[bisq-network/bisq-core] Fix broadcast issues (#165)

Tue Aug 21 19:20:20 UTC 2018

chirhonul commented on this pull request.

I have left some review comments.

Nothing major, just nits.

> @@ -29,13 +30,22 @@
     @Nullable
     private final Transaction localTx;
     @Getter
-    private int delay;
-
-    public TxBroadcastTimeoutException(Transaction localTx, int delay) {
+    private final int delay;
+    @Getter
+    private final Wallet wallet;
+
+    /**
+     *

(Optional) It's great that the docstring was added here, but maybe there could be some description, either here or in a class-level docstring, of what the `TxBroadCastTimeoutException` means semantically?

> @@ -42,8 +42,41 @@
         void onSuccess(Transaction transaction);

         default void onTimeout(TxBroadcastTimeoutException exception) {
-            log.error("TxBroadcaster.onTimeout " + exception.toString());
-            onFailure(exception);
+            Transaction tx = exception.getLocalTx();
+            if (tx != null) {
+                String txId = tx.getHashAsString();
+                log.warn("TxBroadcaster.onTimeout called: {}\n" +
+                                "We optimistically assume that the tx broadcast succeeds later and call onSuccess on the " +
+                                "callback handler. This behaviour carries less potential problems than if we would trigger " +
+                                "a failure (e.g. which would cause a failed create offer attempt of failed take offer attempt).\n" +
+                                "We have no guarantee how long it will take to get the information that sufficiently BTC " +

Nit: Where we say `sufficiently BTC nodes`, it seems more clear to instead say `sufficently many Bitcoin nodes`.

> @@ -42,8 +42,41 @@
         void onSuccess(Transaction transaction);

         default void onTimeout(TxBroadcastTimeoutException exception) {
-            log.error("TxBroadcaster.onTimeout " + exception.toString());
-            onFailure(exception);
+            Transaction tx = exception.getLocalTx();
+            if (tx != null) {
+                String txId = tx.getHashAsString();
+                log.warn("TxBroadcaster.onTimeout called: {}\n" +
+                                "We optimistically assume that the tx broadcast succeeds later and call onSuccess on the " +
+                                "callback handler. This behaviour carries less potential problems than if we would trigger " +
+                                "a failure (e.g. which would cause a failed create offer attempt of failed take offer attempt).\n" +
+                                "We have no guarantee how long it will take to get the information that sufficiently BTC " +
+                                "nodes have reported back to BitcoinJ that the tx is in their mempool.\n" +
+                                "In normal situations " +
+                                "that's very fast but in some cases it can take minutes (mostly related to Tor connection " +
+                                "issues). So if we just go on in the application logic and treat it as successful and the " +
+                                "tx will be broadcasted successfully later all is fine.\n" +

Nit: s/`broadcasted`/`broadcast`

> -            log.error("TxBroadcaster.onTimeout " + exception.toString());
-            onFailure(exception);
+            Transaction tx = exception.getLocalTx();
+            if (tx != null) {
+                String txId = tx.getHashAsString();
+                log.warn("TxBroadcaster.onTimeout called: {}\n" +
+                                "We optimistically assume that the tx broadcast succeeds later and call onSuccess on the " +
+                                "callback handler. This behaviour carries less potential problems than if we would trigger " +
+                                "a failure (e.g. which would cause a failed create offer attempt of failed take offer attempt).\n" +
+                                "We have no guarantee how long it will take to get the information that sufficiently BTC " +
+                                "nodes have reported back to BitcoinJ that the tx is in their mempool.\n" +
+                                "In normal situations " +
+                                "that's very fast but in some cases it can take minutes (mostly related to Tor connection " +
+                                "issues). So if we just go on in the application logic and treat it as successful and the " +
+                                "tx will be broadcasted successfully later all is fine.\n" +
+                                "If it will fail to get broadcasted, " +

Nit: s/`broadcasted`/`broadcast`

> +            if (tx != null) {
+                String txId = tx.getHashAsString();
+                log.warn("TxBroadcaster.onTimeout called: {}\n" +
+                                "We optimistically assume that the tx broadcast succeeds later and call onSuccess on the " +
+                                "callback handler. This behaviour carries less potential problems than if we would trigger " +
+                                "a failure (e.g. which would cause a failed create offer attempt of failed take offer attempt).\n" +
+                                "We have no guarantee how long it will take to get the information that sufficiently BTC " +
+                                "nodes have reported back to BitcoinJ that the tx is in their mempool.\n" +
+                                "In normal situations " +
+                                "that's very fast but in some cases it can take minutes (mostly related to Tor connection " +
+                                "issues). So if we just go on in the application logic and treat it as successful and the " +
+                                "tx will be broadcasted successfully later all is fine.\n" +
+                                "If it will fail to get broadcasted, " +
+                                "it will lead to a failure state, the same as if we would trigger a failure due the timeout." +
+                                "So we can assume that this behaviour will lead to less problems as otherwise.\n" +
+                                "Long term we should implement better monitoring for Tor and the provided Bitcoin nodes to " +

(Optional) I think the text of this log message is really useful to remember why we are doing this. But maybe it would be better to discuss future how we could make future improvements etc in the form of a `TODO` comment in the code, or a Github issue?

> +                                "callback handler. This behaviour carries less potential problems than if we would trigger " +
+                                "a failure (e.g. which would cause a failed create offer attempt of failed take offer attempt).\n" +
+                                "We have no guarantee how long it will take to get the information that sufficiently BTC " +
+                                "nodes have reported back to BitcoinJ that the tx is in their mempool.\n" +
+                                "In normal situations " +
+                                "that's very fast but in some cases it can take minutes (mostly related to Tor connection " +
+                                "issues). So if we just go on in the application logic and treat it as successful and the " +
+                                "tx will be broadcasted successfully later all is fine.\n" +
+                                "If it will fail to get broadcasted, " +
+                                "it will lead to a failure state, the same as if we would trigger a failure due the timeout." +
+                                "So we can assume that this behaviour will lead to less problems as otherwise.\n" +
+                                "Long term we should implement better monitoring for Tor and the provided Bitcoin nodes to " +
+                                "find out why those delays happen and add some rollback behaviour to the app state in case " +
+                                "the tx will never get broadcasted.",
+                        exception.toString(),
+                        txId,

There seems to be three arguments after the main string given, but I only see one `{}` in the string itself. It seems that either there are some missing `{}` where `txId` and `exception.getDelay()` should be placed in the string, or those arguments should be dropped from `log.warn()`.

> +                                "issues). So if we just go on in the application logic and treat it as successful and the " +
+                                "tx will be broadcasted successfully later all is fine.\n" +
+                                "If it will fail to get broadcasted, " +
+                                "it will lead to a failure state, the same as if we would trigger a failure due the timeout." +
+                                "So we can assume that this behaviour will lead to less problems as otherwise.\n" +
+                                "Long term we should implement better monitoring for Tor and the provided Bitcoin nodes to " +
+                                "find out why those delays happen and add some rollback behaviour to the app state in case " +
+                                "the tx will never get broadcasted.",
+                        exception.toString(),
+                        txId,
+                        exception.getDelay());
+
+                // The commit is required in case the tx is spent by a follow up tx as otherwise there would be an
+                // InsufficientMoneyException thrown. But in some test scenarios we also got issues with wallet
+                // inconsistency if the tx was committed twice. It should be prevented by the maybeCommitTx methods but
+                // not 100% if that is always the case. Just added that comment to make clear that this might be a risky

Nit: I think the comment mentioning why the comment was added is a little too meta.. maybe instead of:
```
Just added that comment to make clear that this might be a risky strategy and might need improvement if we get problems.
```

We could say:
```
This might be risky and we should re-evalute the choices we made here if there's future issues.
```

> +                                "If it will fail to get broadcasted, " +
+                                "it will lead to a failure state, the same as if we would trigger a failure due the timeout." +
+                                "So we can assume that this behaviour will lead to less problems as otherwise.\n" +
+                                "Long term we should implement better monitoring for Tor and the provided Bitcoin nodes to " +
+                                "find out why those delays happen and add some rollback behaviour to the app state in case " +
+                                "the tx will never get broadcasted.",
+                        exception.toString(),
+                        txId,
+                        exception.getDelay());
+
+                // The commit is required in case the tx is spent by a follow up tx as otherwise there would be an
+                // InsufficientMoneyException thrown. But in some test scenarios we also got issues with wallet
+                // inconsistency if the tx was committed twice. It should be prevented by the maybeCommitTx methods but
+                // not 100% if that is always the case. Just added that comment to make clear that this might be a risky
+                // strategy and might need improvement if we get problems.
+                exception.getWallet().maybeCommitTx(tx);

(Optional) It feels a little odd to have the `TxBroadcastTimeoutException` class know about the `Wallet`.. I understand why we need that reference here, but it feels like it's not a concern that the exception class should need to handle.

> @@ -94,8 +126,8 @@ public void onSuccess(@Nullable Transaction result) {
                             UserThread.execute(() -> callback.onSuccess(tx));
                         } else {
                             stopAndRemoveTimer(txId);
-                            UserThread.execute(() -> callback.onFailure(new TxBroadcastException("We got an onSuccess callback for " +
-                                    "a broadcast which got already triggered the timeout.", txId)));
+                            log.warn("We got an onSuccess callback for a broadcast which got already triggered " +

Nit: Instead of `for a broadcast which got already triggered the timeout` it seems more clear to say `for a broadcast which already triggered the timeout`.

> @@ -92,6 +92,7 @@ protected void run() {
                         @Override
                         public void onSuccess(Transaction transaction) {
                             if (!completed) {
+                                trade.setDepositTx(transaction);

(Optional) It looks surprising to find the `transaction` referenced in this `onSuccess()` handler, when we are in a context of creating the very same `depositTx`.. I understand from conversation with @ManfredKarrer that this edit was necessary when testing on regtest, where the `onSuccess()` handler would fire before the `trade.setDepositTx(depositTx)` line on L118. Maybe a comment could be added here to explain this, to help future maintainers understand?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/bisq-network/bisq-core/pull/165#pullrequestreview-148205147
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bisq.network/pipermail/bisq-github/attachments/20180821/1ebfd523/attachment-0001.html>