[bisq-network/bisq] No connection, Peer discovery message fills 220Mb of log in short period of time (Issue #5996)

jmacxx notifications at github.com
Thu Jan 27 05:06:34 CET 2022


There is a minor resource leak of callable future tasks in `PeerGroup.java`.

The leak occurs during a connection group backoff period.  A connection group backoff is initiated whenever a connection dies if there is no existing backoff in progress.  The group backoff is indicated in the logs by a message like "Waiting 41869 ms before next connect attempt to [address]".  During this time, if another connection dies it will leak one callable future task.  
In the case of Bisq with maxconnections=7 and no inet availablity, the leak will be 6 futures for every connection group backoff.

The leak grows with each cycle of connection attempts.

`triggerConnectionsJob` is invoked by each of the leaked resources perpetually (with random delays) until all connections are restored - this is what fills up the logs.

Observed size of the leaked future was 100 after 5 minutes, 200 after 10 minutes, 300 after an hour, 1300 after 10 hours.  
It has the ability to completely fill 220Mb of bisq logs if left running with no network connection for several days.  (Surprisingly, this happens in more user logs than you would expect -> possibly they configured to use their own node which then went down).



Example: [bisq_20.zip](https://github.com/bisq-network/bisq/files/7947440/bisq_20.zip) from a user; 82000 lines filled by the leaked log messages in half an hour.  All 20 of their logs were like this.


----


### To reproduce:

- Run Bisq with no reachable internet.  After an hour the logging will clearly show the problem.
- Make the internet reachable again.  Observe that nodes become connected after a few minutes.




Modifying [this logging code](https://github.com/bisq-network/bitcoinj/blob/42bbae98c08a7410a96fce48fb4174495e846f51/core/src/main/java/org/bitcoinj/core/PeerGroup.java#L560) enables you to see the leak count increase over time.


    log.info("Waiting {} ms before next connect attempt {} leak={}", delay, addrToTry == null ? "" : "to " + addrToTry, scheduledThreadPoolExecutor.getQueue().size());


----



### The problem from a code perspective.

PeerGroup `triggerConnectionsJob`:  basic function is "do the next thing necessary regarding making peer connections", which in many cases can be simply do nothing at this time and try again later.  That means it can be called many times without ill-behaviour, (except producing exessive logging, and the resource leak).

Algorithm:  The job does one of the following, in order of precedence:

    1- check for a local bitcoin node. (only done first time)
    2- perform a discovery process to find nodes to connect to.  (Bisq does not use discovery, instead we provide a list of node addresses).
    3- if ipv6 connectivity has gone down, remove all ipv6 node addresses from the list of potential connections.
    4- decide there are no available nodes to connect to at this time, so schedule a future time to re-do triggerConnectionsJob.
    5- decide we have enough active node connections, so do nothing.
    6- decide that we have a node to connect to, but the backoff time has not elapsed yet, so log and schedule a future time to re-do triggerConnectionsJob.
    7- open a connection to an available node, and schedule an immediate call to triggerConnectionsJob (so that all available connections get started).

When a node connection dies, triggerConnectionsJob is invoked via callable future, which then essentially calls itself via delayed futures bootstrapping until all new connections can be made.
When connections have been made, it stops calling itself, relying on the connection death event to re-start the process.
If more than one connection dies within 10 seconds (typically or whatever the variable backoff time happens to be), then callable futures are leaked.


----

### Example:

    bitcoinj is provided with 4 addresses for connecting to: a,b,c,d.
    (in practice it is provided 16 federated Bisq addresses, simplified here for demonstration purposes).

    bitcoinj opens connections to a,b,c,d as described above algorithm point 7.
    the connection to a dies, bitcoinj triggers the scheduler which creates a future call for +10 seconds to reconnect (resource=future1)
    the connection to b dies, bitcoinj triggers the scheduler which creates a future call for +10 seconds to reconnect (resource=future2)
    the connection to c dies, bitcoinj triggers the scheduler which creates a future call for +10 seconds to reconnect (resource=future3)
    the connection to d dies, bitcoinj triggers the scheduler which creates a future call for +10 seconds to reconnect (resource=future4)

    at this point future1 is useful, future 2,3,4 are leaked resources.

    10 seconds passes.
    the scheduler is invoked by future1, it opens connections to a,b,c,d as described above algorithm point 7.
    the scheduler is invoked by future2, it logs peer discovery provided no addresses and will retry in 10 seconds (resource=future5).
    the scheduler is invoked by future3, it logs peer discovery provided no addresses and will retry in 10 seconds (resource=future6).
    the scheduler is invoked by future4, it logs peer discovery provided no addresses and will retry in 10 seconds (resource=future7).

    future 5,6,7 are now the leaked resources

    the connection to a dies, bitcoinj triggers the scheduler which creates a future call for +10 seconds to reconnect (resource=future8)
    the connection to b dies, bitcoinj triggers the scheduler which creates a future call for +10 seconds to reconnect (resource=future9)
    the connection to c dies, bitcoinj triggers the scheduler which creates a future call for +10 seconds to reconnect (resource=future10)
    the connection to d dies, bitcoinj triggers the scheduler which creates a future call for +10 seconds to reconnect (resource=future11)

    at this point future8 is useful, future 5,6,7,9,10,11 are leaked resources.

    the scheduler is invoked by future5, it logs peer discovery provided no addresses and will retry in 10 seconds (resource=future12).
    the scheduler is invoked by future6, it logs peer discovery provided no addresses and will retry in 10 seconds (resource=future13).
    the scheduler is invoked by future7, it logs peer discovery provided no addresses and will retry in 10 seconds (resource=future14).

    the scheduler is invoked by future8, it opens connections to a,b,c,d as described above algorithm point 6.
    the scheduler is invoked by future9, it logs peer discovery provided no addresses and will retry in 10 seconds (resource=future15).
    the scheduler is invoked by future10, it logs peer discovery provided no addresses and will retry in 10 seconds (resource=future16).
    the scheduler is invoked by future11, it logs peer discovery provided no addresses and will retry in 10 seconds (resource=future17).

    future 12,13,14,15,16,17 are now the leaked resources

    [repeats]

----


### Solution:

The leak and excessive logging can be mitigated by the following change at [L580 of PeerGroup.java](https://github.com/bisq-network/bitcoinj/blob/42bbae98c08a7410a96fce48fb4174495e846f51/core/src/main/java/org/bitcoinj/core/PeerGroup.java#L580):

    private void triggerConnections() {
    +    if (scheduledThreadPoolExecutor.getQueue().size() > 1) {
    +        return;
    +    }

(`scheduledThreadPoolExecutor` has to be made accessible via `createPrivateExecutor` in order to query its queue size).


----

### Conclusion

Overall I suppose the leak is fairly benign, the annoyance is user's logs can fill up easily, preventing any problem diagnosis.  The resource leak seems quite small and does clear up once connectivity is restored.



-- 
Reply to this email directly or view it on GitHub:
https://github.com/bisq-network/bisq/issues/5996#issuecomment-1022828159
You are receiving this because you are subscribed to this thread.

Message ID: <bisq-network/bisq/issues/5996/1022828159 at github.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bisq.network/pipermail/bisq-github/attachments/20220126/9627b9b3/attachment-0001.htm>


More information about the bisq-github mailing list