[bisq-network/bisq] Redesign DAO state monitoring to avoid performance degradation (Issue #5779)

Tue Oct 26 14:58:52 CEST 2021

The PR https://github.com/bisq-network/bisq/pull/5782 is now ready for review.

### Here a specification of the new behaviour.

There is a new boolean field in PreferencePayload `useFullModeDaoMonitor` which indicates if the user wants to create the daoStateHashes by themself or if they take the hashes received from seed nodes during initial BSQ block parsing.

As descibed above that reduces dramatically costs for parsing. For 4000 blocks (about 1 month) it takes about 10-15 min. with useFullModeDaoMonitor is set to true and about 2-4 sec. if useFullModeDaoMonitor is set to false (default).

If `useFullModeDaoMonitor` is disabled (default for normal users) it behaves like following:
- User gets the missing BSQ blocks from seed nodes. Parsing is super fast (usually < 1 ms). We do not create the daoStateHash at each new block.
- Once initial BSQ block parsing is done we request the DaoStateHashes from the seed nodes from the last hash we have. The data is about 26 bytes for one hash (as we removed the prevHash its 50% smaller now)
- When we receive the hashes we create our `DaoStateBlock` from the peers `DaoStateHash` and mark the `isSelfCreated` field with false as information that we have not created the hash by ourself.
- We apply the checks and add the peer into the `peersMap`
- At the last block we do instead the hashing ourself by using our daoState and the previous hash (from the peer) and add the peer into our `peersMap` and apply the conflic checks
- After that we create a snapshot of our current state
- At each new block we create the hash now by ourself. The shortcut with the peers hashes was only applied during parsing to avoid the performance penalty
- The snapshotting will happen then after the next snapshot interval (20 blocks grid)
- If the user shuts down and starts up again they might see some of the new blocks where we had created the hash by ourself now delivered by the peer again. That is because the new blocks have not been added to the next snapshop yet (only after about 20 blocks have passed). But the important thing is that the current block is always created by ourself and by that we can verify that the created hash, based on the previous hash from the peer is not in conflict with the network.

If `useFullModeDaoMonitor` is enabled (default for headless apps like seed nodes) it behaves like following (behaviour has not really changed):
- User gets the missing BSQ blocks from seed nodes (if lite mode, otherwise from bitcoind rcp. Parsing is super fast (usually < 1 ms) but creating the hash at each new block takes about 100-200 ms. This is because the daoState data is already quite large. The `spentInfoMap` for instance has about 80k items. There are several steps which accumulate to the costs. 
    - 1. Serializing the daoState excluding the blockchain. 
    - 2. Adding the last block
    - 3. Concatenating the data as bytes with the previous hash
    - 4. Creating a hash out of that data
    - Some small part could be optimized but would improve overall cost by only 20-30%
- Spapshots are created at each snapshot interval. This is done on background threads but still has costs of about 1-4 sec. as the daoState is about 150 MB. Writing to disc includes serialisation and write operation, both are slow for that large data.
- After initial sync is done at each new block we create our hash and at each new snapshot interval we create a new snapshot
- We request the hashes from the seed nodes and apply the checks to see if we are in sync

### What are the drawbacks if `useFullModeDaoMonitor` is disabled?
- If all nodes have it disabled potential conflicts might not be discovered as there would be very few nodes which are actually creating the hash from their daoState. At least the seed nodes are running in enabled mode. But it would be good that developers and power users also have the mode enabled to get a bit more resiliance.
- As long there are seed nodes with correct hashes a potential conflict should be discovered. All seed nodes need to have the same hashes, if not we have to investigate and find out the issue. 

### Any other option to fix that performance problem?
The DAO state scaling problem need to be addressed as some point. When redesigning that there might be new options how to deal with it. But at the moment I don't see any easy solution and the suggested one here comes with quite low risks but improves the user experience a lot.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/bisq-network/bisq/issues/5779#issuecomment-951912926
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bisq.network/pipermail/bisq-github/attachments/20211026/273edfba/attachment.htm>