[bisq-network/bisq] Reduce initial request size (#4233)

Fri Aug 7 18:00:46 UTC 2020

@dmos62 Thanks for your input!

After a discussion with @sqrrm we agreed the best way forward is to limit a first mitiagtion to trade statistic objects only. This will reduce bandwith requirement by 50% as well as seed node processing load by 50% (as its about 50% of the hashes). It will also reduce required RAM for the Bisq app if it is only loaded on demand.

Rough idea:

Prune existing trade statistics by removing all data older than a cut off date - e.g. 2 months (TBD). Beside that handling of trade statistics data is exactly as it is now but instead of 4 years of data we only deal with 2 month. I assume that will be about 1-5% of current data.
At each release we add a snapshot named with the release version which contains all historical data until a defined cutoff date (can have an overlap with the in-memory data to be on the safe side - details have to be defined once we work on it).
This historical data is available for new users as its in the binary. For not updated users they will request it with a new request call from the seed nodes. They pass the versions they have and receive the missing ones. This will be only done once the data is actually needed (user goes on historical trade statistics UI). We could even spit that UI in recent trades which covers the in-memory data and historical data which makes the request and/or loads the shipped resource files. After leaving this UI we unload that data from RAM. Loaded data and in-memory data might overlap but as its hashmaps those will get canceld out. 

This approach reduces a lot the risks as if anything goes wrong we only screw up trade statistics which are non-critical data.
If the concept of using version based buckets works out well it can be extended to the next data type like account witness objects (here we dont have the concept of on-deman data as all data are required).

Backward  compatibility:
We need to take care that not updated nodes still work properly. I guess there are no big issues beside potential extra load when a updated seed node request data from a not updated seed. Here we get a bit of extra load as the not updated seed node would deliver all the missing data (up to the hard cap limit). The updated node would ignore those historical data as they will not fit the date check predicate. With customizing the update of seeds by assigning a seed to connect with (one which has alreayd updated) at startup we can minimize that load. Beside that we could use other tools like capabilities, verion numbers, new data fields as flags, activation date to make the transition smooth.
Users who have not updated longer than 2 months would see gaps in the trade statistics, but that is not a major issue as that data is only used for informational purpose. We can even require an update via the filter after a certain time.

I might work on that myself in a couple of weeks if I find time. Otherwise if anyone wants to pick it up, please leave a message.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/bisq-network/bisq/pull/4233#issuecomment-670639176
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bisq.network/pipermail/bisq-github/attachments/20200807/b317557a/attachment-0001.html>