[bisq-network/bisq] Reduce initial request size (#4233)

chimp1984 notifications at github.com
Thu Aug 6 06:11:03 UTC 2020


A few "high level" comments/thoughts:

By splitting up the data into groups we are creating "blocks" (I use that terminology with intention). How to uniquely identify a block in a distributed system is a challenge. 

The approach used here makes the shortcut via the release maintainer to be the single entity to define what is the right "block". This might not be an issue (or at least does not make things worse as it is) but I am not 100% sure. The checkpoint terminology reveals the centralisation issue.

Is there another way to do it?
We have date fields in most (all?) of the relevant data structures and could use that for sorting the items but there could be multiple items with the same date. To get unique IDs ans sorting we could add the hash fields and create a tuple {date, hash} as ID  and sort primarily by date, secondarily by hash. 
But there can be a late insertion as well which could mutate our data blocks. For certain data we don't allow insertion outside a defined time period tolerance window (`DateTolerantPayload`). So we could consider blocks of a certain age as immutable at least. Newer blocks might get "reorged". Handling "reorgs" sounds like looking for toubles though...

Another approach could be to use a fixed time period window (e.g. 10 days) and fill up all data whose date falls into that period. The GetData request would send the index of the time window and the hash of its content. If the Seed node has a different hash it will send the whole package. For older data that should be very rare and the additional network overhead might be a good trade off with the more simple approach and low processing costs. The hash gets calculated once and is stored with the package. A problem is how to handle the case if a user has more data items than the seed node. The it would request each time again the package. If a seed is missing data it would get dossed by the network....

Another approach might be to use Bloom filters for the data request, but that is likely expensive for the seed node to process. A probably bigger problem than the 2.5 MB request is the load on the seed node to process the response. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/bisq-network/bisq/pull/4233#issuecomment-669723550
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bisq.network/pipermail/bisq-github/attachments/20200805/7d784f05/attachment.html>


More information about the bisq-github mailing list