[bisq-network/projects] Reduce bandwidth-requirement on Bisq app startup (#25)

Wed Apr 1 11:58:44 UTC 2020

> Why would a Bisq node need to send so much data to its seednode(s)?
- sorry, I took that as common knowledge, because that is how Bisq always worked
- let the ELI5 commence:
  1. on startup, the Bisq app asks the seed node for a "distributed-database update"
  2. In order to not burden the seednode to send all the data (> 12MB), Bisq tells the seednode which objects it already has (ie. sends data to the seednode).
  3. The seed node then only sends the data the bisq app does not have already.

- The trouble comes with success: We now have more than 100k objects in the "distributed database" which makes bisq send 100k "keys" to the seednode (100k * 20byte hash = 2MByte = substantial, and rising).
- And because that is not enough: for redundancy purposes, the Bisq app asks two seednodes for data
- given a "bad" internet connection, Bisq simply fails to start
  - ie. if net upstream is < 35kB/s = 280kb/s (= 4MB/120 second connection timeout)
  - does not seem like a lot, but there are bug reports (labeled critical bug) out there and I encountered it myself while not at home
  - Tor is not at fault: p50 of tor speed is [28Mb/s](https://metrics.torproject.org/advbwdist-perc.html?start=2020-01-02&end=2020-04-01&p=100&p=50)
  - we need more bandwidth as time goes on (because the database grows -> the number of objects grows -> the request size grows)
  - if we succeed with bisq, the required bandwidth will outgrow infrastructure development

> Do you intend here to check these binary blobs into the main Bisq repository, or something else? I would really like to avoid adding more binary data to the repository (as we're already doing with all the stuff in `p2p/src/main/resources`).

yes, I intend to check these binary blobs into the main Bisq repository. It is exactly about the stuff in `p2p/src/main/resources` which is a snapshot of the "distributed-database" we ship with each release.
- Atm, there is only one blob that gets bigger and bigger. Plus it replaces the old one, so repo size grows with `size(t) = size(t-1)+size(newData)` per release. (actually, it is several files for different message types, but overall, it is one blob of data)
- after this project is done, a new blob will be added for every release with `size(t) = size(newData)`, the "old" blobs are left untouched and are used as they are (historical data does not change)
- doing it that way is a very minimal change to the current release processes and we can focus on fixing the real issue quickly

> I'd like us to look into doing this properly with Git LFS instead
- I totally agree that we have to move away from committing binary data to the repo, but
  - using [_insert your favorite storage technology here_] does not collide with this project
  - can be done later
  - should be done sooner than later
  - will look into Git LFS as a followup-project

All in all, this project aims for making small steps towards a more reliable service. Rethinking the storage synchronization is a whole other can of worms.

Btw. just checked. We have 110k objects now, at the time of project creation it has been 104k -> approx. +5% in 25 days.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/bisq-network/projects/issues/25#issuecomment-607206262
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bisq.network/pipermail/bisq-github/attachments/20200401/acfb2d9b/attachment.html>