[bisq-network/bisq] Track p2p data store files using Git LFS (#4114)

Thu Apr 2 19:07:31 UTC 2020

When I put together this PR earlier today, I was grossly underestimating the number of clones that this repository gets every month. This is the Clones graph in `Insights > Traffic`:

<img width="1002" alt="image" src="https://user-images.githubusercontent.com/301810/78289139-f8659980-7521-11ea-9746-ee0670b894e9.png">

We averaged roughly 28 clones per day from 14 unique cloners over the last 14 days. I'm assuming that most of those cloners are not just different Travis CI machines. If in fact most of them are Travis CI, then we might be OK from a bandwidth perspective because of how we're caching LFS data in Travis as mentioned above. But let's just assume for a moment that all of those clones are normal, non-Travis clones. That would mean 28 (clones/day) x 30 (days/month) x 65 (MB of LFS data per clone) / 1024 (MB/GB) = `53 GB per month` of LFS bandwidth. So that would already put us over the 1 data pack (50 GB/month) limit. 1 data pack is $60 / year, and we'd need at least 2, meaning $120 per year. It's not prohibitively expensive to consider the DAO paying for this, but it does make me pause and think before pushing to execute this.

Let's step back for a moment. I think we have the following options:

1. **Do nothing.** Stay with the status quo, let the repository size continue to grow with every release, keep all data stores n Git history forever. We've been tracking these binaries since July of 2018. In roughly 20 months, they've grown the repository from around 20M to 360M. This rate of growth will only continue from here out, assuming that Bisq continues to succeed and therefore have more of this data (e.g. historical trade data) accumulating more rapidly.

2. **Use Git LFS as proposed here.** Bite the bullet, pay the $120 per year and stop the growth of the repository now.

3. **Run our own Git LFS server.** This wouldn't necessarily be a cost savings, as we'd have to maintain the instance and pay for the bandwidth anyway. It's also not obvious what the go-to implementation would be. https://github.com/git-lfs/git-lfs/wiki/Implementations.

4. **Re-think our approach to distributing data stores entirely.** This is a larger conversation, but now may be the right time to have it. I was absent when the decision was made to start tracking and distributing these data stores, but I believe it was a pragmatic decision made to alleviate growing load on our seed nodes. So the question arises, why put all of this on the seed nodes in the first place? Why not have the entire network of Bisq nodes share this data with one another, more or less the way that Bitcoin full nodes service new or recently offline nodes in catching up with the blockchain? If we combined this approach with relaxing the need for new Bisq clients to have all historical data right out of the gate, we could even make it efficient. Certain kinds of data, e.g. account signing data would need to be comprehensive, i.e. new clients would want and need to get all that data up front, but they don't necessarily need to have the full 56,000-trade history (at time of writing) just to get going. They could request the last N trades up front, and then slowly catch up over time, or only catch up with the full history on request from the user. So the basic idea here is why don't we take this load off our seed nodes and distribute it to the whole network? This may well have been considered and dismissed for good reason, my apologies if this is rehashing old territory, but the refresher might be good to have anyway.

Discussion welcome, please. We don't have to pull the trigger on this PR urgently (though I would like to do it before we add or update any more data stores). Better to get this right. Having considered all the above, I would probably still go with option (2) above, i.e. merging this PR and just paying for GitHub's LFS service, as it's the easiest thing to do and will just work. But I'd like to hear others' thoughts.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/bisq-network/bisq/pull/4114#issuecomment-608048830
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bisq.network/pipermail/bisq-github/attachments/20200402/66fdcf4b/attachment.html>