[bisq-network/bisq] ConcurrentModificationException FileManager::saveToFile (#3752)

Julian Knutsen notifications at github.com
Tue Dec 10 19:35:12 UTC 2019


I got a little too deep into the perf work here and wanted to share the summary of information for the person that picks this up.

I think guaranteeing correctness outweighs performance at a high-level. The current code "works" because there isn't a lot of state change once Bisq is up and running and even if there is corruption it is likely overridden quickly enough with the next save. All of this persistence is just a network bandwidth optimization so worst case, refetching everything from the seed nodes is an option.

But, fixing the correctness and keeping it performant is not a small project. The persisted objects have multiple fields and mutable getters/setters which make it extremely difficult to understand the calling contexts for each object and how to efficiently synchronize them. There are plenty of ways this can be done at the design-level, but translating that into actually PRs will require quite a bit of work.

With that said, I have a few small diffs and traces I want to share to help future viewers understand the performance.

### Make it correct

This patch just does the `toProtoMessage` synchronization on the `UserThread`. It ensures that the iteration bug can't happen because the iteration also happens on the `UserThread`.

### Make it fast (sort of...)
If the `toProtoMessage` has to be done on the `UserThread`, the next step is to profile that code and see what is slow. The results show that the `DaoStateStore` is the worst offender followed by `AccountAgeWitnessStore` and `TradeStatistics2Store` as far as `totalTime`. The preferences payload is also over 100ms.

![toProtoMessage_UserThread_comparisons](https://user-images.githubusercontent.com/8082291/70560417-6504f480-1b3d-11ea-81fd-9861dc5bdecd.JPG)

**Why is DaoStateStore slow?**

Millions of "fast" calls. Having to serialize every transaction and every block is expensive.

![DaoStateStore_toProtoMessage](https://user-images.githubusercontent.com/8082291/70560675-e2306980-1b3d-11ea-9891-2d153c447d5b.JPG)

__Can we cache Block/Tx toProtoMessage?__

No. It turns out that the overhead of caching those objects not only increases the memory footprint but generating the `hashCode()` for the cache/map takes just as long as serialization.

__Can we cache serialized DaoState for faster iterative saves?__
Potentially, but this requires synchronizing all DaoState modifications and without testing it is high-risk to modify any of that code.

__Can we use concurrency?__

Yes. Using the `parallelStream()` construct for the top 3 objects (`DaoStateStore`, `AccountAgeWitnessStore`, `TradeStatistics2Store`) shows some performance gains. Keep in mind this is on a 4 core VM so actual results on low-end hardware won't be nearly as interesting. It also uses the global ForkJoin pool so if this pattern is repeated on multiple threads it also won't be as beneficial. Even with these changes, you are looking at 100ms+ delays on the `UserThread` which may be a non-starter.

![toProtoMessage_UserThread_parallelStream_comparison](https://user-images.githubusercontent.com/8082291/70561163-d09b9180-1b3e-11ea-9253-3bdc76ada7b1.JPG)

So overall, these are the "best" option that I found in my investigation that doesn't require an extended dev effort. Performance is tricky and there isn't enough testing of the persistable objects where I would feel confident making changes without a lot of manual testing overhead or high-risk bug introduction. See https://github.com/bisq-network/bisq/pull/3773 for my comments on similar perf optimizations in the same layer.

I am also just documenting this here as opposed to PRs because I don't think I am the right person to be responsible for this change and the performance impacts moving forward. I'm happy to help investigate, but someone with much more context of the system should be responsible for the maintenance of it.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/bisq-network/bisq/issues/3752#issuecomment-564208822
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bisq.network/pipermail/bisq-github/attachments/20191210/e4baa7bc/attachment.html>


More information about the bisq-github mailing list