<p></p>

<p><b>@dmos62</b> requested changes on this pull request.</p>

<p>Requested changes. Mostly related to clean up, but still significant. I found readability difficult. Would appreciate more helpful names and more intermediary variables.</p>

<p>I wrote up a description of the PR and the problem it's solving. I'll include it below. Some of the things there may not be correct; it represents my ongoing learning about Bisq's P2P (and this PR). <a class="user-mention" data-hovercard-type="user" data-hovercard-url="/users/freimair/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/freimair">@freimair</a>, after corrections, if they're necessary, would you consider including some of this information in an appropriate place in the code? I think future devs will appreciate an explanation of why this datastore design was chosen. I would appreciate feedback if I got something wrong.</p>

<hr>

<p>A datastore is a store of Bisq p2p objects. It is synced between peers-peers and peers-seednodes,<br>

and this way makes up a distributed database.</p>

<p>I presume that a peer's datastore is synced in this way:</p>

<ul>

<li>a peer/client starts up;</li>

<li>it starts a "bulk sync" procedure:

<ul>

<li>gathers up UIDs (or hashes) of the objects it already has in its datastore;</li>

<li>uploads those UIDs to a seed node;</li>

<li>seed node takes a set difference of the passed UIDs and the UIDs of the objects it has;</li>

<li>seed node uploads to the peer/client the objects corresponding to that difference;</li>

</ul>

</li>

<li>peer then enters p2p mode, listening for and publishing updates.</li>

</ul>

<p>The problem is that this "bulk sync" procedure includes the peer sending a large number of UIDS<br>

to the seed node, which can timeout, in case of insufficient network throughput, causing the<br>

syncing to unrecoverably terminate.</p>

<p>That shows two problems:</p>

<ul>

<li>Bisq does not recover from throughput issues;</li>

<li>the bulk sync is bandwidth-intensive.</li>

</ul>

<p>This spec addresses the bandwidth-intensity.</p>

<p>Chronological information in the p2p objects would allow querying it by comparing objects to a<br>

timestamp-like value; however, the p2p objects do not contain chronological information. It is<br>

unclear whether introducing a timestamp would have any negative effects, but it would involve<br>

making sensitive changes.</p>

<p>Freimair's proposal is about using chronologically-indexed datastore "checkpoints" that are<br>

distributed with releases</p>

<ul>

<li>(and thus identical across clients and seednodes)</li>

<li>each checkpoint contains objects generated since the last checkpoint

<ul>

<li>the first checkpoint contains all prior objects</li>

<li>this effectively assigns chronological information to objects

<ul>

<li>precision of which, for the purposes of bulk sync, corresponds to frequency of releases;</li>

</ul>

</li>

</ul>

</li>

<li>this way a bulk sync can be initiated by identifying

<ul>

<li>the latest checkpoint that the client has</li>

<li>and the objects it has stored since then (objects not in any checkpoint known to him)</li>

<li>which reduces the bandwidth requirement

<ul>

<li>to the equivalent of uploading UIDs that are newer than the release used

<ul>

<li>because identifying the checkpoint is negligible bandwidth-wise.</li>

</ul>

</li>

</ul>

</li>

</ul>

</li>

</ul>

<p>Requirements for the checkpoint datastore:</p>

<ul>

<li>initializable from the previous monolithic datastore or from scratch;</li>

<li>a "running" checkpoint has to be maintained;

<ul>

<li>which holds objects generated since the last "release" checkpoint;</li>

</ul>

</li>

<li>at no point can the datastore be mutated, except via appends;</li>

<li>"release" checkpoints must be identical across peers and seednodes.</li>

</ul><hr>

<p>In <a href="https://github.com/bisq-network/bisq/pull/4233#discussion_r439847849">p2p/src/main/java/bisq/network/p2p/storage/persistence/SplitStoreService.java</a>:</p>

<pre style='color:#555'>> +        if (getMap().containsKey(hash))

+            return;

+

+        store.getMap().put(hash, payload);

+        persist();

+    }

+

+    @Override

+    protected PersistableNetworkPayload putIfAbsent(P2PDataStorage.ByteArray hash,

+                                                    PersistableNetworkPayload payload) {

+        // make sure we do not add data that we already have (in a bin of historical data)

+        if (getMap().containsKey(hash))

+            return null;

+

+        PersistableNetworkPayload previous = store.getMap().putIfAbsent(hash, payload);

+        persist();

</pre>

<p>Reading this is a bit difficult, because <code>getMap()</code> is easy to confuse with <code>store.getMap()</code>.</p>

<p><code>getMap().containsKey(hash) == false</code> implies <code>store.getMap().containsKey(hash) == false</code>, because <code>getMap()</code> is a union of <code>store</code> and <code>store</code>s inside <code>history</code>, so L76 here will always insert a new entry. I'd like to point out that these <code>.put</code> and <code>.putIfAbsent</code> implementations (their returns) have different semantics than in <code>java.util.Map</code> (<code>.put</code> returns <code>void</code>). Same in <code>MapStoreService</code>.</p>

<hr>

<p>In <a href="https://github.com/bisq-network/bisq/pull/4233#discussion_r439854542">p2p/src/main/java/bisq/network/p2p/storage/P2PDataStorage.java</a>:</p>

<pre style='color:#555'>>          // We collect the keys of the PersistableNetworkPayload items so we exclude them in our request.

         // PersistedStoragePayload items don't get removed, so we don't have an issue with the case that

         // an object gets removed in between PreliminaryGetDataRequest and the GetUpdatedDataRequest and we would

         // miss that event if we do not load the full set or use some delta handling.

-        Set<byte[]> excludedKeys = this.appendOnlyDataStoreService.getMap().keySet().stream()

-                .map(e -> e.bytes)

-                .collect(Collectors.toSet());

+        Set<byte[]> excludedKeys;

+        if (seedNodeRepository != null && seedNodeRepository.isSeedNode(networkNode.getNodeAddress())) {

</pre>

<p>Meaning of this predicate not immediately obvious: maybe <code>var weAreASeedNode = seedNodeRepository != null && seedNodeRepository.isSeedNode(networkNode.getNodeAddress())</code>?</p>

<hr>

<p>In <a href="https://github.com/bisq-network/bisq/pull/4233#discussion_r439854746">p2p/src/main/java/bisq/network/p2p/storage/P2PDataStorage.java</a>:</p>

<pre style='color:#555'>>          // We collect the keys of the PersistableNetworkPayload items so we exclude them in our request.

         // PersistedStoragePayload items don't get removed, so we don't have an issue with the case that

         // an object gets removed in between PreliminaryGetDataRequest and the GetUpdatedDataRequest and we would

         // miss that event if we do not load the full set or use some delta handling.

-        Set<byte[]> excludedKeys = this.appendOnlyDataStoreService.getMap().keySet().stream()

-                .map(e -> e.bytes)

-                .collect(Collectors.toSet());

+        Set<byte[]> excludedKeys;

+        if (seedNodeRepository != null && seedNodeRepository.isSeedNode(networkNode.getNodeAddress())) {

+            excludedKeys = this.appendOnlyDataStoreService.getMap().keySet().stream()

+                    .map(e -> e.bytes)

+                    .collect(Collectors.toSet());

+        } else {

+            excludedKeys = this.appendOnlyDataStoreService.getMap("since " + Version.VERSION).keySet().stream()

+                    .map(e -> e.bytes)

+                    .collect(Collectors.toSet());

+        }

         Set<byte[]> excludedKeysFromPersistedEntryMap = this.map.keySet()

                 .stream()

                 .map(e -> e.bytes)

                 .collect(Collectors.toSet());

</pre>

<p><code>.keySet().stream().map(e -> e.bytes).collect(Collectors.toSet())</code> is applied 3 times in this method. Moving it to a method (like <code>getKeySetInBytes(Map)</code>) would improve readability.</p>

<hr>

<p>In <a href="https://github.com/bisq-network/bisq/pull/4233#discussion_r439856223">p2p/src/main/java/bisq/network/p2p/storage/P2PDataStorage.java</a>:</p>

<pre style='color:#555'>> @@ -205,24 +214,61 @@ public GetUpdatedDataRequest buildGetUpdatedDataRequest(NodeAddress senderNodeAd

         return new GetUpdatedDataRequest(senderNodeAddress, nonce, this.getKnownPayloadHashes());

     }

+    /**

+     * Create the special key. <br><br>

</pre>

<p>"Special key" is a problem semantically, in that it doesn't describe how it is special. Better to be concrete: it's the release version string in bytes. That's not my main point here, but I noticed a few phrasings like this so felt that it should be mentioned.</p>

<p>The actual problem is that there's string and byte manipulations in various files in this PR. You can put all this logic in <code>bisq/common/app/Version.java</code> under <code>toP2PDataStorageKey</code>, <code>static fromP2PDataStorageKey</code>, etc. Most places where strings are being passed around, <code>Version</code>s should be passed around instead. For example, <code>Version.history</code> would do better as <code>List<Version></code>, than <code>List<String></code>.</p>

<hr>

<p>In <a href="https://github.com/bisq-network/bisq/pull/4233#discussion_r439856316">p2p/src/main/java/bisq/network/p2p/storage/persistence/AppendOnlyDataStoreService.java</a>:</p>

<pre style='color:#555'>> @@ -55,6 +55,18 @@ public void readFromResources(String postFix) {

         services.forEach(service -> service.readFromResources(postFix));

     }

+    /**

+     * TODO to be implemented

</pre>

<p>Probably forgot to remove this.</p>

<hr>

<p>In <a href="https://github.com/bisq-network/bisq/pull/4233#discussion_r439856592">p2p/src/main/java/bisq/network/p2p/storage/persistence/SplitStoreService.java</a>:</p>

<pre style='color:#555'>> + * <li>shoveling around data in case Bisq gets updated to a new version</li>

+ * <li>takes care of setting up a fresh Bisq install</li>

+ * <li>makes sure that historical data cannot be altered easily</li>

+ * <li>adding data to the store will only amend the live store</li>

+ * <li>takes care of keeping the legacy API functional</li>

+ * <li>adds the feature of filtering object queries by Bisq release</li>

+ * </ul>

+ * </p>

+ *

+ * <h3>Further reading</h3><p><ul>

+ *     <li><a href="https://github.com/bisq-network/projects/issues/25">project description</a></li>

+ * </ul></p>

+ */

+@Slf4j

+public abstract class SplitStoreService<T extends SplitStore> extends MapStoreService<T, PersistableNetworkPayload> {

+    protected HashMap<String, SplitStore> history;

</pre>

<p><code>HashMap<Version, SplitStore></code> would be better.</p>

<hr>

<p>In <a href="https://github.com/bisq-network/bisq/pull/4233#discussion_r439859633">p2p/src/main/java/bisq/network/p2p/storage/persistence/SplitStoreService.java</a>:</p>

<pre style='color:#555'>> +    public Map<P2PDataStorage.ByteArray, PersistableNetworkPayload> getMap(String filter) {

+        HashMap<P2PDataStorage.ByteArray, PersistableNetworkPayload> result = new HashMap<>();

+        result.putAll(store.getMap());

+

+        // TODO do a proper language, possibly classes

+        if (filter.startsWith("since ")) {

+            String finalFilter = filter.replace("since ", "");

+            if (!finalFilter.equals(Version.VERSION)) {

+                history.entrySet().stream()

+                        .filter(entry -> parseSpecialKey(entry.getKey()) > parseSpecialKey(finalFilter))

+                        .forEach(entry -> result.putAll(entry.getValue().getMap()));

+            }

+        }

+

+        return result;

+    }

</pre>

<p><code>getMap(String)</code> is cryptic. <code>getObjectsSince(Version)</code> would be much better. Notice that gets rid of the language/DSL. Doing a language is not necessary, because currently there's only one usage for these version keys here: getting objects from newer releases than the one specified.</p>

<p><code>parseSpecialKey(finalFilter)</code> is called a lot, could be cached.</p>

<hr>

<p>In <a href="https://github.com/bisq-network/bisq/pull/4233#discussion_r439859962">p2p/src/main/java/bisq/network/p2p/storage/persistence/SplitStoreService.java</a>:</p>

<pre style='color:#555'>> +            log.error("Could not copy resourceFile {} to {}.\n{}", resourceFileName, destinationFile.getAbsolutePath(), e.getMessage());

+            e.printStackTrace();

+        }

+

+        // split

+        // - get all

+        SplitStore historicalStore = readStore(destinationFile.getName());

+        // - subtract all that is in resource files

+        store.getMap().keySet().removeAll(historicalStore.getMap().keySet());

+

+        // - create new file with leftovers

+        storage.initAndGetPersisted(store, 0);

+        storage.queueUpForSave();

+

+        return historicalStore;

+    }

</pre>

<p>Why are we copying historical datastore checkpoints/splits out of resources? Because that's what the live-store-only solution we currently use does? For troubleshooting purposes? Seems unnecessary, though I may be missing something. I presume we keep the live store in the working directory, because we can't mutate resource files (is that even true?).</p>

<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/bisq-network/bisq/pull/4233#pullrequestreview-430234666">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AJFFTNWLPX2FMFPU2IFASSDRWUWBTANCNFSM4MY33J4A">unsubscribe</a>.<img src="https://github.com/notifications/beacon/AJFFTNWDYAWTRC5WWTGRVG3RWUWBTA5CNFSM4MY33J4KYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGODGSNYKQ.gif" height="1" width="1" alt="" /></p>

<script type="application/ld+json">[

{

"@context": "http://schema.org",

"@type": "EmailMessage",

"potentialAction": {

"@type": "ViewAction",

"target": "https://github.com/bisq-network/bisq/pull/4233#pullrequestreview-430234666",

"url": "https://github.com/bisq-network/bisq/pull/4233#pullrequestreview-430234666",

"name": "View Pull Request"

},

"description": "View this Pull Request on GitHub",

"publisher": {

"@type": "Organization",

"name": "GitHub",

"url": "https://github.com"

}

}

]</script>