[bisq-network/proposals] ProtectedStorageEntry Deltas (#140)

Sun Nov 17 01:30:01 UTC 2019

Before `ProtectedStorageEntrys`, the `PersistableNetworkPayload` objects were append-only and immutable. This meant that there were never issues with interleaved remove operations or `GetData` updating when nodes were offline. It was OK to just ask for all of the new objects when a node came online and the set of the Payloads would sync. But, with the introduction of `ProtectedStorageEntry` objects that can be removed and receive updates, there is a window of risk where nodes will not see operations that occurred while they were offline.

The current initialization path uses a sequence of messages between an initializing node (consumer) and a seed node (provider). Currently, this sequence is limited to retrieving only `ProtectedStorageEntrys` that exist on the provider node, but not on the consumer node.

### Current Synchronization Messages
```
message PreliminaryGetDataRequest {
    int32 nonce = 21;
    repeated bytes excluded_keys = 2;
    repeated int32 supported_capabilities = 3;
}

message GetDataResponse {
    int32 request_nonce = 1;
    bool is_get_updated_data_response = 2;
    repeated StorageEntryWrapper data_set = 3;
    repeated int32 supported_capabilities = 4;
    repeated PersistableNetworkPayload persistable_network_payload_items = 5;
}

message GetUpdatedDataRequest {
    NodeAddress sender_node_address = 1;
    int32 nonce = 2;
    repeated bytes excluded_keys = 3;
}
```

### Specific Failure Cases

**1. Lost Remove Messages**

The current message sequence just focuses on adding missing data and not synchronizing the state between the consumer and provider. This causes issues in two cases:

_a. RemoveData when node is offline_

`ProtectedStoragePayload` objects implementing the `PersistablePayload` interface will load from disk early in startup. This means that the payload hashes for these objects will be sent in the initial `PreliminaryGetDataRequest` message. Even if the provider node doesn't recognize the hash, it is only interested in sending NEW objects so the consumer node will never delete the should-be-deleted `ProtectedStorageEntry`.

This is the current bug with Compensation Requests (#3610)
_b. RemoveData between PreliminaryGetDataRequest & GetUpdatedDataRequest_

Until the HS is published, the P2PDataStorage layer does not receive broadcast messages. But the timing of this operation is not synchronized with the `PreliminaryGetDataRequest`->`GetDataResponse`->`GetUpdatedDataRequest` sequence.

This means that if a `RemoveData` message was broadcast BEFORE the HS was published, but AFTER the `GetDataResponse` was sent to the consumer, the consumer would add the should-be-removed Entry and would never know the Entry was removed. It would be up to the TTL to remove it eventually, but it may have UI artifacts or state updates based on this data that may be invalid.

**2. Lost Metadata Updates**

For `ProtectedStorageEntrys`, since only the payload hash is sent to the provider, it will not receive any updates to known payloads that have had their metadata updated. Metadata here refers to the
`metadata` section below.

```
    // metadata
    private byte[] signature;
    private long creationTimeStamp;
    private final int sequenceNumber;

    // immutable data
    private final ProtectedStoragePayload protectedStoragePayload;

``` 

So, if an update (read addProtectedStorageEntry() for existing Payload) was sent while a node was offline, the consumer node will not receive that update. I believe in the current system this isn't an issue, because there are not persistent `ProtectedStoragePayload` objects that depend on metadata (OfferPayload is non-persistent). But, if the future holds more payload types that depend on metadata updates, it may make sense to fix this gap.

## Design Goals
1. Remove risk window for lost removes and metadata updates
2. All ProtectedStorageEntry updates must be validated (no implied trust of SeedNodes). This isn't possible and I will explain more below. (_See Known Issues Section_)
3. SeedNode (old) <--> Bisq (new) should work
4. Bisq (old) <--> SeedNode (new) should work

## ProtectedStorageEntry Delta Design
This section will focus on a design for the P2PDataStorage system: `ProtectedStorageEntryDelta`. I'll explain the basic concepts and implementation and then tie it into how it can solve the initialization problems described above.

A `ProtectedStorageEntryDelta` is the minimum set of data required to perform an update or delete operation on a `ProtectedStorageEntry`. Think of them as a lightweight `ProtectedStorageEntry` that is only useful for **_objects that exist_**. They leverage the fact that `Payload` objects already exist on the peer, so they do not need to be retransmitted.

```
enum DeltaType {
    UPDATE = 1,
    REMOVE = 2,
}

public class ProtectedStorageEntryDelta {
    private final int deltaType;                 // 4 bytes 
    private final byte[] signature;              // 46 bytes
    private final byte[] hashOfPayload;          // 32 bytes
    private final int sequenceNumber;            // 4 bytes
    private long creationTimeStamp;              // 8 bytes 
} // 94 Bytes vs 124 Byte RefreshOffer and 1700 Byte RemoveData
```
### Benefits
Currently, the system relies on full ProtectedStorageEntry messages to remove data `RemoveDataMessage` (1700 bytes) and a smaller version for updates `RefreshOfferMessage` (123 bytes). Using a new lightweight `ProtectedStorageEntryDelta` object can give the same functionality while reducing the on-wire size.

- During the GetData sequence, the provider can send `RemoveDelta` objects to the consumer to inform them of  `RemoveDataMessage` that happened while the node was offline. This will be less expensive in-memory storage and bandwidth than storing the full `RemoveDataMessage` and sending it to them.

- During the GetData sequence, the provider could send `UpdateDelta` objects to the consumer to inform them of `AddDataMessage` that happened while the node was offline. This will be less expensive in-memory storage and bandwidth than storing the last full `AddDataMessage` and sending it to them. 

   _Important to note that I don't think there are any interesting consumers of this right now. The only `Payload` that cares about the metadata is the `OfferPayload` and since it is non-persistable it will never be sent from the consumer to the provider in the `PreliminaryGetDataRequest`._

- Future development work could replace `RefreshOfferMessage` with `UpdateDelta` reducing on-wire size by ~20%.
- Future development work could replace `RemoveDataMessage` with `RemoveDelta` reducing on-wire size by ~95%.
- Future development could optimize `addProtectedStorageEntry` to use an `UpdateDelta` as an optimization if the `Payload` already exists.

### Implementation Behavior

**RemoveDelta**

* If a `RemoveDelta` is received for a `ProtectedStorageEntry` that does not exist, it is discarded.
  * Without an existing `Payload` we can't validate the signature.  
* If a `RemoveDelta` is received for a `ProtectedStorageEntry` that does exist:
  * Validation is performed on the post-delta object using similar checks to RefreshTTL, a `ProtectedStorageEntry` is created from the existing `protectedStoragePayload` and the new metadata provided in the `ProtectedStorageEntryDelta`. It is then sent to the `remove()` call as if it was sent across the wire as a full `RemoveDataMessage(ProtectedStorageEntry)`.
    * This is subtle, but the `Payload.ownerPubKey` can be used in the Mailbox and Non-Mailbox case. Should talk through it more to validate that assertion.
  * Using the common `remove()` path ensure all the sequence number checks, remove-before-add, and other validation is performed equivalently between this delta-based remove and a full remove.

**UpdateDelta**

* If an `UpdateDelta` is received for a `ProtectedStorageEntry` that does not exist, it is discarded.
  * Without an existing `Payload` we can't validate the signature.
* If an `UpdateDelta` is received for a `ProtectedStorageEntry` that does exist:
  * Validation is performed on the post-delta object using similar checks to RefreshTTL, the new `ProtectedStorageEntry` is constructed with the existing `protectedStoragePayload` and `Payload.ownerPubKey` and the new metadata provided in the  `ProtectedStorageEntryDelta`. It is then sent to the `addProtectedStorageEntry()` call as if it was originally sent across the wire as a full `AddDataMessage(ProtectedStorageEntry)`.
  * Using the common `addProtectedStorageEntry()` path ensures all sequence number checks and other validation is performed equivalently between this delta-based add and a full add.

### Known Issues

#### Malicious Nodes
One of the goals of this design was to not put too much trust in seed nodes. But, after looking through the current implementation I don't believe this is possible without a major update to the P2P messages.

The main issue is that there is no signature verification of the message type, only the hash(seqNr, payload). This means any malicious node can just take the contents of a valid `AddDataMessage`, put them inside a `RemoveDataMessage`, and it will be accepted by peer nodes as a `remove()`.

We just rely on the fact that the original `AddDataMessage` with the same sequence number will propagate faster than the malicious `RemoveDataMessage`.

With this current constraint, there is also no way for a consumer node to validate that a `RemoveDelta` sent from a seed node was originally a fully-formed `RemoveDataMessage` that was sent to the network by the original owner. So any of these designs must be aware that a malicious actor can cause issues regardless of this design.

**_A potential fix for this is to update the Entry signatures to include the message type. This would remove the rebroadcast attack vector by giving nodes a way to validate the operation type as well so they wouldn't remove data that wasn't intentionally deleted by the owner._**

#### `createdTimeStamp` not verifiable
Similar to the above issue, it isn't possible for a consumer to verify that the `createdTimeStamp` in a `UpdateDelta` message is valid. This makes this path pretty uninteresting until the other Entry fields are signed or there is some common timekeeping mechanism for the network that doesn't rely on the system clock. The only path that would use this feature right now is the RefreshTTL path and that relies on the current system time, anyway.

## Development Plan [~4wks]

**_Fix #1a (RemoveData when node is offline)_**

There are quite a few ideas here. Some are good long term others are useful short term. The goal isn't to fix everything at once, but make progress while keeping it maintainable long term. I think the minimum version of this just implements `RemoveDelta` and just uses it in the `GetData` sequence. This helps with a few of the edge cases and allows something to be released sooner that fixes a big issue. If we like the idea of using `UpdateDelta`, it may make sense to incorporate that into the initial object design.

This path would fix the current issue with Compensation Requests and is a possible long term solution for other `PersistableProtectedStoragePayload` types.

I would recommend the following development plan:

### Phase 1 (Refactor GetData Path) [1wk]
I would like to push more of the knowledge about creating `GetData` messages and processing `GetDataResponse` messages down into `P2PDataStorage`. This will allow us to test the API for the P2PDataStore that constructs and processes the objects in this GetData path so when we change it we can verify it works.

I'm also hopeful that we would be able to write the interop tests as P2PDataStorage unit tests which would be a big win.

This would get all the plumbing and test fixtures in good shape so all changes will be easily testable.

```
New APIs

class P2PDataStorage {
  public GetData generateGetData(boolean isPreliminaryDataRequest, ..., ...);
  public GetDataResponse generateGetDataResponse(GetData request, ..., ...);
  public void processGetDataResponse(GetDataResponse response);
}
```

### Phase 2 (Update new APIs (seen above) that utilize `Delta` objects) [2wk]
This will validate the design and ensure any strange test cases are covered. If it doesn't work it will be obvious with just a few days of lost time.

- Updates to ProtectedMailboxStorageEntry to handle `RemoveDelta` requirements
- Implementation and tests of `private void applyDeltas(...)` [Consumer Side]
- Implementation and tests of `private void createDeltas(...)` [Provider Side]
  - Involves storing a `RemoveDelta` at deletion time for future consumers
- `P2PDataStorage::generateGetData()` and `P2PDataStorage::processGetDataResponse()` tests updated with new test cases including an old GetDataResponse (older seed node) and old GetData (nulled fields from older Bisq node)

### Phase 3 (Manual Testing) [2-3 days]
This involves developing a test plan involving combinations of old & new seed-nodes and Bisq clients

### Phase 4 (Rollout) [whenever the next release happens]
Do we have dedicated seednodes on testnet for beta testing? We could potentially start this during Phase 3

With all the previous testing, everything should work regardless of the providers version or the consumers version. So, we can choose to either roll out the seed node updates first or just do a normal release.

## Future Work
Here are some high-level ideas for next steps.

1. **_Remove 1b (RemoveData between PreliminaryGetDataRequest & GetUpdatedDataRequest) failure_**  
This would involve synchronizing the HS publishing (more importantly the first time we can receive Broadcast messages) and the GetData path to ensure that we can receive BroadcastMessages before we send a `GetUpdatedDataRequest`. This will ensure we will see any `RemoveData` messages that come in that otherwise would have been lost.

2. **_Remove 2 (Lost Metadata Updates)_**  
This is more about closing a potential issue later on if any `PersistentPayload` objects start caring about the Entry.createdTimeStamp. For now, it may be fine that consumer nodes have older seqNrs and createdTimestamps.

3. **_Optimize RemoveDataMessage_**  
By replacing `RemoveDataMessage` w/ `Removedelta` we would save about 95% of the bandwidth for those messages. They are infrequent so it may not be a priority.

4. **_Optimize RefreshTTL_**  
This message currently uses 32 bytes that aren't needed. Replacing it with an `UpdateDelta` would save some bandwidth.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/bisq-network/proposals/issues/140
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bisq.network/pipermail/bisq-github/attachments/20191116/0fbcd01f/attachment-0001.html>