[ovs-dev] [PATCH v4 2/3] ovsdb-idl: Avoid inconsistent IDL state with OVSDB_MONITOR_V3.

Fri Apr 24 20:48:57 UTC 2020

On 4/24/20 10:23 PM, Han Zhou wrote:
> 
> 
> On Thu, Apr 23, 2020 at 12:04 PM Ilya Maximets <i.maximets at ovn.org <mailto:i.maximets at ovn.org>> wrote:
>>
>> On 4/23/20 8:22 PM, Ilya Maximets wrote:
>> > On 4/22/20 8:38 PM, Dumitru Ceara wrote:
>> >> Assuming an ovsdb client connected to a database using OVSDB_MONITOR_V3
>> >> (i.e., "monitor_cond_since" method) with the initial monitor condition
>> >> MC1.
>> >>
>> >> Assuming the following two transactions are executed on the
>> >> ovsdb-server:
>> >> TXN1: "insert record R1 in table T1"
>> >> TXN2: "insert record R2 in table T2"
>> >>
>> >> If the client's monitor condition MC1 for table T2 matches R2 then the
>> >> client will receive the following update3 message:
>> >> method="update3", "insert record R2 in table T2", last-txn-id=TXN2
>> >>
>> >> At this point, if the presence of the new record R2 in the IDL triggers
>> >> the client to update its monitor condition to MC2 and add a clause for
>> >> table T1 which matches R1, a monitor_cond_change message is sent to the
>> >> server:
>> >> method="monitor_cond_change", "clauses from MC2"
>> >>
>> >> In normal operation the ovsdb-server will reply with a new update3
>> >> message of the form:
>> >> method="update3", "insert record R1 in table T1", last-txn-id=TXN2
>> >>
>> >> However, if the connection drops in the meantime, this last update might
>> >> get lost.
>> >>
>> >> It might happen that during the reconnect a new transaction happens
>> >> that modifies the original record R1:
>> >> TXN3: "modify record R1 in table T1"
>> >>
>> >> When the client reconnects, it will try to perform a fast resync by
>> >> sending:
>> >> method="monitor_cond_since", "clauses from MC2", last-txn-id=TXN2
>> >>
>> >> Because TXN2 is still in the ovsdb-server transaction history, the
>> >> server replies with the changes from the most recent transactions only,
>> >> i.e., TXN3:
>> >> result="true", last-txbb-id=TXN3, "modify record R1 in table T1"
>> >>
>> >> This causes the IDL on the client in to end up in an inconsistent
>> >> state because it has never seen the update that created R1.
>> >>
>> >> Such a scenario is described in:
>> >> https://bugzilla.redhat.com/show_bug.cgi?id=1808580#c22
>> >>
>> >> To avoid hitting this issue, whenever a reconnect happens (either
>> >> due to an IDL retry or due to network issues), we clear db->last_id
>> >> in ovsdb_idl_restart_fsm() in any of the following cases:
>> >> - a monitor condition has been changed locally and the corresponding
>> >>   request was not sent yet (i.e., idl->data.cond_changed == true).
>> >> - a monitor_cond_change request is in flight.
>> >>
>> >> This ensures that updates of type "insert" that happened before the last
>> >> transaction known by the IDL but didn't match old monitor conditions are
>> >> sent upon reconnect if the monitor condition has changed to include them
>> >> in the meantime.
>> >>
>> >> CC: Han Zhou <hzhou at ovn.org <mailto:hzhou at ovn.org>>
>> >> Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when connection reset.")
>> >> Signed-off-by: Dumitru Ceara <dceara at redhat.com <mailto:dceara at redhat.com>>
>> >> ---
>> >>  lib/ovsdb-idl.c |   19 +++++++++++++++++++
>> >>  1 file changed, 19 insertions(+)
>> >>
>> >> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
>> >> index 1535ad7..67c4745 100644
>> >> --- a/lib/ovsdb-idl.c
>> >> +++ b/lib/ovsdb-idl.c
>> >> @@ -690,6 +690,25 @@ ovsdb_idl_send_request(struct ovsdb_idl *idl, struct jsonrpc_msg *request)
>> >>  static void
>> >>  ovsdb_idl_restart_fsm(struct ovsdb_idl *idl)
>> >>  {
>> >> +    /* If there's an outstanding request of type monitor_cond_change and
>> >> +     * we're in monitor_cond_since mode then we can't trust that all relevant
>> >> +     * updates from transaction idl->data.last_id have been received as we
>> >> +     * might have relaxed the monitor condition with our last request and
>> >> +     * might be missing previously not monitored records.
>> >> +     *
>> >> +     * Same reasoning applies for the case when a monitor condition has been
>> >> +     * changed locally but the monitor_cond_change request was not sent yet.
>> >> +     *
>> >> +     * In both cases, clear last_id to make sure that the next time
>> >> +     * monitor_cond_since is sent (i.e., after reconnect) we get the complete
>> >> +     * view of the database.
>> >> +     */
>> >> +    if (idl->data.cond_changed ||
>> >> +            (idl->request_id &&
>> >> +                idl->data.monitoring == OVSDB_IDL_MONITORING_COND_SINCE)) {
>> >> +        uuid_zero(&idl->data.last_id);
>> >
>> > As Han pointed out during OVN IRC meeting,  when there is a change to SB, e.g.
>> > creating a new DP, it causes all HVs changing the condition, and at the same time
>> > one of the SB servers might be disconnected triggering this bug on a big number
>> > of nodes at once.  If all of these nodes will request the full database at the
>> > same time, this might be an issue.
>> >
>> > One possibility that came to mind is that we could store 'last_id' for the
>> > previous successful cond_change and use it instead of 0 on re-connection if there
>> > was in-flight cond_change.  This way we could reduce the pressure on SB server.
>>
>> Thinking more about this, this idea should not work because with new conditions we
>> might need to receive changes that happened even before the last successful cond_change
>> because we might relax conditions incrementally.
>>
>> So, current solution seems the most correct for now.  I'd like to hear some other
>> thoughts about this issue with massive re-connections, though, if any.
>>
> 
> Ideally, this can be solved by below mechanism:
> 1. If there is an uncompleted cond_change (either not sent, or in-flight) while server
> disconnected, the IDL firsty reconnect with old condition with last_id being the current
> data position, so that server knows the old condition first.
> 2. Then it retry the cond_change request with the new condition. Server will send all
> needed updates for the delta between old and new condition.

Yeah, we've been discussing same approach today with Dumitru.

> This may need more complex logic in the IDL implementation, if not too complex.

Implementation suggestion might be to store "new_conditions" along with "conditions":

- set_conditions() should set "new_conditions" if they are not equal to "conditions"
  or current "new_conditions".
- monitor_cond_since should always use old "conditions".
- monitor_cond_change should always send "new_conditions".
  On reply from the server "conditions" should be freed, "new_conditions" moved to
  "conditions" and "new_conditions" cleared.

In this case, on re-connection, both "conditions" and "new_conditions" will be present
and two sequential requests (monitor_cond_since, monitor_cond_change) will be triggered
naturally by the existing code.

> 
> Thanks,
> Han
> 
>> >
>> >> +    }
>> >> +
>> >>      ovsdb_idl_send_schema_request(idl, &idl->server);
>> >>      ovsdb_idl_transition(idl, IDL_S_SERVER_SCHEMA_REQUESTED);
>> >>      idl->data.monitoring = OVSDB_IDL_NOT_MONITORING;
>> >>
>> >
>>