[ovs-dev] [PATCH v5 2/3] ovsdb-idl: Avoid inconsistent IDL state with OVSDB_MONITOR_V3.

Dumitru Ceara dceara at redhat.com
Wed May 27 09:03:18 UTC 2020


On 5/27/20 3:41 AM, Han Zhou wrote:
> Thanks Dumitru. Please see my comments below.
> 
> On Thu, May 7, 2020 at 4:21 AM Dumitru Ceara <dceara at redhat.com
> <mailto:dceara at redhat.com>> wrote:
>>
>> Assuming an ovsdb client connected to a database using OVSDB_MONITOR_V3
>> (i.e., "monitor_cond_since" method) with the initial monitor condition
>> MC1.
>>
>> Assuming the following two transactions are executed on the
>> ovsdb-server:
>> TXN1: "insert record R1 in table T1"
>> TXN2: "insert record R2 in table T2"
>>
>> If the client's monitor condition MC1 for table T2 matches R2 then the
>> client will receive the following update3 message:
>> method="update3", "insert record R2 in table T2", last-txn-id=TXN2
>>
>> At this point, if the presence of the new record R2 in the IDL triggers
>> the client to update its monitor condition to MC2 and add a clause for
>> table T1 which matches R1, a monitor_cond_change message is sent to the
>> server:
>> method="monitor_cond_change", "clauses from MC2"
>>
>> In normal operation the ovsdb-server will reply with a new update3
>> message of the form:
>> method="update3", "insert record R1 in table T1", last-txn-id=TXN2
>>
>> However, if the connection drops in the meantime, this last update might
>> get lost.
>>
>> It might happen that during the reconnect a new transaction happens
>> that modifies the original record R1:
>> TXN3: "modify record R1 in table T1"
>>
>> When the client reconnects, it will try to perform a fast resync by
>> sending:
>> method="monitor_cond_since", "clauses from MC2", last-txn-id=TXN2
>>
>> Because TXN2 is still in the ovsdb-server transaction history, the
>> server replies with the changes from the most recent transactions only,
>> i.e., TXN3:
>> result="true", last-txbb-id=TXN3, "modify record R1 in table T1"
>>
>> This causes the IDL on the client in to end up in an inconsistent
>> state because it has never seen the update that created R1.
>>
>> Such a scenario is described in:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1808580#c22
>>
>> To avoid this issue, the IDL will now maintain (up to) 3 different
>> types of conditions for each DB table:
>> - new_cond: condition that has been set by the IDL client but has
>>   not yet been sent to the server through monitor_cond_change.
>> - req_cond: condition that has been sent to the server but the reply
>>   acknowledging the change hasn't been received yet.
>> - ack_cond: condition that has been acknowledged by the server.
>>
>> Whenever the IDL FSM is restarted (e.g., voluntary or involuntary
>> disconnect):
>> - if there is a known last_id txn-id the code ensures that new_cond
>>   will contain the most recent condition set by the IDL client
>>   (either req_cond if there was a request in flight, or new_cond
>>   if the IDL client set a condition while the IDL was disconnected)
>> - if there is no known last_id txn-id the code ensures that ack_cond will
>>   contain the most recent conditions set by the IDL client regardless
>>   whether they were acked by the server or not.
>>
>> When monitor_cond_since/monitor_cond requests are sent they will
>> always include ack_cond and if new_cond is not NULL a follow up
>> monitor_cond_change will be generated afterwards.
>>
>> On the other hand ovsdb_idl_db_set_condition() will always modify
> new_cond.
>>
>> This ensures that updates of type "insert" that happened before the last
>> transaction known by the IDL but didn't match old monitor conditions are
>> sent upon reconnect if the monitor condition has changed to include them
>> in the meantime.
>>
>> CC: Han Zhou <hzhou at ovn.org <mailto:hzhou at ovn.org>>
>> Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when
> connection reset.")
>> Signed-off-by: Dumitru Ceara <dceara at redhat.com
> <mailto:dceara at redhat.com>>
>> ---
>>  lib/ovsdb-idl-provider.h |    8 ++
>>  lib/ovsdb-idl.c          |  148
> +++++++++++++++++++++++++++++++++++++++-------
>>  tests/ovsdb-idl.at <http://ovsdb-idl.at>       |   56 +++++++++++++++++
>>  3 files changed, 187 insertions(+), 25 deletions(-)
>>
>> diff --git a/lib/ovsdb-idl-provider.h b/lib/ovsdb-idl-provider.h
>> index 30d1d08..00497d9 100644
>> --- a/lib/ovsdb-idl-provider.h
>> +++ b/lib/ovsdb-idl-provider.h
>> @@ -122,8 +122,12 @@ struct ovsdb_idl_table {
>>      unsigned int change_seqno[OVSDB_IDL_CHANGE_MAX];
>>      struct ovs_list indexes;    /* Contains "struct ovsdb_idl_index"s */
>>      struct ovs_list track_list; /* Tracked rows
> (ovsdb_idl_row.track_node). */
>> -    struct ovsdb_idl_condition condition;
>> -    bool cond_changed;
>> +    struct ovsdb_idl_condition *ack_cond; /* Last condition acked by the
>> +                                           * server. */
>> +    struct ovsdb_idl_condition *req_cond; /* Last condition requested
> to the
>> +                                           * server. */
>> +    struct ovsdb_idl_condition *new_cond; /* Latest condition set by
> the IDL
>> +                                           * client. */
>>  };
>>
>>  struct ovsdb_idl_class {
>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
>> index 1535ad7..557f61c 100644
>> --- a/lib/ovsdb-idl.c
>> +++ b/lib/ovsdb-idl.c
>> @@ -240,6 +240,10 @@ static void ovsdb_idl_send_monitor_request(struct
> ovsdb_idl *,
>>                                             struct ovsdb_idl_db *,
>>                                             enum
> ovsdb_idl_monitor_method);
>>  static void ovsdb_idl_db_clear(struct ovsdb_idl_db *db);
>> +static void ovsdb_idl_db_ack_condition(struct ovsdb_idl_db *db);
>> +static void ovsdb_idl_db_sync_condition(struct ovsdb_idl_db *db);
>> +static void ovsdb_idl_condition_move(struct ovsdb_idl_condition **dst,
>> +                                     struct ovsdb_idl_condition **src);
>>
>>  struct ovsdb_idl {
>>      struct ovsdb_idl_db server;
>> @@ -422,9 +426,11 @@ ovsdb_idl_db_init(struct ovsdb_idl_db *db, const
> struct ovsdb_idl_class *class,
>>              = table->change_seqno[OVSDB_IDL_CHANGE_MODIFY]
>>              = table->change_seqno[OVSDB_IDL_CHANGE_DELETE] = 0;
>>          table->db = db;
>> -        ovsdb_idl_condition_init(&table->condition);
>> -        ovsdb_idl_condition_add_clause_true(&table->condition);
>> -        table->cond_changed = false;
>> +        table->ack_cond = NULL;
>> +        table->req_cond = NULL;
>> +        table->new_cond = xmalloc(sizeof *table->new_cond);
>> +        ovsdb_idl_condition_init(table->new_cond);
>> +        ovsdb_idl_condition_add_clause_true(table->new_cond);
>>      }
>>      db->monitor_id = json_array_create_2(json_string_create("monid"),
>>                                          
> json_string_create(class->database));
>> @@ -556,12 +562,15 @@ ovsdb_idl_set_shuffle_remotes(struct ovsdb_idl
> *idl, bool shuffle)
>>  static void
>>  ovsdb_idl_db_destroy(struct ovsdb_idl_db *db)
>>  {
>> +    struct ovsdb_idl_condition *null_cond = NULL;
>>      ovs_assert(!db->txn);
>>      ovsdb_idl_db_txn_abort_all(db);
>>      ovsdb_idl_db_clear(db);
>>      for (size_t i = 0; i < db->class_->n_tables; i++) {
>>          struct ovsdb_idl_table *table = &db->tables[i];
>> -        ovsdb_idl_condition_destroy(&table->condition);
>> +        ovsdb_idl_condition_move(&table->ack_cond, &null_cond);
>> +        ovsdb_idl_condition_move(&table->req_cond, &null_cond);
>> +        ovsdb_idl_condition_move(&table->new_cond, &null_cond);
>>          ovsdb_idl_destroy_indexes(table);
>>          shash_destroy(&table->columns);
>>          hmap_destroy(&table->rows);
>> @@ -690,6 +699,12 @@ ovsdb_idl_send_request(struct ovsdb_idl *idl,
> struct jsonrpc_msg *request)
>>  static void
>>  ovsdb_idl_restart_fsm(struct ovsdb_idl *idl)
>>  {
>> +    /* Resync data DB table conditions to avoid missing updates due to
>> +     * conditions that were in flight or changed locally while the
> connection
>> +     * was down.
>> +     */
>> +    ovsdb_idl_db_sync_condition(&idl->data);
>> +
>>      ovsdb_idl_send_schema_request(idl, &idl->server);
>>      ovsdb_idl_transition(idl, IDL_S_SERVER_SCHEMA_REQUESTED);
>>      idl->data.monitoring = OVSDB_IDL_NOT_MONITORING;
>> @@ -797,7 +812,9 @@ ovsdb_idl_process_response(struct ovsdb_idl *idl,
> struct jsonrpc_msg *msg)
>>           * do, it's a "monitor_cond_change", which means that the
> conditional
>>           * monitor clauses were updated.
>>           *
>> -         * If further condition changes were pending, send them now. */
>> +         * Mark the last requested conditions and acked and if further
> 
> typo: s/and acked/as acked
> 

Ack :)

>> +         * condition changes were pending, send them now. */
>> +        ovsdb_idl_db_ack_condition(&idl->data);
>>          ovsdb_idl_send_cond_change(idl);
>>          idl->data.cond_seqno++;
>>          break;
>> @@ -1493,30 +1510,60 @@ ovsdb_idl_condition_equals(const struct
> ovsdb_idl_condition *a,
>>  }
>>
>>  static void
>> -ovsdb_idl_condition_clone(struct ovsdb_idl_condition *dst,
>> +ovsdb_idl_condition_clone(struct ovsdb_idl_condition **dst,
>>                            const struct ovsdb_idl_condition *src)
>>  {
>> -    ovsdb_idl_condition_init(dst);
>> +    if (*dst) {
>> +        ovsdb_idl_condition_destroy(*dst);
>> +    } else {
>> +        *dst = xmalloc(sizeof **dst);
>> +    }
>> +    ovsdb_idl_condition_init(*dst);
>>
>> -    dst->is_true = src->is_true;
>> +    (*dst)->is_true = src->is_true;
>>
>>      const struct ovsdb_idl_clause *clause;
>>      HMAP_FOR_EACH (clause, hmap_node, &src->clauses) {
>> -        ovsdb_idl_condition_add_clause__(dst, clause,
> clause->hmap_node.hash);
>> +        ovsdb_idl_condition_add_clause__(*dst, clause,
> clause->hmap_node.hash);
>>      }
>>  }
>>
>> +static void
>> +ovsdb_idl_condition_move(struct ovsdb_idl_condition **dst,
>> +                         struct ovsdb_idl_condition **src)
>> +{
>> +    if (*dst) {
>> +        ovsdb_idl_condition_destroy(*dst);
>> +        free(*dst);
>> +    }
>> +    *dst = *src;
>> +    *src = NULL;
>> +}
>> +
>>  static unsigned int
>>  ovsdb_idl_db_set_condition(struct ovsdb_idl_db *db,
>>                             const struct ovsdb_idl_table_class *tc,
>>                             const struct ovsdb_idl_condition *condition)
>>  {
>> +    struct ovsdb_idl_condition *table_cond;
>>      struct ovsdb_idl_table *table = ovsdb_idl_db_table_from_class(db,
> tc);
>>      unsigned int seqno = db->cond_seqno;
>> -    if (!ovsdb_idl_condition_equals(condition, &table->condition)) {
>> -        ovsdb_idl_condition_destroy(&table->condition);
>> -        ovsdb_idl_condition_clone(&table->condition, condition);
>> -        db->cond_changed = table->cond_changed = true;
>> +
>> +    /* Compare the new condition to the last known condition which can be
>> +     * either "new" (not sent yet), "requested" or "acked", in this
> order.
>> +     */
>> +    if (table->new_cond) {
>> +        table_cond = table->new_cond;
>> +    } else if (table->req_cond) {
>> +        table_cond = table->req_cond;
>> +    } else {
>> +        table_cond = table->ack_cond;
>> +    }
>> +    ovs_assert(table_cond);
>> +
>> +    if (!ovsdb_idl_condition_equals(condition, table_cond)) {
>> +        ovsdb_idl_condition_clone(&table->new_cond, condition);
>> +        db->cond_changed = true;
>>          poll_immediate_wake();
>>          return seqno + 1;
>>      }
>> @@ -1561,9 +1608,8 @@ ovsdb_idl_condition_to_json(const struct
> ovsdb_idl_condition *cnd)
>>  }
>>
>>  static struct json *
>> -ovsdb_idl_create_cond_change_req(struct ovsdb_idl_table *table)
>> +ovsdb_idl_create_cond_change_req(const struct ovsdb_idl_condition *cond)
>>  {
>> -    const struct ovsdb_idl_condition *cond = &table->condition;
>>      struct json *monitor_cond_change_request = json_object_create();
>>      struct json *cond_json = ovsdb_idl_condition_to_json(cond);
>>
>> @@ -1583,8 +1629,9 @@ ovsdb_idl_db_compose_cond_change(struct
> ovsdb_idl_db *db)
>>      for (size_t i = 0; i < db->class_->n_tables; i++) {
>>          struct ovsdb_idl_table *table = &db->tables[i];
>>
>> -        if (table->cond_changed) {
>> -            struct json *req = ovsdb_idl_create_cond_change_req(table);
>> +        if (table->new_cond) {
>> +            struct json *req =
>> +                ovsdb_idl_create_cond_change_req(table->new_cond);
>>              if (req) {
>>                  if (!monitor_cond_change_requests) {
>>                      monitor_cond_change_requests = json_object_create();
>> @@ -1593,7 +1640,11 @@ ovsdb_idl_db_compose_cond_change(struct
> ovsdb_idl_db *db)
>>                               table->class_->name,
>>                               json_array_create_1(req));
>>              }
>> -            table->cond_changed = false;
>> +            /* Mark the new condition as requested by moving it to
> req_cond.
>> +             * If there's already requested condition that's a bug.
>> +             */
>> +            ovs_assert(table->req_cond == NULL);
>> +            ovsdb_idl_condition_move(&table->req_cond, &table->new_cond);
>>          }
>>      }
>>
>> @@ -1608,6 +1659,60 @@ ovsdb_idl_db_compose_cond_change(struct
> ovsdb_idl_db *db)
>>      return jsonrpc_create_request("monitor_cond_change", params, NULL);
>>  }
>>
>> +/* Marks all requested table conditions in 'db' as acked by the server.
>> + * It should be called when the server replies to monitor_cond_change
>> + * requests.
>> + */
>> +static void
>> +ovsdb_idl_db_ack_condition(struct ovsdb_idl_db *db)
>> +{
>> +    for (size_t i = 0; i < db->class_->n_tables; i++) {
>> +        struct ovsdb_idl_table *table = &db->tables[i];
>> +
>> +        if (table->req_cond) {
>> +            ovsdb_idl_condition_move(&table->ack_cond, &table->req_cond);
>> +        }
>> +    }
>> +}
>> +
>> +/* Should be called when the IDL fsm is restarted and resyncs table
> conditions
>> + * based on the state the DB is in:
>> + * - if a non-zero last_id is available for the DB then upon reconnect,
>> + *   if the IDL will use monitor_cond_since, it should first request
> acked
> 
> This sentence is a little bit confusing. "if the IDL will use
> monitor_cond_since," - I think if last_id is non-zero it is definitely
> using monitor_cond_since, right? The code doesn't have this check either.
> 

There is actually a case when even if last_id is non-zero we will not be
using monitor_cond_since when reconnecting, even though this might not
happen in proper deployments:

When reconnecting we might choose a different server (or the server
instance might get upgraded/downgraded) and if the instance we connect
to doesn't support monitor_cond_since, the IDL client will fall back to
monitor_cond:

https://github.com/openvswitch/ovs/blob/3423cd97f88fe6a8de8b649d79fe6ac83bce94d1/lib/ovsdb-idl.c#L758

I agree that the sentence is a bit confusing, would this work better?

"if a non-zero last_id is available and if the server instance the
client reconnects to supports monitor_cond_since, the IDL client should
first request acked conditions to avoid missing updates about records
that were added before the transaction with txn-id == last_id."

>> + *   conditions to avoid missing updates about records that were
> added before
>> + *   the transaction with txn-id == last_id. If there were requested
>> + *   condition changes in flight (i.e., req_cond not NULL) and the IDL
>> + *   client didn't set new conditions (i.e., new_cond is NULL) then move
>> + *   req_cond to new_cond to trigger a follow up cond_change request.
>> + * - if there's no last_id available for the DB then it's safe to use the
>> + *   latest conditions set by the IDL client even if they weren't
> acked yet.
>> + */
>> +static void
>> +ovsdb_idl_db_sync_condition(struct ovsdb_idl_db *db)
>> +{
>> +    bool ack_all = uuid_is_zero(&db->last_id);
>> +
>> +    db->cond_changed = false;
>> +    for (size_t i = 0; i < db->class_->n_tables; i++) {
>> +        struct ovsdb_idl_table *table = &db->tables[i];
>> +
>> +        if (ack_all) {
>> +            if (table->new_cond) {
>> +                ovsdb_idl_condition_move(&table->req_cond,
> &table->new_cond);
>> +            }
>> +
>> +            if (table->req_cond) {
>> +                ovsdb_idl_condition_move(&table->ack_cond,
> &table->req_cond);
>> +            }
>> +        } else {
>> +            if (table->req_cond && !table->new_cond) {
>> +                ovsdb_idl_condition_move(&table->new_cond,
> &table->req_cond);
>> +                db->cond_changed = true;
> 
> What if new_cond is not NULL? Or it is impossible? If it is impossible,
> shall we assert it instead of adding in the if check?
> 

If !ack_all (i.e., last_id != 0) we have two cases:
1) new_cond is set in which case the sequence of events upon reconnect
should be:

"monitor_cond_since(ack_cond, last_id)" followed by
"monitor_cond_change(new_cond)"

As soon as monitor_cond_change(new_cond) is sent, new_cond will be moved
to req_cond.

2) new_cond is not set in which case the sequence of events upon
reconnect should be:

"monitor_cond_since(ack_cond, last_id)" followed by
"monitor_cond_change(req_cond)" if req_cond != NULL (i.e., a request was
in flight when the disconnect happened)

To maintain the same logic of always sending
monitor_cond_change(new_cond) it's safe to just move req_cond to
new_cond in this second case.

What do you think?

Thanks,
Dumitru

>> +            }
>> +        }
>> +    }
>> +}
>> +
>>  static void
>>  ovsdb_idl_send_cond_change(struct ovsdb_idl *idl)
>>  {
>> @@ -2062,13 +2167,12 @@ ovsdb_idl_send_monitor_request(struct
> ovsdb_idl *idl, struct ovsdb_idl_db *db,
>>              monitor_request = json_object_create();
>>              json_object_put(monitor_request, "columns", columns);
>>
>> -            const struct ovsdb_idl_condition *cond = &table->condition;
>> +            const struct ovsdb_idl_condition *cond = table->ack_cond;
>>              if ((monitor_method == OVSDB_IDL_MM_MONITOR_COND ||
>>                   monitor_method == OVSDB_IDL_MM_MONITOR_COND_SINCE) &&
>> -                !ovsdb_idl_condition_is_true(cond)) {
>> +                cond && !ovsdb_idl_condition_is_true(cond)) {
>>                  json_object_put(monitor_request, "where",
>>                                  ovsdb_idl_condition_to_json(cond));
>> -                table->cond_changed = false;
>>              }
>>              json_object_put(monitor_requests, tc->name,
>>                              json_array_create_1(monitor_request));
>> @@ -2076,8 +2180,6 @@ ovsdb_idl_send_monitor_request(struct ovsdb_idl
> *idl, struct ovsdb_idl_db *db,
>>      }
>>      free_schema(schema);
>>
>> -    db->cond_changed = false;
>> -
>>      struct json *params = json_array_create_3(
>>                                json_string_create(db->class_->database),
>>                                json_clone(db->monitor_id),
>> diff --git a/tests/ovsdb-idl.at <http://ovsdb-idl.at>
> b/tests/ovsdb-idl.at <http://ovsdb-idl.at>
>> index b5cbee7..4efed88 100644
>> --- a/tests/ovsdb-idl.at <http://ovsdb-idl.at>
>> +++ b/tests/ovsdb-idl.at <http://ovsdb-idl.at>
>> @@ -1828,3 +1828,59 @@ m4_define([OVSDB_CHECK_IDL_LEADER_ONLY_PY],
>>
>>  OVSDB_CHECK_IDL_LEADER_ONLY_PY([Check Python IDL connects to leader],
> 3, ['remote'])
>>  OVSDB_CHECK_IDL_LEADER_ONLY_PY([Check Python IDL reconnects to
> leader], 3, ['remote' '+remotestop' 'remote'])
>> +
>> +# same as OVSDB_CHECK_IDL but uses C IDL implementation with tcp
>> +# with multiple remotes.
>> +m4_define([OVSDB_CHECK_CLUSTER_IDL_C],
>> +  [AT_SETUP([$1 - C - tcp])
>> +   AT_KEYWORDS([ovsdb server idl positive tcp socket $5])
>> +   m4_define([LPBK],[127.0.0.1])
>> +   AT_CHECK([ovsdb_cluster_start_idltest $2 "ptcp:0:"LPBK])
>> +   PARSE_LISTENING_PORT([s1.log], [TCP_PORT_1])
>> +   PARSE_LISTENING_PORT([s2.log], [TCP_PORT_2])
>> +   PARSE_LISTENING_PORT([s3.log], [TCP_PORT_3])
>> +   remotes=tcp:LPBK:$TCP_PORT_1,tcp:LPBK:$TCP_PORT_2,tcp:LPBK:$TCP_PORT_3
>> +
>> +   m4_if([$3], [], [],
>> +     [AT_CHECK([ovsdb-client transact $remotes $3], [0], [ignore],
> [ignore])])
>> +   AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m'
> -vjsonrpc -t10 idl tcp:LPBK:$TCP_PORT_1 $4],
>> +            [0], [stdout], [ignore])
>> +   AT_CHECK([sort stdout | uuidfilt]m4_if([$7],,, [[| $7]]),
>> +            [0], [$5])
>> +   AT_CLEANUP])
>> +
>> +# Checks that monitor_cond_since works fine when disconnects happen
>> +# with cond_change requests in flight (i.e., IDL is properly updated).
>> +OVSDB_CHECK_CLUSTER_IDL_C([simple idl, monitor_cond_since, cluster
> disconnect],
>> +  3,
>> +  [['["idltest",
>> +       {"op": "insert",
>> +       "table": "simple",
>> +       "row": {"i": 1,
>> +               "r": 1.0,
>> +               "b": true}},
>> +       {"op": "insert",
>> +       "table": "simple",
>> +       "row": {"i": 2,
>> +               "r": 1.0,
>> +               "b": true}}]']],
>> +  [['condition simple []' \
>> +    'condition simple [["i","==",2]]' \
>> +    'condition simple [["i","==",1]]' \
>> +    '+reconnect' \
>> +    '["idltest",
>> +      {"op": "update",
>> +       "table": "simple",
>> +       "where": [["i", "==", 1]],
>> +       "row": {"r": 2.0 }}]']],
>> +  [[000: change conditions
>> +001: empty
>> +002: change conditions
>> +003: i=2 r=1 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
>> +004: change conditions
>> +005: reconnect
>> +006: i=2 r=1 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
>> +007: {"error":null,"result":[{"count":1}]}
>> +008: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<2>
>> +009: done
>> +]])
>>
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org <mailto:dev at openvswitch.org>
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev



More information about the dev mailing list