[ovs-dev] [PATCH v5 2/3] ovsdb-idl: Avoid inconsistent IDL state with OVSDB_MONITOR_V3.
Dumitru Ceara
dceara at redhat.com
Wed May 27 09:03:18 UTC 2020
On 5/27/20 3:41 AM, Han Zhou wrote:
> Thanks Dumitru. Please see my comments below.
>
> On Thu, May 7, 2020 at 4:21 AM Dumitru Ceara <dceara at redhat.com
> <mailto:dceara at redhat.com>> wrote:
>>
>> Assuming an ovsdb client connected to a database using OVSDB_MONITOR_V3
>> (i.e., "monitor_cond_since" method) with the initial monitor condition
>> MC1.
>>
>> Assuming the following two transactions are executed on the
>> ovsdb-server:
>> TXN1: "insert record R1 in table T1"
>> TXN2: "insert record R2 in table T2"
>>
>> If the client's monitor condition MC1 for table T2 matches R2 then the
>> client will receive the following update3 message:
>> method="update3", "insert record R2 in table T2", last-txn-id=TXN2
>>
>> At this point, if the presence of the new record R2 in the IDL triggers
>> the client to update its monitor condition to MC2 and add a clause for
>> table T1 which matches R1, a monitor_cond_change message is sent to the
>> server:
>> method="monitor_cond_change", "clauses from MC2"
>>
>> In normal operation the ovsdb-server will reply with a new update3
>> message of the form:
>> method="update3", "insert record R1 in table T1", last-txn-id=TXN2
>>
>> However, if the connection drops in the meantime, this last update might
>> get lost.
>>
>> It might happen that during the reconnect a new transaction happens
>> that modifies the original record R1:
>> TXN3: "modify record R1 in table T1"
>>
>> When the client reconnects, it will try to perform a fast resync by
>> sending:
>> method="monitor_cond_since", "clauses from MC2", last-txn-id=TXN2
>>
>> Because TXN2 is still in the ovsdb-server transaction history, the
>> server replies with the changes from the most recent transactions only,
>> i.e., TXN3:
>> result="true", last-txbb-id=TXN3, "modify record R1 in table T1"
>>
>> This causes the IDL on the client in to end up in an inconsistent
>> state because it has never seen the update that created R1.
>>
>> Such a scenario is described in:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1808580#c22
>>
>> To avoid this issue, the IDL will now maintain (up to) 3 different
>> types of conditions for each DB table:
>> - new_cond: condition that has been set by the IDL client but has
>> not yet been sent to the server through monitor_cond_change.
>> - req_cond: condition that has been sent to the server but the reply
>> acknowledging the change hasn't been received yet.
>> - ack_cond: condition that has been acknowledged by the server.
>>
>> Whenever the IDL FSM is restarted (e.g., voluntary or involuntary
>> disconnect):
>> - if there is a known last_id txn-id the code ensures that new_cond
>> will contain the most recent condition set by the IDL client
>> (either req_cond if there was a request in flight, or new_cond
>> if the IDL client set a condition while the IDL was disconnected)
>> - if there is no known last_id txn-id the code ensures that ack_cond will
>> contain the most recent conditions set by the IDL client regardless
>> whether they were acked by the server or not.
>>
>> When monitor_cond_since/monitor_cond requests are sent they will
>> always include ack_cond and if new_cond is not NULL a follow up
>> monitor_cond_change will be generated afterwards.
>>
>> On the other hand ovsdb_idl_db_set_condition() will always modify
> new_cond.
>>
>> This ensures that updates of type "insert" that happened before the last
>> transaction known by the IDL but didn't match old monitor conditions are
>> sent upon reconnect if the monitor condition has changed to include them
>> in the meantime.
>>
>> CC: Han Zhou <hzhou at ovn.org <mailto:hzhou at ovn.org>>
>> Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when
> connection reset.")
>> Signed-off-by: Dumitru Ceara <dceara at redhat.com
> <mailto:dceara at redhat.com>>
>> ---
>> lib/ovsdb-idl-provider.h | 8 ++
>> lib/ovsdb-idl.c | 148
> +++++++++++++++++++++++++++++++++++++++-------
>> tests/ovsdb-idl.at <http://ovsdb-idl.at> | 56 +++++++++++++++++
>> 3 files changed, 187 insertions(+), 25 deletions(-)
>>
>> diff --git a/lib/ovsdb-idl-provider.h b/lib/ovsdb-idl-provider.h
>> index 30d1d08..00497d9 100644
>> --- a/lib/ovsdb-idl-provider.h
>> +++ b/lib/ovsdb-idl-provider.h
>> @@ -122,8 +122,12 @@ struct ovsdb_idl_table {
>> unsigned int change_seqno[OVSDB_IDL_CHANGE_MAX];
>> struct ovs_list indexes; /* Contains "struct ovsdb_idl_index"s */
>> struct ovs_list track_list; /* Tracked rows
> (ovsdb_idl_row.track_node). */
>> - struct ovsdb_idl_condition condition;
>> - bool cond_changed;
>> + struct ovsdb_idl_condition *ack_cond; /* Last condition acked by the
>> + * server. */
>> + struct ovsdb_idl_condition *req_cond; /* Last condition requested
> to the
>> + * server. */
>> + struct ovsdb_idl_condition *new_cond; /* Latest condition set by
> the IDL
>> + * client. */
>> };
>>
>> struct ovsdb_idl_class {
>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
>> index 1535ad7..557f61c 100644
>> --- a/lib/ovsdb-idl.c
>> +++ b/lib/ovsdb-idl.c
>> @@ -240,6 +240,10 @@ static void ovsdb_idl_send_monitor_request(struct
> ovsdb_idl *,
>> struct ovsdb_idl_db *,
>> enum
> ovsdb_idl_monitor_method);
>> static void ovsdb_idl_db_clear(struct ovsdb_idl_db *db);
>> +static void ovsdb_idl_db_ack_condition(struct ovsdb_idl_db *db);
>> +static void ovsdb_idl_db_sync_condition(struct ovsdb_idl_db *db);
>> +static void ovsdb_idl_condition_move(struct ovsdb_idl_condition **dst,
>> + struct ovsdb_idl_condition **src);
>>
>> struct ovsdb_idl {
>> struct ovsdb_idl_db server;
>> @@ -422,9 +426,11 @@ ovsdb_idl_db_init(struct ovsdb_idl_db *db, const
> struct ovsdb_idl_class *class,
>> = table->change_seqno[OVSDB_IDL_CHANGE_MODIFY]
>> = table->change_seqno[OVSDB_IDL_CHANGE_DELETE] = 0;
>> table->db = db;
>> - ovsdb_idl_condition_init(&table->condition);
>> - ovsdb_idl_condition_add_clause_true(&table->condition);
>> - table->cond_changed = false;
>> + table->ack_cond = NULL;
>> + table->req_cond = NULL;
>> + table->new_cond = xmalloc(sizeof *table->new_cond);
>> + ovsdb_idl_condition_init(table->new_cond);
>> + ovsdb_idl_condition_add_clause_true(table->new_cond);
>> }
>> db->monitor_id = json_array_create_2(json_string_create("monid"),
>>
> json_string_create(class->database));
>> @@ -556,12 +562,15 @@ ovsdb_idl_set_shuffle_remotes(struct ovsdb_idl
> *idl, bool shuffle)
>> static void
>> ovsdb_idl_db_destroy(struct ovsdb_idl_db *db)
>> {
>> + struct ovsdb_idl_condition *null_cond = NULL;
>> ovs_assert(!db->txn);
>> ovsdb_idl_db_txn_abort_all(db);
>> ovsdb_idl_db_clear(db);
>> for (size_t i = 0; i < db->class_->n_tables; i++) {
>> struct ovsdb_idl_table *table = &db->tables[i];
>> - ovsdb_idl_condition_destroy(&table->condition);
>> + ovsdb_idl_condition_move(&table->ack_cond, &null_cond);
>> + ovsdb_idl_condition_move(&table->req_cond, &null_cond);
>> + ovsdb_idl_condition_move(&table->new_cond, &null_cond);
>> ovsdb_idl_destroy_indexes(table);
>> shash_destroy(&table->columns);
>> hmap_destroy(&table->rows);
>> @@ -690,6 +699,12 @@ ovsdb_idl_send_request(struct ovsdb_idl *idl,
> struct jsonrpc_msg *request)
>> static void
>> ovsdb_idl_restart_fsm(struct ovsdb_idl *idl)
>> {
>> + /* Resync data DB table conditions to avoid missing updates due to
>> + * conditions that were in flight or changed locally while the
> connection
>> + * was down.
>> + */
>> + ovsdb_idl_db_sync_condition(&idl->data);
>> +
>> ovsdb_idl_send_schema_request(idl, &idl->server);
>> ovsdb_idl_transition(idl, IDL_S_SERVER_SCHEMA_REQUESTED);
>> idl->data.monitoring = OVSDB_IDL_NOT_MONITORING;
>> @@ -797,7 +812,9 @@ ovsdb_idl_process_response(struct ovsdb_idl *idl,
> struct jsonrpc_msg *msg)
>> * do, it's a "monitor_cond_change", which means that the
> conditional
>> * monitor clauses were updated.
>> *
>> - * If further condition changes were pending, send them now. */
>> + * Mark the last requested conditions and acked and if further
>
> typo: s/and acked/as acked
>
Ack :)
>> + * condition changes were pending, send them now. */
>> + ovsdb_idl_db_ack_condition(&idl->data);
>> ovsdb_idl_send_cond_change(idl);
>> idl->data.cond_seqno++;
>> break;
>> @@ -1493,30 +1510,60 @@ ovsdb_idl_condition_equals(const struct
> ovsdb_idl_condition *a,
>> }
>>
>> static void
>> -ovsdb_idl_condition_clone(struct ovsdb_idl_condition *dst,
>> +ovsdb_idl_condition_clone(struct ovsdb_idl_condition **dst,
>> const struct ovsdb_idl_condition *src)
>> {
>> - ovsdb_idl_condition_init(dst);
>> + if (*dst) {
>> + ovsdb_idl_condition_destroy(*dst);
>> + } else {
>> + *dst = xmalloc(sizeof **dst);
>> + }
>> + ovsdb_idl_condition_init(*dst);
>>
>> - dst->is_true = src->is_true;
>> + (*dst)->is_true = src->is_true;
>>
>> const struct ovsdb_idl_clause *clause;
>> HMAP_FOR_EACH (clause, hmap_node, &src->clauses) {
>> - ovsdb_idl_condition_add_clause__(dst, clause,
> clause->hmap_node.hash);
>> + ovsdb_idl_condition_add_clause__(*dst, clause,
> clause->hmap_node.hash);
>> }
>> }
>>
>> +static void
>> +ovsdb_idl_condition_move(struct ovsdb_idl_condition **dst,
>> + struct ovsdb_idl_condition **src)
>> +{
>> + if (*dst) {
>> + ovsdb_idl_condition_destroy(*dst);
>> + free(*dst);
>> + }
>> + *dst = *src;
>> + *src = NULL;
>> +}
>> +
>> static unsigned int
>> ovsdb_idl_db_set_condition(struct ovsdb_idl_db *db,
>> const struct ovsdb_idl_table_class *tc,
>> const struct ovsdb_idl_condition *condition)
>> {
>> + struct ovsdb_idl_condition *table_cond;
>> struct ovsdb_idl_table *table = ovsdb_idl_db_table_from_class(db,
> tc);
>> unsigned int seqno = db->cond_seqno;
>> - if (!ovsdb_idl_condition_equals(condition, &table->condition)) {
>> - ovsdb_idl_condition_destroy(&table->condition);
>> - ovsdb_idl_condition_clone(&table->condition, condition);
>> - db->cond_changed = table->cond_changed = true;
>> +
>> + /* Compare the new condition to the last known condition which can be
>> + * either "new" (not sent yet), "requested" or "acked", in this
> order.
>> + */
>> + if (table->new_cond) {
>> + table_cond = table->new_cond;
>> + } else if (table->req_cond) {
>> + table_cond = table->req_cond;
>> + } else {
>> + table_cond = table->ack_cond;
>> + }
>> + ovs_assert(table_cond);
>> +
>> + if (!ovsdb_idl_condition_equals(condition, table_cond)) {
>> + ovsdb_idl_condition_clone(&table->new_cond, condition);
>> + db->cond_changed = true;
>> poll_immediate_wake();
>> return seqno + 1;
>> }
>> @@ -1561,9 +1608,8 @@ ovsdb_idl_condition_to_json(const struct
> ovsdb_idl_condition *cnd)
>> }
>>
>> static struct json *
>> -ovsdb_idl_create_cond_change_req(struct ovsdb_idl_table *table)
>> +ovsdb_idl_create_cond_change_req(const struct ovsdb_idl_condition *cond)
>> {
>> - const struct ovsdb_idl_condition *cond = &table->condition;
>> struct json *monitor_cond_change_request = json_object_create();
>> struct json *cond_json = ovsdb_idl_condition_to_json(cond);
>>
>> @@ -1583,8 +1629,9 @@ ovsdb_idl_db_compose_cond_change(struct
> ovsdb_idl_db *db)
>> for (size_t i = 0; i < db->class_->n_tables; i++) {
>> struct ovsdb_idl_table *table = &db->tables[i];
>>
>> - if (table->cond_changed) {
>> - struct json *req = ovsdb_idl_create_cond_change_req(table);
>> + if (table->new_cond) {
>> + struct json *req =
>> + ovsdb_idl_create_cond_change_req(table->new_cond);
>> if (req) {
>> if (!monitor_cond_change_requests) {
>> monitor_cond_change_requests = json_object_create();
>> @@ -1593,7 +1640,11 @@ ovsdb_idl_db_compose_cond_change(struct
> ovsdb_idl_db *db)
>> table->class_->name,
>> json_array_create_1(req));
>> }
>> - table->cond_changed = false;
>> + /* Mark the new condition as requested by moving it to
> req_cond.
>> + * If there's already requested condition that's a bug.
>> + */
>> + ovs_assert(table->req_cond == NULL);
>> + ovsdb_idl_condition_move(&table->req_cond, &table->new_cond);
>> }
>> }
>>
>> @@ -1608,6 +1659,60 @@ ovsdb_idl_db_compose_cond_change(struct
> ovsdb_idl_db *db)
>> return jsonrpc_create_request("monitor_cond_change", params, NULL);
>> }
>>
>> +/* Marks all requested table conditions in 'db' as acked by the server.
>> + * It should be called when the server replies to monitor_cond_change
>> + * requests.
>> + */
>> +static void
>> +ovsdb_idl_db_ack_condition(struct ovsdb_idl_db *db)
>> +{
>> + for (size_t i = 0; i < db->class_->n_tables; i++) {
>> + struct ovsdb_idl_table *table = &db->tables[i];
>> +
>> + if (table->req_cond) {
>> + ovsdb_idl_condition_move(&table->ack_cond, &table->req_cond);
>> + }
>> + }
>> +}
>> +
>> +/* Should be called when the IDL fsm is restarted and resyncs table
> conditions
>> + * based on the state the DB is in:
>> + * - if a non-zero last_id is available for the DB then upon reconnect,
>> + * if the IDL will use monitor_cond_since, it should first request
> acked
>
> This sentence is a little bit confusing. "if the IDL will use
> monitor_cond_since," - I think if last_id is non-zero it is definitely
> using monitor_cond_since, right? The code doesn't have this check either.
>
There is actually a case when even if last_id is non-zero we will not be
using monitor_cond_since when reconnecting, even though this might not
happen in proper deployments:
When reconnecting we might choose a different server (or the server
instance might get upgraded/downgraded) and if the instance we connect
to doesn't support monitor_cond_since, the IDL client will fall back to
monitor_cond:
https://github.com/openvswitch/ovs/blob/3423cd97f88fe6a8de8b649d79fe6ac83bce94d1/lib/ovsdb-idl.c#L758
I agree that the sentence is a bit confusing, would this work better?
"if a non-zero last_id is available and if the server instance the
client reconnects to supports monitor_cond_since, the IDL client should
first request acked conditions to avoid missing updates about records
that were added before the transaction with txn-id == last_id."
>> + * conditions to avoid missing updates about records that were
> added before
>> + * the transaction with txn-id == last_id. If there were requested
>> + * condition changes in flight (i.e., req_cond not NULL) and the IDL
>> + * client didn't set new conditions (i.e., new_cond is NULL) then move
>> + * req_cond to new_cond to trigger a follow up cond_change request.
>> + * - if there's no last_id available for the DB then it's safe to use the
>> + * latest conditions set by the IDL client even if they weren't
> acked yet.
>> + */
>> +static void
>> +ovsdb_idl_db_sync_condition(struct ovsdb_idl_db *db)
>> +{
>> + bool ack_all = uuid_is_zero(&db->last_id);
>> +
>> + db->cond_changed = false;
>> + for (size_t i = 0; i < db->class_->n_tables; i++) {
>> + struct ovsdb_idl_table *table = &db->tables[i];
>> +
>> + if (ack_all) {
>> + if (table->new_cond) {
>> + ovsdb_idl_condition_move(&table->req_cond,
> &table->new_cond);
>> + }
>> +
>> + if (table->req_cond) {
>> + ovsdb_idl_condition_move(&table->ack_cond,
> &table->req_cond);
>> + }
>> + } else {
>> + if (table->req_cond && !table->new_cond) {
>> + ovsdb_idl_condition_move(&table->new_cond,
> &table->req_cond);
>> + db->cond_changed = true;
>
> What if new_cond is not NULL? Or it is impossible? If it is impossible,
> shall we assert it instead of adding in the if check?
>
If !ack_all (i.e., last_id != 0) we have two cases:
1) new_cond is set in which case the sequence of events upon reconnect
should be:
"monitor_cond_since(ack_cond, last_id)" followed by
"monitor_cond_change(new_cond)"
As soon as monitor_cond_change(new_cond) is sent, new_cond will be moved
to req_cond.
2) new_cond is not set in which case the sequence of events upon
reconnect should be:
"monitor_cond_since(ack_cond, last_id)" followed by
"monitor_cond_change(req_cond)" if req_cond != NULL (i.e., a request was
in flight when the disconnect happened)
To maintain the same logic of always sending
monitor_cond_change(new_cond) it's safe to just move req_cond to
new_cond in this second case.
What do you think?
Thanks,
Dumitru
>> + }
>> + }
>> + }
>> +}
>> +
>> static void
>> ovsdb_idl_send_cond_change(struct ovsdb_idl *idl)
>> {
>> @@ -2062,13 +2167,12 @@ ovsdb_idl_send_monitor_request(struct
> ovsdb_idl *idl, struct ovsdb_idl_db *db,
>> monitor_request = json_object_create();
>> json_object_put(monitor_request, "columns", columns);
>>
>> - const struct ovsdb_idl_condition *cond = &table->condition;
>> + const struct ovsdb_idl_condition *cond = table->ack_cond;
>> if ((monitor_method == OVSDB_IDL_MM_MONITOR_COND ||
>> monitor_method == OVSDB_IDL_MM_MONITOR_COND_SINCE) &&
>> - !ovsdb_idl_condition_is_true(cond)) {
>> + cond && !ovsdb_idl_condition_is_true(cond)) {
>> json_object_put(monitor_request, "where",
>> ovsdb_idl_condition_to_json(cond));
>> - table->cond_changed = false;
>> }
>> json_object_put(monitor_requests, tc->name,
>> json_array_create_1(monitor_request));
>> @@ -2076,8 +2180,6 @@ ovsdb_idl_send_monitor_request(struct ovsdb_idl
> *idl, struct ovsdb_idl_db *db,
>> }
>> free_schema(schema);
>>
>> - db->cond_changed = false;
>> -
>> struct json *params = json_array_create_3(
>> json_string_create(db->class_->database),
>> json_clone(db->monitor_id),
>> diff --git a/tests/ovsdb-idl.at <http://ovsdb-idl.at>
> b/tests/ovsdb-idl.at <http://ovsdb-idl.at>
>> index b5cbee7..4efed88 100644
>> --- a/tests/ovsdb-idl.at <http://ovsdb-idl.at>
>> +++ b/tests/ovsdb-idl.at <http://ovsdb-idl.at>
>> @@ -1828,3 +1828,59 @@ m4_define([OVSDB_CHECK_IDL_LEADER_ONLY_PY],
>>
>> OVSDB_CHECK_IDL_LEADER_ONLY_PY([Check Python IDL connects to leader],
> 3, ['remote'])
>> OVSDB_CHECK_IDL_LEADER_ONLY_PY([Check Python IDL reconnects to
> leader], 3, ['remote' '+remotestop' 'remote'])
>> +
>> +# same as OVSDB_CHECK_IDL but uses C IDL implementation with tcp
>> +# with multiple remotes.
>> +m4_define([OVSDB_CHECK_CLUSTER_IDL_C],
>> + [AT_SETUP([$1 - C - tcp])
>> + AT_KEYWORDS([ovsdb server idl positive tcp socket $5])
>> + m4_define([LPBK],[127.0.0.1])
>> + AT_CHECK([ovsdb_cluster_start_idltest $2 "ptcp:0:"LPBK])
>> + PARSE_LISTENING_PORT([s1.log], [TCP_PORT_1])
>> + PARSE_LISTENING_PORT([s2.log], [TCP_PORT_2])
>> + PARSE_LISTENING_PORT([s3.log], [TCP_PORT_3])
>> + remotes=tcp:LPBK:$TCP_PORT_1,tcp:LPBK:$TCP_PORT_2,tcp:LPBK:$TCP_PORT_3
>> +
>> + m4_if([$3], [], [],
>> + [AT_CHECK([ovsdb-client transact $remotes $3], [0], [ignore],
> [ignore])])
>> + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m'
> -vjsonrpc -t10 idl tcp:LPBK:$TCP_PORT_1 $4],
>> + [0], [stdout], [ignore])
>> + AT_CHECK([sort stdout | uuidfilt]m4_if([$7],,, [[| $7]]),
>> + [0], [$5])
>> + AT_CLEANUP])
>> +
>> +# Checks that monitor_cond_since works fine when disconnects happen
>> +# with cond_change requests in flight (i.e., IDL is properly updated).
>> +OVSDB_CHECK_CLUSTER_IDL_C([simple idl, monitor_cond_since, cluster
> disconnect],
>> + 3,
>> + [['["idltest",
>> + {"op": "insert",
>> + "table": "simple",
>> + "row": {"i": 1,
>> + "r": 1.0,
>> + "b": true}},
>> + {"op": "insert",
>> + "table": "simple",
>> + "row": {"i": 2,
>> + "r": 1.0,
>> + "b": true}}]']],
>> + [['condition simple []' \
>> + 'condition simple [["i","==",2]]' \
>> + 'condition simple [["i","==",1]]' \
>> + '+reconnect' \
>> + '["idltest",
>> + {"op": "update",
>> + "table": "simple",
>> + "where": [["i", "==", 1]],
>> + "row": {"r": 2.0 }}]']],
>> + [[000: change conditions
>> +001: empty
>> +002: change conditions
>> +003: i=2 r=1 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
>> +004: change conditions
>> +005: reconnect
>> +006: i=2 r=1 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
>> +007: {"error":null,"result":[{"count":1}]}
>> +008: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<2>
>> +009: done
>> +]])
>>
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org <mailto:dev at openvswitch.org>
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
More information about the dev
mailing list