[ovs-dev] [PATCH v7] ovsdb-idl: Force IDL retry when missing updates encountered.
Han Zhou
hzhou at ovn.org
Wed Jul 1 14:22:12 UTC 2020
On Wed, Jul 1, 2020 at 7:17 AM Dumitru Ceara <dceara at redhat.com> wrote:
>
> On 7/1/20 4:10 PM, Han Zhou wrote:
> > Hi Dumitru,
> >
> > Just a few minor comments. With them addressed:
> > Acked-by: Han Zhou <hzhou at ovn.org <mailto:hzhou at ovn.org>>
> >
> >
> > On Thu, Jun 18, 2020 at 1:45 PM Dumitru Ceara <dceara at redhat.com
> > <mailto:dceara at redhat.com>> wrote:
> >>
> >> Adds a generic recovery mechanism which triggers an IDL retry with fast
> >> resync disabled in case the IDL has detected that it ended up in an
> >> inconsistent state due to other bugs in the ovsdb-server/ovsdb-idl
> >> implementation.
> >>
> >> Additionally, this commit also:
> >> - bumps IDL semantic error logs to level ERR to make them more
> >> visible.
> >> - triggers an IDL retry in cases when the IDL client used to try to
> >> recover (i.e., trying to add an existing row, trying to remove a non
> >> existent row). This is required because even though the specific
> >> inconsistency can be fixed locally the client has no clue about the
> >> state of the server. For example, if the server hit a bug and isn't
> >> accepting any other transactions from its cluster leader the client
> >> will have no other chance of detecting this.
> >
> > I think this is not a good example. In this case even if we retry we
> > could still connect to the same stale server and get stale data
> > (normally such servers should inform clients that they are disconnect
> > from the cluster and client IDL should retry other servers, but we are
> > talking about bug situation). The change here is in fact useful when the
> > server is already recovered from bugs (probably with operator
> > intervention), but clients already have inconsistent data, and now it is
> > an opportunity to correct all inconsistency on the client side. I'd
> > suggest removing the example.
> >
>
> Hi Han,
>
> Thanks for the review. Sure, I'll remove this part.
>
> >>
> >> CC: Andy Zhou <azhou at ovn.org <mailto:azhou at ovn.org>>
> >> CC: Han Zhou <hzhou at ovn.org <mailto:hzhou at ovn.org>>
> >> CC: Ilya Maximets <i.maximets at ovn.org <mailto:i.maximets at ovn.org>>
> >> Fixes: db2b5757328c ("lib: add monitor2 support in ovsdb-idl.")
> >> Signed-off-by: Dumitru Ceara <dceara at redhat.com
> > <mailto:dceara at redhat.com>>
> >>
> >> ---
> >> V7:
> >> - Address Ilya's comments:
> >> - improve result returning in ovsdb_idl_process_update2
> >> - For consistency use the same way of returning results in
> >> ovsdb_idl_process_update and make error handling generic.
> >> - Bump semantic log level to ERR to make them more visible and force
IDL
> >> retry on every semantic error.
> >> ---
> >> lib/ovsdb-idl.c | 169
> > ++++++++++++++++++++++++++++++++++++--------------------
> >> 1 file changed, 109 insertions(+), 60 deletions(-)
> >>
> >> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
> >> index 0a18261..ef3b97b 100644
> >> --- a/lib/ovsdb-idl.c
> >> +++ b/lib/ovsdb-idl.c
> >> @@ -321,14 +321,19 @@ static bool
> > ovsdb_idl_handle_monitor_canceled(struct ovsdb_idl *,
> >> static void ovsdb_idl_db_parse_update(struct ovsdb_idl_db *,
> >> const struct json
*table_updates,
> >> enum ovsdb_idl_monitor_method
> > method);
> >> -static bool ovsdb_idl_process_update(struct ovsdb_idl_table *,
> >> - const struct uuid *,
> >> - const struct json *old,
> >> - const struct json *new);
> >> -static bool ovsdb_idl_process_update2(struct ovsdb_idl_table *,
> >> - const struct uuid *,
> >> - const char *operation,
> >> - const struct json *row);
> >> +enum update_result {
> >> + OVSDB_IDL_UPDATE_DB_CHANGED,
> >> + OVSDB_IDL_UPDATE_NO_CHANGES,
> >> + OVSDB_IDL_UPDATE_INCONSISTENT,
> >> +};
> >> +static enum update_result ovsdb_idl_process_update(struct
> > ovsdb_idl_table *,
> >> + const struct uuid
*,
> >> + const struct json
> > *old,
> >> + const struct json
> > *new);
> >> +static enum update_result ovsdb_idl_process_update2(struct
> > ovsdb_idl_table *,
> >> + const struct uuid
*,
> >> + const char
> > *operation,
> >> + const struct json
> > *row);
> >> static void ovsdb_idl_insert_row(struct ovsdb_idl_row *, const struct
> > json *);
> >> static void ovsdb_idl_delete_row(struct ovsdb_idl_row *);
> >> static bool ovsdb_idl_modify_row(struct ovsdb_idl_row *, const struct
> > json *);
> >> @@ -2418,6 +2423,7 @@ ovsdb_idl_db_parse_update__(struct ovsdb_idl_db
*db,
> >> version_suffix,
> > table->class_->name);
> >> }
> >> SHASH_FOR_EACH (table_node, json_object(table_update)) {
> >> + enum update_result result = OVSDB_IDL_UPDATE_NO_CHANGES;
> >> const struct json *row_update = table_node->data;
> >> struct uuid uuid;
> >>
> >> @@ -2450,13 +2456,13 @@ ovsdb_idl_db_parse_update__(struct
> > ovsdb_idl_db *db,
> >> operation = ops[i];
> >> row = shash_find_data(json_object(row_update),
> > operation);
> >>
> >> - if (row) {
> >> - if (ovsdb_idl_process_update2(table, &uuid,
> > operation,
> >> - row)) {
> >> - db->change_seqno++;
> >> - }
> >> - break;
> >> + if (!row) {
> >> + continue;
> >> }
> >> +
> >> + result = ovsdb_idl_process_update2(table, &uuid,
> >> + operation,
row);
> >> + break;
> >> }
> >>
> >> /* row_update2 should contain one of the objects */
> >> @@ -2487,10 +2493,24 @@ ovsdb_idl_db_parse_update__(struct
> > ovsdb_idl_db *db,
> >> "and \"new\" members");
> >> }
> >>
> >> - if (ovsdb_idl_process_update(table, &uuid, old_json,
> >> - new_json)) {
> >> - db->change_seqno++;
> >> - }
> >> + result = ovsdb_idl_process_update(table, &uuid,
old_json,
> >> + new_json);
> >> + }
> >> +
> >> + switch (result) {
> >> + case OVSDB_IDL_UPDATE_DB_CHANGED:
> >> + db->change_seqno++;
> >> + break;
> >> + case OVSDB_IDL_UPDATE_NO_CHANGES:
> >> + break;
> >> + case OVSDB_IDL_UPDATE_INCONSISTENT:
> >> + memset(&db->last_id, 0, sizeof db->last_id);
> >> + ovsdb_idl_retry(db->idl);
> >> + return ovsdb_error(NULL,
> >> + "<row_update%s> received for
> > inconsistent "
> >> + "IDL: reconnecting IDL and resync
> > all "
> >> + "data",
> >> + version_suffix);
> >> }
> >> }
> >> }
> >> @@ -2523,9 +2543,22 @@ ovsdb_idl_get_row(struct ovsdb_idl_table
> > *table, const struct uuid *uuid)
> >> return NULL;
> >> }
> >>
> >> -/* Returns true if a column with mode OVSDB_IDL_MODE_RW changed, false
> >> - * otherwise. */
> >> -static bool
> >> +/* Returns OVSDB_IDL_UPDATE_DB_CHANGED if a column with mode
> >> + * OVSDB_IDL_MODE_RW changed.
> >> + *
> >> + * Some IDL inconsistencies can be detected when processing updates:
> >> + * - trying to insert an already existing row
> >> + * - trying to update a missing row
> >> + * - trying to delete a non existent row
> >> + *
> >> + * In such cases OVSDB_IDL_UPDATE_INCONSISTENT is returned.
> >> + * Even though the IDL client could recover, it's best to report the
> >> + * inconsistent state because the state the server is in is unknown
> > so the
> >> + * safest thing to do is to retry (potentially connecting to a new
> > server).
> >> + *
> >> + * Returns OVSDB_IDL_UPDATE_NO_CHANGES otherwise.
> >> + */
> >> +static enum update_result
> >> ovsdb_idl_process_update(struct ovsdb_idl_table *table,
> >> const struct uuid *uuid, const struct json
*old,
> >> const struct json *new)
> >> @@ -2539,10 +2572,10 @@ ovsdb_idl_process_update(struct
> > ovsdb_idl_table *table,
> >> /* XXX perhaps we should check the 'old' values? */
> >> ovsdb_idl_delete_row(row);
> >> } else {
> >> - VLOG_WARN_RL(&semantic_rl, "cannot delete missing row
> > "UUID_FMT" "
> >> - "from table %s",
> >> - UUID_ARGS(uuid), table->class_->name);
> >> - return false;
> >> + VLOG_ERR_RL(&semantic_rl, "cannot delete missing row
> > "UUID_FMT" "
> >> + "from table %s",
> >> + UUID_ARGS(uuid), table->class_->name);
> >> + return OVSDB_IDL_UPDATE_INCONSISTENT;
> >> }
> >> } else if (!old) {
> >> /* Insert row. */
> >> @@ -2551,35 +2584,50 @@ ovsdb_idl_process_update(struct
> > ovsdb_idl_table *table,
> >> } else if (ovsdb_idl_row_is_orphan(row)) {
> >> ovsdb_idl_insert_row(row, new);
> >> } else {
> >> - VLOG_WARN_RL(&semantic_rl, "cannot add existing row
> > "UUID_FMT" to "
> >> - "table %s", UUID_ARGS(uuid),
> > table->class_->name);
> >> - return ovsdb_idl_modify_row(row, new);
> >> + VLOG_ERR_RL(&semantic_rl, "cannot add existing row
> > "UUID_FMT" to "
> >> + "table %s", UUID_ARGS(uuid),
> > table->class_->name);
> >> + return OVSDB_IDL_UPDATE_INCONSISTENT;
> >> }
> >> } else {
> >> /* Modify row. */
> >> if (row) {
> >> /* XXX perhaps we should check the 'old' values? */
> >> if (!ovsdb_idl_row_is_orphan(row)) {
> >> - return ovsdb_idl_modify_row(row, new);
> >> + return ovsdb_idl_modify_row(row, new)
> >> + ? OVSDB_IDL_UPDATE_DB_CHANGED
> >> + : OVSDB_IDL_UPDATE_NO_CHANGES;
> >> } else {
> >> - VLOG_WARN_RL(&semantic_rl, "cannot modify missing but
"
> >> - "referenced row "UUID_FMT" in table %s",
> >> - UUID_ARGS(uuid), table->class_->name);
> >> - ovsdb_idl_insert_row(row, new);
> >> + VLOG_ERR_RL(&semantic_rl, "cannot modify missing but "
> >> + "referenced row "UUID_FMT" in table %s",
> >> + UUID_ARGS(uuid), table->class_->name);
> >> + return OVSDB_IDL_UPDATE_INCONSISTENT;
> >> }
> >> } else {
> >> - VLOG_WARN_RL(&semantic_rl, "cannot modify missing row
> > "UUID_FMT" "
> >> - "in table %s", UUID_ARGS(uuid),
> > table->class_->name);
> >> - ovsdb_idl_insert_row(ovsdb_idl_row_create(table, uuid),
new);
> >> + VLOG_ERR_RL(&semantic_rl, "cannot modify missing row
> > "UUID_FMT" "
> >> + "in table %s", UUID_ARGS(uuid),
> > table->class_->name);
> >> + return OVSDB_IDL_UPDATE_INCONSISTENT;
> >> }
> >> }
> >>
> >> - return true;
> >> + return OVSDB_IDL_UPDATE_DB_CHANGED;
> >
> > This return should never be reached.
> >
>
> I'm not sure I understand your point here. There are valid code paths
> that can lead to this return. Do you mean to move it to all places that
> might lead up to here?
>
Sorry, my bad. Please ignore this one and the next.
> >> }
> >>
> >> -/* Returns true if a column with mode OVSDB_IDL_MODE_RW changed, false
> >> - * otherwise. */
> >> -static bool
> >> +/* Returns OVSDB_IDL_UPDATE_DB_CHANGED if a column with mode
> >> + * OVSDB_IDL_MODE_RW changed.
> >> + *
> >> + * Some IDL inconsistencies can be detected when processing updates:
> >> + * - trying to insert an already existing row
> >> + * - trying to update a missing row
> >> + * - trying to delete a non existent row
> >> + *
> >> + * In such cases OVSDB_IDL_UPDATE_INCONSISTENT is returned.
> >> + * Even though the IDL client could recover, it's best to report the
> >> + * inconsistent state because the state the server is in is unknown
> > so the
> >> + * safest thing to do is to retry (potentially connecting to a new
> > server).
> >> + *
> >> + * Otherwise OVSDB_IDL_UPDATE_NO_CHANGES is returned.
> >> + */
> >> +static enum update_result
> >> ovsdb_idl_process_update2(struct ovsdb_idl_table *table,
> >> const struct uuid *uuid,
> >> const char *operation,
> >> @@ -2593,10 +2641,10 @@ ovsdb_idl_process_update2(struct
> > ovsdb_idl_table *table,
> >> if (row && !ovsdb_idl_row_is_orphan(row)) {
> >> ovsdb_idl_delete_row(row);
> >> } else {
> >> - VLOG_WARN_RL(&semantic_rl, "cannot delete missing row
> > "UUID_FMT" "
> >> - "from table %s",
> >> - UUID_ARGS(uuid), table->class_->name);
> >> - return false;
> >> + VLOG_ERR_RL(&semantic_rl, "cannot delete missing row
> > "UUID_FMT" "
> >> + "from table %s",
> >> + UUID_ARGS(uuid), table->class_->name);
> >> + return OVSDB_IDL_UPDATE_INCONSISTENT;
> >> }
> >> } else if (!strcmp(operation, "insert") || !strcmp(operation,
> > "initial")) {
> >> /* Insert row. */
> >> @@ -2605,34 +2653,35 @@ ovsdb_idl_process_update2(struct
> > ovsdb_idl_table *table,
> >> } else if (ovsdb_idl_row_is_orphan(row)) {
> >> ovsdb_idl_insert_row(row, json_row);
> >> } else {
> >> - VLOG_WARN_RL(&semantic_rl, "cannot add existing row
> > "UUID_FMT" to "
> >> - "table %s", UUID_ARGS(uuid),
> > table->class_->name);
> >> - ovsdb_idl_delete_row(row);
> >> - ovsdb_idl_insert_row(row, json_row);
> >> + VLOG_ERR_RL(&semantic_rl, "cannot add existing row
> > "UUID_FMT" to "
> >> + "table %s", UUID_ARGS(uuid),
> > table->class_->name);
> >> + return OVSDB_IDL_UPDATE_INCONSISTENT;
> >> }
> >> } else if (!strcmp(operation, "modify")) {
> >> /* Modify row. */
> >> if (row) {
> >> if (!ovsdb_idl_row_is_orphan(row)) {
> >> - return ovsdb_idl_modify_row_by_diff(row, json_row);
> >> + return ovsdb_idl_modify_row_by_diff(row, json_row)
> >> + ? OVSDB_IDL_UPDATE_DB_CHANGED
> >> + : OVSDB_IDL_UPDATE_NO_CHANGES;
> >> } else {
> >> - VLOG_WARN_RL(&semantic_rl, "cannot modify missing but
"
> >> - "referenced row "UUID_FMT" in table %s",
> >> - UUID_ARGS(uuid), table->class_->name);
> >> - return false;
> >> + VLOG_ERR_RL(&semantic_rl, "cannot modify missing but "
> >> + "referenced row "UUID_FMT" in table %s",
> >> + UUID_ARGS(uuid), table->class_->name);
> >> + return OVSDB_IDL_UPDATE_INCONSISTENT;
> >> }
> >> } else {
> >> - VLOG_WARN_RL(&semantic_rl, "cannot modify missing row
> > "UUID_FMT" "
> >> - "in table %s", UUID_ARGS(uuid),
> > table->class_->name);
> >> - return false;
> >> + VLOG_ERR_RL(&semantic_rl, "cannot modify missing row
> > "UUID_FMT" "
> >> + "in table %s", UUID_ARGS(uuid),
> > table->class_->name);
> >> + return OVSDB_IDL_UPDATE_INCONSISTENT;
> >> }
> >> } else {
> >> - VLOG_WARN_RL(&semantic_rl, "unknown operation %s to "
> >> - "table %s", operation, table->class_->name);
> >> - return false;
> >> + VLOG_ERR_RL(&semantic_rl, "unknown operation %s to "
> >> + "table %s", operation, table->class_->name);
> >> + return OVSDB_IDL_UPDATE_NO_CHANGES;
> >> }
> >>
> >> - return true;
> >> + return OVSDB_IDL_UPDATE_DB_CHANGED;
> >
> > This return should never be reached.
> >
>
> Same as for ovsdb_idl_process_update(), there are valid paths that lead
> here.
>
> Thanks,
> Dumitru
>
> >> }
> >>
> >> /* Recursively add rows to tracked change lists for current row
> >> --
> >> 1.8.3.1
> >>
>
More information about the dev
mailing list