[ovs-dev] [PATCH] ovsdb-server: Replace in-memory DB contents at raft install_snapshot.

Han Zhou hzhou at ovn.org
Wed Aug 5 17:48:58 UTC 2020


On Wed, Aug 5, 2020 at 8:28 AM Dumitru Ceara <dceara at redhat.com> wrote:
>
> Every time a follower has to install a snapshot received from the
> leader, it should also replace the data in memory. Right now this only
> happens when snapshots are installed that also change the schema.
>
> This can lead to inconsistent DB data on follower nodes and the snapshot
> may fail to get applied.
>
> CC: Han Zhou <hzhou at ovn.org>
> Fixes: bda1f6b60588 ("ovsdb-server: Don't disconnect clients after raft
install_snapshot.")
> Signed-off-by: Dumitru Ceara <dceara at redhat.com>

Thanks Dumitru! This is a great finding, and sorry for my mistake.
This patch looks good to me. Just one minor comment below on the test case.
Otherwise:

Acked-by: Han Zhou <hzhou at ovn.org>

> ---
>  ovsdb/ovsdb-server.c    | 21 +++++++++++++--------
>  tests/idltest.ovsschema |  9 +++++++++
>  tests/ovsdb-cluster.at  | 28 +++++++++++++++++++++++++---
>  tests/ovsdb-idl.at      |  1 +
>  4 files changed, 48 insertions(+), 11 deletions(-)
>
> diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
> index ef4e996..fd7891a 100644
> --- a/ovsdb/ovsdb-server.c
> +++ b/ovsdb/ovsdb-server.c
> @@ -543,13 +543,14 @@ parse_txn(struct server_config *config, struct db
*db,
>            const struct ovsdb_schema *schema, const struct json *txn_json,
>            const struct uuid *txnid)
>  {
> -    if (schema && (!db->db->schema || strcmp(schema->version,
> -                                             db->db->schema->version))) {
> +    if (schema) {
>          /* We're replacing the schema (and the data).  Destroy the
database
>           * (first grabbing its storage), then replace it with the new
schema.
>           * The transaction must also include the replacement data.
>           *
> -         * Only clustered database schema changes go through this path.
*/
> +         * Only clustered database schema changes and snapshot installs
> +         * go through this path.
> +         */
>          ovs_assert(txn_json);
>          ovs_assert(ovsdb_storage_is_clustered(db->db->storage));
>
> @@ -559,11 +560,15 @@ parse_txn(struct server_config *config, struct db
*db,
>              return error;
>          }
>
> -        ovsdb_jsonrpc_server_reconnect(
> -            config->jsonrpc, false,
> -            (db->db->schema
> -             ? xasprintf("database %s schema changed", db->db->name)
> -             : xasprintf("database %s connected to storage",
db->db->name)));
> +        if (!db->db->schema ||
> +            strcmp(schema->version, db->db->schema->version)) {
> +            ovsdb_jsonrpc_server_reconnect(
> +                config->jsonrpc, false,
> +                (db->db->schema
> +                ? xasprintf("database %s schema changed", db->db->name)
> +                : xasprintf("database %s connected to storage",
> +                            db->db->name)));
> +        }
>
>          ovsdb_replace(db->db, ovsdb_create(ovsdb_schema_clone(schema),
NULL));
>
> diff --git a/tests/idltest.ovsschema b/tests/idltest.ovsschema
> index e02b975..e04755e 100644
> --- a/tests/idltest.ovsschema
> +++ b/tests/idltest.ovsschema
> @@ -54,6 +54,15 @@
>        },
>        "isRoot" : true
>      },
> +    "indexed": {
> +      "columns": {
> +        "i": {
> +          "type": "integer"
> +        }
> +      },
> +      "indexes": [["i"]],
> +      "isRoot" : true
> +    },
>      "simple": {
>        "columns": {
>          "b": {
> diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at
> index 9714545..06d27c9 100644
> --- a/tests/ovsdb-cluster.at
> +++ b/tests/ovsdb-cluster.at
> @@ -332,15 +332,31 @@ for i in `seq $n`; do
>      AT_CHECK([ovsdb_client_wait unix:s$i.ovsdb $schema_name connected])
>  done
>
> +AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
> +      {"op": "insert",
> +       "table": "indexed",
> +       "row": {"i": 1}}]]'], [0], [ignore], [ignore])
> +
>  # Kill one follower (s2) and write some data to cluster, so that the
follower is falling behind
>  printf "\ns2: stopping\n"
>  OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s2], [s2.pid])
>
> +# Delete "i":1 and readd it to get a different UUID for it.
> +AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
> +      {"op": "delete",
> +       "table": "indexed",
> +       "where": [["i", "==", 1]]}]]'], [0], [ignore], [ignore])
> +
>  AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
>        {"op": "insert",
> -       "table": "simple",
> +       "table": "indexed",
>         "row": {"i": 1}}]]'], [0], [ignore], [ignore])
>
> +AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
> +      {"op": "insert",
> +       "table": "indexed",
> +       "row": {"i": 2}}]]'], [0], [ignore], [ignore])
> +
>  # Compact leader online to generate snapshot
>  AT_CHECK([ovs-appctl -t "`pwd`"/s1 ovsdb-server/compact])
>
> @@ -355,8 +371,14 @@ AT_CHECK([ovsdb_client_wait unix:s2.ovsdb
$schema_name connected])
>  # succeed.
>  AT_CHECK([ovsdb-client transact unix:s2.ovsdb '[["idltest",
>        {"op": "insert",
> -       "table": "simple",
> -       "row": {"i": 1}}]]'], [0], [ignore], [ignore])
> +       "table": "indexed",
> +       "row": {"i": 3}}]]'], [0], [ignore], [ignore])
> +
> +# The snapshot should overwrite the in-memory contents of the DB on S2
> +# without generating any constraint violations.
> +AT_CHECK([grep 'Transaction causes multiple rows in "indexed" table to
have identical values (1) for index on column "i"' -c s2.log], [1], [dnl
> +0
> +])

Beside checking the error log, it may be better to simply check if the i=2
row is found in S2. Without this fix i=2 shouldn't get inserted. This can
better ensure the correctness, in case the log text changes and the grep
doesn't detect the problem.

>
>  for i in `seq $n`; do
>      OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s$i], [s$i.pid])
> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
> index 4efed88..789ae23 100644
> --- a/tests/ovsdb-idl.at
> +++ b/tests/ovsdb-idl.at
> @@ -954,6 +954,7 @@ AT_CHECK([sort stdout | uuidfilt], [0],
>
>  # Check that ovsdb-idl figured out that table link2 and column l2 are
missing.
>  AT_CHECK([grep ovsdb_idl stderr | sort], [0], [dnl
> +test-ovsdb|ovsdb_idl|idltest database lacks indexed table (database
needs upgrade?)
>  test-ovsdb|ovsdb_idl|idltest database lacks link2 table (database needs
upgrade?)
>  test-ovsdb|ovsdb_idl|idltest database lacks simple5 table (database
needs upgrade?)
>  test-ovsdb|ovsdb_idl|idltest database lacks singleton table (database
needs upgrade?)
> --
> 1.8.3.1
>


More information about the dev mailing list