[ovs-dev] [PATCH] ovsdb-server: Replace in-memory DB contents at raft install_snapshot.
Han Zhou
hzhou at ovn.org
Wed Aug 5 17:48:58 UTC 2020
On Wed, Aug 5, 2020 at 8:28 AM Dumitru Ceara <dceara at redhat.com> wrote:
>
> Every time a follower has to install a snapshot received from the
> leader, it should also replace the data in memory. Right now this only
> happens when snapshots are installed that also change the schema.
>
> This can lead to inconsistent DB data on follower nodes and the snapshot
> may fail to get applied.
>
> CC: Han Zhou <hzhou at ovn.org>
> Fixes: bda1f6b60588 ("ovsdb-server: Don't disconnect clients after raft
install_snapshot.")
> Signed-off-by: Dumitru Ceara <dceara at redhat.com>
Thanks Dumitru! This is a great finding, and sorry for my mistake.
This patch looks good to me. Just one minor comment below on the test case.
Otherwise:
Acked-by: Han Zhou <hzhou at ovn.org>
> ---
> ovsdb/ovsdb-server.c | 21 +++++++++++++--------
> tests/idltest.ovsschema | 9 +++++++++
> tests/ovsdb-cluster.at | 28 +++++++++++++++++++++++++---
> tests/ovsdb-idl.at | 1 +
> 4 files changed, 48 insertions(+), 11 deletions(-)
>
> diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
> index ef4e996..fd7891a 100644
> --- a/ovsdb/ovsdb-server.c
> +++ b/ovsdb/ovsdb-server.c
> @@ -543,13 +543,14 @@ parse_txn(struct server_config *config, struct db
*db,
> const struct ovsdb_schema *schema, const struct json *txn_json,
> const struct uuid *txnid)
> {
> - if (schema && (!db->db->schema || strcmp(schema->version,
> - db->db->schema->version))) {
> + if (schema) {
> /* We're replacing the schema (and the data). Destroy the
database
> * (first grabbing its storage), then replace it with the new
schema.
> * The transaction must also include the replacement data.
> *
> - * Only clustered database schema changes go through this path.
*/
> + * Only clustered database schema changes and snapshot installs
> + * go through this path.
> + */
> ovs_assert(txn_json);
> ovs_assert(ovsdb_storage_is_clustered(db->db->storage));
>
> @@ -559,11 +560,15 @@ parse_txn(struct server_config *config, struct db
*db,
> return error;
> }
>
> - ovsdb_jsonrpc_server_reconnect(
> - config->jsonrpc, false,
> - (db->db->schema
> - ? xasprintf("database %s schema changed", db->db->name)
> - : xasprintf("database %s connected to storage",
db->db->name)));
> + if (!db->db->schema ||
> + strcmp(schema->version, db->db->schema->version)) {
> + ovsdb_jsonrpc_server_reconnect(
> + config->jsonrpc, false,
> + (db->db->schema
> + ? xasprintf("database %s schema changed", db->db->name)
> + : xasprintf("database %s connected to storage",
> + db->db->name)));
> + }
>
> ovsdb_replace(db->db, ovsdb_create(ovsdb_schema_clone(schema),
NULL));
>
> diff --git a/tests/idltest.ovsschema b/tests/idltest.ovsschema
> index e02b975..e04755e 100644
> --- a/tests/idltest.ovsschema
> +++ b/tests/idltest.ovsschema
> @@ -54,6 +54,15 @@
> },
> "isRoot" : true
> },
> + "indexed": {
> + "columns": {
> + "i": {
> + "type": "integer"
> + }
> + },
> + "indexes": [["i"]],
> + "isRoot" : true
> + },
> "simple": {
> "columns": {
> "b": {
> diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at
> index 9714545..06d27c9 100644
> --- a/tests/ovsdb-cluster.at
> +++ b/tests/ovsdb-cluster.at
> @@ -332,15 +332,31 @@ for i in `seq $n`; do
> AT_CHECK([ovsdb_client_wait unix:s$i.ovsdb $schema_name connected])
> done
>
> +AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
> + {"op": "insert",
> + "table": "indexed",
> + "row": {"i": 1}}]]'], [0], [ignore], [ignore])
> +
> # Kill one follower (s2) and write some data to cluster, so that the
follower is falling behind
> printf "\ns2: stopping\n"
> OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s2], [s2.pid])
>
> +# Delete "i":1 and readd it to get a different UUID for it.
> +AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
> + {"op": "delete",
> + "table": "indexed",
> + "where": [["i", "==", 1]]}]]'], [0], [ignore], [ignore])
> +
> AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
> {"op": "insert",
> - "table": "simple",
> + "table": "indexed",
> "row": {"i": 1}}]]'], [0], [ignore], [ignore])
>
> +AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
> + {"op": "insert",
> + "table": "indexed",
> + "row": {"i": 2}}]]'], [0], [ignore], [ignore])
> +
> # Compact leader online to generate snapshot
> AT_CHECK([ovs-appctl -t "`pwd`"/s1 ovsdb-server/compact])
>
> @@ -355,8 +371,14 @@ AT_CHECK([ovsdb_client_wait unix:s2.ovsdb
$schema_name connected])
> # succeed.
> AT_CHECK([ovsdb-client transact unix:s2.ovsdb '[["idltest",
> {"op": "insert",
> - "table": "simple",
> - "row": {"i": 1}}]]'], [0], [ignore], [ignore])
> + "table": "indexed",
> + "row": {"i": 3}}]]'], [0], [ignore], [ignore])
> +
> +# The snapshot should overwrite the in-memory contents of the DB on S2
> +# without generating any constraint violations.
> +AT_CHECK([grep 'Transaction causes multiple rows in "indexed" table to
have identical values (1) for index on column "i"' -c s2.log], [1], [dnl
> +0
> +])
Beside checking the error log, it may be better to simply check if the i=2
row is found in S2. Without this fix i=2 shouldn't get inserted. This can
better ensure the correctness, in case the log text changes and the grep
doesn't detect the problem.
>
> for i in `seq $n`; do
> OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s$i], [s$i.pid])
> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
> index 4efed88..789ae23 100644
> --- a/tests/ovsdb-idl.at
> +++ b/tests/ovsdb-idl.at
> @@ -954,6 +954,7 @@ AT_CHECK([sort stdout | uuidfilt], [0],
>
> # Check that ovsdb-idl figured out that table link2 and column l2 are
missing.
> AT_CHECK([grep ovsdb_idl stderr | sort], [0], [dnl
> +test-ovsdb|ovsdb_idl|idltest database lacks indexed table (database
needs upgrade?)
> test-ovsdb|ovsdb_idl|idltest database lacks link2 table (database needs
upgrade?)
> test-ovsdb|ovsdb_idl|idltest database lacks simple5 table (database
needs upgrade?)
> test-ovsdb|ovsdb_idl|idltest database lacks singleton table (database
needs upgrade?)
> --
> 1.8.3.1
>
More information about the dev
mailing list