[ovs-dev] [PATCH v2] ovsdb: Use column diffs for ovsdb and raft log entries.

Ilya Maximets i.maximets at ovn.org
Thu Jan 14 01:11:21 UTC 2021


Currently, ovsdb-server stores complete value for the column in a database
file and in a raft log in case this column changed.  This means that
transaction that adds, for example, one new acl to a port group creates
a log entry with all UUIDs of all existing acls + one new.  Same for
ports in logical switches and routers and more other columns with sets
in Northbound DB.

There could be thousands of acls in one port group or thousands of ports
in a single logical switch.  And the typical use case is to add one new
if we're starting a new service/VM/container or adding one new node in a
kubernetes or OpenStack cluster.  This generates huge amount of traffic
within ovsdb raft cluster, grows overall memory consumption and hurts
performance since all these UUIDs are parsed and formatted to/from json
several times and stored on disks.  And more values we have in a set -
more space a single log entry will occupy and more time it will take to
process by ovsdb-server cluster members.

Simple test:

1. Start OVN sandbox with clustered DBs:
   # make sandbox SANDBOXFLAGS='--nbdb-model=clustered --sbdb-model=clustered'

2. Run a script that creates one port group and adds 4000 acls into it:
   # cat ../memory-test.sh
   pg_name=my_port_group
   export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach --log-file -vsocket_util:off)
   ovn-nbctl pg-add $pg_name
   for i in $(seq 1 4000); do
     echo "Iteration: $i"
     ovn-nbctl --log acl-add $pg_name from-lport $i udp drop
   done
   ovn-nbctl acl-del $pg_name
   ovn-nbctl pg-del $pg_name
   ovs-appctl -t $(pwd)/sandbox/nb1 memory/show
   ovn-appctl -t ovn-nbctl exit
   ---

4. Check the current memory consumption of ovsdb-server processes and
   space occupied by database files:
   # ls sandbox/[ns]b*.db -alh
   # ps -eo vsz,rss,comm,cmd | egrep '=[ns]b[123].pid'

Test results with current ovsdb log format:

   On-disk Nb DB size     :  ~369 MB
   RSS of Nb ovsdb-servers:  ~2.7 GB
   Time to finish the test:  ~2m

In order to mitigate memory consumption issues and reduce computational
load on ovsdb-servers let's store diff between old and new values
instead.  This will make size of each log entry that adds single acl to
port group (or port to logical switch or anything else like that) very
small and independent from the number of already existing acls (ports,
etc.).

Added a new marker '_is_diff' into a file transaction to specify that
this transaction contains diffs instead of replacements for the existing
data.

One side effect is that this change will actually increase the size of
file transaction that removes more than a half of entries from the set,
because diff will be larger than the resulted new value.  However, such
operations are rare.

Test results with change applied:

   On-disk Nb DB size     :  ~2.7 MB  ---> reduced by 99%
   RSS of Nb ovsdb-servers:  ~580 MB  ---> reduced by 78%
   Time to finish the test:  ~1m27s   ---> reduced by 27%

After this change new ovsdb-server is still able to read old databases,
but old ovsdb-server will not be able to read new ones.
Since new servers could join ovsdb cluster dynamically it's hard to
implement any runtime mechanism to handle cases where different
versions of ovsdb-server joins the cluster.  However we still need to
handle cluster upgrades.  For this case added special command line
argument to disable new functionality.  Documentation updated with the
recommended way to upgrade the ovsdb cluster.

Acked-by: Dumitru Ceara <dceara at redhat.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>
---

Version 2:
  - Added 'Downgrading' section to docs.
  - Added Ack from Dumitru.

 Documentation/ref/ovsdb.7.rst | 65 +++++++++++++++++++++++++++
 NEWS                          |  8 ++++
 ovsdb/file.c                  | 82 ++++++++++++++++++++++++++++++++---
 ovsdb/file.h                  |  2 +
 ovsdb/ovsdb-server.c          |  8 ++++
 ovsdb/ovsdb-tool.c            |  8 +++-
 tests/ovsdb-server.at         |  4 +-
 tests/ovsdb-tool.at           |  3 +-
 8 files changed, 170 insertions(+), 10 deletions(-)

diff --git a/Documentation/ref/ovsdb.7.rst b/Documentation/ref/ovsdb.7.rst
index da4dbedd2..e4f1bf766 100644
--- a/Documentation/ref/ovsdb.7.rst
+++ b/Documentation/ref/ovsdb.7.rst
@@ -204,6 +204,14 @@ split-brain.
 
 Open vSwitch 2.6 introduced support for the active-backup service model.
 
+.. important::
+
+   There was a change of a database file format in version 2.15.
+   To upgrade/downgrade the ``ovsdb-server`` processes across this version
+   follow the instructions described under
+   `Upgrading from version 2.14 and earlier to 2.15 and later`_ and
+   `Downgrading from version 2.15 and later to 2.14 and earlier`_.
+
 Clustered Database Service Model
 --------------------------------
 
@@ -270,11 +278,68 @@ vSwitch to another, upgrading them one at a time will keep the cluster healthy
 during the upgrade process.  (This is different from upgrading a database
 schema, which is covered later under `Upgrading or Downgrading a Database`_.)
 
+.. important::
+
+   There was a change of a database file format in version 2.15.
+   To upgrade/downgrade the ``ovsdb-server`` processes across this version
+   follow the instructions described under
+   `Upgrading from version 2.14 and earlier to 2.15 and later`_ and
+   `Downgrading from version 2.15 and later to 2.14 and earlier`_.
+
 Clustered OVSDB does not support the OVSDB "ephemeral columns" feature.
 ``ovsdb-tool`` and ``ovsdb-client`` change ephemeral columns into persistent
 ones when they work with schemas for clustered databases.  Future versions of
 OVSDB might add support for this feature.
 
+Upgrading from version 2.14 and earlier to 2.15 and later
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There is a change of a database file format in version 2.15 that doesn't allow
+older versions of ``ovsdb-server`` to read the database file modified by the
+``ovsdb-server`` version 2.15 or later.  This also affects runtime
+communications between servers in **active-backup** and **cluster** service
+models. To upgrade the ``ovsdb-server`` processes from one version of Open
+vSwitch (2.14 or earlier) to another (2.15 or higher) instructions below should
+be followed. (This is different from upgrading a database schema, which is
+covered later under `Upgrading or Downgrading a Database`_.)
+
+In case of **standalone** service model no special handling during upgrade is
+required.
+
+For the **active-backup** service model, administrator needs to update backup
+``ovsdb-server`` first and the active one after that, or shut down both servers
+and upgrade at the same time.
+
+For the **cluster** service model recommended upgrade strategy is following:
+
+1. Upgrade processes one at a time.  Each ``ovsdb-server`` process after
+   upgrade should be started with ``--disable-file-column-diff`` command line
+   argument.
+
+2. When all ``ovsdb-server`` processes upgraded, use ``ovs-appctl`` to invoke
+   ``ovsdb/file/column-diff-enable`` command on each of them or restart all
+   ``ovsdb-server`` processes one at a time without
+   ``--disable-file-column-diff`` command line option.
+
+Downgrading from version 2.15 and later to 2.14 and earlier
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Similar to upgrading covered under `Upgrading from version 2.14 and earlier to
+2.15 and later`_, downgrading from the ``ovsdb-server`` version 2.15 and later
+to 2.14 and earlier requires additional steps. (This is different from
+upgrading a database schema, which is covered later under
+`Upgrading or Downgrading a Database`_.)
+
+For all service models it's required to:
+
+1. Stop all ``ovsdb-server`` processes (single process for **standalone**
+   service model, all involved processes for **active-backup** and **cluster**
+   service models).
+
+2. Compact all database files with ``ovsdb-tool compact`` command.
+
+3. Downgrade and restart ``ovsdb-server`` processes.
+
 Understanding Cluster Consistency
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/NEWS b/NEWS
index 617fe8e6a..7fe58a7b5 100644
--- a/NEWS
+++ b/NEWS
@@ -1,6 +1,14 @@
 Post-v2.14.0
 ---------------------
    - OVSDB:
+     * Changed format in which ovsdb transactions are stored in database files.
+       Now each transaction contains diff of data instead of the whole new
+       value of a column.
+       New ovsdb-server process will be able to read old database format, but
+       old processes will *fail* to read database created by the new one.
+       For cluster and active-backup service models follow upgrade instructions
+       in 'Upgrading from version 2.14 and earlier to 2.15 and later' section
+       of ovsdb(7).
      * New unixctl command 'ovsdb-server/get-db-storage-status' to show the
        status of the storage that's backing a database.
      * New unixctl command 'ovsdb-server/memory-trim-on-compaction on|off'.
diff --git a/ovsdb/file.c b/ovsdb/file.c
index 0af077fce..0b8bdfe37 100644
--- a/ovsdb/file.c
+++ b/ovsdb/file.c
@@ -34,6 +34,7 @@
 #include "table.h"
 #include "timeval.h"
 #include "transaction.h"
+#include "unixctl.h"
 #include "uuid.h"
 #include "util.h"
 #include "openvswitch/vlog.h"
@@ -53,8 +54,33 @@ static void ovsdb_file_txn_add_row(struct ovsdb_file_txn *,
                                    const struct ovsdb_row *new,
                                    const unsigned long int *changed);
 
+/* If set to 'true', file transactions will contain difference between
+ * datums of old and new rows and not the whole new datum for the column. */
+static bool use_column_diff = true;
+
+static void
+ovsdb_file_column_diff_enable(struct unixctl_conn *conn, int argc OVS_UNUSED,
+                              const char *argv[] OVS_UNUSED,
+                              void *arg OVS_UNUSED)
+{
+    use_column_diff = true;
+    unixctl_command_reply(conn, NULL);
+}
+
+void
+ovsdb_file_column_diff_disable(void)
+{
+    if (!use_column_diff) {
+        return;
+    }
+    use_column_diff = false;
+    unixctl_command_register("ovsdb/file/column-diff-enable", "",
+                             0, 0, ovsdb_file_column_diff_enable, NULL);
+}
+
 static struct ovsdb_error *
 ovsdb_file_update_row_from_json(struct ovsdb_row *row, bool converting,
+                                bool row_contains_diff,
                                 const struct json *json)
 {
     struct ovsdb_table_schema *schema = row->table->schema;
@@ -84,6 +110,20 @@ ovsdb_file_update_row_from_json(struct ovsdb_row *row, bool converting,
         if (error) {
             return error;
         }
+        if (row_contains_diff
+            && !ovsdb_datum_is_default(&row->fields[column->index],
+                                       &column->type)) {
+            struct ovsdb_datum new_datum;
+
+            error = ovsdb_datum_apply_diff(&new_datum,
+                                           &row->fields[column->index],
+                                           &datum, &column->type);
+            ovsdb_datum_destroy(&datum, &column->type);
+            if (error) {
+                return error;
+            }
+            ovsdb_datum_swap(&datum, &new_datum);
+        }
         ovsdb_datum_swap(&row->fields[column->index], &datum);
         ovsdb_datum_destroy(&datum, &column->type);
     }
@@ -93,7 +133,7 @@ ovsdb_file_update_row_from_json(struct ovsdb_row *row, bool converting,
 
 static struct ovsdb_error *
 ovsdb_file_txn_row_from_json(struct ovsdb_txn *txn, struct ovsdb_table *table,
-                             bool converting,
+                             bool converting, bool row_contains_diff,
                              const struct uuid *row_uuid, struct json *json)
 {
     const struct ovsdb_row *row = ovsdb_table_get_row(table, row_uuid);
@@ -107,14 +147,16 @@ ovsdb_file_txn_row_from_json(struct ovsdb_txn *txn, struct ovsdb_table *table,
         return NULL;
     } else if (row) {
         return ovsdb_file_update_row_from_json(ovsdb_txn_row_modify(txn, row),
-                                               converting, json);
+                                               converting, row_contains_diff,
+                                               json);
     } else {
         struct ovsdb_error *error;
         struct ovsdb_row *new;
 
         new = ovsdb_row_create(table);
         *ovsdb_row_get_uuid_rw(new) = *row_uuid;
-        error = ovsdb_file_update_row_from_json(new, converting, json);
+        error = ovsdb_file_update_row_from_json(new, converting,
+                                                row_contains_diff, json);
         if (error) {
             ovsdb_row_destroy(new);
         } else {
@@ -127,7 +169,9 @@ ovsdb_file_txn_row_from_json(struct ovsdb_txn *txn, struct ovsdb_table *table,
 static struct ovsdb_error *
 ovsdb_file_txn_table_from_json(struct ovsdb_txn *txn,
                                struct ovsdb_table *table,
-                               bool converting, struct json *json)
+                               bool converting,
+                               bool row_contains_diff,
+                               struct json *json)
 {
     struct shash_node *node;
 
@@ -147,6 +191,7 @@ ovsdb_file_txn_table_from_json(struct ovsdb_txn *txn,
         }
 
         error = ovsdb_file_txn_row_from_json(txn, table, converting,
+                                             row_contains_diff,
                                              &row_uuid, txn_row_json);
         if (error) {
             return error;
@@ -176,6 +221,13 @@ ovsdb_file_txn_from_json(struct ovsdb *db, const struct json *json,
         return ovsdb_syntax_error(json, NULL, "object expected");
     }
 
+    struct json *is_diff = shash_find_data(json->object, "_is_diff");
+    bool row_contains_diff = false;
+
+    if (is_diff && is_diff->type == JSON_TRUE) {
+        row_contains_diff = true;
+    }
+
     txn = ovsdb_txn_create(db);
     SHASH_FOR_EACH (node, json->object) {
         const char *table_name = node->name;
@@ -187,6 +239,10 @@ ovsdb_file_txn_from_json(struct ovsdb *db, const struct json *json,
             if (!strcmp(table_name, "_date")
                 && node_json->type == JSON_INTEGER) {
                 continue;
+            } else if (!strcmp(table_name, "_is_diff")
+                       && (node_json->type == JSON_TRUE
+                           || node_json->type == JSON_FALSE)) {
+                continue;
             } else if (!strcmp(table_name, "_comment") || converting) {
                 continue;
             }
@@ -197,7 +253,7 @@ ovsdb_file_txn_from_json(struct ovsdb *db, const struct json *json,
         }
 
         error = ovsdb_file_txn_table_from_json(txn, table, converting,
-                                               node_json);
+                                               row_contains_diff, node_json);
         if (error) {
             goto error;
         }
@@ -353,6 +409,9 @@ ovsdb_file_txn_annotate(struct json *json, const char *comment)
     if (comment) {
         json_object_put_string(json, "_comment", comment);
     }
+    if (use_column_diff) {
+        json_object_put(json, "_is_diff", json_boolean_create(true));
+    }
     json_object_put(json, "_date", json_integer_create(time_wall_msec()));
     return json;
 }
@@ -383,17 +442,26 @@ ovsdb_file_txn_add_row(struct ovsdb_file_txn *ftxn,
             const struct ovsdb_column *column = node->data;
             const struct ovsdb_type *type = &column->type;
             unsigned int idx = column->index;
+            struct ovsdb_datum datum;
+            struct json *column_json;
 
             if (idx != OVSDB_COL_UUID && column->persistent
                 && (old
                     ? bitmap_is_set(changed, idx)
                     : !ovsdb_datum_is_default(&new->fields[idx], type)))
             {
+                if (old && use_column_diff) {
+                    ovsdb_datum_diff(&datum, &old->fields[idx],
+                                     &new->fields[idx], type);
+                    column_json = ovsdb_datum_to_json(&datum, type);
+                    ovsdb_datum_destroy(&datum, type);
+                } else {
+                    column_json = ovsdb_datum_to_json(&new->fields[idx], type);
+                }
                 if (!row) {
                     row = json_object_create();
                 }
-                json_object_put(row, column->name,
-                                ovsdb_datum_to_json(&new->fields[idx], type));
+                json_object_put(row, column->name, column_json);
             }
         }
     }
diff --git a/ovsdb/file.h b/ovsdb/file.h
index 40833a4d4..be4f6ad27 100644
--- a/ovsdb/file.h
+++ b/ovsdb/file.h
@@ -23,6 +23,8 @@ struct ovsdb;
 struct ovsdb_schema;
 struct ovsdb_txn;
 
+void ovsdb_file_column_diff_disable(void);
+
 struct json *ovsdb_to_txn_json(const struct ovsdb *, const char *comment);
 struct json *ovsdb_file_txn_to_json(const struct ovsdb_txn *);
 struct json *ovsdb_file_txn_annotate(struct json *, const char *comment);
diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
index 0e60e2b87..29a2bace8 100644
--- a/ovsdb/ovsdb-server.c
+++ b/ovsdb/ovsdb-server.c
@@ -1793,6 +1793,7 @@ parse_options(int argc, char *argv[],
         OPT_SYNC_EXCLUDE,
         OPT_ACTIVE,
         OPT_NO_DBS,
+        OPT_FILE_COLUMN_DIFF,
         VLOG_OPTION_ENUMS,
         DAEMON_OPTION_ENUMS,
         SSL_OPTION_ENUMS,
@@ -1815,6 +1816,7 @@ parse_options(int argc, char *argv[],
         {"sync-exclude-tables", required_argument, NULL, OPT_SYNC_EXCLUDE},
         {"active", no_argument, NULL, OPT_ACTIVE},
         {"no-dbs", no_argument, NULL, OPT_NO_DBS},
+        {"disable-file-column-diff", no_argument, NULL, OPT_FILE_COLUMN_DIFF},
         {NULL, 0, NULL, 0},
     };
     char *short_options = ovs_cmdl_long_options_to_short_options(long_options);
@@ -1905,6 +1907,10 @@ parse_options(int argc, char *argv[],
             add_default_db = false;
             break;
 
+        case OPT_FILE_COLUMN_DIFF:
+            ovsdb_file_column_diff_disable();
+            break;
+
         case '?':
             exit(EXIT_FAILURE);
 
@@ -1942,6 +1948,8 @@ usage(void)
     printf("\nOther options:\n"
            "  --run COMMAND           run COMMAND as subprocess then exit\n"
            "  --unixctl=SOCKET        override default control socket name\n"
+           "  --disable-file-column-diff\n"
+           "                          don't use column diff in database file\n"
            "  -h, --help              display this help message\n"
            "  -V, --version           display version information\n");
     exit(EXIT_SUCCESS);
diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c
index 1b49b6fc8..b8560f850 100644
--- a/ovsdb/ovsdb-tool.c
+++ b/ovsdb/ovsdb-tool.c
@@ -391,6 +391,9 @@ compact_or_convert(const char *src_name_, const char *dst_name_,
         ovs_fatal(retval, "%s: failed to lock lockfile", dst_name);
     }
 
+    /* Resulted DB will contain a single transaction without diff anyway. */
+    ovsdb_file_column_diff_disable();
+
     /* Save a copy. */
     struct ovsdb *ovsdb = (new_schema
                            ? ovsdb_file_read_as_schema(src_name, new_schema)
@@ -648,6 +651,8 @@ static void
 print_db_changes(struct shash *tables, struct smap *names,
                  const struct ovsdb_schema *schema)
 {
+    struct json *is_diff = shash_find_data(tables, "_is_diff");
+    bool diff = (is_diff && is_diff->type == JSON_TRUE);
     struct shash_node *n1;
 
     int i = 0;
@@ -691,7 +696,8 @@ print_db_changes(struct shash *tables, struct smap *names,
                     printf(" insert row %.8s:\n", row_uuid);
                 }
             } else {
-                printf(" row %s (%.8s):\n", old_name, row_uuid);
+                printf(" row %s (%.8s)%s:\n", old_name, row_uuid,
+                                              diff ? " diff" : "");
             }
 
             if (columns->type == JSON_OBJECT) {
diff --git a/tests/ovsdb-server.at b/tests/ovsdb-server.at
index 0b15758f2..926abce3a 100644
--- a/tests/ovsdb-server.at
+++ b/tests/ovsdb-server.at
@@ -715,7 +715,9 @@ ovsdb_check_online_compaction() {
       [0], [stdout])
     if test $model = standalone; then
         dnl Check that all the crap is in fact in the database log.
-        AT_CHECK([[uuidfilt db | grep -v ^OVSDB | sed 's/"_date":[0-9]*/"_date":0/' | ovstest test-json --multiple -]], [0],
+        AT_CHECK([[uuidfilt db | grep -v ^OVSDB | \
+            sed 's/"_date":[0-9]*/"_date":0/' |  sed 's/"_is_diff":true,//' | \
+            ovstest test-json --multiple -]], [0],
 [[{"cksum":"12345678 9","name":"ordinals","tables":{"ordinals":{"columns":{"name":{"type":"string"},"number":{"type":"integer"}},"indexes":[["number"]]}},"version":"5.1.3"}
 {"_comment":"add row for zero 0","_date":0,"ordinals":{"<0>":{"name":"zero"}}}
 {"_comment":"delete row for 0","_date":0,"ordinals":{"<0>":null}}
diff --git a/tests/ovsdb-tool.at b/tests/ovsdb-tool.at
index b8e830f5b..12ad6fb3f 100644
--- a/tests/ovsdb-tool.at
+++ b/tests/ovsdb-tool.at
@@ -93,7 +93,8 @@ AT_CHECK(
     done]],
   [0], [stdout], [ignore])
 dnl Check that all the crap is in fact in the database log.
-AT_CHECK([[uuidfilt db | grep -v ^OVSDB | sed 's/"_date":[0-9]*/"_date":0/' | ovstest test-json --multiple -]], [0],
+AT_CHECK([[uuidfilt db | grep -v ^OVSDB | sed 's/"_date":[0-9]*/"_date":0/' | \
+            sed 's/"_is_diff":true,//' | ovstest test-json --multiple -]], [0],
   [[{"cksum":"12345678 9","name":"ordinals","tables":{"ordinals":{"columns":{"name":{"type":"string"},"number":{"type":"integer"}},"indexes":[["number"]]}},"version":"5.1.3"}
 {"_comment":"add row for zero 0","_date":0,"ordinals":{"<0>":{"name":"zero"}}}
 {"_comment":"delete row for 0","_date":0,"ordinals":{"<0>":null}}
-- 
2.25.4



More information about the dev mailing list