[ovs-dev] [PATCH v2 9/9] docs: Add documentation for ovsdb relay mode.
i.maximets at ovn.org
Sat Jun 12 02:00:08 UTC 2021
Main documentation for the service model and tutorial with the use case
and configuration examples.
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>
Documentation/automake.mk | 1 +
Documentation/ref/ovsdb.7.rst | 62 ++++++++++++--
Documentation/topics/index.rst | 1 +
Documentation/topics/ovsdb-relay.rst | 124 +++++++++++++++++++++++++++
NEWS | 3 +
ovsdb/ovsdb-server.1.in | 27 +++---
6 files changed, 200 insertions(+), 18 deletions(-)
create mode 100644 Documentation/topics/ovsdb-relay.rst
diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index bc30f94c5..213d9c867 100644
@@ -52,6 +52,7 @@ DOC_SOURCE = \
+ Documentation/topics/ovsdb-relay.rst \
diff --git a/Documentation/ref/ovsdb.7.rst b/Documentation/ref/ovsdb.7.rst
index e4f1bf766..a5b8a9c33 100644
@@ -121,13 +121,14 @@ schema checksum from a schema or database file, respectively.
-OVSDB supports three service models for databases: **standalone**,
-**active-backup**, and **clustered**. The service models provide different
-compromises among consistency, availability, and partition tolerance. They
-also differ in the number of servers required and in terms of performance. The
-standalone and active-backup database service models share one on-disk format,
-and clustered databases use a different format, but the OVSDB programs work
-with both formats. ``ovsdb(5)`` documents these file formats.
+OVSDB supports four service models for databases: **standalone**,
+**active-backup**, **relay** and **clustered**. The service models provide
+different compromises among consistency, availability, and partition tolerance.
+They also differ in the number of servers required and in terms of performance.
+The standalone and active-backup database service models share one on-disk
+format, and clustered databases use a different format, but the OVSDB programs
+work with both formats. ``ovsdb(5)`` documents these file formats. Relay
+databases has no on-disk storage.
RFC 7047, which specifies the OVSDB protocol, does not mandate or specify
any particular service model.
@@ -406,6 +407,50 @@ following consequences:
that the client previously read. The OVSDB client library in Open vSwitch
uses this feature to avoid servers with stale data.
+Relay Service Model
+A **relay** database is a way to scale out read-mostly access to the
+existing database working in any service model including relay.
+Relay database creates and maintains an OVSDB connection with other OVSDB
+server. It uses this connection to maintain in-memory copy of the remote
+database (a.k.a. the ``relay source``) keeping the copy up-to-date as the
+database content changes on relay source in the real time.
+The purpose of relay server is to scale out the number of database clients.
+Read-only transactions and monitor requests are fully handled by the relay
+server itself. For the transactions that requests database modifications,
+relay works as a proxy between the client and the relay source, i.e. it
+forwards transactions and replies between them.
+Compared to a clustered and active-backup models, relay service model provides
+read and write access to the database similarly to a clustered database (and
+even more scalable), but with generally insignificant performance overhead of
+an active-backup model. At the same time it doesn't increase availability that
+needs to be covered by the service model of the relay source.
+Relay database has no on-disk storage and therefore cannot be converted to
+any other service model.
+If there is already a database started in any service model, to start a relay
+database server use ``ovsdb-server relay:<DB_NAME>:<relay source>``, where
+``<DB_NAME>`` is the database name as specified in the schema of the database
+that existing server runs, and ``<relay source>`` is an OVSDB connection method
+(see `Connection Methods`_ below) that connects to the existing database
+server. ``<relay source>`` could contain a comma-separated list of connection
+methods, e.g. to connect to any server of the clustered database.
+Multiple relay servers could be started for the same relay source.
+Since the way how relay handles read and write transactions is very similar
+to the clustered model where "cluster" means "set or relay servers connected
+to the same relay source", "follower" means "relay server" and the "leader"
+means "relay source", same consistency consequences as for the clustered
+model applies to relay as well (See `Understanding Cluster Consistency`_
+Open vSwitch 2.16 introduced support for relay service model.
@@ -414,7 +459,8 @@ Replication, in this context, means to make, and keep up-to-date, a read-only
copy of the contents of a database (the ``replica``). One use of replication
is to keep an up-to-date backup of a database. A replica used solely for
backup would not need to support clients of its own. A set of replicas that do
-serve clients could be used to scale out read access to the primary database.
+serve clients could be used to scale out read access to the primary database,
+however `Relay Service Model`_ is more suitable for that purpose.
A database replica is set up in the same way as a backup server in an
active-backup pair, with the difference that the replica is never promoted to
diff --git a/Documentation/topics/index.rst b/Documentation/topics/index.rst
index 0036567eb..d8ccbd757 100644
@@ -44,6 +44,7 @@ OVS
diff --git a/Documentation/topics/ovsdb-relay.rst b/Documentation/topics/ovsdb-relay.rst
new file mode 100644
@@ -0,0 +1,124 @@
+ Copyright 2021, Red Hat, Inc.
+ Licensed under the Apache License, Version 2.0 (the "License"); you may
+ not use this file except in compliance with the License. You may obtain
+ a copy of the License at
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ License for the specific language governing permissions and limitations
+ under the License.
+ Convention for heading levels in Open vSwitch documentation:
+ ======= Heading 0 (reserved for the title in a document)
+ ------- Heading 1
+ ~~~~~~~ Heading 2
+ +++++++ Heading 3
+ ''''''' Heading 4
+ Avoid deeper levels because they do not render well.
+Scaling OVSDB Access With Relay
+Open vSwitch 2.16 introduced support for OVSDB Relay mode with the goal to
+increase database scalability for a big deployments. Mainly, OVN (Open Virtual
+Network) Southbound Database deployments. This document describes the main
+concept and provides the configuration examples.
+What is OVSDB Relay?
+Relay is a database service model in which one ``ovsdb-server`` (``relay``)
+connects to another standalone or clustered database server
+(``relay source``) and maintains in-memory copy of its data, receiving
+all the updates via this OVSDB connection. Relay server handles all the
+read-only requests (monitors and transactions) on its own and forwards all the
+transactions that requires database modifications to the relay source.
+Why is this needed?
+Some OVN deployment could have hundreds or even thousands nodes, on each of
+these nodes there is an ovn-controller, which is connected to the
+OVN_Southbound database that is served by a standalone or clustered OVSDB.
+Standalone database is handled by a single ovsdb-server process and clustered
+could consist of 3 to 5 ovsdb-server processes. For the clustered database,
+higher number of servers may significantly increase transaction latency due
+to necessity for these servers to reach consensus. So, in the end limited
+number of ovsdb-server processes serves ever growing number of clients and this
+leads to performance issues.
+Read-only access could be scaled up with OVSDB replication on top of
+active-backup service model, but ovn-controller is a read-mostly client, not
+a read-only, i.e. it needs to execute write transactions from time to time.
+Here relay service model comes into play.
+Solution for the scaling issue could look like a 2-tier deployment, where
+a set of relay servers is connected to the main database cluster
+(OVN_Southbound) and clients (ovn-conrtoller) connected to these relay
+ +--------------------+ +----+ ovsdb-relay-1 +--+---+ client-1
+ | | | |
+ | Clustered | | +---+ client-2
+ | Database | | ...
+ | | | +---+ client-N
+ | 10.0.0.2 | |
+ | ovsdb-server-2 | | 172.16.0.2
+ | + + | +----+ ovsdb-relay-2 +--+---+ client-N+1
+ | | | | | |
+ | | + +---+ +---+ client-N+2
+ | | 10.0.0.1 | | ...
+ | | ovsdb-server-1 | | +---+ client-2N
+ | | + | |
+ | | | | |
+ | + + | + ... ... ... ... ...
+ | ovsdb-server-3 | |
+ | 10.0.0.3 | | +---+ client-KN-1
+ | | | 172.16.0.K |
+ +--------------------+ +----+ ovsdb-relay-K +--+---+ client-KN
+In practice, the picture might look a bit more complex, because all relay
+servers might connect to any member of a main cluster and clients might
+connect to any relay server of their choice.
+Assuming that servers of a main cluster started like this::
+ $ ovsdb-server --remote=ptcp:10.0.0.1:6642 ovn-sb-1.db
+The same for other two servers. In this case relay servers could be
+started like this::
+ $ REMOTES=tcp:10.0.0.1:6642,tcp:10.0.0.2:6642,tcp:10.0.0.3:6642
+ $ ovsdb-server --remote=ptcp:172.16.0.1:6642 relay:OVN_Southbound:$REMOTES
+ $ ...
+ $ ovsdb-server --remote=ptcp:172.16.0.K:6642 relay:OVN_Southbound:$REMOTES
+Every relay server could connect to any of the cluster members of their choice,
+fairness of load distribution is achieved by shuffling remotes.
+For the actual clients, they could be configured to connect to any of the
+relay servers. For ovn-controllers the configuration could look like this::
+ $ REMOTES=tcp:172.16.0.1:6642,...,tcp:172.16.0.K:6642
+ $ ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=$REMOTES
+Setup like this allows the system to serve ``K * N`` clients while having only
+``K`` actual connections on the main clustered database keeping it in a
+It's also possible to create multi-tier deployments by connecting one set
+of relay servers to another (smaller) set of relay servers, or even create
+tree-like structures by the cost of increased latency for write transactions,
+because they will be forwarded multiple times.
diff --git a/NEWS b/NEWS
index ebba17b22..391b0abba 100644
@@ -1,6 +1,9 @@
+ * Introduced new database service model - "relay". Targeted to scale out
+ read-mostly access (ovn-controller) to existing databases.
+ For more information: ovsdb(7) and Documentation/topics/ovsdb-relay.rst
* New command line options --record/--replay for ovsdb-server and
ovsdb-client to record and replay all the incoming transactions,
monitors, etc. More datails in Documentation/topics/record-replay.rst.
diff --git a/ovsdb/ovsdb-server.1.in b/ovsdb/ovsdb-server.1.in
index fdd52e8f6..dac0f02cb 100644
@@ -10,6 +10,7 @@ ovsdb\-server \- Open vSwitch database server
@@ -35,12 +36,15 @@ For an introduction to OVSDB and its implementation in Open vSwitch,
Each OVSDB file may be specified on the command line as \fIdatabase\fR.
-If none is specified, the default is \fB at DBDIR@/conf.db\fR. The database
-files must already have been created and initialized using, for
-example, \fBovsdb\-tool\fR's \fBcreate\fR, \fBcreate\-cluster\fR, or
+Relay databases may be specified on the command line as
+\fIrelay:schema_name:remote\fR. For a detailed description of relay database
+argument, see \fBovsdb\fR(7).
+If none of database files or relay databases is specified, the default is
+\fB at DBDIR@/conf.db\fR. The database files must already have been created and
+initialized using, for example, \fBovsdb\-tool\fR's \fBcreate\fR,
+\fBcreate\-cluster\fR, or \fBjoin\-cluster\fR command.
-This OVSDB implementation supports standalone, active-backup, and
+This OVSDB implementation supports standalone, active-backup, relay and
clustered database service models, as well as database replication.
See the Service Models section of \fBovsdb\fR(7) for more information.
@@ -50,7 +54,9 @@ successfully join a cluster (if the database file is freshly created
with \fBovsdb\-tool join\-cluster\fR) or connect to a cluster that it
has already joined. Use \fBovsdb\-client wait\fR (see
\fBovsdb\-client\fR(1)) to wait until the server has successfully
-joined and connected to a cluster.
+joined and connected to a cluster. The same is true for relay databases.
+Same commands could be used to wait for a relay database to connect to
+the relay source (remote).
In addition to user-specified databases, \fBovsdb\-server\fR version
2.9 and later also always hosts a built-in database named
@@ -243,10 +249,11 @@ not list remotes added indirectly because they were read from the
database by configuring a
-.IP "\fBovsdb\-server/add\-db \fIdatabase\fR"
-Adds the \fIdatabase\fR to the running \fBovsdb\-server\fR. The database
-file must already have been created and initialized using, for example,
+.IP "\fBovsdb\-server/add\-db \fIdatabase\fR
+Adds the \fIdatabase\fR to the running \fBovsdb\-server\fR. \fIdatabase\fR
+could be a database file or a relay description in the following format:
+\fIrelay:schema_name:remote\fR. The database file must already have been
+created and initialized using, for example, \fBovsdb\-tool create\fR.
.IP "\fBovsdb\-server/remove\-db \fIdatabase\fR"
Removes \fIdatabase\fR from the running \fBovsdb\-server\fR. \fIdatabase\fR
More information about the dev