[ovs-dev] [PATCH v3 0/9] OVSDB Relay Service Model. (Was: OVSDB 2-Tier deployment)

Mark Gray mark.d.gray at redhat.com
Thu Jul 15 16:45:15 UTC 2021


On 14/07/2021 14:50, Ilya Maximets wrote:
> Replication can be used to scale out read-only access to the database.
> But there are clients that are not read-only, but read-mostly.
> One of the main examples is ovn-controller that mostly monitors
> updates from the Southbound DB, but needs to claim ports by sending
> transactions that changes some database tables.
> 
> Southbound database serves lots of connections: all connections
> from ovn-controllers and some service connections from cloud
> infrastructure, e.g. some OpenStack agents are monitoring updates.
> At a high scale and with a big size of the database ovsdb-server
> spends too much time processing monitor updates and it's required
> to move this load somewhere else.  This patch-set aims to introduce
> required functionality to scale out read-mostly connections by
> introducing a new OVSDB 'relay' service model .
> 
> In this new service model ovsdb-server connects to existing OVSDB
> server and maintains in-memory copy of the database.  It serves
> read-only transactions and monitor requests by its own, but forwards
> write transactions to the relay source.
> 
> Key differences from the active-backup replication:
> - support for "write" transactions.
> - no on-disk storage. (probably, faster operation)
> - support for multiple remotes (connect to the clustered db).
> - doesn't try to keep connection as long as possible, but
>   faster reconnects to other remotes to avoid missing updates.
> - No need to know the complete database schema beforehand,
>   only the schema name.
> - can be used along with other standalone and clustered databases
>   by the same ovsdb-server process. (doesn't turn the whole
>   jsonrpc server to read-only mode)
> - supports modern version of monitors (monitor_cond_since),
>   because based on ovsdb-cs.
> - could be chained, i.e. multiple relays could be connected
>   one to another in a row or in a tree-like form.
> 
> Bringing all above functionality to the existing active-backup
> replication doesn't look right as it will make it less reliable
> for the actual backup use case, and this also would be much
> harder from the implementation point of view, because current
> replication code is not based on ovsdb-cs or idl and all the required
> features would be likely duplicated or replication would be fully
> re-written on top of ovsdb-cs with severe modifications of the former.
> 
> Relay is somewhere in the middle between active-backup replication and
> the clustered model taking a lot from both, therefore is hard to
> implement on top of any of them.
> 
> To run ovsdb-server in relay mode, user need to simply run:
> 
>   ovsdb-server --remote=punix:db.sock relay:<schema-name>:<remotes>
> 
> e.g.
> 
>   ovsdb-server --remote=punix:db.sock relay:OVN_Southbound:tcp:127.0.0.1:6642
> 
> More details and examples in the documentation in the last patch
> of the series.
> 
> I actually tried to implement transaction forwarding on top of
> active-backup replication in v1 of this seies, but it required
> a lot of tricky changes, including schema format changes in order
> to bring required information to the end clients, so I decided
> to fully rewrite the functionality in v2 with a different approach.
> 
> 
>  Testing
>  =======
> 
> Some scale tests were performed with OVSDB Relays that mimics OVN
> workloads with ovn-kubernetes.
> Tests performed with ovn-heater (https://github.com/dceara/ovn-heater)
> on scenario ocp-120-density-heavy:
>  https://github.com/dceara/ovn-heater/blob/master/test-scenarios/ocp-120-density-heavy.yml
> In short, the test gradually creates a lot of OVN resources and
> checks that network is configured correctly (by pinging diferent
> namespaces).  The test includes 120 chassis (created by
> ovn-fake-multinode), 31250 LSPs spread evenly across 120 LSes, 3 LBs
> with 15625 VIPs each, attached to all node LSes, etc.  Test performed
> with monitor-all=true.
> 
> Note 1:
>  - Memory consumption is checked at the end of a test in a following
>    way: 1) check RSS 2) compact database 3) check RSS again.
>    It's observed that ovn-controllers in this test are fairly slow
>    and backlog builds up on monitors, because ovn-controllers are
>    not able to receive updates fast enough.  This contributes to
>    RSS of the process, especially in combination of glibc bug (glibc
>    doesn't free fastbins back to the system).  Memory trimming on
>    compaction is enabled in the test, so after compaction we can
>    see more or less real value of the RSS at the end of the test
>    wihtout backlog noise. (Compaction on relay in this case is
>    just plain malloc_trim()).
> 
> Note 2:
>  - I didn't collect memory consumption (RSS) after compaction for a
>    test with 10 relays, because I got the idea only after the test
>    was finished and another one already started.  And run takes
>    significant amount of time.  So, values marked with a star (*)
>    are an approximation based on results form other tests, hence
>    might be not fully correct.
> 
> Note 3:
>  - 'Max. poll' is a maximum of the 'long poll intervals' logged by
>    ovsdb-server during the test.  Poll intervals that involved database
>    compaction (huge disk writes) are same in all tests and excluded
>    from the results.  (Sb DB size in the test is 256MB, fully
>    compacted).  'Number of intervals' is just a number of logged
>    unreasonably long poll intervals.
>    Also note that ovsdb-server logs only compactions that took > 1s,
>    so poll intervals that involved compaction, but under 1s can not
>    be reliably excluded from the test results.
>    'central' - main Sb DB servers.
>    'relay'   - relay servers connected to central ones.
>    'before'/'after' - RSS before and after compaction + malloc_trim().
>    'time' - is a total time the process spent in Running state.
> 
> 
> Baseline (3 main servers, 0 relays):
> ++++++++++++++++++++++++++++++++++++++++
> 
>                RSS
> central  before    after    clients  time     Max. poll   Number of intervals
>          7552924   3828848   ~41     109:50   5882        1249
>          7342468   4109576   ~43     108:37   5717        1169
>          5886260   4109496   ~39      96:31   4990        1233
>          ---------------------------------------------------------------------
>              20G       12G   126     314:58   5882        3651
> 
> 3x3 (3 main servers, 3 relays):
> +++++++++++++++++++++++++++++++
> 
>                 RSS
> central  before    after    clients  time     Max. poll   Number of intervals
>          6228176   3542164   ~1-5    36:53    2174        358
>          5723920   3570616   ~1-5    24:03    2205        382
>          5825420   3490840   ~1-5    35:42    2214        309
>          ---------------------------------------------------------------------
>            17.7G     10.6G      9    96:38    2214        1049
> 
> relay    before    after    clients  time     Max. poll   Number of intervals
>          2174328    726576    37     69:44    5216        627
>          2122144    729640    32     63:52    4767        625
>          2824160    751384    51     89:09    5980        627
>          ---------------------------------------------------------------------
>               7G      2.2G    120   222:45    5980        1879
> 
> Total:   =====================================================================
>            24.7G     12.8G    129    319:23   5980        2928
> 
> 3x10 (3 main servers, 10 relays):
> +++++++++++++++++++++++++++++++++
> 
>                RSS
> central  before    after    clients  time    Max. poll   Number of intervals
>          6190892    ---      ~1-6    42:43   2041         634
>          5687576    ---      ~1-5    27:09   2503         405
>          5958432    ---      ~1-7    40:44   2193         450
>          ---------------------------------------------------------------------
>            17.8G   ~10G*       16   110:36   2503         1489
> 
> relay    before    after    clients  time    Max. poll   Number of intervals
>          1331256    ---       9      22:58   1327         140
>          1218288    ---      13      28:28   1840         621
>          1507644    ---      19      41:44   2869         623
>          1257692    ---      12      27:40   1532         517
>          1125368    ---       9      22:23   1148         105
>          1380664    ---      16      35:04   2422         619
>          1087248    ---       6      18:18   1038           6
>          1277484    ---      14      34:02   2392         616
>          1209936    ---      10      25:31   1603         451
>          1293092    ---      12      29:03   2071         621
>          ---------------------------------------------------------------------
>            12.6G    5-7G*    120    285:11   2869         4319
> 
> Total:   =====================================================================
>            30.4G    15-17G*  136    395:47   2869         5808
> 
> 

Thanks for running these and sharing the data. It looks promising and is
not a hugely intrusive change in the code base so it looks good to me

Acked-by: Mark D. Gray <mark.d.gray at redhat.com>

>  Conclusions from the test:
>  ==========================
> 
> 1. Relays relieve a lot of pressure from main Sb DB servers.
>    In my testing total CPU time on main servers goes down from 314
>    to 96-110 minutes, which is 3 times lower.
>    During the test, number of registered 'unreasonably poll interval's
>    on main servers goes down by 3-4 times.  At the same time the
>    maximum duration of these intervals goes down by a factor of 2.5.
>    Also, factor should be higher with increased number of clinents.
> 
> 2. Since number of clients is significantly lower, memory consumption
>    of main Db DB servers also goes down by ~12%.
> 
> 3. For the 3x3 test total memory consumed by all processes increased
>    only by 6%.  And total CPU usage increased by 1.2%.  Poll intervals
>    on relay servers are comparable to poll intervals on main servers
>    with no relays, but poll intervals on main servers are significantly
>    better (see conclusion # 1).  In general, it seems that for this
>    test running of 3 relays next to 3 main Sb DB servers significanlty
>    increases cluster stability and responsiveness without noticeable
>    increase in memory or CPU usage.
> 
> 4. For the 3x10 test total memory consumed by all processes increased
>    by ~50-70%*.  And total CPU usage increased by 26% in compare with
>    baseline setup.  At the same time poll intervals on both main
>    and relay servers are lower by a factor of 2-4 (depends on a
>    particular server).  In general, cluster with 10 relays is much more
>    stable and responsive with a reasonably low memory consumption and
>    CPU time overhead.
> 
> 
> 
> Future work:
> - Add support for transaction history (it could be just inherited
>   from the transaction ids received from the relay source).  This
>   will allow clients to utilize monitor_cond_since while working
>   with relay.
> - Possibly try to inherit min_index from the relay source to give
>   clients ability to detect relays with stale data.
> - Probably, add support for both above things to standalone databases,
>   so relays will be able to inherit not only from clustered ones.
> 
> 
> Version 3:
>   - Fixed issue with incorrect schema equality check.
>   - Fixed transaction leak if inconsistent data received from the
>     source.
>   - Minor fixes for style, wording and typos.
> 
> Version 2:
>   - Dropped implementation on top of active-backup replication.
>   - Implemented new 'relay' service model.
>   - Updated documentation and wrote a separate topic with examples
>     and ascii-graphics.  That's why v2 seem larger.
> 
> Ilya Maximets (9):
>   jsonrpc-server: Wake up jsonrpc session if there are completed
>     triggers.
>   ovsdb: storage: Allow setting the name for the unbacked storage.
>   ovsdb: table: Expose functions to execute operations on ovsdb tables.
>   ovsdb: row: Add support for xor-based row updates.
>   ovsdb: New ovsdb 'relay' service model.
>   ovsdb: relay: Add support for transaction forwarding.
>   ovsdb: relay: Reflect connection status in _Server database.
>   ovsdb: Make clients aware of relay service model.
>   docs: Add documentation for ovsdb relay mode.
> 
>  Documentation/automake.mk            |   1 +
>  Documentation/ref/ovsdb.7.rst        |  62 ++++-
>  Documentation/topics/index.rst       |   1 +
>  Documentation/topics/ovsdb-relay.rst | 124 +++++++++
>  NEWS                                 |   3 +
>  lib/ovsdb-cs.c                       |  15 +-
>  ovsdb/_server.ovsschema              |   7 +-
>  ovsdb/_server.xml                    |  35 +--
>  ovsdb/automake.mk                    |   4 +
>  ovsdb/execution.c                    |  18 +-
>  ovsdb/file.c                         |   2 +-
>  ovsdb/jsonrpc-server.c               |   3 +-
>  ovsdb/ovsdb-client.c                 |   2 +-
>  ovsdb/ovsdb-server.1.in              |  27 +-
>  ovsdb/ovsdb-server.c                 | 105 +++++---
>  ovsdb/ovsdb.c                        |  11 +
>  ovsdb/ovsdb.h                        |   9 +-
>  ovsdb/relay.c                        | 385 +++++++++++++++++++++++++++
>  ovsdb/relay.h                        |  38 +++
>  ovsdb/replication.c                  |  83 +-----
>  ovsdb/row.c                          |  30 ++-
>  ovsdb/row.h                          |   6 +-
>  ovsdb/storage.c                      |  13 +-
>  ovsdb/storage.h                      |   2 +-
>  ovsdb/table.c                        |  70 +++++
>  ovsdb/table.h                        |  14 +
>  ovsdb/transaction-forward.c          | 182 +++++++++++++
>  ovsdb/transaction-forward.h          |  44 +++
>  ovsdb/trigger.c                      |  49 +++-
>  ovsdb/trigger.h                      |  41 +--
>  python/ovs/db/idl.py                 |  16 ++
>  tests/ovsdb-server.at                |  85 +++++-
>  tests/test-ovsdb.c                   |   6 +-
>  33 files changed, 1297 insertions(+), 196 deletions(-)
>  create mode 100644 Documentation/topics/ovsdb-relay.rst
>  create mode 100644 ovsdb/relay.c
>  create mode 100644 ovsdb/relay.h
>  create mode 100644 ovsdb/transaction-forward.c
>  create mode 100644 ovsdb/transaction-forward.h
> 



More information about the dev mailing list