[ovs-git] [openvswitch/ovs] 7865bc: ovsdb: Add raft memory usage to memory report.

Ilya Maximets noreply at github.com
Tue Nov 3 15:59:21 UTC 2020


  Branch: refs/heads/branch-2.13
  Home:   https://github.com/openvswitch/ovs
  Commit: 7865bc49d3545a8aa2e9a501535b0ff2802ab8c0
      https://github.com/openvswitch/ovs/commit/7865bc49d3545a8aa2e9a501535b0ff2802ab8c0
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2020-11-03 (Tue, 03 Nov 2020)

  Changed paths:
    M ovsdb/ovsdb.c
    M ovsdb/raft.c
    M ovsdb/raft.h
    M ovsdb/storage.c
    M ovsdb/storage.h

  Log Message:
  -----------
  ovsdb: Add raft memory usage to memory report.

Memory reports could be found in logs or by calling 'memory/show'
appctl command.  For ovsdb-server it includes information about db
cells, monitor connections with their backlog size, etc.  But it
doesn't contain any information about memory consumed by raft.
Backlogs of raft connections could be insanely large because of
snapshot installation requests that simply contains the whole database.
In not that healthy clusters where one of ovsdb servers is not able to
timely handle all the incoming raft traffic, backlog on a sender's side
could cause significant memory consumption issues.

Adding new 'raft-connections' and 'raft-backlog' counters to the
memory report to better track such conditions.

Acked-by: Han Zhou <hzhou at ovn.org>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


  Commit: fd29479f04fce3a0d805a456136d536f4ca95776
      https://github.com/openvswitch/ovs/commit/fd29479f04fce3a0d805a456136d536f4ca95776
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2020-11-03 (Tue, 03 Nov 2020)

  Changed paths:
    M ovsdb/raft.c

  Log Message:
  -----------
  raft: Report jsonrpc backlog in kilobytes.

While sending snapshots backlog on raft connections could quickly
grow over 4GB and this will overflow raft-backlog counter.

Let's report it in kB instead. (Using kB and not KB to match with
ru_maxrss counter reported by kernel)

Fixes: 3423cd97f88f ("ovsdb: Add raft memory usage to memory report.")
Acked-by: Dumitru Ceara <dceara at redhat.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


  Commit: 49453540d930021b0737c6c9e5b811be924a6491
      https://github.com/openvswitch/ovs/commit/49453540d930021b0737c6c9e5b811be924a6491
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2020-11-03 (Tue, 03 Nov 2020)

  Changed paths:
    M ovsdb/raft.c

  Log Message:
  -----------
  raft: Add log length to the memory report.

In many cases a big part of a memory consumed by ovsdb-server process
is a raft log, so it's important to add its length to the memory
report.

Acked-by: Dumitru Ceara <dceara at redhat.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


  Commit: fc7a644a731d0b961f14d82a5794049bb2f3b795
      https://github.com/openvswitch/ovs/commit/fc7a644a731d0b961f14d82a5794049bb2f3b795
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2020-11-03 (Tue, 03 Nov 2020)

  Changed paths:
    M NEWS
    M configure.ac
    M ovsdb/ovsdb-server.1.in
    M ovsdb/ovsdb-server.c
    M ovsdb/ovsdb.c
    M ovsdb/ovsdb.h

  Log Message:
  -----------
  ovsdb-server: Reclaim heap memory after compaction.

Compaction happens at most once in 10 minutes.  That is a big time
interval for a heavy loaded ovsdb-server in cluster mode.
In 10 minutes raft logs could grow up to tens of thousands of entries
with tens of gigabytes in total size.
While compaction cleans up raft log entries, the memory in many cases
is not returned to the system, but kept in the heap of running
ovsdb-server process, and it could stay in this condition for a really
long time.  In the end one performance spike could lead to a fast
growth of the raft log and this memory will never (for a really long
time) be released to the system even if the database if empty.

Simple example how to reproduce with OVN sandbox:

1. make sandbox SANDBOXFLAGS='--nbdb-model=clustered --sbdb-model=clustered'

2. Run following script that creates 1 port group, adds 4000 acls and
   removes all of that in the end:

   # cat ../memory-test.sh
   pg_name=my_port_group
   export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach --log-file -vsocket_util:off)
   ovn-nbctl pg-add $pg_name
   for i in $(seq 1 4000); do
     echo "Iteration: $i"
     ovn-nbctl --log acl-add $pg_name from-lport $i udp drop
   done
   ovn-nbctl acl-del $pg_name
   ovn-nbctl pg-del $pg_name
   ovs-appctl -t $(pwd)/sandbox/nb1 memory/show
   ovn-appctl -t ovn-nbctl exit
   ---

3. Stopping one of Northbound DB servers:
   ovs-appctl -t $(pwd)/sandbox/nb1 exit

   Make sure that ovsdb-server didn't compact the database before
   it was stopped.  Now we have a db file on disk that contains
   4000 fairly big transactions inside.

4. Trying to start same ovsdb-server with this file.

   # cd sandbox && ovsdb-server <...> nb1.db

   At this point ovsdb-server reads all the transactions from db
   file and performs all of them as fast as it can one by one.
   When it finishes this, raft log contains 4000 entries and
   ovsdb-server consumes (on my system) ~13GB of memory while
   database is empty.  And libc will likely never return this memory
   back to system, or, at least, will hold it for a really long time.

This patch adds a new command 'ovsdb-server/memory-trim-on-compaction'.
It's disabled by default, but once enabled, ovsdb-server will call
'malloc_trim(0)' after every successful compaction to try to return
unused heap memory back to system.  This is glibc-specific, so we
need to detect function availability in a build time.
Disabled by default since it adds from 1% to 30% (depending on the
current state) to the snapshot creation time and, also, next memory
allocations will likely require requests to kernel and that might be
slower.  Could be enabled by default later if considered broadly
beneficial.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1888829
Acked-by: Dumitru Ceara <dceara at redhat.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


  Commit: 625a6f0f0e3038c3e90bc42393a5b0898007af0d
      https://github.com/openvswitch/ovs/commit/625a6f0f0e3038c3e90bc42393a5b0898007af0d
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2020-11-03 (Tue, 03 Nov 2020)

  Changed paths:
    M ovsdb/raft-private.c
    M ovsdb/raft-private.h
    M ovsdb/raft.c

  Log Message:
  -----------
  raft: Avoid having more than one snapshot in-flight.

Previous commit 8c2c503bdb0d ("raft: Avoid sending equal snapshots.")
took a "safe" approach to not send only exactly same snapshot
installation requests.  However, it doesn't make much sense to send
more than one snapshot at a time.  If obsolete snapshot installed,
leader will re-send the most recent one.

With this change leader will have only 1 snapshot in-flight per
connection.  This will reduce backlogs on raft connections in case
new snapshot created while 'install_snapshot_request' is in progress
or if election timer changed in that period.

Also, not tracking the exact 'install_snapshot_request' we've sent
allows to simplify the code.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1888829
Fixes: 8c2c503bdb0d ("raft: Avoid sending equal snapshots.")
Acked-by: Dumitru Ceara <dceara at redhat.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


Compare: https://github.com/openvswitch/ovs/compare/7a69ccf7e48d...625a6f0f0e30


More information about the git mailing list