[ovs-git] [openvswitch/ovs] e3afed: raft: Add log length to the memory report.

Tue Nov 3 15:59:11 UTC 2020

  Branch: refs/heads/branch-2.14
  Home:   https://github.com/openvswitch/ovs
  Commit: e3afed72b726411d6e3426b8dddae9db8f524a8c
      https://github.com/openvswitch/ovs/commit/e3afed72b726411d6e3426b8dddae9db8f524a8c
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2020-11-03 (Tue, 03 Nov 2020)

  Changed paths:
    M ovsdb/raft.c

  Log Message:
  -----------
  raft: Add log length to the memory report.

In many cases a big part of a memory consumed by ovsdb-server process
is a raft log, so it's important to add its length to the memory
report.

Acked-by: Dumitru Ceara <dceara at redhat.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>

  Commit: 04bf64bf7d61c696646ad539f6e719e2c761f048
      https://github.com/openvswitch/ovs/commit/04bf64bf7d61c696646ad539f6e719e2c761f048
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2020-11-03 (Tue, 03 Nov 2020)

  Changed paths:
    M NEWS
    M configure.ac
    M ovsdb/ovsdb-server.1.in
    M ovsdb/ovsdb-server.c
    M ovsdb/ovsdb.c
    M ovsdb/ovsdb.h

  Log Message:
  -----------
  ovsdb-server: Reclaim heap memory after compaction.

Compaction happens at most once in 10 minutes.  That is a big time
interval for a heavy loaded ovsdb-server in cluster mode.
In 10 minutes raft logs could grow up to tens of thousands of entries
with tens of gigabytes in total size.
While compaction cleans up raft log entries, the memory in many cases
is not returned to the system, but kept in the heap of running
ovsdb-server process, and it could stay in this condition for a really
long time.  In the end one performance spike could lead to a fast
growth of the raft log and this memory will never (for a really long
time) be released to the system even if the database if empty.

Simple example how to reproduce with OVN sandbox:

1. make sandbox SANDBOXFLAGS='--nbdb-model=clustered --sbdb-model=clustered'

2. Run following script that creates 1 port group, adds 4000 acls and
   removes all of that in the end:

   # cat ../memory-test.sh
   pg_name=my_port_group
   export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach --log-file -vsocket_util:off)
   ovn-nbctl pg-add $pg_name
   for i in $(seq 1 4000); do
     echo "Iteration: $i"
     ovn-nbctl --log acl-add $pg_name from-lport $i udp drop
   done
   ovn-nbctl acl-del $pg_name
   ovn-nbctl pg-del $pg_name
   ovs-appctl -t $(pwd)/sandbox/nb1 memory/show
   ovn-appctl -t ovn-nbctl exit
   ---

3. Stopping one of Northbound DB servers:
   ovs-appctl -t $(pwd)/sandbox/nb1 exit

   Make sure that ovsdb-server didn't compact the database before
   it was stopped.  Now we have a db file on disk that contains
   4000 fairly big transactions inside.

4. Trying to start same ovsdb-server with this file.

   # cd sandbox && ovsdb-server <...> nb1.db

   At this point ovsdb-server reads all the transactions from db
   file and performs all of them as fast as it can one by one.
   When it finishes this, raft log contains 4000 entries and
   ovsdb-server consumes (on my system) ~13GB of memory while
   database is empty.  And libc will likely never return this memory
   back to system, or, at least, will hold it for a really long time.

This patch adds a new command 'ovsdb-server/memory-trim-on-compaction'.
It's disabled by default, but once enabled, ovsdb-server will call
'malloc_trim(0)' after every successful compaction to try to return
unused heap memory back to system.  This is glibc-specific, so we
need to detect function availability in a build time.
Disabled by default since it adds from 1% to 30% (depending on the
current state) to the snapshot creation time and, also, next memory
allocations will likely require requests to kernel and that might be
slower.  Could be enabled by default later if considered broadly
beneficial.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1888829
Acked-by: Dumitru Ceara <dceara at redhat.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>

  Commit: 66280ef6ac64b2f77e332952fb2a2a847d158f83
      https://github.com/openvswitch/ovs/commit/66280ef6ac64b2f77e332952fb2a2a847d158f83
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2020-11-03 (Tue, 03 Nov 2020)

  Changed paths:
    M ovsdb/raft-private.c
    M ovsdb/raft-private.h
    M ovsdb/raft.c

  Log Message:
  -----------
  raft: Avoid having more than one snapshot in-flight.

Previous commit 8c2c503bdb0d ("raft: Avoid sending equal snapshots.")
took a "safe" approach to not send only exactly same snapshot
installation requests.  However, it doesn't make much sense to send
more than one snapshot at a time.  If obsolete snapshot installed,
leader will re-send the most recent one.

With this change leader will have only 1 snapshot in-flight per
connection.  This will reduce backlogs on raft connections in case
new snapshot created while 'install_snapshot_request' is in progress
or if election timer changed in that period.

Also, not tracking the exact 'install_snapshot_request' we've sent
allows to simplify the code.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1888829
Fixes: 8c2c503bdb0d ("raft: Avoid sending equal snapshots.")
Acked-by: Dumitru Ceara <dceara at redhat.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>

Compare: https://github.com/openvswitch/ovs/compare/f2db6431047b...66280ef6ac64