[ovs-dev] [PATCH] ovsdb: Don't let transaction history grow larger than the database.

Han Zhou hzhou at ovn.org
Mon Nov 8 19:44:57 UTC 2021


On Mon, Nov 8, 2021 at 11:33 AM Ilya Maximets <i.maximets at ovn.org> wrote:
>
> On 11/8/21 20:14, Han Zhou wrote:
> >
> >
> > On Mon, Nov 8, 2021 at 10:49 AM Ilya Maximets <i.maximets at ovn.org
<mailto:i.maximets at ovn.org>> wrote:
> >>
> >> On 11/8/21 19:24, Han Zhou wrote:
> >> >
> >> >
> >> > On Fri, Nov 5, 2021 at 2:55 PM Ilya Maximets <i.maximets at ovn.org
<mailto:i.maximets at ovn.org> <mailto:i.maximets at ovn.org <mailto:
i.maximets at ovn.org>>> wrote:
> >> >>
> >> >> On 10/23/21 08:40, Han Zhou wrote:
> >> >> >
> >> >> >
> >> >> > On Thu, Oct 14, 2021 at 7:58 AM Ilya Maximets <i.maximets at ovn.org
<mailto:i.maximets at ovn.org> <mailto:i.maximets at ovn.org <mailto:
i.maximets at ovn.org>> <mailto:i.maximets at ovn.org <mailto:i.maximets at ovn.org>
<mailto:i.maximets at ovn.org <mailto:i.maximets at ovn.org>>>> wrote:
> >> >> >>
> >> >> >> If user frequently changes a lot of rows in a database,
transaction
> >> >> >> history could grow way larger than the database itself.  This
wastes
> >> >> >> a lot of memory and also makes monitor_cond_since slower than
> >> >> >> usual monotor_cond if the transaction id is old enough, because
> >> >> >> re-construction of the changes from a history is slower than just
> >> >> >> creation of initial database snapshot.  This is also the case if
> >> >> >> user deleted a lot of data, so transaction history still holds
all of
> >> >> >> it while the database itself doesn't.
> >> >> >>
> >> >> >> In case of current lb-per-service model in ovn-kubernetes, each
> >> >> >> load-balancer is added to every logical switch/router.  Such a
> >> >> >> transaction touches more than a half of a OVN_Northbound
database.
> >> >> >> And each of these transactions is added to the transaction
history.
> >> >> >> Since transaction history depth is 100, in worst case scenario,
> >> >> >> it will hold 100 copies of a database increasing memory
consumption
> >> >> >> dramatically.  In tests with 3000 LBs and 120 LSs, memory goes up
> >> >> >> to 3 GB, while holding at 30 MB if transaction history disabled
in
> >> >> >> the code.
> >> >> >>
> >> >> >> Fixing that by keeping count of the number of ovsdb_atom's in the
> >> >> >> database and not allowing the total number of atoms in
transaction
> >> >> >> history to grow larger than this value.  Counting atoms is fairly
> >> >> >> cheap because we don't need to iterate over them, so it doesn't
have
> >> >> >> significant performance impact.  It would be ideal to measure the
> >> >> >> size of individual atoms, but that will hit the performance.
> >> >> >> Counting cells instead of atoms is not sufficient, because OVN
> >> >> >> users are adding hundreds or thousands of atoms to a single cell,
> >> >> >> so they are largely different in size.
> >> >> >>
> >> >> >> Signed-off-by: Ilya Maximets <i.maximets at ovn.org <mailto:
i.maximets at ovn.org> <mailto:i.maximets at ovn.org <mailto:i.maximets at ovn.org>>
<mailto:i.maximets at ovn.org <mailto:i.maximets at ovn.org> <mailto:
i.maximets at ovn.org <mailto:i.maximets at ovn.org>>>>
> >> >> >> ---
> >> >> >>  ovsdb/ovsdb.c       |  5 ++++
> >> >> >>  ovsdb/ovsdb.h       |  3 +++
> >> >> >>  ovsdb/transaction.c | 56
+++++++++++++++++++++++++++++++++++++++++++--
> >> >> >>  3 files changed, 62 insertions(+), 2 deletions(-)
> >> >>
> >> >> <snip>
> >> >>
> >> >> >
> >> >> > Thanks Ilya. This looks good to me. Although the atoms number is
not equal to the size of the data, it should be sufficient for the purpose
of controlling history size.
> >> >> > It would be good to have a test case to ensure this behavior, and
also ensure the atoms count is as expected, just to avoid messing up the
counters by mistake in the future. (if that happens it is not easily
noticed because there is no functional difference but possible performance
impact of fast-resync)
> >> >> >
> >> >> > Acked-by: Han Zhou <hzhou at ovn.org <mailto:hzhou at ovn.org> <mailto:
hzhou at ovn.org <mailto:hzhou at ovn.org>> <mailto:hzhou at ovn.org <mailto:
hzhou at ovn.org> <mailto:hzhou at ovn.org <mailto:hzhou at ovn.org>>>>
> >> >>
> >> >> Thanks for review!  I prepared a test below for the number of atoms.
> >> >> If it looks good to you, I can fold it in before applying the patch.
> >> >>
> >> >> Han, Dumitru, What do you think?
> >> >>
> >> >> diff --git a/tests/ovsdb-server.at <http://ovsdb-server.at> <
http://ovsdb-server.at <http://ovsdb-server.at>> b/tests/ovsdb-server.at <
http://ovsdb-server.at> <http://ovsdb-server.at <http://ovsdb-server.at>>
> >> >> index ac243d6a7..2b742f78b 100644
> >> >> --- a/tests/ovsdb-server.at <http://ovsdb-server.at> <
http://ovsdb-server.at <http://ovsdb-server.at>>
> >> >> +++ b/tests/ovsdb-server.at <http://ovsdb-server.at> <
http://ovsdb-server.at <http://ovsdb-server.at>>
> >> >> @@ -1228,6 +1228,69 @@ AT_CHECK([test $logged_updates -lt
$logged_nonblock_updates])
> >> >>  AT_CHECK_UNQUOTED([ovs-vsctl get open_vswitch . system_version],
[0],
> >> >>    [xyzzy$counter
> >> >>  ])
> >> >> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> >> >> +AT_CLEANUP
> >> >> +
> >> >> +AT_SETUP([ovsdb-server transaction history size])
> >> >> +on_exit 'kill `cat *.pid`'
> >> >> +
> >> >> +dnl Start an ovsdb-server with the clustered vswitchd schema.
> >> >> +AT_CHECK([ovsdb-tool create-cluster db dnl
> >> >> +            $abs_top_srcdir/vswitchd/vswitch.ovsschema
unix:s1.raft],
> >> >> +         [0], [ignore], [ignore])
> >> >> +AT_CHECK([ovsdb-server --detach --no-chdir --pidfile dnl
> >> >> +            --log-file --remote=punix:db.sock db],
> >> >> +         [0], [ignore], [ignore])
> >> >> +AT_CHECK([ovs-vsctl --no-wait init])
> >> >> +
> >> >> +dnl Create a bridge with N ports per transaction.  Increase N
every 4
> >> >> +dnl iterations.  And then remove the bridges.  By increasing the
size of
> >> >> +dnl transactions, ensuring that they take up a significant
percentage of
> >> >> +dnl the total database size, so the transaction history will not
be able
> >> >> +dnl to hold all of them.
> >> >> +dnl
> >> >> +dnl The test verifies that the number of atoms in the transaction
history
> >> >> +dnl is always less than the number of atoms in the database.
> >> >> +get_n_atoms () {
> >> >> +    n=$(ovs-appctl -t ovsdb-server memory/show dnl
> >> >> +            | tr ' ' '\n' | grep atoms | grep "^$1:" | cut -d ':'
-f 2)
> >> >> +    if test X"$n" == "X"; then
> >> >> +        n=0
> >> >> +    fi
> >> >> +    echo $n
> >> >> +}
> >> >> +
> >> >> +check_atoms () {
> >> >> +    n_db_atoms=$(get_n_atoms atoms)
> >> >> +    n_txn_history_atoms=$(get_n_atoms txn-history-atoms)
> >> >> +    echo "n_db_atoms:          $n_db_atoms"
> >> >> +    echo "n_txn_history_atoms: $n_txn_history_atoms"
> >> >> +    AT_CHECK([test $n_txn_history_atoms -le $n_db_atoms])
> >> >> +}
> >> >> +
> >> >> +add_ports () {
> >> >> +    for j in $(seq 1 $2); do
> >> >> +        printf " -- add-port br$1 p$1-%d" $j
> >> >> +    done
> >> >> +}
> >> >> +
> >> >> +initial_db_atoms=$(get_n_atoms atoms)
> >> >> +
> >> >> +for i in $(seq 1 100); do
> >> >> +    cmd=$(add_ports $i $(($i / 4 + 1)))
> >> >> +    AT_CHECK([ovs-vsctl --no-wait add-br br$i $cmd])
> >> >> +    check_atoms
> >> >> +done
> >> >> +
> >> >> +for i in $(seq 1 100); do
> >> >> +    AT_CHECK([ovs-vsctl --no-wait del-br br$i])
> >> >> +    check_atoms
> >> >> +done
> >> >> +
> >> >
> >> > Thanks Ilya for the test case. I noticed that it seems the case when
txn_history is bigger than the data is not covered by the above loops. When
i increases from 1 to 100, the history is smaller or equal to the whole
data. Then when the first deletion happens, because there are max 100
entries in the history, so the oldest history is removed, which is the same
size as the new transaction, because it is deleting the same br that was
added in the oldest transaction, and the txn history size is still smaller
or equal to the whole data items, and so on ... And it seems that simply
changing the loop count to 99 or less would have the scenario covered.
> >>
> >> Hmm.  Deletions are happening in the same order, i.e. small bridges
> >> first.  And here is how the last addition and the first deletion looks
> >> like when I ran the test on my machine (I added an extra call for
> >> memory/show to print actual history size):
> >>
> >> # make -j8 check TESTSUITEFLAGS='-k history -v'
> >> <...>
> >> ./ovsdb-server.at:1281 <http://ovsdb-server.at:1281>: ovs-vsctl
--no-wait add-br br$i $cmd
> >> atoms:26572 cells:89364 monitors:0 raft-log:102 txn-history:49
txn-history-atoms:26479
> >> n_db_atoms:          26572
> >> n_txn_history_atoms: 26479
> >> ./ovsdb-server.at:1268 <http://ovsdb-server.at:1268>: test
$n_txn_history_atoms -le $n_db_atoms
> >> ./ovsdb-server.at:1287 <http://ovsdb-server.at:1287>: ovs-vsctl
--no-wait del-br br$i
> >> atoms:26527 cells:89218 monitors:0 raft-log:103 txn-history:49
txn-history-atoms:26341
> >> n_db_atoms:          26527
> >> n_txn_history_atoms: 26341
> >> <...>
> >>
> >> So, the history doesn't grow larger than 49 entries in that case.
> >> Later, due to deletion of small bridges, history grows up to 55 entries
> >> and gradually goes down after that with removal of larger bridges.
> >>
> >> Is it different in you setup?  Or did I misunderstand?
> >
> > Ok, I ran it myself and it turned out I was wrong. I was expecting that
the atoms in the transaction history that adds a br and its ports should be
the same as the total atoms increased in the DB by this transaction, but it
turned out that the transaction history increases more than what's
increased in the DB. So my assumption "When i increases from 1 to 100, the
history is smaller or equal to the whole data" was wrong. Did you happen to
know why the txn has more than what's added to the DB?
>
> Yep.  Transaction basically holds 2x of some of the
> database data, because it holds both old and new versions
> of rows.  Addition of a bridge here is not purely addition,
> because ovs-vsctl will also add this new bridge to the
> 'bridges' column of the row in 'Open_vSwitch' table.
> And it will also change the value of the 'next_cfg' column
> in that row.  These are mutations that require having
> both old and new versions of a row in the transaction.
>
> So, each transaction will hold old and new versions of the
> single row from 'Open_vSwitch' table along with all the new
> data (bridge itself and ports).  Hence it will be a bit
> larger than the added new data.

OK. Sorry that I forgot that the bridges column will be updated.
Thanks for the explanation!

Han

>
> >
> >>
> >> >
> >> >> +dnl After removing all the bridges, the number of atoms in the
database
> >> >> +dnl should return to its initial value.
> >> >> +AT_CHECK([test $(get_n_atoms atoms) -eq $initial_db_atoms])
> >> >> +
> >> >>  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> >> >>  AT_CLEANUP
> >> >>
> >> >> ---
> >>
>


More information about the dev mailing list