[ovs-dev] ovn northbound ovsdb-server’s memory usage problem

Ben Pfaff blp at ovn.org
Mon Mar 7 23:17:24 UTC 2016


On Mon, Mar 07, 2016 at 01:57:22PM -0800, Han Zhou wrote:
> On Sun, Mar 6, 2016 at 11:02 PM, Lei Huang <huang.f.lei at gmail.com> wrote:
> >
> > Hi,
> >
> >
> > During a scalability test, we found that the ovn northbound ovsdb-server’s
> > memory usage becomes very high while creating and binding ports, the test
> > step is:
> >
> > 1. Create 1000 sandboxes
> >
> > 2. Create 5 lswitches and create 200 lports for each of lswitches,  on
> > northbound
> >
> > 3. For each of lport created in step 2, bind it to a sandboxes randomly,
> >  then use ‘ovn-nbctl wait-until Logical_Port <port-name> up=true’ to wait
> > the port is ‘up’ on northbound
> >
> > 4. Goto step 2
> >
> >
> > The VmRSS memory usages as below:
> >
> >
> > lport number  ovn-northd   nb ovsdb-server   sb ovsdb-server
> >
> >     0               1292              1124              7104
> >
> >  5000              21704           6476420             17424
> >
> > 10000              41680          25446148             31808
> >
> > 15000              61600          56893928            133864
> >
> > 20000              81844         100955012            176868
> >
> >
> > The memory log as below:
> >
> > 016-03-02T09:48:32.024Z|00003|memory|INFO|1124 kB peak resident set size
> > after 10.0 seconds
> >
> > 2016-03-02T16:24:46.430Z|00005|memory|INFO|peak resident set size grew
> 150%
> > in last 23774.4 seconds, from 1124 kB to 2808 kB
> >
> > 2016-03-02T16:25:06.432Z|00007|memory|INFO|peak resident set size grew
> 130%
> > in last 20.0 seconds, from 2808 kB to 6448 kB
> >
> > 2016-03-02T16:25:16.433Z|00009|memory|INFO|peak resident set size grew 54%
> > in last 10.0 seconds, from 6448 kB to 9948 kB
> >
> > 2016-03-02T16:25:26.434Z|00011|memory|INFO|peak resident set size grew 52%
> > in last 10.0 seconds, from 9948 kB to 15072 kB
> >
> > 2016-03-02T16:25:36.436Z|00013|memory|INFO|peak resident set size grew
> 178%
> > in last 10.0 seconds, from 15072 kB to 41956 kB
> >
> > 2016-03-02T16:26:36.439Z|00015|memory|INFO|peak resident set size grew
> 140%
> > in last 60.0 seconds, from 41956 kB to 100696 kB
> >
> > 2016-03-02T16:27:36.444Z|00017|memory|INFO|peak resident set size grew 78%
> > in last 60.0 seconds, from 100696 kB to 179108 kB
> >
> > 2016-03-02T16:28:46.444Z|00019|memory|INFO|peak resident set size grew 55%
> > in last 70.0 seconds, from 179108 kB to 277412 kB
> >
> > 2016-03-02T16:39:26.953Z|00021|memory|INFO|peak resident set size grew 50%
> > in last 640.5 seconds, from 277412 kB to 417472 kB
> >
> > 2016-03-02T16:41:06.960Z|00023|memory|INFO|peak resident set size grew 67%
> > in last 100.0 seconds, from 417472 kB to 696732 kB
> >
> > 2016-03-02T16:44:17.180Z|00025|memory|INFO|peak resident set size grew 54%
> > in last 190.2 seconds, from 696732 kB to 1071080 kB
> >
> > 2016-03-02T16:55:02.317Z|00027|memory|INFO|peak resident set size grew 68%
> > in last 645.1 seconds, from 1071080 kB to 1794276 kB
> >
> > 2016-03-02T17:03:22.544Z|00029|memory|INFO|peak resident set size grew 50%
> > in last 500.2 seconds, from 1794276 kB to 2698100 kB
> >
> > 2016-03-02T17:12:18.359Z|00031|memory|INFO|peak resident set size grew 54%
> > in last 535.8 seconds, from 2698100 kB to 4158336 kB
> >
> > 2016-03-02T17:27:59.897Z|00070|memory|INFO|peak resident set size grew 55%
> > in last 941.5 seconds, from 4158336 kB to 6458136 kB
> >
> > 2016-03-02T17:51:04.005Z|00136|memory|INFO|peak resident set size grew 53%
> > in last 1384.1 seconds, from 6458136 kB to 9896844 kB
> >
> > 2016-03-02T18:37:52.913Z|00243|memory|INFO|peak resident set size grew 57%
> > in last 2808.9 seconds, from 9896844 kB to 15539132 kB
> >
> > 2016-03-02T19:21:33.977Z|00414|memory|INFO|peak resident set size grew 51%
> > in last 2621.1 seconds, from 15539132 kB to 23403744 kB
> >
> > 2016-03-02T20:27:53.938Z|00661|memory|INFO|peak resident set size grew 51%
> > in last 3980.0 seconds, from 23403744 kB to 35348044 kB
> >
> > 2016-03-02T22:11:13.200Z|00957|memory|INFO|peak resident set size grew 53%
> > in last 6199.3 seconds, from 35348044 kB to 53943084 kB
> >
> > 2016-03-03T00:49:32.196Z|01306|memory|INFO|peak resident set size grew 51%
> > in last 9499.0 seconds, from 53943084 kB to 81658988 kB
> >
> >
> >
> > On another small test environment, I checked northbound ovsdb-server by
> > valgrind:
> >
> > $ valgrind --leak-check=full --show-leak-kinds=all ovsdb-server --no-chdir
> > --pidfile=ovsdb-server-nb.pid --unixctl=ovsdb-server-nb.ctl -vconsole:off
> > -vsyslog:off -vfile:info --log-file=ovsdb-server-nb.log
> > --remote=punix:/home/rally/controller-sandbox/db-nb.sock conf-nb.db
> ovnnb.db
> >
> >
> > But no obvious memory leak found. So I make ovsdb-server exit without
> > cleanup to see what are memory used for, code changed as below:
> >
> >
> > diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
> >
> > index fa662b1..4498eaa 100644
> >
> > --- a/ovsdb/ovsdb-server.c
> >
> > +++ b/ovsdb/ovsdb-server.c
> >
> > @@ -339,7 +339,7 @@ main(int argc, char *argv[])
> >
> >                               ovsdb_server_disable_monitor2, jsonrpc);
> >
> >
> >
> >      main_loop(jsonrpc, &all_dbs, unixctl, &remotes, run_process,
> &exiting);
> >
> > -
> >
> > +    return 0;
> >
> >      ovsdb_jsonrpc_server_destroy(jsonrpc);
> >
> >      SHASH_FOR_EACH_SAFE(node, next, &all_dbs) {
> >
> >          struct db *db = node->data;
> >
> >
> > After 100 port created and bound, the memory used is about 200M, and the
> > output of valgrind as below :
> >
> > ==23712== 24,994,032 bytes in 520,709 blocks are possibly lost in loss
> > record 228 of 229
> >
> > ==23712==    at 0x4C2CC70: calloc (in
> > /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> >
> > ==23712==    by 0x44140C: xcalloc (in /usr/local/sbin/ovsdb-server)
> >
> > ==23712==    by 0x44145A: xzalloc (in /usr/local/sbin/ovsdb-server)
> >
> > ==23712==    by 0x410DFF: ovsdb_monitor_changes_update (in
> > /usr/local/sbin/ovsdb-server)
> >
> > ==23712==    by 0x4110F6: ovsdb_monitor_change_cb (in
> > /usr/local/sbin/ovsdb-server)
> >
> > ==23712==    by 0x41780A: ovsdb_txn_for_each_change (in
> > /usr/local/sbin/ovsdb-server)
> >
> > ==23712==    by 0x4119A0: ovsdb_monitor_commit (in
> > /usr/local/sbin/ovsdb-server)
> >
> > ==23712==    by 0x41767C: ovsdb_txn_commit_ (in
> > /usr/local/sbin/ovsdb-server)
> >
> > ==23712==    by 0x41775D: ovsdb_txn_commit (in
> /usr/local/sbin/ovsdb-server)
> >
> > ==23712==    by 0x419011: ovsdb_execute (in /usr/local/sbin/ovsdb-server)
> >
> > ==23712==    by 0x414D2E: ovsdb_trigger_try (in
> > /usr/local/sbin/ovsdb-server)
> >
> > ==23712==    by 0x414A8B: ovsdb_trigger_init (in
> > /usr/local/sbin/ovsdb-server)
> >
> > ==23712==
> >
> > ==23712== 124,299,840 bytes in 520,709 blocks are possibly lost in loss
> > record 229 of 229
> >
> > ==23712==    at 0x4C2AB80: malloc (in
> > /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> >
> > ==23712==    by 0x441487: xmalloc (in /usr/local/sbin/ovsdb-server)
> >
> > ==23712==    by 0x40F975: clone_monitor_row_data (in
> > /usr/local/sbin/ovsdb-server)
> >
> > ==23712==    by 0x410E70: ovsdb_monitor_changes_update (in
> > /usr/local/sbin/ovsdb-server)
> >
> > ==23712==    by 0x4110F6: ovsdb_monitor_change_cb (in
> > /usr/local/sbin/ovsdb-server)
> >
> > ==23712==    by 0x41780A: ovsdb_txn_for_each_change (in
> > /usr/local/sbin/ovsdb-server)
> >
> > ==23712==    by 0x4119A0: ovsdb_monitor_commit (in
> > /usr/local/sbin/ovsdb-server)
> >
> > ==23712==    by 0x41767C: ovsdb_txn_commit_ (in
> > /usr/local/sbin/ovsdb-server)
> >
> > ==23712==    by 0x41775D: ovsdb_txn_commit (in
> /usr/local/sbin/ovsdb-server)
> >
> > ==23712==    by 0x419011: ovsdb_execute (in /usr/local/sbin/ovsdb-server)
> >
> > ==23712==    by 0x414D2E: ovsdb_trigger_try (in
> > /usr/local/sbin/ovsdb-server)
> >
> > ==23712==    by 0x414A8B: ovsdb_trigger_init (in
> > /usr/local/sbin/ovsdb-server)
> >
> > ==23712==
> >
> > ==23712== LEAK SUMMARY:
> >
> > ==23712==    definitely lost: 159 bytes in 4 blocks
> >
> > ==23712==    indirectly lost: 398 bytes in 13 blocks
> >
> > ==23712==      possibly lost: 212,065,579 bytes in 3,963,543 blocks
> >
> > ==23712==    still reachable: 1,156,295 bytes in 9,686 blocks
> >
> > ==23712==         suppressed: 0 bytes in 0 blocks
> >
> >
> > It’s seems like about 150M memory are used by ovsdb monitor, I am looking
> > into this problem, because I’m not familiar with ovsdb-ser’s source, this
> > may take some time. Do you have any idea about this problem?
> >
> >
> > BR
> >
> > Huang Lei
> > _______________________________________________
> > dev mailing list
> > dev at openvswitch.org
> > http://openvswitch.org/mailman/listinfo/dev
> 
> 
> Thanks Lei for reporting and the analysis! Update of the problem:
> 
> Both Andy and Lei found the bug in ovsdb monitor that when monitor is
> removed the tracked changes were not dereferenced thus memory not released.
> 
> Andy provided a quick fix and I just verified it: no obvious memory
> increase for ovsdb NB.
> 
> Big thanks to Andy and Lei. Looking forward to the formal patch from Andy :)

That's great news.  Thanks Andy and Lei!



More information about the dev mailing list