[ovs-discuss] [OVN] Cluster mode ovsdb memory keeps increasing

刘梦馨 liumengxinfly at gmail.com
Mon Dec 16 07:44:24 UTC 2019


Thanks for your advice.

Maybe it's still related to our scenario. What we did is to test if our new
ovn cni can hold 10000 pods in a kubernetes cluster, so we create lots of
containers in a very short period of time and it stopped at about 3000
containers when hitting the memory issue.

I will try to use a larger server and slow down the container creation to
see the difference.

Han Zhou <hzhou at ovn.org> 于2019年12月16日周一 下午2:53写道:

> Hmm... I am not sure if it is normal. In our environment with even larger
> scale (in terms of number of ports), the memory is usually less than 1GB
> for each ovsdb-server process, and we didn't see symptom of memory leak
> (after months of clustered mode deployment in live environment).
>
> Could you check which DB (NB or SB) is the major memory consumer? If it is
> SB DB, it is normal to see memory spike for the first time you switch from
> standalone mode to clustered mode if you have big number of compute nodes
> connected to SB DB. After it stablizes, the memory footprint should
> decrease. Of course it is possible that you have more complex scenarios
> that triggers more memory consumption (or memory leak). Could you try with
> a server with more memory (to avoid OOM killer), to see if it stablizes at
> some point or just keep increasing day after day?
>
> Besides, I believe Ben has some better trouble shooting steps for memory
> issues of ovsdb-server. Ben, could you suggest?
>
> Thanks,
> Han
>
> On Sun, Dec 15, 2019 at 9:56 PM 刘梦馨 <liumengxinfly at gmail.com> wrote:
> >
> > After more iteration (6 in my environment) the rss usage stabilized in
> 759128KB.
> >
> > This is a really simplified test, in our real environment we run about
> 3000 containers and with lots other operations, like set route,
> loadbalancer, all the ovn-sb operations etc. The memory consumption can
> quickly go up to 6GB (nb and sb together) and lead a system OOM.  Is that a
> reasonable resource consumption in your experience? I didn't remember the
> actual numbers of standalone db resource consumption, however in the same
> environment, it didn't lead to an OOM.
> >
> > Han Zhou <hzhou at ovn.org> 于2019年12月16日周一 下午1:05写道:
> >>
> >> Thanks for the details. I tried the same command with a for loop.
> >>
> >> After the first 4 iterations, the RSS of the first NB server increased
> to 572888 (KB). After that, it stayed the same in the next 3 iterations. So
> it seems to just build memory buffers up and then stayed at the level
> without further increasing and doesn't seem to be memory leaking. Could you
> try more iterations and see if it still continuously increase?
> >>
> >> Thanks,
> >> Han
> >>
> >> On Sun, Dec 15, 2019 at 7:54 PM 刘梦馨 <liumengxinfly at gmail.com> wrote:
> >> >
> >> > Hi, Han
> >> >
> >> > In my test scenario, I use ovn-ctl to start a one node ovn with
> cluster mode db and no chassis bind to the ovn-sb to just check the memory
> usage of ovn-nb.
> >> > Then use a script to add a logical switch, add 1000 ports, set
> dynamic addresses and then delete the logical switch.
> >> >
> >> > #!/bin/bash
> >> > ovn-nbctl ls-add ls1
> >> > for i in {1..1000}; do
> >> >   ovn-nbctl lsp-add ls1 ls1-vm$i
> >> >   ovn-nbctl lsp-set-addresses ls1-vm$i dynamic
> >> > done
> >> > ovn-nbctl ls-del ls1
> >> >
> >> > I run this script repeatedly and watch the memory change.
> >> >
> >> > After 5 runs (5000 lsp add and delete), the rss of nb increased to
> 667M.
> >> > The nb file increased to 119M and didn't automatically compacted.
> After a manually compact the db file size change back to 11K, but the
> memory usage didn't change.
> >> >
> >> >
> >> >
> >> > Han Zhou <hzhou at ovn.org> 于2019年12月14日周六 上午3:40写道:
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Dec 11, 2019 at 12:51 AM 刘梦馨 <liumengxinfly at gmail.com>
> wrote:
> >> >> >
> >> >> >
> >> >> > We are using ovs/ovn 2.12.0 to implementing our container network.
> After switching form standalone ovndb to cluster mode ovndb, we noticed
> that the memory consumption for both ovnnb and ovnsb will keep increasing
> after each operation and never decrease.
> >> >> >
> >> >> > We did some profiling by valgrind. The leak check report a 16 byte
> leak in fork_and_wait_for_startup, which obviously is not the main reason.
> Later we use memif to profile the memory consumption and we put the result
> in the attachment.
> >> >> >
> >> >> > Most of the memory come from two part ovsthread_wrapper
> (ovs-thread.c:378) that allocates a subprogram_name and  jsonrpc_send
> (jsonrpc.c:253) as below, (I just skipped the duplicated stack of jsonrpc).
> >> >> >
> >> >> > However I found both part have a related free operation in near
> place, so I don't know how to further explore this memory issue. I'm not
> aware of the differences here between cluster mode and standalone mode.
> >> >> >
> >> >> > Can anyone give some advice and hint? Thanks!
> >> >> >
> >> >> > 100.00% (357,920,768B) (page allocation syscalls) mmap/mremap/brk,
> --alloc-fns, etc.
> >> >> > ->78.52% (281,038,848B) 0x66FDD49: mmap (in /usr/lib64/
> libc-2.17.so)
> >> >> > | ->37.50% (134,217,728B) 0x66841EF: new_heap (in /usr/lib64/
> libc-2.17.so)
> >> >> > | | ->37.50% (134,217,728B) 0x6684C22: arena_get2.isra.3 (in
> /usr/lib64/libc-2.17.so)
> >> >> > | |   ->37.50% (134,217,728B) 0x668AACC: malloc (in /usr/lib64/
> libc-2.17.so)
> >> >> > | |     ->37.50% (134,217,728B) 0x4FDC613: xmalloc (util.c:138)
> >> >> > | |       ->37.50% (134,217,728B) 0x4FDC78E: xvasprintf
> (util.c:202)
> >> >> > | |         ->37.50% (134,217,728B) 0x4FDC877: xasprintf
> (util.c:343)
> >> >> > | |           ->37.50% (134,217,728B) 0x4FA548D: ovsthread_wrapper
> (ovs-thread.c:378)
> >> >> > | |             ->37.50% (134,217,728B) 0x5BE5E63: start_thread
> (in /usr/lib64/libpthread-2.17.so)
> >> >> > | |               ->37.50% (134,217,728B) 0x670388B: clone (in
> /usr/lib64/libc-2.17.so)
> >> >> > | |
> >> >> > | ->36.33% (130,023,424B) 0x6686DF3: sysmalloc (in /usr/lib64/
> libc-2.17.so)
> >> >> > | | ->36.33% (130,023,424B) 0x6687CA8: _int_malloc (in /usr/lib64/
> libc-2.17.so)
> >> >> > | |   ->28.42% (101,711,872B) 0x66890C0: _int_realloc (in
> /usr/lib64/libc-2.17.so)
> >> >> > | |   | ->28.42% (101,711,872B) 0x668B160: realloc (in /usr/lib64/
> libc-2.17.so)
> >> >> > | |   |   ->28.42% (101,711,872B) 0x4FDC9A3: xrealloc (util.c:149)
> >> >> > | |   |     ->28.42% (101,711,872B) 0x4F1DEB2: ds_reserve
> (dynamic-string.c:63)
> >> >> > | |   |       ->28.42% (101,711,872B) 0x4F1DED3: ds_put_uninit
> (dynamic-string.c:73)
> >> >> > | |   |         ->28.42% (101,711,872B) 0x4F1DF0B: ds_put_char__
> (dynamic-string.c:82)
> >> >> > | |   |           ->26.37% (94,371,840B) 0x4F2B09F:
> json_serialize_string (dynamic-string.h:93)
> >> >> > | |   |           | ->12.01% (42,991,616B) 0x4F2B3EA:
> json_serialize (json.c:1651)
> >> >> > | |   |           | | ->12.01% (42,991,616B) 0x4F2B3EA:
> json_serialize (json.c:1651)
> >> >> > | |   |           | |   ->12.01% (42,991,616B) 0x4F2B3EA:
> json_serialize (json.c:1651)
> >> >> > | |   |           | |     ->12.01% (42,991,616B) 0x4F2B540:
> json_serialize (json.c:1626)
> >> >> > | |   |           | |       ->12.01% (42,991,616B) 0x4F2B540:
> json_serialize (json.c:1626)
> >> >> > | |   |           | |         ->12.01% (42,991,616B) 0x4F2B540:
> json_serialize (json.c:1626)
> >> >> > | |   |           | |           ->12.01% (42,991,616B) 0x4F2B540:
> json_serialize (json.c:1626)
> >> >> > | |   |           | |             ->12.01% (42,991,616B)
> 0x4F2B3EA: json_serialize (json.c:1651)
> >> >> > | |   |           | |               ->12.01% (42,991,616B)
> 0x4F2B540: json_serialize (json.c:1626)
> >> >> > | |   |           | |                 ->12.01% (42,991,616B)
> 0x4F2D82A: json_to_ds (json.c:1525)
> >> >> > | |   |           | |                   ->12.01% (42,991,616B)
> 0x4F2EA49: jsonrpc_send (jsonrpc.c:253)
> >> >> > | |   |           | |                     ->12.01% (42,991,616B)
> 0x4C3A68A: ovsdb_jsonrpc_server_run (jsonrpc-server.c:1104)
> >> >> > | |   |           | |                       ->12.01% (42,991,616B)
> 0x10DCC1: main (ovsdb-server.c:209)
> >> >> >
> >> >> > _______________________________________________
> >> >> > discuss mailing list
> >> >> > discuss at openvswitch.org
> >> >> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >> >>
> >> >> Thanks for reporting the issue. Could you describe your test
> scenario (the operations), the scale, the db file size and the memory (RSS)
> data of the NB/SB?
> >> >> Clustered mode maintains some extra data such as RAFT logs, compares
> to standalone, but it should not increase forever, because RAFT logs will
> get compacted periodically.
> >> >>
> >> >> Thanks,
> >> >> Han
> >> >
> >> >
> >> >
> >> > --
> >> > 刘梦馨
> >> > Blog: http://oilbeater.com
> >> > Weibo: @oilbeater
> >
> >
> >
> > --
> > 刘梦馨
> > Blog: http://oilbeater.com
> > Weibo: @oilbeater
>


-- 
刘梦馨
Blog: http://oilbeater.com
Weibo: @oilbeater <http://weibo.com/oilbeater>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20191216/26c7acc9/attachment.html>


More information about the discuss mailing list