[ovs-discuss] [OVN] Cluster mode ovsdb memory keeps increasing

Mon Dec 16 06:53:04 UTC 2019

Hmm... I am not sure if it is normal. In our environment with even larger
scale (in terms of number of ports), the memory is usually less than 1GB
for each ovsdb-server process, and we didn't see symptom of memory leak
(after months of clustered mode deployment in live environment).

Could you check which DB (NB or SB) is the major memory consumer? If it is
SB DB, it is normal to see memory spike for the first time you switch from
standalone mode to clustered mode if you have big number of compute nodes
connected to SB DB. After it stablizes, the memory footprint should
decrease. Of course it is possible that you have more complex scenarios
that triggers more memory consumption (or memory leak). Could you try with
a server with more memory (to avoid OOM killer), to see if it stablizes at
some point or just keep increasing day after day?

Besides, I believe Ben has some better trouble shooting steps for memory
issues of ovsdb-server. Ben, could you suggest?

Thanks,
Han

On Sun, Dec 15, 2019 at 9:56 PM 刘梦馨 <liumengxinfly at gmail.com> wrote:
>
> After more iteration (6 in my environment) the rss usage stabilized in
759128KB.
>
> This is a really simplified test, in our real environment we run about
3000 containers and with lots other operations, like set route,
loadbalancer, all the ovn-sb operations etc. The memory consumption can
quickly go up to 6GB (nb and sb together) and lead a system OOM.  Is that a
reasonable resource consumption in your experience? I didn't remember the
actual numbers of standalone db resource consumption, however in the same
environment, it didn't lead to an OOM.
>
> Han Zhou <hzhou at ovn.org> 于2019年12月16日周一 下午1:05写道：
>>
>> Thanks for the details. I tried the same command with a for loop.
>>
>> After the first 4 iterations, the RSS of the first NB server increased
to 572888 (KB). After that, it stayed the same in the next 3 iterations. So
it seems to just build memory buffers up and then stayed at the level
without further increasing and doesn't seem to be memory leaking. Could you
try more iterations and see if it still continuously increase?
>>
>> Thanks,
>> Han
>>
>> On Sun, Dec 15, 2019 at 7:54 PM 刘梦馨 <liumengxinfly at gmail.com> wrote:
>> >
>> > Hi, Han
>> >
>> > In my test scenario, I use ovn-ctl to start a one node ovn with
cluster mode db and no chassis bind to the ovn-sb to just check the memory
usage of ovn-nb.
>> > Then use a script to add a logical switch, add 1000 ports, set dynamic
addresses and then delete the logical switch.
>> >
>> > #!/bin/bash
>> > ovn-nbctl ls-add ls1
>> > for i in {1..1000}; do
>> >   ovn-nbctl lsp-add ls1 ls1-vm$i
>> >   ovn-nbctl lsp-set-addresses ls1-vm$i dynamic
>> > done
>> > ovn-nbctl ls-del ls1
>> >
>> > I run this script repeatedly and watch the memory change.
>> >
>> > After 5 runs (5000 lsp add and delete), the rss of nb increased to
667M.
>> > The nb file increased to 119M and didn't automatically compacted.
After a manually compact the db file size change back to 11K, but the
memory usage didn't change.
>> >
>> >
>> >
>> > Han Zhou <hzhou at ovn.org> 于2019年12月14日周六 上午3:40写道：
>> >>
>> >>
>> >>
>> >> On Wed, Dec 11, 2019 at 12:51 AM 刘梦馨 <liumengxinfly at gmail.com> wrote:
>> >> >
>> >> >
>> >> > We are using ovs/ovn 2.12.0 to implementing our container network.
After switching form standalone ovndb to cluster mode ovndb, we noticed
that the memory consumption for both ovnnb and ovnsb will keep increasing
after each operation and never decrease.
>> >> >
>> >> > We did some profiling by valgrind. The leak check report a 16 byte
leak in fork_and_wait_for_startup, which obviously is not the main reason.
Later we use memif to profile the memory consumption and we put the result
in the attachment.
>> >> >
>> >> > Most of the memory come from two part ovsthread_wrapper
(ovs-thread.c:378) that allocates a subprogram_name and  jsonrpc_send
(jsonrpc.c:253) as below, (I just skipped the duplicated stack of jsonrpc).
>> >> >
>> >> > However I found both part have a related free operation in near
place, so I don't know how to further explore this memory issue. I'm not
aware of the differences here between cluster mode and standalone mode.
>> >> >
>> >> > Can anyone give some advice and hint? Thanks!
>> >> >
>> >> > 100.00% (357,920,768B) (page allocation syscalls) mmap/mremap/brk,
--alloc-fns, etc.
>> >> > ->78.52% (281,038,848B) 0x66FDD49: mmap (in /usr/lib64/libc-2.17.so)
>> >> > | ->37.50% (134,217,728B) 0x66841EF: new_heap (in /usr/lib64/
libc-2.17.so)
>> >> > | | ->37.50% (134,217,728B) 0x6684C22: arena_get2.isra.3 (in
/usr/lib64/libc-2.17.so)
>> >> > | |   ->37.50% (134,217,728B) 0x668AACC: malloc (in /usr/lib64/
libc-2.17.so)
>> >> > | |     ->37.50% (134,217,728B) 0x4FDC613: xmalloc (util.c:138)
>> >> > | |       ->37.50% (134,217,728B) 0x4FDC78E: xvasprintf (util.c:202)
>> >> > | |         ->37.50% (134,217,728B) 0x4FDC877: xasprintf
(util.c:343)
>> >> > | |           ->37.50% (134,217,728B) 0x4FA548D: ovsthread_wrapper
(ovs-thread.c:378)
>> >> > | |             ->37.50% (134,217,728B) 0x5BE5E63: start_thread (in
/usr/lib64/libpthread-2.17.so)
>> >> > | |               ->37.50% (134,217,728B) 0x670388B: clone (in
/usr/lib64/libc-2.17.so)
>> >> > | |
>> >> > | ->36.33% (130,023,424B) 0x6686DF3: sysmalloc (in /usr/lib64/
libc-2.17.so)
>> >> > | | ->36.33% (130,023,424B) 0x6687CA8: _int_malloc (in /usr/lib64/
libc-2.17.so)
>> >> > | |   ->28.42% (101,711,872B) 0x66890C0: _int_realloc (in
/usr/lib64/libc-2.17.so)
>> >> > | |   | ->28.42% (101,711,872B) 0x668B160: realloc (in /usr/lib64/
libc-2.17.so)
>> >> > | |   |   ->28.42% (101,711,872B) 0x4FDC9A3: xrealloc (util.c:149)
>> >> > | |   |     ->28.42% (101,711,872B) 0x4F1DEB2: ds_reserve
(dynamic-string.c:63)
>> >> > | |   |       ->28.42% (101,711,872B) 0x4F1DED3: ds_put_uninit
(dynamic-string.c:73)
>> >> > | |   |         ->28.42% (101,711,872B) 0x4F1DF0B: ds_put_char__
(dynamic-string.c:82)
>> >> > | |   |           ->26.37% (94,371,840B) 0x4F2B09F:
json_serialize_string (dynamic-string.h:93)
>> >> > | |   |           | ->12.01% (42,991,616B) 0x4F2B3EA:
json_serialize (json.c:1651)
>> >> > | |   |           | | ->12.01% (42,991,616B) 0x4F2B3EA:
json_serialize (json.c:1651)
>> >> > | |   |           | |   ->12.01% (42,991,616B) 0x4F2B3EA:
json_serialize (json.c:1651)
>> >> > | |   |           | |     ->12.01% (42,991,616B) 0x4F2B540:
json_serialize (json.c:1626)
>> >> > | |   |           | |       ->12.01% (42,991,616B) 0x4F2B540:
json_serialize (json.c:1626)
>> >> > | |   |           | |         ->12.01% (42,991,616B) 0x4F2B540:
json_serialize (json.c:1626)
>> >> > | |   |           | |           ->12.01% (42,991,616B) 0x4F2B540:
json_serialize (json.c:1626)
>> >> > | |   |           | |             ->12.01% (42,991,616B) 0x4F2B3EA:
json_serialize (json.c:1651)
>> >> > | |   |           | |               ->12.01% (42,991,616B)
0x4F2B540: json_serialize (json.c:1626)
>> >> > | |   |           | |                 ->12.01% (42,991,616B)
0x4F2D82A: json_to_ds (json.c:1525)
>> >> > | |   |           | |                   ->12.01% (42,991,616B)
0x4F2EA49: jsonrpc_send (jsonrpc.c:253)
>> >> > | |   |           | |                     ->12.01% (42,991,616B)
0x4C3A68A: ovsdb_jsonrpc_server_run (jsonrpc-server.c:1104)
>> >> > | |   |           | |                       ->12.01% (42,991,616B)
0x10DCC1: main (ovsdb-server.c:209)
>> >> >
>> >> > _______________________________________________
>> >> > discuss mailing list
>> >> > discuss at openvswitch.org
>> >> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>> >>
>> >> Thanks for reporting the issue. Could you describe your test scenario
(the operations), the scale, the db file size and the memory (RSS) data of
the NB/SB?
>> >> Clustered mode maintains some extra data such as RAFT logs, compares
to standalone, but it should not increase forever, because RAFT logs will
get compacted periodically.
>> >>
>> >> Thanks,
>> >> Han
>> >
>> >
>> >
>> > --
>> > 刘梦馨
>> > Blog： http://oilbeater.com
>> > Weibo: @oilbeater
>
>
>
> --
> 刘梦馨
> Blog： http://oilbeater.com
> Weibo: @oilbeater
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20191215/afa6ea43/attachment-0001.html>