[ovs-discuss] ovs-vswitchd process huge memory consumption

Oleg Bondarev obondarev at mirantis.com
Thu Feb 28 13:18:48 UTC 2019


Hi Ben,

so here're examples of how those repeatable blocks look like:
https://pastebin.com/JBUaeX44
https://pastebin.com/wKreHDJf
https://pastebin.com/f41knqgn

All those blocks are mostly filled with such sequence:
"0000 0000 0000 0000 fa16 3e39 83c4 0000  ..........>9....
 0000 0030 0000 0000 0000 0000 0000 4014  ...0.......... at .
 0000 0000 0000 0000 0000 0000 ffff ffff  ................
 ffff ffff ffff 0000 0000 0fff 0000 0000  ................
 0000 0000 0000 0000 6500 0000 0000 0000  ........e.......
 0000 0000 0000 4014 0000 0000 0000 0000  ...... at ........."

where "fa 16 3e xx xx xx" are openstack neutron ports mac addresses
(actually VMs MACs as confirmed on the env with issue).
Can we figure something out of this?

Thanks,
Oleg

On Wed, Feb 27, 2019 at 8:41 AM Oleg Bondarev <obondarev at mirantis.com>
wrote:

>
>
> On Tue, Feb 26, 2019 at 1:41 PM Oleg Bondarev <obondarev at mirantis.com>
> wrote:
>
>> Hi,
>>
>> thanks for the scripts, so here's the output for a 24G core dump:
>> https://pastebin.com/hWa3R9Fx
>> there's 271 entries of 4MB - does it seem something we should take a
>> closer look at?
>>
>
> not 4 but ~67Mb of course.
>
>
>>
>> Thanks,
>> Oleg
>>
>> On Tue, Feb 26, 2019 at 3:26 AM Ben Pfaff <blp at ovn.org> wrote:
>>
>>> Some combinations of kernel bonding with Open vSwitch don't necessarily
>>> work that well.  I have forgotten which ones are problematic or why.
>>> However, the problems are functional ones (the bonds don't work well),
>>> not operational ones like memory leaks.  A memory leak would be a bug
>>> whether or not the configuration used kernel bonds in a way that we
>>> discourage.
>>>
>>> With small OpenFlow flow tables (maybe a few thousand flows) and a small
>>> number of ports (maybe up to 100 or so), using the Linux kernel module
>>> (not DPDK), I'd be surprised to see OVS use more than 100 MB.  I'd also
>>> be surprised to see OVS memory usage grow past, say, the first day or
>>> usage.
>>>
>>> With large numbers of OpenFlow flows (e.g. hundreds of thousands), my
>>> rule of thumb is that OVS tends to use about 1 to 2 kB RAM per flow.
>>>
>>> A dump of a 500 MB process could still be enlightening.  Usually, when
>>> there is a memory leak, one sees an extraordinary number of unfreed
>>> memory blocks of a single size, and by looking at their size and their
>>> contents one can often figure out what allocated them, and that usually
>>> leads one to figure out why they did not get freed.
>>>
>>> On Mon, Feb 25, 2019 at 09:46:28PM +0000, Fernando Casas Schössow wrote:
>>> >
>>> > I read in a few places that mixing OS networking features (like
>>> bonding) and OVS is not a good idea and that the recommendation is to do
>>> everything at OVS level. That's why I assumed the configuration was not ok
>>> (even when it worked correctly for around two years albeit the high memory
>>> usage I detected).
>>> >
>>> > How many MB of RAM would you consider normal in a small setup like
>>> this one? Just to make myself an idea.
>>> >
>>> > I just finished a maintenance window on this server that required a
>>> reboot.
>>> > Right after reboot ovs-vswitchd is using 14MB of RAM.
>>> > I will keep monitoring the process memory usage usage and report back
>>> after two weeks or so.
>>> >
>>> > Would it make sense to get a process dump for analysis even if memory
>>> usage is not going as high (several GBs) as before the config change? In
>>> other words, if I find that the process memory usage grows up to around
>>> 500MB but then becomes steady and is not growing anymore would it make
>>> sense to collect a dump for analysis?
>>> >
>>> > On lun, feb 25, 2019 at 5:48 PM, Ben Pfaff <blp at ovn.org> wrote:
>>> > Both configurations should work, so probably you did find a bug
>>> causing a memory leak in the former configuration. 464 MB actually sounds
>>> like a lot also. On Sun, Feb 24, 2019 at 02:58:02PM +0000, Fernando Casas
>>> Schössow wrote:
>>> > Hi Ben, In my case I think I found the cause of the issue, and it was
>>> indeed a misconfiguration on my side. Yet I'm not really sure why the
>>> misconfiguration was causing the high memory usage on OVS. The server has 4
>>> NICs. Bonded in two bonds of two. The problem I think it was that the
>>> bonding was done at OS level (Linux kernel bonding) instead of at OVS
>>> level. So there were two interfaces at OS level (bond0 and bond1) with
>>> bond0 added to OVS as an uplink port. I changed that configuration, removed
>>> all the bonding at OS level and instead created the bonds at OVS level.
>>> Then I restarted the service so I can monitor memory usage. After this
>>> change, memory usage growth from 10MB (at service start) to 464MB after a
>>> few hours and then stayed at that level until today (a week later). I'm
>>> still monitoring the process memory usage but as I said is steady for
>>> almost a week so I will keep monitoring it for a couple more weeks just in
>>> case and report back. Thanks. Kind regards, Fernando On sáb, feb 23, 2019
>>> at 12:23 AM, Ben Pfaff <blp at ovn.org<mailto:blp at ovn.org>> wrote: It's
>>> odd that two people would notice the same problem at the same time on old
>>> branches. Anyway, I'm attaching the scripts I have. They are rough. The
>>> second one invokes the first one as a subprocess; it is probably the one
>>> you should use. I might have to walk you through how to use it, or write
>>> better documentation myself. Anyway, it should be a start. On Wed, Feb 20,
>>> 2019 at 07:15:26PM +0400, Oleg Bondarev wrote: Ah, sorry, I missed
>>> "ovs-vswitchd memory consumption behavior" thread. So I guess I'm also
>>> interested in the scripts for analyzing the heap in a core dump :) Thanks,
>>> Oleg On Wed, Feb 20, 2019 at 7:00 PM Oleg Bondarev <
>>> obondarev at mirantis.com<mailto:obondarev at mirantis.com><mailto:
>>> obondarev at mirantis.com>> wrote: > Hi, > > OVS 2.8.0, uptime 197 days,
>>> 44G RAM. > ovs-appctl memory/show reports: > "handlers:35 ofconns:4
>>> ports:73 revalidators:13 rules:1099 udpif > keys:686" > > Similar data on
>>> other nodes of the OpenStack cluster. > Seems usage grows gradually over
>>> time. > Are there any known issues, like >
>>> http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-14970? > Please
>>> advise on the best way to debug. > > Thanks, > Oleg > >
>>> _______________________________________________ discuss mailing list
>>> discuss at openvswitch.org<mailto:discuss at openvswitch.org><mailto:
>>> discuss at openvswitch.org>
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>> >
>>> >
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20190228/c9c87f4e/attachment.html>


More information about the discuss mailing list