[ovs-discuss] ovs-vswitchd process huge memory consumption

Oleg Bondarev obondarev at mirantis.com
Tue Mar 12 13:27:04 UTC 2019


Hi Ben,

So I decided to try find the leak with help of Address Sanitizer as you
recommended.
Would the following approach make sense?:

 - build OVS from sources with flags CFLAGS="-g -O2 -fsanitize=leak
-fno-omit-frame-pointer -fno-common"
 - service openvswitch-switch stop
 - replace binary */usr/lib/openvswitch-switch/ovs-vswitchd*
 - service openvswitch-switch start
 - load the cluster to cause possible mem leaks
 - look for address sanitizer logs in syslog

Thanks,
Oleg

On Thu, Mar 7, 2019 at 1:33 PM Oleg Bondarev <obondarev at mirantis.com> wrote:

> Hi Ben,
>
> attaching the full dump-heap script output for a 11G core dump, probably
> it can bring some more clarity.
>
> Thanks,
> Oleg
>
> On Thu, Mar 7, 2019 at 11:54 AM Oleg Bondarev <obondarev at mirantis.com>
> wrote:
>
>>
>>
>> On Wed, Mar 6, 2019 at 7:01 PM Oleg Bondarev <obondarev at mirantis.com>
>> wrote:
>>
>>>
>>> I'm thinking if this can be malloc() not returning memory to the system
>>> after peak loads:
>>> *"Occasionally, free can actually return memory to the operating system
>>> and make the process smaller. Usually, all it can do is allow a later call
>>> to malloc to reuse the space. In the meantime, the space remains in your
>>> program as part of a free-list used internally by malloc." [1]*
>>>
>>> Does it sound sane? If yes, what would be a best way to check that?
>>>
>>
>> Seems that's not the case. On one of the nodes memory usage by
>> ovs-vswitchd grew from 84G to 87G for the past week, and on other nodes
>> grows gradually as well.
>>
>>
>>>
>>> [1] http://www.gnu.org/software/libc/manual/pdf/libc.pdf
>>>
>>> Thanks,
>>> Oleg
>>>
>>> On Wed, Mar 6, 2019 at 12:34 PM Oleg Bondarev <obondarev at mirantis.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> On Wed, Mar 6, 2019 at 1:08 AM Ben Pfaff <blp at ovn.org> wrote:
>>>>
>>>>> Starting from 0x30, this looks like a "minimatch" data structure, which
>>>>> is a kind of compressed bitwise match against a flow.
>>>>>
>>>>> 00000030: 0000 0000 0000 4014 0000 0000 0000 0000
>>>>> 00000040: 0000 0000 0000 0000 fa16 3e2b c5d5 0000 0000 0022 0000 0000
>>>>>
>>>>> 00000058: 0000 0000 0000 4014 0000 0000 0000 0000
>>>>> 00000068: 0000 0000 ffff ffff ffff ffff ffff 0000 0000 0fff 0000 0000
>>>>>
>>>>> I think this corresponds to a flow of this form:
>>>>>
>>>>>
>>>>> pkt_mark=0xc5d5/0xffff,skb_priority=0x3e2bfa16,reg13=0,mpls_label=2,mpls_tc=1,mpls_ttl=0,mpls_bos=0
>>>>>
>>>>> Is that at all meaningful?  Does it match anything that appears in the
>>>>> OpenFlow flow table?
>>>>>
>>>>
>>>> Not sure, actually fa:16:3e:2b:c5:d5 is a mac address of a neutron port
>>>> (this is an OpenStack cluster) - the port is a VM port.
>>>> fa:16:3e/fa:16:3f - are standard neutron mac prefixes. That makes me
>>>> think those might be some actual eth packets (broadcasts?) that somehow
>>>> stuck in memory..
>>>> So I didn't find anything similar in the flow tables. I'm attaching
>>>> flows of all 5 OVS bridges on the node.
>>>>
>>>>
>>>>>
>>>>> Are you using the kernel or DPDK datapath?
>>>>>
>>>>
>>>> It's kernel datapath, no DPDK. Ubuntu with 4.13.0-45  kernel.
>>>>
>>>>
>>>>>
>>>>> On Tue, Mar 05, 2019 at 08:42:14PM +0400, Oleg Bondarev wrote:
>>>>> > Hi,
>>>>> >
>>>>> > thanks for your help!
>>>>> >
>>>>> > On Tue, Mar 5, 2019 at 7:26 PM Ben Pfaff <blp at ovn.org> wrote:
>>>>> >
>>>>> > > You're talking about the email where you dumped out a repeating
>>>>> sequence
>>>>> > > from some blocks?  That might be the root of the problem, if you
>>>>> can
>>>>> > > provide some more context.  I didn't see from the message where you
>>>>> > > found the sequence (was it just at the beginning of each of the 4
>>>>> MB
>>>>> > > blocks you reported separately, or somewhere else), how many
>>>>> copies of
>>>>> > > it, or if you were able to figure out how long each of the blocks
>>>>> was.
>>>>> > > If you can provide that information I might be able to learn some
>>>>> > > things.
>>>>> > >
>>>>> >
>>>>> > Yes, those were beginnings of 0x4000000 size blocks reported by the
>>>>> script.
>>>>> > I also checked 0x8000000 blocks reported and the content is the same.
>>>>> > Examples of how those blocks end:
>>>>> >  - https://pastebin.com/D9M6T2BA
>>>>> >  - https://pastebin.com/gNT7XEGn
>>>>> >  - https://pastebin.com/fqy4XDbN
>>>>> >
>>>>> > So basically contents of the blocks are sequences of:
>>>>> >
>>>>> > *00000020: 0000 0000 0000 0000 6500 0000 0000 0000  ........e.......*
>>>>> > *00000030: 0000 0000 0000 4014 0000 0000 0000 0000  ...... at .........*
>>>>> > *00000040: 0000 0000 0000 0000 fa16 3e2b c5d5 0000  ..........>+....*
>>>>> > *00000050: 0000 0022 0000 0000 0000 0000 0000 4014  ...".......... at .*
>>>>> > *00000060: 0000 0000 0000 0000 0000 0000 ffff ffff  ................*
>>>>> > *00000070: ffff ffff ffff 0000 0000 0fff 0000 0000  ................*
>>>>> >
>>>>> > following each other and sometimes separated by sequences like this:
>>>>> >
>>>>> > *00001040: 6861 6e64 6c65 7232 3537 0000 0000 0000  handler257......*
>>>>> >
>>>>> > I ran the scripts against several core dumps of several compute
>>>>> nodes with
>>>>> > the issue and
>>>>> > the picture is pretty much the same: 0x4000000 blocks and less
>>>>> 0x8000000
>>>>> > blocks.
>>>>> > I checked the core dump from a compute node where OVS memory
>>>>> consumption
>>>>> > was ok:
>>>>> > no such block sizes reported.
>>>>> >
>>>>> >
>>>>> > >
>>>>> > > On Tue, Mar 05, 2019 at 09:07:55AM +0400, Oleg Bondarev wrote:
>>>>> > > > Hi Ben,
>>>>> > > >
>>>>> > > > I didn't have a chance to debug the scripts yet, but just in
>>>>> case you
>>>>> > > > missed my last email with examples of repeatable blocks
>>>>> > > > and sequences - do you think we still need to analyze further,
>>>>> will the
>>>>> > > > scripts tell more about the heap?
>>>>> > > >
>>>>> > > > Thanks,
>>>>> > > > Oleg
>>>>> > > >
>>>>> > > > On Thu, Feb 28, 2019 at 10:14 PM Ben Pfaff <blp at ovn.org> wrote:
>>>>> > > >
>>>>> > > > > On Tue, Feb 26, 2019 at 01:41:45PM +0400, Oleg Bondarev wrote:
>>>>> > > > > > Hi,
>>>>> > > > > >
>>>>> > > > > > thanks for the scripts, so here's the output for a 24G core
>>>>> dump:
>>>>> > > > > > https://pastebin.com/hWa3R9Fx
>>>>> > > > > > there's 271 entries of 4MB - does it seem something we
>>>>> should take a
>>>>> > > > > closer
>>>>> > > > > > look at?
>>>>> > > > >
>>>>> > > > > I think that this output really just indicates that the script
>>>>> failed.
>>>>> > > > > It analyzed a lot of regions but didn't output anything
>>>>> useful.  If it
>>>>> > > > > had worked properly, it would have told us a lot about data
>>>>> blocks that
>>>>> > > > > had been allocated and freed.
>>>>> > > > >
>>>>> > > > > The next step would have to be to debug the script.  It
>>>>> definitely
>>>>> > > > > worked for me before, because I have fixed at least 3 or 4
>>>>> bugs based
>>>>> > > on
>>>>> > > > > it, but it also definitely is a quick hack and not something
>>>>> that I can
>>>>> > > > > stand behind.  I'm not sure how to debug it at a distance.  It
>>>>> has a
>>>>> > > > > large comment that describes what it's trying to do.  Maybe
>>>>> that would
>>>>> > > > > help you, if you want to try to debug it yourself.  I guess
>>>>> it's also
>>>>> > > > > possible that glibc has changed its malloc implementation; if
>>>>> so, then
>>>>> > > > > it would probably be necessary to start over and build a new
>>>>> script.
>>>>> > > > >
>>>>> > >
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20190312/550d94e0/attachment.html>


More information about the discuss mailing list