[ovs-dev] [PATCH 1/1] [branch-1.4] [ofproto-dpif] Memory leak at specific PACKET_INs

Ben Pfaff blp at nicira.com
Tue Apr 23 00:13:28 UTC 2013


I think I see the problem.  It is subtle.  I'll write up a fix
tomorrow morning.

On Mon, Apr 22, 2013 at 08:45:52PM +0100, Zoltan Kiss wrote:
> I found one thing which might be related: these flow_del's are
> called by facet_remove, and before them there are usually an another
> flow_del called by facet_unexpected, which looks almost the same,
> except the hardware address look like:
> sha=01:0a:00:4e:00:00,tha=00:00:ff:ff:ff:ff
> But the captures never show such ARP packets. Is it possible that
> these fields get corrupted when they were installed to the datapath?
> This src-target hw address combination are found often by
> facet_unexpected, and never appears in the captures.
> 
> Regards,
> 
> Zoli
> 
> On 22/04/13 19:13, Zoltan Kiss wrote:
> >Hi,
> >
> >On 16/04/13 23:53, Zoltan Kiss wrote:
> >>On 15/04/13 18:15, Ben Pfaff wrote:
> >>>On Mon, Apr 15, 2013 at 03:59:52PM +0100, Zoltan Kiss wrote:
> >>>>When the packet is sent to the controller due to an userspace rule
> >>>>(and not
> >>>>a kernel-space flow), execute_controller_action is invoked with
> >>>>clone=true,
> >>>>so handle_flow_miss retains ownership of the packet buffer. But if it
> >>>>returns
> >>>>true (which means the packet had only a PACKET_IN action), nothing
> >>>>frees up
> >>>>the buffer.
> >>>
> >>>I think you're right.  But in that case, wouldn't it solve the problem
> >>>in a better way (doing less memory allocation and copying) by passing
> >>>clone=false, instead of passing clone=true and then freeing the packet
> >>>in the caller?
> >>
> >>It sounds reasonable, and I was thinking about that, but I was worried
> >>about the side-effects. Now I've tried it, and it seems it cause
> >>problems indeed. Broadcast ARP packets are causing problem here:
> >>
> >>dpif|WARN|Dropped 26 log messages in last 1 seconds (most recently, 1
> >>seconds ago) due to excessive rate
> >>dpif|WARN|system at xenbr0: failed to flow_del (No such file or directory)
> >>in_port(1),eth(src=ab:cd:ef:12:34:56,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=10.0.0.1,tip=10.0.0.2,op=1,sha=ab:cd:ef:12:34:56,tha=00:00:00:00:00:00)
> >>
> >>
> >>
> >>These messages are coming continuously if I install that joker rule to
> >>userspace. And it doesn't happen with the original clone=true version. I
> >>haven't found out yet why this happens, it shouldn't really change
> >>anything but the time when the packet is freed.
> >
> >I've tried to find out why these warnings come with clone=false, but I
> >didn't succeed yet. I've checked this code path:
> >
> >handle_flow_miss
> >  execute_controller_action
> >   send_packet_in_action
> >    connmgr_send_packet_in (this is where clone makes difference as we
> >pass rw_packet)
> >     schedule_packet_in
> >      ofputil_encode_packet_in (this is where we either dupe the buffer
> >or use the original)
> >      pinsched_send (in my tests there were no pinscheduler involved
> >       do_send_packet_in (after this the code won't know about the
> >content of the packet)
> >
> >But I couldn't find any place where it did matter whether we clone and
> >pass a copy (and free the original immediately after
> >execute_controller_action), or just give away the original buffer. And
> >frankly, I'm running out of ideas where else to check. Does anyone touch
> >that buffer in parallel? Any ideas?
> >
> >Regards,
> >
> >Zoli
> >
> >_______________________________________________
> >dev mailing list
> >dev at openvswitch.org
> >http://openvswitch.org/mailman/listinfo/dev
> 



More information about the dev mailing list