[ovs-dev] [PATCH 1/1] [branch-1.4] [ofproto-dpif] Memory leak at specific PACKET_INs

Ben Pfaff blp at nicira.com
Tue Apr 23 17:17:51 UTC 2013


I posted a fix.  Please review it:
        http://openvswitch.org/pipermail/dev/2013-April/026864.html

On Mon, Apr 22, 2013 at 05:13:28PM -0700, Ben Pfaff wrote:
> I think I see the problem.  It is subtle.  I'll write up a fix
> tomorrow morning.
> 
> On Mon, Apr 22, 2013 at 08:45:52PM +0100, Zoltan Kiss wrote:
> > I found one thing which might be related: these flow_del's are
> > called by facet_remove, and before them there are usually an another
> > flow_del called by facet_unexpected, which looks almost the same,
> > except the hardware address look like:
> > sha=01:0a:00:4e:00:00,tha=00:00:ff:ff:ff:ff
> > But the captures never show such ARP packets. Is it possible that
> > these fields get corrupted when they were installed to the datapath?
> > This src-target hw address combination are found often by
> > facet_unexpected, and never appears in the captures.
> > 
> > Regards,
> > 
> > Zoli
> > 
> > On 22/04/13 19:13, Zoltan Kiss wrote:
> > >Hi,
> > >
> > >On 16/04/13 23:53, Zoltan Kiss wrote:
> > >>On 15/04/13 18:15, Ben Pfaff wrote:
> > >>>On Mon, Apr 15, 2013 at 03:59:52PM +0100, Zoltan Kiss wrote:
> > >>>>When the packet is sent to the controller due to an userspace rule
> > >>>>(and not
> > >>>>a kernel-space flow), execute_controller_action is invoked with
> > >>>>clone=true,
> > >>>>so handle_flow_miss retains ownership of the packet buffer. But if it
> > >>>>returns
> > >>>>true (which means the packet had only a PACKET_IN action), nothing
> > >>>>frees up
> > >>>>the buffer.
> > >>>
> > >>>I think you're right.  But in that case, wouldn't it solve the problem
> > >>>in a better way (doing less memory allocation and copying) by passing
> > >>>clone=false, instead of passing clone=true and then freeing the packet
> > >>>in the caller?
> > >>
> > >>It sounds reasonable, and I was thinking about that, but I was worried
> > >>about the side-effects. Now I've tried it, and it seems it cause
> > >>problems indeed. Broadcast ARP packets are causing problem here:
> > >>
> > >>dpif|WARN|Dropped 26 log messages in last 1 seconds (most recently, 1
> > >>seconds ago) due to excessive rate
> > >>dpif|WARN|system at xenbr0: failed to flow_del (No such file or directory)
> > >>in_port(1),eth(src=ab:cd:ef:12:34:56,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=10.0.0.1,tip=10.0.0.2,op=1,sha=ab:cd:ef:12:34:56,tha=00:00:00:00:00:00)
> > >>
> > >>
> > >>
> > >>These messages are coming continuously if I install that joker rule to
> > >>userspace. And it doesn't happen with the original clone=true version. I
> > >>haven't found out yet why this happens, it shouldn't really change
> > >>anything but the time when the packet is freed.
> > >
> > >I've tried to find out why these warnings come with clone=false, but I
> > >didn't succeed yet. I've checked this code path:
> > >
> > >handle_flow_miss
> > >  execute_controller_action
> > >   send_packet_in_action
> > >    connmgr_send_packet_in (this is where clone makes difference as we
> > >pass rw_packet)
> > >     schedule_packet_in
> > >      ofputil_encode_packet_in (this is where we either dupe the buffer
> > >or use the original)
> > >      pinsched_send (in my tests there were no pinscheduler involved
> > >       do_send_packet_in (after this the code won't know about the
> > >content of the packet)
> > >
> > >But I couldn't find any place where it did matter whether we clone and
> > >pass a copy (and free the original immediately after
> > >execute_controller_action), or just give away the original buffer. And
> > >frankly, I'm running out of ideas where else to check. Does anyone touch
> > >that buffer in parallel? Any ideas?
> > >
> > >Regards,
> > >
> > >Zoli
> > >
> > >_______________________________________________
> > >dev mailing list
> > >dev at openvswitch.org
> > >http://openvswitch.org/mailman/listinfo/dev
> > 



More information about the dev mailing list