[ovs-discuss] kernel panic under heavy receive load

Chris Dunlop chris at onthe.net.au
Fri Mar 6 03:13:38 UTC 2015


On Wed, Mar 04, 2015 at 03:56:41PM -0500, Xu (Simon) Chen wrote:
> Now I think about it... Maybe I run into this problem because I turned on
> Sflow collection on my OVS nodes...

Yes, I was also running sflow collection. I think the bug will only bite
when doing sflow collection because it's the sflow sampling that
produces the "nla_type(a) == OVS_ACTION_ATTR_SAMPLE" in
do_execute_actions() which then drops us into the problematical sample()
function that shares the skb.

So, for anyone coming across BUG() looking for answers... the quick
solution is to turn off sflow collection, the permanent solution is to
install v2.3.2 once it's out, and in the meantime you can cherry-pick
commit d7ff93d75 on top of v2.3.1 to get your sflow collection working
again.

Cheers!

Chris

> On Wednesday, March 4, 2015, Chris Dunlop <chris at onthe.net.au> wrote:
> 
> > On Sat, Feb 28, 2015 at 12:49:31PM +1100, Chris Dunlop wrote:
> > > On Fri, Feb 27, 2015 at 08:30:42PM -0500, Xu (Simon) Chen wrote:
> > > > On Fri, Feb 27, 2015 at 6:14 PM, Pravin Shelar <pshelar at nicira.com
> > <javascript:;>> wrote:
> > > > > So it looks like vhost is generating shared skb. Can you try same
> > test
> > > > > on latest upstream kernel?
> > > >
> > > > I don't know whether it's vhost, as for me the VM receiving high volume
> > > > consistently crashes.
> > > >
> > > > I previously had trouble building OVS datapath module on newer kernels.
> > > > What is the latest kernel that I should try?
> > >
> > > According to:
> > >
> > > https://github.com/openvswitch/ovs/blob/master/FAQ.md
> > >
> > > ...v3.14 *is* the latest kernel supported by the out-of-tree OVS
> > > kernel module, given that ovs 2.4.x isn't tagged yet and "git
> > > log v2.3..master" doesn't reveal anything obviously aimed at
> > > providing >v3.14 compatibility.
> > >
> > > Can you try with the in-tree OVS module?
> >
> > Simon, I think your crash will be fixed if you cherry-pick this
> > openvswitch commit (from master) on top of v2.3.1:
> >
> > ----------------------------------------------------------------------
> > commit d7ff93d7532717ea9d610a7181f24c773170be80
> > Author: Andy Zhou <azhou at nicira.com <javascript:;>>
> > Date:   Fri Aug 29 13:20:23 2014 -0700
> >
> >     datapath: simplify sample action implementation
> >
> >     The current sample() function implementation is more complicated
> >     than necessary in handling single user space action optimization
> >     and skb reference counting. There is no functional changes.
> >
> >     Signed-off-by: Andy Zhou <azhou at nicira.com <javascript:;>>
> >     Acked-by: Pravin B Shelar <pshelar at nicira.com <javascript:;>>
> > ----------------------------------------------------------------------
> >
> > The commit isn't actually designed to address the problem directly,
> > however in simplifying sample() it removes a call to skb_get(). The
> > skb_get() makes the skb shared, which later causes us to hit the BUG().
> > E.g. your v2.3.1 stack trace shows this call path:
> >
> >   netdev_frame_hook
> >   + netdev_port_receive
> >     | skb is guaranteed not-shared, via:
> >     |   skb = skb_share_check(skb, GFP_ATOMIC);
> >     + ovs_vport_receive
> >       + ovs_dp_process_received_packet
> >         + ovs_dp_process_packet_with_key
> >           + ovs_execute_actions
> >             + do_execute_actions
> >               | nla_type(a) == OVS_ACTION_ATTR_SAMPLE
> >               + sample
> >                 | skb is made shared here, via:
> >                 |   sample_skb = skb;
> >                 |   skb_get(skb);
> >                 + do_execute_actions
> >                   | nla_type(a) == OVS_ACTION_ATTR_USERSPACE
> >                   + output_userspace
> >                     + ovs_dp_upcall
> >                       + queue_userspace_packet
> >                         + skb_checksum_help
> >                           + pskb_expand_head
> >                             | if (skb_shared(skb))
> >                             |         BUG();        BOOM!!!
> >
> >
> > I think commit d7ff93d should be added to v2.3.2 (if indeed that's ever
> > produced).
> >
> > Cheers,
> >
> > Chris
> >



More information about the discuss mailing list