[ovs-discuss] kernel panic under heavy receive load

Chris Dunlop chris at onthe.net.au
Wed Mar 4 14:16:00 UTC 2015


On Sat, Feb 28, 2015 at 12:49:31PM +1100, Chris Dunlop wrote:
> On Fri, Feb 27, 2015 at 08:30:42PM -0500, Xu (Simon) Chen wrote:
> > On Fri, Feb 27, 2015 at 6:14 PM, Pravin Shelar <pshelar at nicira.com> wrote:
> > > So it looks like vhost is generating shared skb. Can you try same test
> > > on latest upstream kernel?
> > 
> > I don't know whether it's vhost, as for me the VM receiving high volume
> > consistently crashes.
> > 
> > I previously had trouble building OVS datapath module on newer kernels.
> > What is the latest kernel that I should try?
> 
> According to:
> 
> https://github.com/openvswitch/ovs/blob/master/FAQ.md
> 
> ...v3.14 *is* the latest kernel supported by the out-of-tree OVS
> kernel module, given that ovs 2.4.x isn't tagged yet and "git
> log v2.3..master" doesn't reveal anything obviously aimed at
> providing >v3.14 compatibility.
> 
> Can you try with the in-tree OVS module?

Simon, I think your crash will be fixed if you cherry-pick this
openvswitch commit (from master) on top of v2.3.1:

----------------------------------------------------------------------
commit d7ff93d7532717ea9d610a7181f24c773170be80
Author: Andy Zhou <azhou at nicira.com>
Date:   Fri Aug 29 13:20:23 2014 -0700

    datapath: simplify sample action implementation

    The current sample() function implementation is more complicated
    than necessary in handling single user space action optimization
    and skb reference counting. There is no functional changes.

    Signed-off-by: Andy Zhou <azhou at nicira.com>
    Acked-by: Pravin B Shelar <pshelar at nicira.com>
----------------------------------------------------------------------

The commit isn't actually designed to address the problem directly,
however in simplifying sample() it removes a call to skb_get(). The
skb_get() makes the skb shared, which later causes us to hit the BUG().
E.g. your v2.3.1 stack trace shows this call path:

  netdev_frame_hook
  + netdev_port_receive
    | skb is guaranteed not-shared, via:
    |   skb = skb_share_check(skb, GFP_ATOMIC);
    + ovs_vport_receive
      + ovs_dp_process_received_packet
        + ovs_dp_process_packet_with_key
          + ovs_execute_actions
            + do_execute_actions
              | nla_type(a) == OVS_ACTION_ATTR_SAMPLE
              + sample
                | skb is made shared here, via:
                |   sample_skb = skb;
                |   skb_get(skb);
                + do_execute_actions
                  | nla_type(a) == OVS_ACTION_ATTR_USERSPACE
                  + output_userspace
                    + ovs_dp_upcall
                      + queue_userspace_packet
                        + skb_checksum_help
                          + pskb_expand_head
                            | if (skb_shared(skb))
                            |         BUG();        BOOM!!!


I think commit d7ff93d should be added to v2.3.2 (if indeed that's ever produced).

Cheers,

Chris



More information about the discuss mailing list