[ovs-discuss] kernel panic under heavy receive load

Pravin Shelar pshelar at nicira.com
Wed Mar 4 20:46:40 UTC 2015


On Wed, Mar 4, 2015 at 6:16 AM, Chris Dunlop <chris at onthe.net.au> wrote:
> On Sat, Feb 28, 2015 at 12:49:31PM +1100, Chris Dunlop wrote:
>> On Fri, Feb 27, 2015 at 08:30:42PM -0500, Xu (Simon) Chen wrote:
>> > On Fri, Feb 27, 2015 at 6:14 PM, Pravin Shelar <pshelar at nicira.com> wrote:
>> > > So it looks like vhost is generating shared skb. Can you try same test
>> > > on latest upstream kernel?
>> >
>> > I don't know whether it's vhost, as for me the VM receiving high volume
>> > consistently crashes.
>> >
>> > I previously had trouble building OVS datapath module on newer kernels.
>> > What is the latest kernel that I should try?
>>
>> According to:
>>
>> https://github.com/openvswitch/ovs/blob/master/FAQ.md
>>
>> ...v3.14 *is* the latest kernel supported by the out-of-tree OVS
>> kernel module, given that ovs 2.4.x isn't tagged yet and "git
>> log v2.3..master" doesn't reveal anything obviously aimed at
>> providing >v3.14 compatibility.
>>
>> Can you try with the in-tree OVS module?
>
> Simon, I think your crash will be fixed if you cherry-pick this
> openvswitch commit (from master) on top of v2.3.1:
>
> ----------------------------------------------------------------------
> commit d7ff93d7532717ea9d610a7181f24c773170be80
> Author: Andy Zhou <azhou at nicira.com>
> Date:   Fri Aug 29 13:20:23 2014 -0700
>
>     datapath: simplify sample action implementation
>
>     The current sample() function implementation is more complicated
>     than necessary in handling single user space action optimization
>     and skb reference counting. There is no functional changes.
>
>     Signed-off-by: Andy Zhou <azhou at nicira.com>
>     Acked-by: Pravin B Shelar <pshelar at nicira.com>
> ----------------------------------------------------------------------
>
> The commit isn't actually designed to address the problem directly,
> however in simplifying sample() it removes a call to skb_get(). The
> skb_get() makes the skb shared, which later causes us to hit the BUG().
> E.g. your v2.3.1 stack trace shows this call path:
>
>   netdev_frame_hook
>   + netdev_port_receive
>     | skb is guaranteed not-shared, via:
>     |   skb = skb_share_check(skb, GFP_ATOMIC);
>     + ovs_vport_receive
>       + ovs_dp_process_received_packet
>         + ovs_dp_process_packet_with_key
>           + ovs_execute_actions
>             + do_execute_actions
>               | nla_type(a) == OVS_ACTION_ATTR_SAMPLE
>               + sample
>                 | skb is made shared here, via:
>                 |   sample_skb = skb;
>                 |   skb_get(skb);
>                 + do_execute_actions
>                   | nla_type(a) == OVS_ACTION_ATTR_USERSPACE
>                   + output_userspace
>                     + ovs_dp_upcall
>                       + queue_userspace_packet
>                         + skb_checksum_help
>                           + pskb_expand_head
>                             | if (skb_shared(skb))
>                             |         BUG();        BOOM!!!
>
>
> I think commit d7ff93d should be added to v2.3.2 (if indeed that's ever produced).
>

Thanks for investigating the issue. I have backported the patch for branch-2.3.



More information about the discuss mailing list