[ovs-discuss] kernel panic under heavy receive load
Pravin Shelar
pshelar at nicira.com
Wed Mar 4 20:46:40 UTC 2015
On Wed, Mar 4, 2015 at 6:16 AM, Chris Dunlop <chris at onthe.net.au> wrote:
> On Sat, Feb 28, 2015 at 12:49:31PM +1100, Chris Dunlop wrote:
>> On Fri, Feb 27, 2015 at 08:30:42PM -0500, Xu (Simon) Chen wrote:
>> > On Fri, Feb 27, 2015 at 6:14 PM, Pravin Shelar <pshelar at nicira.com> wrote:
>> > > So it looks like vhost is generating shared skb. Can you try same test
>> > > on latest upstream kernel?
>> >
>> > I don't know whether it's vhost, as for me the VM receiving high volume
>> > consistently crashes.
>> >
>> > I previously had trouble building OVS datapath module on newer kernels.
>> > What is the latest kernel that I should try?
>>
>> According to:
>>
>> https://github.com/openvswitch/ovs/blob/master/FAQ.md
>>
>> ...v3.14 *is* the latest kernel supported by the out-of-tree OVS
>> kernel module, given that ovs 2.4.x isn't tagged yet and "git
>> log v2.3..master" doesn't reveal anything obviously aimed at
>> providing >v3.14 compatibility.
>>
>> Can you try with the in-tree OVS module?
>
> Simon, I think your crash will be fixed if you cherry-pick this
> openvswitch commit (from master) on top of v2.3.1:
>
> ----------------------------------------------------------------------
> commit d7ff93d7532717ea9d610a7181f24c773170be80
> Author: Andy Zhou <azhou at nicira.com>
> Date: Fri Aug 29 13:20:23 2014 -0700
>
> datapath: simplify sample action implementation
>
> The current sample() function implementation is more complicated
> than necessary in handling single user space action optimization
> and skb reference counting. There is no functional changes.
>
> Signed-off-by: Andy Zhou <azhou at nicira.com>
> Acked-by: Pravin B Shelar <pshelar at nicira.com>
> ----------------------------------------------------------------------
>
> The commit isn't actually designed to address the problem directly,
> however in simplifying sample() it removes a call to skb_get(). The
> skb_get() makes the skb shared, which later causes us to hit the BUG().
> E.g. your v2.3.1 stack trace shows this call path:
>
> netdev_frame_hook
> + netdev_port_receive
> | skb is guaranteed not-shared, via:
> | skb = skb_share_check(skb, GFP_ATOMIC);
> + ovs_vport_receive
> + ovs_dp_process_received_packet
> + ovs_dp_process_packet_with_key
> + ovs_execute_actions
> + do_execute_actions
> | nla_type(a) == OVS_ACTION_ATTR_SAMPLE
> + sample
> | skb is made shared here, via:
> | sample_skb = skb;
> | skb_get(skb);
> + do_execute_actions
> | nla_type(a) == OVS_ACTION_ATTR_USERSPACE
> + output_userspace
> + ovs_dp_upcall
> + queue_userspace_packet
> + skb_checksum_help
> + pskb_expand_head
> | if (skb_shared(skb))
> | BUG(); BOOM!!!
>
>
> I think commit d7ff93d should be added to v2.3.2 (if indeed that's ever produced).
>
Thanks for investigating the issue. I have backported the patch for branch-2.3.
More information about the discuss
mailing list