[ovs-dev] [PATCH] vxlan: Fix for the packet loop issue in vxlan

Pravin Shelar pshelar at ovn.org
Thu Jun 14 02:39:58 UTC 2018


On Wed, Jun 13, 2018 at 5:16 AM, Neelakantam Gaddam
<neelugaddam at gmail.com> wrote:
> It should be Qdisc's busylock in dev_queue_xmit function. I have seen the
> issue in kernel version is 3.10.87 vanilla.
>

I see, This looks like general problem which exist in upstream kernel.
You would be able to reproduce it even with linux bridge and vxlan
device.
Proposed patch is specific solution. You can add another layer of
bridge and the patch would not handle same issue in that
configuration. Therefore I am bit hesitant to apply this patch.

>
>
> On Wed, Jun 13, 2018 at 12:21 PM, Pravin Shelar <pshelar at ovn.org> wrote:
>>
>> On Tue, Jun 12, 2018 at 10:58 PM, Neelakantam Gaddam
>> <neelugaddam at gmail.com> wrote:
>> > Hi Pravin,
>> >
>> > I have seen the below crash.
>> >
>> > [<ffffffff80864cd4>] show_stack+0x6c/0xf8
>> >
>> > [<ffffffff80ad1628>] do_raw_spin_lock+0x168/0x170
>> >
>> > [<ffffffff80bf7b1c>] dev_queue_xmit+0x43c/0x470
>> >
>> > [<ffffffff80c32c08>] ip_finish_output+0x250/0x490
>> >
>> > [<ffffffffc0115664>] rpl_iptunnel_xmit+0x134/0x218 [openvswitch]
>> >
>> > [<ffffffffc0120f28>] rpl_vxlan_xmit+0x430/0x538 [openvswitch]
>> >
>> > [<ffffffffc00f9de0>] do_execute_actions+0x18f8/0x19e8 [openvswitch]
>> >
>> > [<ffffffffc00fa2b0>] ovs_execute_actions+0x90/0x208 [openvswitch]
>> >
>> > [<ffffffffc0101860>] ovs_dp_process_packet+0xb0/0x1a8 [openvswitch]
>> >
>> > [<ffffffffc010c5d8>] ovs_vport_receive+0x78/0x130 [openvswitch]
>> >
>> > [<ffffffffc010ce6c>] internal_dev_xmit+0x34/0x98 [openvswitch]
>> >
>> > [<ffffffff80bf74d0>] dev_hard_start_xmit+0x2e8/0x4f8
>> >
>> > [<ffffffff80c10e48>] sch_direct_xmit+0xf0/0x238
>> >
>> > [<ffffffff80bf78b8>] dev_queue_xmit+0x1d8/0x470
>> >
>> > [<ffffffff80c5ffe4>] arp_process+0x614/0x628
>> >
>> > [<ffffffff80bf0cb0>] __netif_receive_skb_core+0x2e8/0x5d8
>> >
>> > [<ffffffff80bf4770>] process_backlog+0xc0/0x1b0
>> >
>> > [<ffffffff80bf501c>] net_rx_action+0x154/0x240
>> >
>> > [<ffffffff8088d130>] __do_softirq+0x1d0/0x218
>> >
>> > [<ffffffff8088d240>] do_softirq+0x68/0x70
>> >
>> > [<ffffffff8088d3a0>] local_bh_enable+0xa8/0xb0
>> >
>> > [<ffffffff80bf5c88>] netif_rx_ni+0x20/0x30
>> >
>> >
>> >
>> >
>> > I have spent some time in investigation and found that crash is because
>> > of
>> > spinlock recursion in dev_queue_xmit function.
>> > The packet path traced is : netif_rx->arp->dev_queue_xmit(internal
>> > port)->vxlan_xmit->dev_queue_xmit(internal port).
>> >
>>
>> Which spin-lock is it? I am surprised to see a lock taken in fast
>> path. Can you also share kernel version?
>>
>> > The macro (XMIT_RECURSION_LIMIT) is defined as 10. This limit wont
>> > prevent
>> > the crash since the recursion is 2 only for my configuration.
>>
>> right, The recursion limit is to avoid stack overflow.
>>
>> >
>> >
>> >
>> > On Wed, Jun 13, 2018 at 4:11 AM, Pravin Shelar <pshelar at ovn.org> wrote:
>> >>
>> >> On Tue, Jun 12, 2018 at 10:11 AM, Neelakantam Gaddam
>> >> <neelugaddam at gmail.com> wrote:
>> >> >
>> >> > Hi Pravin,
>> >> >
>> >> > The below configuration is causing the spinlock recursion issue.
>> >> >
>> >> > I am able to see the issue with below configuration.
>> >> >
>> >> >
>> >> >
>> >> > ovs-vsctl add-br br0
>> >> >
>> >> > ovs-vsctl add-bond br0 bond0 p1p1 p1p2
>> >> >
>> >> > ovs-vsctl set port bond0 lacp=active bond_mode=balance-tcp
>> >> >
>> >> > ifconfig br0 100.0.0.1 up
>> >> >
>> >> > ovs-vsctl add-port br0 veth0
>> >> >
>> >> > ovs-vsctl add-port br0 vx0 -- set interface vx0 type=vxlan
>> >> > options:local_ip=100.0.0.1 options:remote_ip=100.0.0.2
>> >> > option:key=flow
>> >> >
>> >> >
>> >> >
>> >> > ovs-ofctl add-flow br0 "table=0, priority=1, cookie=100, tun_id=100,
>> >> > in_port=4, action=output:3"
>> >> >
>> >> > ovs-ofctl add-flow br0 "table=0, priority=1, cookie=100, in_port=3,
>> >> > actions=set_field:100->tun_id output:4"
>> >> >
>> >> >
>> >> >
>> >> > The same bridge br0 is being used as the local interface for vxlan
>> >> > tunnel. Even though this configuration is invalid, we should not see
>> >> > the
>> >> > kernel crash.
>> >> >
>> >>
>> >> I agree this should not cause crash.
>> >> Can you post the crash or investigate why it is crashing I think such
>> >> configuration would hit the networking stack recursion limit
>> >> (XMIT_RECURSION_LIMIT) and then the packet would be dropped. I am not
>> >> sure which spinlock recursion issue you are referring to.
>> >>
>> >>
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Jun 12, 2018 at 11:55 AM, Pravin Shelar <pshelar at ovn.org>
>> >> > wrote:
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Tue, May 22, 2018 at 10:16 PM, Neelakantam Gaddam
>> >> >> <neelugaddam at gmail.com> wrote:
>> >> >>>
>> >> >>> This patch fixes the kernel soft lockup issue with vxlan
>> >> >>> configuration
>> >> >>> where the tunneled packet is sent on the same bridge where vxlan
>> >> >>> port
>> >> >>> is
>> >> >>> attched to. It detects the loop in vxlan xmit functionb and drops
>> >> >>> if
>> >> >>> loop is
>> >> >>> detected.
>> >> >>>
>> >> >>> Signed-off-by: Neelakantam Gaddam <neelugaddam at gmail.com>
>> >> >>> ---
>> >> >>>  datapath/linux/compat/vxlan.c | 6 ++++--
>> >> >>>  1 file changed, 4 insertions(+), 2 deletions(-)
>> >> >>>
>> >> >>> diff --git a/datapath/linux/compat/vxlan.c
>> >> >>> b/datapath/linux/compat/vxlan.c
>> >> >>> index 287dad2..00562fa 100644
>> >> >>> --- a/datapath/linux/compat/vxlan.c
>> >> >>> +++ b/datapath/linux/compat/vxlan.c
>> >> >>> @@ -1115,7 +1115,8 @@ static void vxlan_xmit_one(struct sk_buff
>> >> >>> *skb,
>> >> >>> struct net_device *dev,
>> >> >>>                         goto tx_error;
>> >> >>>                 }
>> >> >>>
>> >> >>> -               if (rt->dst.dev == dev) {
>> >> >>> +               if ((rt->dst.dev == dev) ||
>> >> >>> +                       (OVS_CB(skb)->input_vport->dev ==
>> >> >>> rt->dst.dev)) {
>> >> >>
>> >> >>
>> >> >> I am not sure which case the  OVS_CB(skb)->input_vport->dev is not
>> >> >> same
>> >> >> as the dev when there is recursion. Can you explain how to reproduce
>> >> >> it?
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Thanks & Regards
>> >> > Neelakantam Gaddam
>> >
>> >
>> >
>> >
>> > --
>> > Thanks & Regards
>> > Neelakantam Gaddam
>
>
>
>
> --
> Thanks & Regards
> Neelakantam Gaddam


More information about the dev mailing list