[ovs-dev] [PATCH] vxlan: Fix for the packet loop issue in vxlan

Neelakantam Gaddam neelugaddam at gmail.com
Wed Jun 13 12:16:30 UTC 2018


It should be Qdisc's busylock in dev_queue_xmit function. I have seen the
issue in kernel version is 3.10.87 vanilla.



On Wed, Jun 13, 2018 at 12:21 PM, Pravin Shelar <pshelar at ovn.org> wrote:

> On Tue, Jun 12, 2018 at 10:58 PM, Neelakantam Gaddam
> <neelugaddam at gmail.com> wrote:
> > Hi Pravin,
> >
> > I have seen the below crash.
> >
> > [<ffffffff80864cd4>] show_stack+0x6c/0xf8
> >
> > [<ffffffff80ad1628>] do_raw_spin_lock+0x168/0x170
> >
> > [<ffffffff80bf7b1c>] dev_queue_xmit+0x43c/0x470
> >
> > [<ffffffff80c32c08>] ip_finish_output+0x250/0x490
> >
> > [<ffffffffc0115664>] rpl_iptunnel_xmit+0x134/0x218 [openvswitch]
> >
> > [<ffffffffc0120f28>] rpl_vxlan_xmit+0x430/0x538 [openvswitch]
> >
> > [<ffffffffc00f9de0>] do_execute_actions+0x18f8/0x19e8 [openvswitch]
> >
> > [<ffffffffc00fa2b0>] ovs_execute_actions+0x90/0x208 [openvswitch]
> >
> > [<ffffffffc0101860>] ovs_dp_process_packet+0xb0/0x1a8 [openvswitch]
> >
> > [<ffffffffc010c5d8>] ovs_vport_receive+0x78/0x130 [openvswitch]
> >
> > [<ffffffffc010ce6c>] internal_dev_xmit+0x34/0x98 [openvswitch]
> >
> > [<ffffffff80bf74d0>] dev_hard_start_xmit+0x2e8/0x4f8
> >
> > [<ffffffff80c10e48>] sch_direct_xmit+0xf0/0x238
> >
> > [<ffffffff80bf78b8>] dev_queue_xmit+0x1d8/0x470
> >
> > [<ffffffff80c5ffe4>] arp_process+0x614/0x628
> >
> > [<ffffffff80bf0cb0>] __netif_receive_skb_core+0x2e8/0x5d8
> >
> > [<ffffffff80bf4770>] process_backlog+0xc0/0x1b0
> >
> > [<ffffffff80bf501c>] net_rx_action+0x154/0x240
> >
> > [<ffffffff8088d130>] __do_softirq+0x1d0/0x218
> >
> > [<ffffffff8088d240>] do_softirq+0x68/0x70
> >
> > [<ffffffff8088d3a0>] local_bh_enable+0xa8/0xb0
> >
> > [<ffffffff80bf5c88>] netif_rx_ni+0x20/0x30
> >
> >
> >
> >
> > I have spent some time in investigation and found that crash is because
> of
> > spinlock recursion in dev_queue_xmit function.
> > The packet path traced is : netif_rx->arp->dev_queue_xmit(internal
> > port)->vxlan_xmit->dev_queue_xmit(internal port).
> >
>
> Which spin-lock is it? I am surprised to see a lock taken in fast
> path. Can you also share kernel version?
>
> > The macro (XMIT_RECURSION_LIMIT) is defined as 10. This limit wont
> prevent
> > the crash since the recursion is 2 only for my configuration.
>
> right, The recursion limit is to avoid stack overflow.
>
> >
> >
> >
> > On Wed, Jun 13, 2018 at 4:11 AM, Pravin Shelar <pshelar at ovn.org> wrote:
> >>
> >> On Tue, Jun 12, 2018 at 10:11 AM, Neelakantam Gaddam
> >> <neelugaddam at gmail.com> wrote:
> >> >
> >> > Hi Pravin,
> >> >
> >> > The below configuration is causing the spinlock recursion issue.
> >> >
> >> > I am able to see the issue with below configuration.
> >> >
> >> >
> >> >
> >> > ovs-vsctl add-br br0
> >> >
> >> > ovs-vsctl add-bond br0 bond0 p1p1 p1p2
> >> >
> >> > ovs-vsctl set port bond0 lacp=active bond_mode=balance-tcp
> >> >
> >> > ifconfig br0 100.0.0.1 up
> >> >
> >> > ovs-vsctl add-port br0 veth0
> >> >
> >> > ovs-vsctl add-port br0 vx0 -- set interface vx0 type=vxlan
> >> > options:local_ip=100.0.0.1 options:remote_ip=100.0.0.2 option:key=flow
> >> >
> >> >
> >> >
> >> > ovs-ofctl add-flow br0 "table=0, priority=1, cookie=100, tun_id=100,
> >> > in_port=4, action=output:3"
> >> >
> >> > ovs-ofctl add-flow br0 "table=0, priority=1, cookie=100, in_port=3,
> >> > actions=set_field:100->tun_id output:4"
> >> >
> >> >
> >> >
> >> > The same bridge br0 is being used as the local interface for vxlan
> >> > tunnel. Even though this configuration is invalid, we should not see
> the
> >> > kernel crash.
> >> >
> >>
> >> I agree this should not cause crash.
> >> Can you post the crash or investigate why it is crashing I think such
> >> configuration would hit the networking stack recursion limit
> >> (XMIT_RECURSION_LIMIT) and then the packet would be dropped. I am not
> >> sure which spinlock recursion issue you are referring to.
> >>
> >>
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, Jun 12, 2018 at 11:55 AM, Pravin Shelar <pshelar at ovn.org>
> wrote:
> >> >>
> >> >>
> >> >>
> >> >> On Tue, May 22, 2018 at 10:16 PM, Neelakantam Gaddam
> >> >> <neelugaddam at gmail.com> wrote:
> >> >>>
> >> >>> This patch fixes the kernel soft lockup issue with vxlan
> configuration
> >> >>> where the tunneled packet is sent on the same bridge where vxlan
> port
> >> >>> is
> >> >>> attched to. It detects the loop in vxlan xmit functionb and drops if
> >> >>> loop is
> >> >>> detected.
> >> >>>
> >> >>> Signed-off-by: Neelakantam Gaddam <neelugaddam at gmail.com>
> >> >>> ---
> >> >>>  datapath/linux/compat/vxlan.c | 6 ++++--
> >> >>>  1 file changed, 4 insertions(+), 2 deletions(-)
> >> >>>
> >> >>> diff --git a/datapath/linux/compat/vxlan.c
> >> >>> b/datapath/linux/compat/vxlan.c
> >> >>> index 287dad2..00562fa 100644
> >> >>> --- a/datapath/linux/compat/vxlan.c
> >> >>> +++ b/datapath/linux/compat/vxlan.c
> >> >>> @@ -1115,7 +1115,8 @@ static void vxlan_xmit_one(struct sk_buff
> *skb,
> >> >>> struct net_device *dev,
> >> >>>                         goto tx_error;
> >> >>>                 }
> >> >>>
> >> >>> -               if (rt->dst.dev == dev) {
> >> >>> +               if ((rt->dst.dev == dev) ||
> >> >>> +                       (OVS_CB(skb)->input_vport->dev ==
> >> >>> rt->dst.dev)) {
> >> >>
> >> >>
> >> >> I am not sure which case the  OVS_CB(skb)->input_vport->dev is not
> same
> >> >> as the dev when there is recursion. Can you explain how to reproduce
> it?
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks & Regards
> >> > Neelakantam Gaddam
> >
> >
> >
> >
> > --
> > Thanks & Regards
> > Neelakantam Gaddam
>



-- 
Thanks & Regards
Neelakantam Gaddam


More information about the dev mailing list