[ovs-discuss] Slowness on neighbour learning

Mika Väisänen mika.vaisanen at iki.fi
Fri Apr 7 12:40:39 UTC 2017


Linux host was not sending anything to the old port after changing the
active slave to eno4.

I tried to abuse the existing grat_arp_lock timer to avoid neighbour
updates for a while after receiving gratuitous ARPs.  It actually helped to
the revalidator learning of old port problem.

Unfortunately there is still the random delay of 100-500 ms after port
change, during which traffic is not going through.

While investigating this I started polling the neighbour info with
ovs-appctl fdb/show command. When I polled that with 100ms intervals, also
the arp delay was reduced to 100ms. How can the ovs-appctl program affect
the neighbour update timing is beyond my understanding.

BR, Mika

6.4.2017 22.15 "Joe Stringer" <joe at ovn.org> kirjoitti:

> On 6 April 2017 at 11:57, Mika Väisänen <mika.vaisanen at gmail.com> wrote:
> > Hello,
> >
> > Is it normal OVS behaviour that neighbour update (MAC moving from OVS
> switch port to another) can cause 100-500 ms break to traffic? Is there any
> way to configure it to be faster?
> >
> > In my case a Linux host is connected with two bonded Ethernets to server
> running OVS 2.5.2 switch. When I change the active bonding slave from the
> Linux host, it causes 100-500 ms break to traffic between the Linux host
> and other hosts on the network. In case I run the same test with HW switch,
> there is no noticeable traffic break at all.
> >
> > While investigating this, I found some strangeness in the way how MAC is
> learned by OVS. In the following example I have moved the active slave from
> interface eno3 to eno4.  It seems correct (hander51), but then
> revalidator56 updates the MAC to be found from the old port again:
> >
> > 2017-04-06T09:51:25.179Z|00339|ofproto_dpif_xlate(handler51)|DBG|bridge
> swu0: learned that 02:11:61:61:70:25 is on port eno4 in VLAN 4
> > 2017-04-06T09:51:25.179Z|00344|ofproto_dpif_xlate(handler51)|DBG|bridge
> swu0: learned that 02:11:61:61:70:25 is on port eno4 in VLAN 5
> > 2017-04-06T09:51:25.179Z|00349|ofproto_dpif_xlate(handler51)|DBG|bridge
> swu0: learned that 02:11:61:61:70:25 is on port eno4 in VLAN 64
> > 2017-04-06T09:51:25.247Z|00065|ofproto_dpif_xlate(revalidator56)|DBG|bridge
> swu0: learned that 02:11:61:61:70:25 is on port eno3 in VLAN 4
> > 2017-04-06T09:51:25.247Z|00066|ofproto_dpif_xlate(revalidator56)|DBG|bridge
> swu0: learned that 02:11:61:61:70:25 is on port eno3 in VLAN 5
> > 2017-04-06T09:51:25.249Z|00067|ofproto_dpif_xlate(revalidator56)| DBG
> |bridge swu0: learned that 02:11:61:61:70:25 is on port eno4 in VLAN 4
> >
> > Why is revalidator refreshing old neighbour information? Could it be
> causing the slowness or is it totally irrelevant?
>
> I wonder if there's a race that happens here.
>
> Let's say that at T0, revalidator runs, forwarding is all correct, and
> the datapath flows are all fine.
> At T1, a packet arrives for eno3 VLAN 4, and is forwarded correctly.
> There's now one packet attributed to the datapath flow which
> revalidator will need to translate and attribute stats for.
> At T2, the active slave is shifted from eno3 to eno4. Traffic starts
> to flow over eno4, so you see the handler threads setting up new flows
> to handle this traffic (correctly).
> At T3, the revalidator thread wakes up, and starts dumping all of the
> datapath flows. When it finds the flow that was hit in T1, it will
> translate this flow, attribute stats, and execute side effects such as
> learning the MAC. If it learnt the MAC at the exact moment that the
> packet arrived, then it would have correctly learned that the mac
> existed on eno3. However, it's not aware that the traffic has since
> shifted to eno4, so it attributes and learns on eno3. The revalidator
> thread continues to dump the datapath flows and finds the one that
> handles the traffic now on eno4, and translates that one which also
> has traffic. This makes the learning happen again on eno4.
>
> Thereafter, I'm guessing that you don't send the traffic on eno3 so
> there will be no packets to attribute, no MAC should be learnt, and
> eventually the revalidator will time out the flow.
>
> It may be possible to mitigate this is if there were to be some sort
> of 'learning ratelimit' where a MAC that shifts to a new interface
> cannot be relearnt for X seconds. It could try to be smart and track
> the previous interface, then if the MAC shifts to a new interface we
> don't perform learning for the previous interface, or it could be
> something a bit more general as just 'don't learn a particular MAC
> more than once a second'.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20170407/b4498a39/attachment-0001.html>


More information about the discuss mailing list