[ovs-discuss] Possible bug with OVS LACP + VPC
Ethan J. Jackson
ejj at eecs.berkeley.edu
Tue Jan 17 19:50:25 UTC 2017
Short answer is no doesn't ring a bell in particular. Definitely doesn't
sound like the expected behavior.
FWIW, based on my very rusty memory of the code, it sounds like one of two
things is happening:
1) The LACP protocol implementation itself has a bug in which it's sending
the incorrect state on the happy slave.
or 2) For some reason packets intended for just eth1 are getting broadcast
to eth0 as well because of how the OpenFlow is setup.
Those are just guesses though, sorry I can't be of more help.
On Tue, Jan 17, 2017 at 9:10 AM, Ben Pfaff <blp at ovn.org> wrote:
> On Thu, Jan 12, 2017 at 06:36:52PM -0600, Chad Norgan wrote:
> > I've been doing some bug chasing around some unintended impacts we've
> > been noticing on our bonded hypervisors. The servers have a bond with
> > two slave interfaces each going to a different upstream switch which
> > have been configured with a Virtual PortChannel (VPC). To OVS, the VPC
> > configuration makes the switches appear as if they are a single device
> > with a single PortChannel. The configuration works great, but we have
> > noticed some unexpected data plane outages when interfaces come back
> > up, not when they go down.
> > For instance, if my server has eth0 and eth1 in a bond and I down the
> > link on eth1, everything is fine. When I re-enable eth1 and it starts
> > to negotiate LACP again, it causes eth0's LACP status to go
> > unsynchronized and stop passing traffic. I've have packet captures for
> > this scenario here:
> > https://gist.github.com/beardymcbeards/7bd9feca87c0574e996a397d90d5ff98.
> > If you look at lines 49-57 of the 2nd file, you can see that when the
> > 2nd interface is brought back online, a rogue LACPDU is sent out the
> > working slave interface with a LACP state that doesn't match the
> > current slave. The state mismatch then causes the switch to stop
> > forwarding and restart the LACP negotiation.
> > Does anyone have an idea on why this might be happening?
> Thank you for this bug report. This may explain certain other bug
> reports that we've received over the last few years that did not have
> enough information to follow up on. I guess that, probably, one of us
> will have to dive into it.
> However, before we do that--since I don't think that anyone on the OVS
> team currently has a good working understanding of LACP--I'd like to ask
> Ethan about it. Ethan, does this ring any bells for you? Do you have
> any idea where one might start looking on this?
Ethan J. Jackson
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the discuss