[ovs-discuss] Possible bug with OVS LACP + VPC

Fri Jan 13 00:36:52 UTC 2017

I've been doing some bug chasing around some unintended impacts we've
been noticing on our bonded hypervisors. The servers have a bond with
two slave interfaces each going to a different upstream switch which
have been configured with a Virtual PortChannel (VPC). To OVS, the VPC
configuration makes the switches appear as if they are a single device
with a single PortChannel. The configuration works great, but we have
noticed some unexpected data plane outages when interfaces come back
up, not when they go down.

For instance, if my server has eth0 and eth1 in a bond and I down the
link on eth1, everything is fine. When I re-enable eth1 and it starts
to negotiate LACP again, it causes eth0's LACP status to go
unsynchronized and stop passing traffic. I've have packet captures for
this scenario here:
https://gist.github.com/beardymcbeards/7bd9feca87c0574e996a397d90d5ff98.
If you look at lines 49-57 of the 2nd file, you can see that when the
2nd interface is brought back online, a rogue LACPDU is sent out the
working slave interface with a LACP state that doesn't match the
current slave. The state mismatch then causes the switch to stop
forwarding and restart the LACP negotiation.

Does anyone have an idea on why this might be happening?

Thanks in advance,
Chad