[ovs-discuss] Possible bug with OVS LACP + VPC

Ben Pfaff blp at ovn.org
Tue Jan 17 17:10:37 UTC 2017


On Thu, Jan 12, 2017 at 06:36:52PM -0600, Chad Norgan wrote:
> I've been doing some bug chasing around some unintended impacts we've
> been noticing on our bonded hypervisors. The servers have a bond with
> two slave interfaces each going to a different upstream switch which
> have been configured with a Virtual PortChannel (VPC). To OVS, the VPC
> configuration makes the switches appear as if they are a single device
> with a single PortChannel. The configuration works great, but we have
> noticed some unexpected data plane outages when interfaces come back
> up, not when they go down.
> 
> For instance, if my server has eth0 and eth1 in a bond and I down the
> link on eth1, everything is fine. When I re-enable eth1 and it starts
> to negotiate LACP again, it causes eth0's LACP status to go
> unsynchronized and stop passing traffic. I've have packet captures for
> this scenario here:
> https://gist.github.com/beardymcbeards/7bd9feca87c0574e996a397d90d5ff98.
> If you look at lines 49-57 of the 2nd file, you can see that when the
> 2nd interface is brought back online, a rogue LACPDU is sent out the
> working slave interface with a LACP state that doesn't match the
> current slave. The state mismatch then causes the switch to stop
> forwarding and restart the LACP negotiation.
> 
> Does anyone have an idea on why this might be happening?

Thank you for this bug report.  This may explain certain other bug
reports that we've received over the last few years that did not have
enough information to follow up on.  I guess that, probably, one of us
will have to dive into it.

However, before we do that--since I don't think that anyone on the OVS
team currently has a good working understanding of LACP--I'd like to ask
Ethan about it.  Ethan, does this ring any bells for you?  Do you have
any idea where one might start looking on this?

Thanks,

Ben.


More information about the discuss mailing list