<div dir="ltr">Short answer is no doesn't ring a bell in particular. Definitely doesn't sound like the expected behavior.<div><br></div><div>FWIW, based on my very rusty memory of the code, it sounds like one of two things is happening:</div><div><br></div><div>1) The LACP protocol implementation itself has a bug in which it's sending the incorrect state on the happy slave.</div><div><br></div><div>or 2) For some reason packets intended for just eth1 are getting broadcast to eth0 as well because of how the OpenFlow is setup.</div><div><br></div><div>Those are just guesses though, sorry I can't be of more help.</div><div><br></div><div>Ethan</div><div><br></div><div><br></div><div><br></div><div class="gmail_extra"><div class="gmail_quote">On Tue, Jan 17, 2017 at 9:10 AM, Ben Pfaff <span dir="ltr"><<a href="mailto:blp@ovn.org" target="_blank">blp@ovn.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Thu, Jan 12, 2017 at 06:36:52PM -0600, Chad Norgan wrote:<br>
> I've been doing some bug chasing around some unintended impacts we've<br>
> been noticing on our bonded hypervisors. The servers have a bond with<br>
> two slave interfaces each going to a different upstream switch which<br>
> have been configured with a Virtual PortChannel (VPC). To OVS, the VPC<br>
> configuration makes the switches appear as if they are a single device<br>
> with a single PortChannel. The configuration works great, but we have<br>
> noticed some unexpected data plane outages when interfaces come back<br>
> up, not when they go down.<br>
><br>
> For instance, if my server has eth0 and eth1 in a bond and I down the<br>
> link on eth1, everything is fine. When I re-enable eth1 and it starts<br>
> to negotiate LACP again, it causes eth0's LACP status to go<br>
> unsynchronized and stop passing traffic. I've have packet captures for<br>
> this scenario here:<br>
> <a href="https://gist.github.com/beardymcbeards/7bd9feca87c0574e996a397d90d5ff98" rel="noreferrer" target="_blank">https://gist.github.com/<wbr>beardymcbeards/<wbr>7bd9feca87c0574e996a397d90d5ff<wbr>98</a>.<br>
> If you look at lines 49-57 of the 2nd file, you can see that when the<br>
> 2nd interface is brought back online, a rogue LACPDU is sent out the<br>
> working slave interface with a LACP state that doesn't match the<br>
> current slave. The state mismatch then causes the switch to stop<br>
> forwarding and restart the LACP negotiation.<br>
><br>
> Does anyone have an idea on why this might be happening?<br>
<br>
Thank you for this bug report. This may explain certain other bug<br>
reports that we've received over the last few years that did not have<br>
enough information to follow up on. I guess that, probably, one of us<br>
will have to dive into it.<br>
<br>
However, before we do that--since I don't think that anyone on the OVS<br>
team currently has a good working understanding of LACP--I'd like to ask<br>
Ethan about it. Ethan, does this ring any bells for you? Do you have<br>
any idea where one might start looking on this?<br>
<br>
Thanks,<br>
<br>
Ben.<br>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Ethan J. Jackson</div><div><a href="http://ejj.sh" style="font-size:12.8px" target="_blank">ejj.sh</a></div></div></div></div></div>
</div></div>