[ovs-dev] [PATCH RFC v2] lacp: Prefer slaves with running partner when selecting lead

Andy Zhou azhou at nicira.com
Mon Aug 4 19:08:48 UTC 2014


Sorry it took a while to get back to you.  I am just coming up to
speed on OVS LACP implementation, so my understanding may not be
correct.  Please feel free to point them out If I am wrong.

According to wikipeida MC-LAG entry, there is no standard for it, they
are mostly designed and implemented by vendors.

After reading through the commit message, and comparing with the
802.1AX spec, I feel this seems like there is a bug in the MC-LAG
implementation/configuration issue. When the partner on port A comes
back again, should it wait for MC-LAG sync before using the default
profile to exchange states with OVS?


On Mon, Jul 14, 2014 at 3:11 PM, Ben Pfaff <blp at nicira.com> wrote:
> On Tue, Jul 08, 2014 at 05:35:57PM +0100, Zoltan Kiss wrote:
>> This patch modifies the LACP selection logic by prefering a slaves with up and
>> running partners when looking for a lead.
>> That fixes the following scenario:
>> - bond has 2 ports, A and B, their other ends are in separate chassis with
>>   MC-LAG sync
>> - the partner of port A is restarted
>> - port B is still working
>> - the partner on port A comes back, but temporarily it is using a default
>>   config, as MC-LAG haven't synced yet
>> - apparently that default config has a sys_priority which is smaller than the
>>   other, still running port, plus completely different sys_id
>> - therefore OVS choose port A despite it won't ever comes up into
>>   collecting-distributing state
>> - and port B is disabled, causing the whole bond goes down
>> Checking through the 802.1ax standard, when port A comes up again, the two
>> links fall apart due to the different LAG IDs. They should be attached to
>> different Aggregators, and the Aggregators should live separately. In OVS there
>> is no such concept as Aggregator, but I think it should be said that it has only
>> one Aggregator, and it has an unique policy to choose which ports can join.
>> Although changing the chassis' default config can also fix this, detecting
>> such problems quite hard, therefore I think it is still valid to improve things
>> in OVS side.
>> Btw. the Linux kernel bonding drivers' LACP implementation allows more
>> aggregators, and therefore it could handle this situation properly.
>> Signed-off-by: Zoltan Kiss <zoltan.kiss at citrix.com>
> I verified that the unit tests still pass with this applied.
> Andy Zhou said he'd review the patch.

More information about the dev mailing list