[ovs-dev] [PATCH RFC v2] lacp: Prefer slaves with running partner when selecting lead

Flavio Leitner fbl at redhat.com
Wed Aug 6 21:31:37 UTC 2014


On Tue, Aug 05, 2014 at 09:44:40PM -0700, Ethan Jackson wrote:
> Based on my (long ago) reading of the LACP spec, only supporting a
> single aggregator is a valid configuration.  Furthermore, it's what

It is, no questions about that.

> makes the most sense given the structure of the OVS bonding
> configuration.  I'd really rather not make a non standard change to
> the protocol to support a buggy upstream mlag implementation cause I
> don't know how it could affect other less buggy switches.  My
> preferences is to shelve this for now FWIW.

Just to make it clear, I am talking about re-selection of aggregator
and that is not specified in any standard, so it's not a non standard
change.  A real life example is the bonding driver that is available
for years doing that by default without issues.

I am just commenting on this RFC anyway, I have no intention to push
any patch for that now.

fbl

> 
> Ethan
> 
> On Tue, Aug 5, 2014 at 2:16 PM, Flavio Leitner <fbl at redhat.com> wrote:
> > On Mon, Aug 04, 2014 at 12:08:48PM -0700, Andy Zhou wrote:
> >> Zoltan,
> >>
> >> Sorry it took a while to get back to you.  I am just coming up to
> >> speed on OVS LACP implementation, so my understanding may not be
> >> correct.  Please feel free to point them out If I am wrong.
> >>
> >> According to wikipeida MC-LAG entry, there is no standard for it, they
> >> are mostly designed and implemented by vendors.
> >>
> >> After reading through the commit message, and comparing with the
> >> 802.1AX spec, I feel this seems like there is a bug in the MC-LAG
> >> implementation/configuration issue. When the partner on port A comes
> >> back again, should it wait for MC-LAG sync before using the default
> >> profile to exchange states with OVS?
> >
> > I agree that it sounds like a problem in the MC-LAG.  However, I also
> > agree that OVS could do better.
> >
> > The aggregation selection policy is somewhat a gray area not defined
> > in any spec. The bonding driver offers ad_select= parameter which
> > allows to switch to the new aggregator only if, for instance, all the
> > ports are down in an active aggregator.
> >
> > The Team driver implementing 802.3ad also provides the policy selection
> > parameter.  The default is to consider the prio in the LACPDU, but you
> > can also tell to not select any other aggregator if the current one is
> > still usable, or per bandwidth or per number of ports available.
> >
> > My suggestion if we want to change something is to stick with bonding
> > driver default behavior regarding to select a new aggregator:
> > """
> > table or 0
> >
> >   The active aggregator is chosen by largest aggregate
> >   bandwidth.
> >
> >   Reselection of the active aggregator occurs only when all
> >   slaves of the active aggregator are down or the active
> >   aggregator has no slaves.
> >
> >   This is the default value.
> > """
> > Documentation/networking/bonding.txt
> >
> > That would avoid problems with transient states like the reported one.
> >
> > fbl
> >
> >> On Mon, Jul 14, 2014 at 3:11 PM, Ben Pfaff <blp at nicira.com> wrote:
> >> > On Tue, Jul 08, 2014 at 05:35:57PM +0100, Zoltan Kiss wrote:
> >> >> This patch modifies the LACP selection logic by prefering a slaves with up and
> >> >> running partners when looking for a lead.
> >> >> That fixes the following scenario:
> >> >> - bond has 2 ports, A and B, their other ends are in separate chassis with
> >> >>   MC-LAG sync
> >> >> - the partner of port A is restarted
> >> >> - port B is still working
> >> >> - the partner on port A comes back, but temporarily it is using a default
> >> >>   config, as MC-LAG haven't synced yet
> >> >> - apparently that default config has a sys_priority which is smaller than the
> >> >>   other, still running port, plus completely different sys_id
> >> >> - therefore OVS choose port A despite it won't ever comes up into
> >> >>   collecting-distributing state
> >> >> - and port B is disabled, causing the whole bond goes down
> >> >>
> >> >> Checking through the 802.1ax standard, when port A comes up again, the two
> >> >> links fall apart due to the different LAG IDs. They should be attached to
> >> >> different Aggregators, and the Aggregators should live separately. In OVS there
> >> >> is no such concept as Aggregator, but I think it should be said that it has only
> >> >> one Aggregator, and it has an unique policy to choose which ports can join.
> >> >> Although changing the chassis' default config can also fix this, detecting
> >> >> such problems quite hard, therefore I think it is still valid to improve things
> >> >> in OVS side.
> >> >> Btw. the Linux kernel bonding drivers' LACP implementation allows more
> >> >> aggregators, and therefore it could handle this situation properly.
> >> >>
> >> >> Signed-off-by: Zoltan Kiss <zoltan.kiss at citrix.com>
> >> >
> >> > I verified that the unit tests still pass with this applied.
> >> >
> >> > Andy Zhou said he'd review the patch.
> >> _______________________________________________
> >> dev mailing list
> >> dev at openvswitch.org
> >> http://openvswitch.org/mailman/listinfo/dev
> >>
> > _______________________________________________
> > dev mailing list
> > dev at openvswitch.org
> > http://openvswitch.org/mailman/listinfo/dev
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
> 



More information about the dev mailing list