[ovs-dev] [PATCH RFC] lacp: Prefer slaves with running partner when selecting lead

Anoob Soman anoob.soman at citrix.com
Wed Jul 2 12:54:21 UTC 2014


Hi Zoli,

On 01/07/14 20:52, Zoltan Kiss wrote:
> This patch modifies the LACP selection logic by prefering a slaves with up and
> running partners when looking for a lead.
> That fixes the following scenario:
> - bond has 2 ports, A and B, their other ends are in separate chassis with
>    MC-LAG sync
> - the partner of port A is restarted
> - port B is still working
> - the partner on port A comes back, but temporarily it is using a default
>    config, as MC-LAG haven't synced yet
> - apparently that default config has a sys_priority which is smaller than the
>    other, still running port, plus completely different sys_id
> - therefore OVS choose port A despite it won't ever comes up into
>    collecting-distributing state
> - and port B is disabled, causing the whole bond goes down
>
> Checking through the 802.1ax standard, when port A comes up again, the two
> links fall apart due to the different LAG IDs. They should be attached to
> different Aggregators, and the Aggregators should live separately. In OVS there
> is no such concept as Aggregator, but I think it should be said that it has only
> one Aggregator, and it has an unique policy to choose which ports can join.
> Although changing the chassis' default config can also fix this, detecting
> such problems quite hard, therefore I think it is still valid to improve things
> in OVS side.
> Btw. the Linux kernel bonding drivers' LACP implementation allows more
> aggregators, and therefore it could handle this situation properly.
>
> Signed-off-by: Zoltan Kiss <zoltan.kiss at citrix.com>
> ---
> diff --git a/lib/lacp.c b/lib/lacp.c
> index 0d30e51..da590a7 100644
> --- a/lib/lacp.c
> +++ b/lib/lacp.c
> @@ -579,6 +579,12 @@ lacp_wait(struct lacp *lacp) OVS_EXCLUDED(mutex)
>       lacp_unlock();
>   }
>   
> +static bool
> +lacp_partner_ready(struct slave *slave)
> +{
> +    return slave->partner.state & (LACP_STATE_COL & LACP_STATE_DIST);
This should be (LACP_STATE_COL | LACP_STATE_DIST)
> +}
> +
>   /* Static Helpers. */
>   
>   /* Updates the attached status of all slaves controlled by 'lacp' and sets its
> @@ -616,7 +622,8 @@ lacp_update_attached(struct lacp *lacp) OVS_REQUIRES(mutex)
>           slave->attached = true;
>           slave_get_priority(slave, &pri);
>   
> -        if (!lead || memcmp(&pri, &lead_pri, sizeof pri) < 0) {
> +        if (!lead || (memcmp(&pri, &lead_pri, sizeof pri) < 0 &&
> +                      lacp_partner_ready(lead))) {
>               lead = slave;
>               lead_pri = pri;
>           }
We should pass slave instead of lead to lacp_partner_ready()




More information about the dev mailing list