[ovs-dev] [PATCH RFC] lacp: Prefer slaves with running partner when selecting lead
Anoob Soman
anoob.soman at citrix.com
Wed Jul 2 12:54:21 UTC 2014
Hi Zoli,
On 01/07/14 20:52, Zoltan Kiss wrote:
> This patch modifies the LACP selection logic by prefering a slaves with up and
> running partners when looking for a lead.
> That fixes the following scenario:
> - bond has 2 ports, A and B, their other ends are in separate chassis with
> MC-LAG sync
> - the partner of port A is restarted
> - port B is still working
> - the partner on port A comes back, but temporarily it is using a default
> config, as MC-LAG haven't synced yet
> - apparently that default config has a sys_priority which is smaller than the
> other, still running port, plus completely different sys_id
> - therefore OVS choose port A despite it won't ever comes up into
> collecting-distributing state
> - and port B is disabled, causing the whole bond goes down
>
> Checking through the 802.1ax standard, when port A comes up again, the two
> links fall apart due to the different LAG IDs. They should be attached to
> different Aggregators, and the Aggregators should live separately. In OVS there
> is no such concept as Aggregator, but I think it should be said that it has only
> one Aggregator, and it has an unique policy to choose which ports can join.
> Although changing the chassis' default config can also fix this, detecting
> such problems quite hard, therefore I think it is still valid to improve things
> in OVS side.
> Btw. the Linux kernel bonding drivers' LACP implementation allows more
> aggregators, and therefore it could handle this situation properly.
>
> Signed-off-by: Zoltan Kiss <zoltan.kiss at citrix.com>
> ---
> diff --git a/lib/lacp.c b/lib/lacp.c
> index 0d30e51..da590a7 100644
> --- a/lib/lacp.c
> +++ b/lib/lacp.c
> @@ -579,6 +579,12 @@ lacp_wait(struct lacp *lacp) OVS_EXCLUDED(mutex)
> lacp_unlock();
> }
>
> +static bool
> +lacp_partner_ready(struct slave *slave)
> +{
> + return slave->partner.state & (LACP_STATE_COL & LACP_STATE_DIST);
This should be (LACP_STATE_COL | LACP_STATE_DIST)
> +}
> +
> /* Static Helpers. */
>
> /* Updates the attached status of all slaves controlled by 'lacp' and sets its
> @@ -616,7 +622,8 @@ lacp_update_attached(struct lacp *lacp) OVS_REQUIRES(mutex)
> slave->attached = true;
> slave_get_priority(slave, &pri);
>
> - if (!lead || memcmp(&pri, &lead_pri, sizeof pri) < 0) {
> + if (!lead || (memcmp(&pri, &lead_pri, sizeof pri) < 0 &&
> + lacp_partner_ready(lead))) {
> lead = slave;
> lead_pri = pri;
> }
We should pass slave instead of lead to lacp_partner_ready()
More information about the dev
mailing list