[ovs-discuss] OVS doesn't detach LACP port immediately when link down

Ethan Jackson ethan at nicira.com
Thu Jun 7 18:14:47 UTC 2012


The patch doesn't actually fix a problem, it just makes the LACP state
machine a little bit more straight forward.  For this reason we
decided not to backport it.  To be a little bit more specific.

Before this patch:
In a LACP bond, if you unplugged one of the slaves.  The bond would
failover as expected, but the output of ovs-appctl lacp/show would say
that the slave was healthy for longer than it should.  This doesn't
actually cause problems, it's just a bit strange.

After this patch:
The ovs-appctl lacp/show says that the slave is down as expected.

If the patch is causing actual problems for you beyond what I've
described above we can reconsider backporting it.

Ethan

On Thu, Jun 7, 2012 at 6:10 AM, Dan Constantinescu
<dconstantinescu at rim.com> wrote:
> Hi Ethan,
>
> I'm looking at 1.4.2 and 1.5.0 releases announced recently and the patches you've included in the master for this fix are not applied.
> Is there any particular reason for that or perhaps the releases were tagged before you've merged the patch ?
>
> Thanks,
> dan
>
>
> -----Original Message-----
> From: Ethan Jackson [mailto:ethan at nicira.com]
> Sent: Tuesday, May 22, 2012 2:43 PM
> To: Dan Constantinescu
> Cc: bugs at openvswitch.org
> Subject: Re: [ovs-discuss] OVS doesn't detach LACP port immediately when link down
>
> Sounds good, glad to be helpful.
> Ethan
>
> On Tue, May 22, 2012 at 7:11 AM, Dan Constantinescu
> <dconstantinescu at rim.com> wrote:
>> Hi Ethan,
>>
>> I've applied the patch you've indicated to openvswitch-1.4.1 code release and it works as expected.
>>
>> Thanks,
>> dan
>>
>>
>> -----Original Message-----
>> From: Dan Constantinescu
>> Sent: Thursday, May 17, 2012 4:44 PM
>> To: 'Ethan Jackson'
>> Cc: bugs at openvswitch.org
>> Subject: RE: [ovs-discuss] OVS doesn't detach LACP port immediately when link down
>>
>> Thanks for the reply Ethan - everything you've said makes sense.
>> I'll test with the code in branch-1.7 and let you know.
>>
>>
>>
>> -----Original Message-----
>> From: Ethan Jackson [mailto:ethan at nicira.com]
>> Sent: Wednesday, May 16, 2012 6:09 PM
>> To: Dan Constantinescu
>> Cc: bugs at openvswitch.org
>> Subject: Re: [ovs-discuss] OVS doesn't detach LACP port immediately when link down
>>
>> You're right that when an interface goes down, this should be
>> reflected in the LACP state machine, and this issue is fixed already
>> on master in the following commit:
>>
>> commit 3e5b3fdbf52cee41142ed8e2bf5cab9f49146d97
>> Author: Ethan Jackson <ethan at nicira.com>
>> Date:   Fri Mar 2 12:24:55 2012 -0800
>>
>>    lacp: Notify LACP module when carrier changes.
>>
>>    Without this patch, when a slave's carrier goes down, the LACP
>>    module (as evidenced by ovs-appctl lacp/show) would consider the
>>    slave current until it hadn't received LACP PDUs for the requisite
>>    amount of time.  It should instead, immediately mark the slave
>>    expired.  This shouldn't actually affect the behavior of LACP bonds
>>    because the bond module won't choose to send traffic out a slave
>>    whose carrier is down.
>>
>>    Signed-off-by: Ethan Jackson <ethan at nicira.com>
>>
>> However, this shouldn't actually be causing problems for you.  The
>> LACP module doesn't decide which interfaces are actually used in a
>> bond.  It only acts in an advisory role to the bonding module.  When
>> the carrier of the NIC goes down, the bond will stop forwarding
>> traffic.  You can verify that the bonding module notices the carrier
>> has changed using the ovs-appctl bond/show <bond_name> command.  When
>> you take the carrier down, the interface should be marked "disabled".
>>
>> So in summary: You are correct that the LACP module should be
>> detaching the link.  This is fixed on branch-1.7 and master of the
>> repository.  This should only be an aesthetic problem and should not
>> actually affect how traffic is forwarded.  If any of the previous
>> statements aren't true please follow up.
>>
>> Thanks,
>> Ethan
>>
>>
>>
>>
>>
>> On Wed, May 16, 2012 at 5:53 AM, Dan Constantinescu
>> <dconstantinescu at rim.com> wrote:
>>>
>>>
>>> * The Open vSwitch version number (as output by "ovs-vswitchd --version").
>>>
>>>
>>>
>>> ovs-vswitchd (Open vSwitch) 1.2.2
>>>
>>> Compiled Oct 18 2011 19:28:29
>>>
>>> OpenFlow versions 0x1:0x1
>>>
>>>
>>>
>>>
>>>
>>> * Upstream switch
>>>
>>>
>>>
>>> Cisco Nexus 7000
>>>
>>>
>>>
>>>
>>>
>>> * What you did that make the problem appear.
>>>
>>>
>>>
>>> - configured OVS with LACP bonding:
>>>
>>> ovs-vsctl add-bond br0 bond0 eth0 eth1 bond-mode=balance-tcp lacp=passive
>>> other_config:lacp-time=slow
>>>
>>>
>>>
>>> - turn an interface down (either at switch or server side):
>>>
>>> ifconfig eth0 down
>>>
>>>
>>>
>>> - verify the link is reported down by the OS:
>>>
>>> ethtool eth0
>>>
>>> Settings for eth0:
>>>
>>>         [.]
>>>
>>>         Link detected: no
>>>
>>>
>>>
>>>
>>>
>>> * What you expected to happen.
>>>
>>>
>>>
>>> If the status of a physical link is down, no matter what LACP timeout value
>>> I select, that port should be disabled and removed from the Link Aggregation
>>> group immediately.
>>>
>>>
>>>
>>>
>>>
>>> * What actually happened.
>>>
>>>
>>>
>>> - the port is not detached immediately as reported by ovs-appctl lacp/show
>>> bond0
>>>
>>> - server keeps trying to send packets over eth0 data path
>>>
>>> - eth0 is eventually detached after LACP timeout kicks-in, which is about 90
>>> seconds later.
>>>
>>> - the problem is mitigated by using lacp-time=fast, but this is not an
>>> option in our case because we lose Cisco ISSU support.
>>>
>>>
>>>
>>>
>>>
>>> * The kernel version on which Open vSwitch is running
>>>
>>>
>>>
>>> Linux version 3.0.0-12-server (buildd at crested) (gcc version 4.6.1
>>> (Ubuntu/Linaro 4.6.1-9ubuntu3) )
>>>
>>> Ubuntu 11.10
>>>
>>>
>>>
>>>
>>>
>>> * The output of "ovs-dpctl show".
>>>
>>>
>>>
>>> # ovs-dpctl show
>>>
>>> system at br0:
>>>
>>>         lookups: frags:0, hit:1463478513, missed:1213642, lost:852
>>>
>>>         port 0: br0 (internal)
>>>
>>>         port 1: eth1
>>>
>>>         port 2: eth0
>>>
>>>         port 3: br112 (internal)
>>>
>>>        port 95: tap3a0
>>>
>>>         port 96: tap3t1
>>>
>>>         port 97: tap4a0
>>>
>>>         port 98: tap4t1
>>>
>>>         port 112: vnet0
>>>
>>>         port 113: vnet1
>>>
>>>         port 114: vnet2
>>>
>>>         port 115: vnet3
>>>
>>>         port 116: vnet4
>>>
>>>         port 117: vnet5
>>>
>>>         port 118: vnet6
>>>
>>>         port 119: vnet7
>>>
>>>         port 120: vnet8
>>>
>>>         port 121: vnet9
>>>
>>>         port 122: vnet10
>>>
>>>         port 123: vnet11
>>>
>>>         port 124: vnet12
>>>
>>>         port 125: vnet13
>>>
>>>         port 126: vnet14
>>>
>>>         port 127: vnet15
>>>
>>>         port 128: vnet16
>>>
>>>         port 129: vnet17
>>>
>>>         port 130: vnet18
>>>
>>>         port 131: vnet19
>>>
>>>
>>>
>>>
>>>
>>> * A fix or workaround, if you have one.
>>>
>>>
>>>
>>> The problem is mitigated by using lacp-time=fast, but this is not an option
>>> in our case because we lose switch ISSU support.
>>>
>>>
>>>
>>>
>>>
>>> * Any other information that you think might be relevant.
>>>
>>>
>>>
>>> I tried either miimon or carrier for other_config:bond-detect-mode with the
>>> same result. It seems that the OVS link detection will only kick-in when
>>> lacp-time expires which is not what the LACP timeout is meant for.
>>>
>>
>>
>> ---------------------------------------------------------------------
>> This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.
>
> ---------------------------------------------------------------------
> This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.



More information about the discuss mailing list