[ovs-discuss] LACP bonding issue

Jay Vosburgh jay.vosburgh at canonical.com
Tue Nov 25 18:15:30 UTC 2014

Nussbaum, Jacob <jdnussb at ilstu.edu> wrote:

>I’m sending this out again hoping someone has an idea because this is
>baffling me. I have attached my configuration for both VM’s. 
>I’m trying to configure a bonded tunnel between the 2 vm’s (2 bridges, S1
>and br0)
>Each time I set Lacp=active all of my links besides 1 are disabled. The
>bridges are running on two separate VM’s using VXlan tunnels.
>Anyone seen anything similar or know of something that may correct this

	I haven't used the OVS LACP implementation much, but I have some
experience with the bonding LACP implementation, and I see something in
your status that looks familiar, below.

>user at docker:~$ sudo ovs-vsctl show
>    Bridge "br0"
>        Port "bond0"
>            Interface "vxlan3"
>                type: vxlan
>                options: {remote_ip=""}
>            Interface "vxlan1"
>                type: vxlan
>                options: {remote_ip=""}
>            Interface "vxlan4"
>                type: vxlan
>                options: {remote_ip=""}
>            Interface "vxlan2"
>                type: vxlan
>                options: {remote_ip=""}
>        Port "vxlan5"
>            Interface "vxlan5"
>        Port "vxlan6"
>            Interface "vxlan6"
>        Port "br0"
>            Interface "br0"
>                type: internal
>        Port "vxlan8"
>            Interface "vxlan8"
>        Port "vxlan7"
>            Interface "vxlan7"
>    ovs_version: "2.0.2"
>user at docker:~$ sudo ovs-appctl bond/show bond0
>---- bond0 ----
>bond_mode: balance-tcp
>bond-hash-basis: 0
>updelay: 0 ms
>downdelay: 0 ms
>next rebalance: 2120 ms
>lacp_status: negotiated
>slave vxlan1: disabled
>        may_enable: false
>slave vxlan2: enabled
>        active slave
>        may_enable: true
>slave vxlan3: disabled
>        may_enable: false
>slave vxlan4: disabled
>        may_enable: false
>		user at docker:~$ sudo ovs-appctl lacp/show
>---- bond0 ----
>        status: passive negotiated
>        sys_id: 9e:78:7e:1f:09:44
>        sys_priority: 65534
>        aggregation key: 1
>        lacp_time: slow
>slave: vxlan1: defaulted detached
>        port_id: 1
>        port_priority: 65535
>        may_enable: false
>        actor sys_id: 9e:78:7e:1f:09:44
>        actor sys_priority: 65534
>        actor port_id: 1
>        actor port_priority: 65535
>        actor key: 1
>        actor state: aggregation defaulted
>        partner sys_id: 00:00:00:00:00:00
>        partner sys_priority: 0
>        partner port_id: 0
>        partner port_priority: 0
>        partner key: 0
>        partner state:

	In my experience with the bonding LACP implementation, the above
(partner MAC, et al, all zeroes) indicates that the port in question is
not receiving LACPDUs from the link partner.

>slave: vxlan2: current attached
>        port_id: 2
>        port_priority: 65535
>        may_enable: true
>        actor sys_id: 9e:78:7e:1f:09:44
>        actor sys_priority: 65534
>        actor port_id: 2
>        actor port_priority: 65535
>        actor key: 1
>        actor state: aggregation synchronized collecting distributing
>        partner sys_id: aa:88:94:85:19:43
>        partner sys_priority: 65534
>        partner port_id: 6
>        partner port_priority: 65535
>        partner key: 5
>        partner state: activity aggregation synchronized collecting distributing

	This port, because the partner values are filled with specific
link partner information that matches the vm2.txt you supplied (e.g., in
vm2.txt, port_id 6 shows this port as its partner), is presumably
correctly exchanging LACPDUs with the link partner.

	LACPDUs are sent as Ethernet multicasts to a specific
destination, so I would expect that for a given configuration with
similar ports, either all would be delivered or none would be.  This
seems very curious in that only one port functions; perhaps something in
the OVS or VXLAN forwarding is confused by the single MAC address for
all LACPDUs (I don't know; I'm just speculating here).

	There is a fallback mechanism in the 802.1AX standard that would
permit one port of an aggregator to function as an individual port when
no LACPDUs are exchanged at all, but that doesn't appear to be the case
here (as in that case there would be no partner sys_id, et al).

	I also notice that one bond (vm1.txt) is in LACP passive mode,
and the other (vm2.txt) is active.  In principle this ought to be fine
(the passive bond responding to received LACPDUs from the active bond),
but I would suggest setting both ends to LACP active and see if that
helps.  It might, and it shouldn't break anything.

	Also, setting the LACP rate (lacp_time) to fast instead of slow
should make things converge more quickly for testing purposes.


	-Jay Vosburgh, jay.vosburgh at canonical.com

More information about the discuss mailing list