[ovs-discuss] LACP bonding issue

Nussbaum, Jacob jdnussb at ilstu.edu
Sat Nov 29 03:59:37 UTC 2014


Thinking it may be a driver issue, I configured the same VM's on virtual box using Intel PRO/1000 MT Desktop (82540EM) adapters.  Sadly the results were the same.  

Is there anything I can do with my configuration that may change the results?

Jacob

-----Original Message-----
From: Nussbaum, Jacob 
Sent: Tuesday, November 25, 2014 8:50 PM
To: Jay Vosburgh
Cc: 'discuss at openvswitch.org'
Subject: RE: [ovs-discuss] LACP bonding issue

Jay,
First off thank you for your response.  

I made the changes, and I had the same issue.  I changed the setup slightly at first and set the interface type for the vxlan tunnels as the devices they would be utilizing for tunneling.  After that didn't work I went back to the previous way I had done it.  

It seems to work when I leave LACP off, and use balance-slb as my bond_mode.  But again it only transmits traffic across one link at a time.  

In case it helps, these VM's are Ubuntu 14.04 that are running on a 2012 R2 Hyper V server.  Eth1, Eth2, Eth3, and Eth4 on both Vm's are on their own separate network within hyper v.  I was thinking it may be a driver issue with hyper v, but wasn't sure if there was a mistake in my configuration.  

Jacob
________________________________________
From: Jay Vosburgh [jay.vosburgh at canonical.com]
Sent: Tuesday, November 25, 2014 12:15 PM
To: Nussbaum, Jacob
Cc: 'discuss at openvswitch.org'
Subject: Re: [ovs-discuss] LACP bonding issue

Nussbaum, Jacob <jdnussb at ilstu.edu> wrote:

>I'm sending this out again hoping someone has an idea because this is 
>baffling me. I have attached my configuration for both VM's.
>
>I'm trying to configure a bonded tunnel between the 2 vm's (2 bridges, 
>S1 and br0)
>
>Each time I set Lacp=active all of my links besides 1 are disabled. The 
>bridges are running on two separate VM's using VXlan tunnels.
>
>Anyone seen anything similar or know of something that may correct this 
>issue?

        I haven't used the OVS LACP implementation much, but I have some experience with the bonding LACP implementation, and I see something in your status that looks familiar, below.

>user at docker:~$ sudo ovs-vsctl show
>a1a5cdb9-0815-4a70-93f6-6d0eb8d6d32c
>    Bridge "br0"
>        Port "bond0"
>            Interface "vxlan3"
>                type: vxlan
>                options: {remote_ip="10.0.0.12"}
>            Interface "vxlan1"
>                type: vxlan
>                options: {remote_ip="10.0.0.10"}
>            Interface "vxlan4"
>                type: vxlan
>                options: {remote_ip="10.0.0.13"}
>            Interface "vxlan2"
>                type: vxlan
>                options: {remote_ip="10.0.0.11"}
>        Port "vxlan5"
>            Interface "vxlan5"
>        Port "vxlan6"
>            Interface "vxlan6"
>        Port "br0"
>            Interface "br0"
>                type: internal
>        Port "vxlan8"
>            Interface "vxlan8"
>        Port "vxlan7"
>            Interface "vxlan7"
>    ovs_version: "2.0.2"
>user at docker:~$ sudo ovs-appctl bond/show bond0
>---- bond0 ----
>bond_mode: balance-tcp
>bond-hash-basis: 0
>updelay: 0 ms
>downdelay: 0 ms
>next rebalance: 2120 ms
>lacp_status: negotiated
>
>slave vxlan1: disabled
>        may_enable: false
>
>slave vxlan2: enabled
>        active slave
>        may_enable: true
>
>slave vxlan3: disabled
>        may_enable: false
>
>slave vxlan4: disabled
>        may_enable: false
>
>               user at docker:~$ sudo ovs-appctl lacp/show
>---- bond0 ----
>        status: passive negotiated
>        sys_id: 9e:78:7e:1f:09:44
>        sys_priority: 65534
>        aggregation key: 1
>        lacp_time: slow
>
>slave: vxlan1: defaulted detached
>        port_id: 1
>        port_priority: 65535
>        may_enable: false
>
>        actor sys_id: 9e:78:7e:1f:09:44
>        actor sys_priority: 65534
>        actor port_id: 1
>        actor port_priority: 65535
>        actor key: 1
>        actor state: aggregation defaulted
>
>        partner sys_id: 00:00:00:00:00:00
>        partner sys_priority: 0
>        partner port_id: 0
>        partner port_priority: 0
>        partner key: 0
>        partner state:

        In my experience with the bonding LACP implementation, the above (partner MAC, et al, all zeroes) indicates that the port in question is not receiving LACPDUs from the link partner.

>slave: vxlan2: current attached
>        port_id: 2
>        port_priority: 65535
>        may_enable: true
>
>        actor sys_id: 9e:78:7e:1f:09:44
>        actor sys_priority: 65534
>        actor port_id: 2
>        actor port_priority: 65535
>        actor key: 1
>        actor state: aggregation synchronized collecting distributing
>
>        partner sys_id: aa:88:94:85:19:43
>        partner sys_priority: 65534
>        partner port_id: 6
>        partner port_priority: 65535
>        partner key: 5
>        partner state: activity aggregation synchronized collecting 
> distributing

        This port, because the partner values are filled with specific link partner information that matches the vm2.txt you supplied (e.g., in vm2.txt, port_id 6 shows this port as its partner), is presumably correctly exchanging LACPDUs with the link partner.

        LACPDUs are sent as Ethernet multicasts to a specific destination, so I would expect that for a given configuration with similar ports, either all would be delivered or none would be.  This seems very curious in that only one port functions; perhaps something in the OVS or VXLAN forwarding is confused by the single MAC address for all LACPDUs (I don't know; I'm just speculating here).

        There is a fallback mechanism in the 802.1AX standard that would permit one port of an aggregator to function as an individual port when no LACPDUs are exchanged at all, but that doesn't appear to be the case here (as in that case there would be no partner sys_id, et al).

        I also notice that one bond (vm1.txt) is in LACP passive mode, and the other (vm2.txt) is active.  In principle this ought to be fine (the passive bond responding to received LACPDUs from the active bond), but I would suggest setting both ends to LACP active and see if that helps.  It might, and it shouldn't break anything.

        Also, setting the LACP rate (lacp_time) to fast instead of slow should make things converge more quickly for testing purposes.

        -J

---
        -Jay Vosburgh, jay.vosburgh at canonical.com



More information about the discuss mailing list