[ovs-discuss] openvswitch issue with lacp

Gernot Poerner gpo at spreadshirt.net
Tue Feb 23 09:40:00 UTC 2016


Hi there,

we currently use openvswitch in conjunction with our virtualization solution 
opennebula (http://www.opennebula.org)

Our setup is running on Ubuntu 14.04, on Dell R620 with 4x 10Gig Network adapters 
connected to 2 Force Ten 4820T switches, using VLT and a port channel and using 
lacp 802.3ad mode, so connecting each 2 interfaces to 2 different switches.

We had this running without openvswitch in our former setup with linux
bonding/ifenslave like so:

/etc/network/interfaces

# The primary network interface
auto eth0
iface eth0 inet manual
    bond-master bond1

auto eth1
iface eth1 inet manual
    bond-master bond0

auto eth3
iface eth3 inet manual
    bond-master bond1

auto eth4
iface eth4 inet manual
    bond-master bond0

auto bond0
iface bond0 inet static
    address  192.168.1.10
    netmask 255.255.255.0
    bond-lacp-rate 1
    bond-slaves none
    bond-mode 802.3ad
    bond-miimon 100

auto bond1
iface bond1 inet static
    address 192.168.2.10
    netmask 255.255.255.0
    gateway 192.168.2.1
    bond-lacp-rate 1
    bond-slaves none
    bond-mode 802.3ad
    bond-miimon 100

This worked as expected and we never had issues with it.

We now have a similar setup configured with openvswitch and /etc/network/interfaces 
now look like this:

allow-vmbr0 bond0
iface bond0 inet manual
  ovs_bridge vmbr0
  ovs_type OVSBond
  ovs_bonds eth1 eth4
  ovs_options bond_mode=balance-tcp lacp=active other-config:lacp-time=fast other_config:lacp-fallback-ab=true 

allow-vmbr1 bond1
iface bond1 inet manual
  ovs_bridge vmbr1
  ovs_type OVSBond
  ovs_bonds eth0 eth3
  ovs_options bond_mode=balance-tcp lacp=active other-config:lacp-time=fast other_config:lacp-fallback-ab=true

auto vmbr0
allow-ovs vmbr0
iface vmbr0 inet manual
  ovs_type OVSBridge
  ovs_ports bond0 vlan-pub

auto vmbr1
allow-ovs vmbr1
iface vmbr1 inet manual
  ovs_type OVSBridge
  ovs_ports bond1 vlan-prv

allow-vmbr0 vlan-pub
iface vlan-pub inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  address 192.168.1.10
  netmask 255.255.255.0
  
allow-vmbr1 vlan-prv
iface vlan-prv inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr1
  address 192.168.2.10
  netmask 255.255.255.0
  gateway 192.168.2.1


This also works as expected until it suddenly stops working. The effect is that, randomly
all these nodes lose their network connection completely, somtimes only on one bond interface,
sometimes even on both. Alle the VMs running on it also lose their connections. 

It also seems not really related to the amount of traffic as far as I can see, there are
low traffic machines which stop earlier than high traffic ones.

After a restart of openvswitch, everything starts working again until it stops the next time.
This is very annyoing since in the worst case, we have to connect with DRAC via console to fix
the issue and get everything working again. 

The ovs-vswitchd logfiles also doesn't give much information, I can provide one if needed.

What did we already try to fix this:

- Upgraded openvswitch from 2.3.1-1 to 2.4.0-1 -> no effect
- Use balance-slb instead of balance-tcp -> no effect
- Use ovs_options bond_mode=active-backup -> this is our current workaround, all our machines are currently set to active failover and in this mode we have no problems. 
- Upgraded switch firmware to latest version -> no effect
- Today I even tried an openvswitch snapshot from git (cc245ce87d3de9c2a66ee42719ab413e464fb2de) -> The upgrade first broke the network connection and I had issues with starting/restarting so I did not try this any longer

output of ovs-vswitchd --version

ovs-vswitchd (Open vSwitch) 2.4.0
Compiled Oct  5 2015 11:12:38

Output of cat /proc/version

Linux version 3.13.0-63-generic (buildd at lgw01-18) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #103-Ubuntu SMP Fri Aug 14 21:42:59 UTC 2015

output from ovs-dpctl show
system at ovs-system:
	lookups: hit:30279107869 missed:103538298 lost:94
	flows: 430
	masks: hit:106941297278 total:4 hit/pkt:3.52
	port 0: ovs-system (internal)
	port 1: vlan-prv (internal)
	port 2: eth4
	port 3: eth2
	port 4: bond1 (internal)
	port 5: vmbr1 (internal)
	port 6: vlan-pub (internal)
	port 7: vmbr0 (internal)
	port 8: eth5
	port 9: eth3
	port 10: bond0 (internal)
	port 11: vnet0
	port 12: vnet1
	port 13: vnet2
	port 14: vnet3
	port 15: vnet4
	port 16: vnet5
	port 17: vnet6
	port 18: vnet7
	port 19: vnet8
	port 20: vnet9
	port 21: vnet10
	port 22: vnet11
	port 23: vnet12
	port 24: vnet13
	port 25: vnet14
	port 26: vnet15
	port 27: vnet16
	port 28: vnet17
	port 29: vnet18
	port 30: vnet19
	port 31: vnet20
	port 32: vnet21
	port 33: vnet22
	port 34: vnet23
	port 35: vnet24
	port 36: vnet25
	port 37: vnet26
	port 38: vnet27
	port 39: vnet28
	port 40: vnet29

I did not attach the contents of /etc/openvswitch/conf.db yet as this is currently 
~200k and it's also running in active-backup mode right now anyway. I can provide 
this if needed.

Thanks a lot for having a look into this

Gernot Poerner



More information about the discuss mailing list