[ovs-discuss] openvswitch issue with lacp
Gernot Poerner
gpo at spreadshirt.net
Tue Feb 23 09:40:00 UTC 2016
Hi there,
we currently use openvswitch in conjunction with our virtualization solution
opennebula (http://www.opennebula.org)
Our setup is running on Ubuntu 14.04, on Dell R620 with 4x 10Gig Network adapters
connected to 2 Force Ten 4820T switches, using VLT and a port channel and using
lacp 802.3ad mode, so connecting each 2 interfaces to 2 different switches.
We had this running without openvswitch in our former setup with linux
bonding/ifenslave like so:
/etc/network/interfaces
# The primary network interface
auto eth0
iface eth0 inet manual
bond-master bond1
auto eth1
iface eth1 inet manual
bond-master bond0
auto eth3
iface eth3 inet manual
bond-master bond1
auto eth4
iface eth4 inet manual
bond-master bond0
auto bond0
iface bond0 inet static
address 192.168.1.10
netmask 255.255.255.0
bond-lacp-rate 1
bond-slaves none
bond-mode 802.3ad
bond-miimon 100
auto bond1
iface bond1 inet static
address 192.168.2.10
netmask 255.255.255.0
gateway 192.168.2.1
bond-lacp-rate 1
bond-slaves none
bond-mode 802.3ad
bond-miimon 100
This worked as expected and we never had issues with it.
We now have a similar setup configured with openvswitch and /etc/network/interfaces
now look like this:
allow-vmbr0 bond0
iface bond0 inet manual
ovs_bridge vmbr0
ovs_type OVSBond
ovs_bonds eth1 eth4
ovs_options bond_mode=balance-tcp lacp=active other-config:lacp-time=fast other_config:lacp-fallback-ab=true
allow-vmbr1 bond1
iface bond1 inet manual
ovs_bridge vmbr1
ovs_type OVSBond
ovs_bonds eth0 eth3
ovs_options bond_mode=balance-tcp lacp=active other-config:lacp-time=fast other_config:lacp-fallback-ab=true
auto vmbr0
allow-ovs vmbr0
iface vmbr0 inet manual
ovs_type OVSBridge
ovs_ports bond0 vlan-pub
auto vmbr1
allow-ovs vmbr1
iface vmbr1 inet manual
ovs_type OVSBridge
ovs_ports bond1 vlan-prv
allow-vmbr0 vlan-pub
iface vlan-pub inet static
ovs_type OVSIntPort
ovs_bridge vmbr0
address 192.168.1.10
netmask 255.255.255.0
allow-vmbr1 vlan-prv
iface vlan-prv inet static
ovs_type OVSIntPort
ovs_bridge vmbr1
address 192.168.2.10
netmask 255.255.255.0
gateway 192.168.2.1
This also works as expected until it suddenly stops working. The effect is that, randomly
all these nodes lose their network connection completely, somtimes only on one bond interface,
sometimes even on both. Alle the VMs running on it also lose their connections.
It also seems not really related to the amount of traffic as far as I can see, there are
low traffic machines which stop earlier than high traffic ones.
After a restart of openvswitch, everything starts working again until it stops the next time.
This is very annyoing since in the worst case, we have to connect with DRAC via console to fix
the issue and get everything working again.
The ovs-vswitchd logfiles also doesn't give much information, I can provide one if needed.
What did we already try to fix this:
- Upgraded openvswitch from 2.3.1-1 to 2.4.0-1 -> no effect
- Use balance-slb instead of balance-tcp -> no effect
- Use ovs_options bond_mode=active-backup -> this is our current workaround, all our machines are currently set to active failover and in this mode we have no problems.
- Upgraded switch firmware to latest version -> no effect
- Today I even tried an openvswitch snapshot from git (cc245ce87d3de9c2a66ee42719ab413e464fb2de) -> The upgrade first broke the network connection and I had issues with starting/restarting so I did not try this any longer
output of ovs-vswitchd --version
ovs-vswitchd (Open vSwitch) 2.4.0
Compiled Oct 5 2015 11:12:38
Output of cat /proc/version
Linux version 3.13.0-63-generic (buildd at lgw01-18) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #103-Ubuntu SMP Fri Aug 14 21:42:59 UTC 2015
output from ovs-dpctl show
system at ovs-system:
lookups: hit:30279107869 missed:103538298 lost:94
flows: 430
masks: hit:106941297278 total:4 hit/pkt:3.52
port 0: ovs-system (internal)
port 1: vlan-prv (internal)
port 2: eth4
port 3: eth2
port 4: bond1 (internal)
port 5: vmbr1 (internal)
port 6: vlan-pub (internal)
port 7: vmbr0 (internal)
port 8: eth5
port 9: eth3
port 10: bond0 (internal)
port 11: vnet0
port 12: vnet1
port 13: vnet2
port 14: vnet3
port 15: vnet4
port 16: vnet5
port 17: vnet6
port 18: vnet7
port 19: vnet8
port 20: vnet9
port 21: vnet10
port 22: vnet11
port 23: vnet12
port 24: vnet13
port 25: vnet14
port 26: vnet15
port 27: vnet16
port 28: vnet17
port 29: vnet18
port 30: vnet19
port 31: vnet20
port 32: vnet21
port 33: vnet22
port 34: vnet23
port 35: vnet24
port 36: vnet25
port 37: vnet26
port 38: vnet27
port 39: vnet28
port 40: vnet29
I did not attach the contents of /etc/openvswitch/conf.db yet as this is currently
~200k and it's also running in active-backup mode right now anyway. I can provide
this if needed.
Thanks a lot for having a look into this
Gernot Poerner
More information about the discuss
mailing list