[ovs-discuss] OVS 2.3.3 to OVS 2.5.1 Upgrade - now seeing random connectivity issues
Kris G. Lindgren
klindgren at godaddy.com
Fri Feb 24 01:08:57 UTC 2017
Hello all,
Trying to track down a problem that started after a recent OVS update in our openstack environment. We updated from OVS 2.3.3 to OVS 2.5.1 and since then we have been having problems with servers and VM’s dropping off the network. In the switches we see a bunch of #mac_move notifications, sometimes upto 26k per second. Which causes the switch to go into defense mode and disable mac-learning. But most of the time we see only a few mac moves per minute. When we should be seeing exactly 0. Our networking team believes that what is happening is a temporary loop in the network or HV’s are somehow forwarding broadcast packets that they shouldn’t be. The only thing that we see around the time is the following:
2017 Feb 23 12:11:20 lfassi0114-02 %FWM-6-MAC_MOVE_NOTIFICATION: Host fa16.3ead.e6cf in vlan 413 is flapping between port Po19 and port Po22
2017 Feb 23 12:11:21 lfassi0114-02 %FWM-6-MAC_MOVE_NOTIFICATION: Host fa16.3ead.e6cf in vlan 413 is flapping between port Po22 and port Po19
TCPDUMPS:
12:11:20.374794 fa:16:3e:ad:e6:cf > 00:00:0c:9f:f0:01, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Request who-has 10.198.39.254 tell 10.198.38.178, length 46
12:11:20.374941 fa:16:3e:ad:e6:cf > 00:00:0c:9f:f0:01, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Request who-has 10.198.39.254 tell 10.198.38.178, length 46
12:11:20.376145 00:00:0c:9f:f0:01 > fa:16:3e:ad:e6:cf, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Reply 10.198.39.254 is-at 00:00:0c:9f:f0:01, length 46
12:11:21.374628 fa:16:3e:ad:e6:cf > 00:00:0c:9f:f0:01, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Request who-has 10.198.39.254 tell 10.198.38.178, length 46
12:11:21.375057 00:00:0c:9f:f0:01 > fa:16:3e:ad:e6:cf, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Reply 10.198.39.254 is-at 00:00:0c:9f:f0:01, length 46
12:11:22.374578 fa:16:3e:ad:e6:cf > 00:00:0c:9f:f0:01, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Request who-has 10.198.39.254 tell 10.198.38.178, length 46
Output of ovs-vsctl:
ac83a7ff-0157-437c-bfba-8c038ec77c74
Bridge br-ext
Port br-ext
Interface br-ext
type: internal
Port "bond0"
Interface "p3p1"
Interface "p3p2"
Port "mgmt0"
Interface "mgmt0"
type: internal
Port "ext-vlan-215"
tag: 215
Interface "ext-vlan-215"
type: patch
options: {peer="br215-ext"}
Bridge br-int
fail_mode: secure
Port "int-br215"
Interface "int-br215"
type: patch
options: {peer="phy-br215"}
Port "qvo99ae272d-f8"
tag: 1
Interface "qvo99ae272d-f8"
Port "qvo1d5492c0-df"
tag: 1
Interface "qvo1d5492c0-df"
Port br-int
Interface br-int
type: internal
Port "qvo6b7f3219-90"
tag: 1
Interface "qvo6b7f3219-90"
Port "qvo3b4f81ed-f4"
tag: 1
Interface "qvo3b4f81ed-f4"
Bridge "br215"
Port "br215"
Interface "br215"
type: internal
Port "phy-br215"
Interface "phy-br215"
type: patch
options: {peer="int-br215"}
Port "br215-ext"
Interface "br215-ext"
type: patch
options: {peer="ext-vlan-215"}
ovs_version: "2.5.1"
# ovs-appctl bond/show
---- bond0 ----
bond_mode: balance-slb
bond may use recirculation: no, Recirc-ID : -1
bond-hash-basis: 0
updelay: 0 ms
downdelay: 0 ms
next rebalance: 2426 ms
lacp_status: negotiated
active slave mac: 00:8c:fa:eb:2b:74(p3p1)
slave p3p1: enabled
active slave
may_enable: true
hash 140: 154 kB load
slave p3p2: enabled
may_enable: true
hash 199: 69 kB load
hash 220: 40 kB load
hash 234: 21 kB load
# ovs-appctl lacp/show
---- bond0 ----
status: active negotiated
sys_id: 00:8c:fa:eb:2b:74
sys_priority: 65534
aggregation key: 9
lacp_time: slow
slave: p3p1: current attached
port_id: 9
port_priority: 65535
may_enable: true
actor sys_id: 00:8c:fa:eb:2b:74
actor sys_priority: 65534
actor port_id: 9
actor port_priority: 65535
actor key: 9
actor state: activity aggregation synchronized collecting distributing
partner sys_id: 02:1c:73:87:60:cd
partner sys_priority: 32768
partner port_id: 52
partner port_priority: 32768
partner key: 52
partner state: activity aggregation synchronized collecting distributing
slave: p3p2: current attached
port_id: 10
port_priority: 65535
may_enable: true
actor sys_id: 00:8c:fa:eb:2b:74
actor sys_priority: 65534
actor port_id: 10
actor port_priority: 65535
actor key: 9
actor state: activity aggregation synchronized collecting distributing
partner sys_id: 02:1c:73:87:60:cd
partner sys_priority: 32768
partner port_id: 32820
partner port_priority: 32768
partner key: 52
partner state: activity aggregation synchronized collecting distributing
The server is connected to a nexus 3000 switch with vPC enabled, we are configured as lacp with balance-slb mode. Mgmt0 has the HV’s management IP assigned to it. We create the br<vlan> bridges and add the patch ports between br-ext and br-vlan. Neutron openvsiwtch agent configures br-int and adds the patch ports between br<vlan> and br-int. Along with any creating any tap devices.
The configured openflow entries for each bridge are as follows:
# ovs-ofctl dump-flows br-ext
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=713896.614s, table=0, n_packets=1369078301, n_bytes=130805436786, idle_age=0, hard_age=65534, priority=0 actions=NORMAL
# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
cookie=0xb367eed8ac0e9e7d, duration=713933.475s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,icmp6,in_port=2,icmp_type=136 actions=resubmit(,24)
cookie=0xb367eed8ac0e9e7d, duration=713932.943s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,icmp6,in_port=3,icmp_type=136 actions=resubmit(,24)
cookie=0xb367eed8ac0e9e7d, duration=713929.414s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,icmp6,in_port=5,icmp_type=136 actions=resubmit(,24)
cookie=0xb367eed8ac0e9e7d, duration=713928.888s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,icmp6,in_port=4,icmp_type=136 actions=resubmit(,24)
cookie=0xb367eed8ac0e9e7d, duration=713933.280s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,arp,in_port=2 actions=resubmit(,24)
cookie=0xb367eed8ac0e9e7d, duration=713932.660s, table=0, n_packets=149398, n_bytes=6274716, idle_age=4, hard_age=65534, priority=10,arp,in_port=3 actions=resubmit(,24)
cookie=0xb367eed8ac0e9e7d, duration=713929.218s, table=0, n_packets=102577, n_bytes=4308234, idle_age=7, hard_age=65534, priority=10,arp,in_port=5 actions=resubmit(,24)
cookie=0xb367eed8ac0e9e7d, duration=713928.620s, table=0, n_packets=61321, n_bytes=2575482, idle_age=8, hard_age=65534, priority=10,arp,in_port=4 actions=resubmit(,24)
cookie=0xb367eed8ac0e9e7d, duration=713935.656s, table=0, n_packets=1274428312, n_bytes=105873932966, idle_age=0, hard_age=65534, priority=3,in_port=1,vlan_tci=0x0000 actions=mod_vlan_vid:1,NORMAL
cookie=0xb367eed8ac0e9e7d, duration=713945.070s, table=0, n_packets=7817, n_bytes=707680, idle_age=65534, hard_age=65534, priority=2,in_port=1 actions=drop
cookie=0xb367eed8ac0e9e7d, duration=713945.999s, table=0, n_packets=82510417, n_bytes=17955154731, idle_age=0, hard_age=65534, priority=0 actions=NORMAL
cookie=0xb367eed8ac0e9e7d, duration=713945.936s, table=23, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions=drop
cookie=0xb367eed8ac0e9e7d, duration=713933.544s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,icmp6,in_port=2,icmp_type=136,nd_target=fe80::f816:3eff:fe49:4dff actions=NORMAL
cookie=0xb367eed8ac0e9e7d, duration=713933.009s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,icmp6,in_port=3,icmp_type=136,nd_target=fe80::f816:3eff:fec7:82b9 actions=NORMAL
cookie=0xb367eed8ac0e9e7d, duration=713929.482s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,icmp6,in_port=5,icmp_type=136,nd_target=fe80::f816:3eff:fe07:d92e actions=NORMAL
cookie=0xb367eed8ac0e9e7d, duration=713928.951s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,icmp6,in_port=4,icmp_type=136,nd_target=fe80::f816:3eff:fe17:9919 actions=NORMAL
cookie=0xb367eed8ac0e9e7d, duration=713933.410s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=2,arp_spa=10.26.87.153 actions=NORMAL
cookie=0xb367eed8ac0e9e7d, duration=713933.344s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=2,arp_spa=10.26.52.87 actions=NORMAL
cookie=0xb367eed8ac0e9e7d, duration=713932.877s, table=24, n_packets=149394, n_bytes=6274548, idle_age=4, hard_age=65534, priority=2,arp,in_port=3,arp_spa=10.26.53.163 actions=NORMAL
cookie=0xb367eed8ac0e9e7d, duration=713932.807s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=3,arp_spa=10.26.85.208 actions=NORMAL
cookie=0xb367eed8ac0e9e7d, duration=713932.728s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=3,arp_spa=10.26.85.209 actions=NORMAL
cookie=0xb367eed8ac0e9e7d, duration=713929.349s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=5,arp_spa=10.26.85.218 actions=NORMAL
cookie=0xb367eed8ac0e9e7d, duration=713929.284s, table=24, n_packets=102573, n_bytes=4308066, idle_age=7, hard_age=65534, priority=2,arp,in_port=5,arp_spa=10.26.53.86 actions=NORMAL
cookie=0xb367eed8ac0e9e7d, duration=713928.817s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=4,arp_spa=10.26.87.99 actions=NORMAL
cookie=0xb367eed8ac0e9e7d, duration=713928.752s, table=24, n_packets=61317, n_bytes=2575314, idle_age=8, hard_age=65534, priority=2,arp,in_port=4,arp_spa=10.26.53.197 actions=NORMAL
cookie=0xb367eed8ac0e9e7d, duration=713928.686s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=4,arp_spa=198.71.248.104 actions=NORMAL
cookie=0xb367eed8ac0e9e7d, duration=713945.871s, table=24, n_packets=16, n_bytes=672, idle_age=65534, hard_age=65534, priority=0 actions=drop
Has any changes been made with LACP and handling of OVS NORMAL flows/mac learning and flooding that would cause it to flood a packet back out to the switch on which it was received on? That’s the only thing that we can think of that is causing this to happen.
___________________________________________________________________
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20170224/c50fcbf9/attachment-0001.html>
More information about the discuss
mailing list