[ovs-discuss] OVS 2.3.3 to OVS 2.5.1 Upgrade - now seeing random connectivity issues

Kris G. Lindgren klindgren at godaddy.com
Fri Feb 24 01:08:57 UTC 2017


Hello all,

Trying to track down a problem that started after a recent OVS update in our openstack environment.  We updated from OVS 2.3.3 to OVS 2.5.1 and since then we have been having problems with servers and VM’s dropping off the network.  In the switches we see a bunch of #mac_move notifications, sometimes upto 26k per second.  Which causes the switch to go into defense mode and disable mac-learning.  But most of the time we see only a few mac moves per minute.  When we should be seeing exactly 0.  Our networking team believes that what is happening is a temporary loop in the network or HV’s are somehow forwarding broadcast packets that they shouldn’t be.  The only thing that we see around the time is the following:
2017 Feb 23 12:11:20 lfassi0114-02 %FWM-6-MAC_MOVE_NOTIFICATION: Host fa16.3ead.e6cf in vlan 413 is flapping between port Po19 and port Po22
2017 Feb 23 12:11:21 lfassi0114-02 %FWM-6-MAC_MOVE_NOTIFICATION: Host fa16.3ead.e6cf in vlan 413 is flapping between port Po22 and port Po19

TCPDUMPS:
12:11:20.374794 fa:16:3e:ad:e6:cf > 00:00:0c:9f:f0:01, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Request who-has 10.198.39.254 tell 10.198.38.178, length 46
12:11:20.374941 fa:16:3e:ad:e6:cf > 00:00:0c:9f:f0:01, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Request who-has 10.198.39.254 tell 10.198.38.178, length 46
12:11:20.376145 00:00:0c:9f:f0:01 > fa:16:3e:ad:e6:cf, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Reply 10.198.39.254 is-at 00:00:0c:9f:f0:01, length 46
12:11:21.374628 fa:16:3e:ad:e6:cf > 00:00:0c:9f:f0:01, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Request who-has 10.198.39.254 tell 10.198.38.178, length 46
12:11:21.375057 00:00:0c:9f:f0:01 > fa:16:3e:ad:e6:cf, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Reply 10.198.39.254 is-at 00:00:0c:9f:f0:01, length 46
12:11:22.374578 fa:16:3e:ad:e6:cf > 00:00:0c:9f:f0:01, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Request who-has 10.198.39.254 tell 10.198.38.178, length 46

Output of ovs-vsctl:
ac83a7ff-0157-437c-bfba-8c038ec77c74
    Bridge br-ext
        Port br-ext
            Interface br-ext
                type: internal
        Port "bond0"
            Interface "p3p1"
            Interface "p3p2"
        Port "mgmt0"
            Interface "mgmt0"
                type: internal
        Port "ext-vlan-215"
            tag: 215
            Interface "ext-vlan-215"
                type: patch
                options: {peer="br215-ext"}
    Bridge br-int
        fail_mode: secure
        Port "int-br215"
            Interface "int-br215"
                type: patch
                options: {peer="phy-br215"}
        Port "qvo99ae272d-f8"
            tag: 1
            Interface "qvo99ae272d-f8"
        Port "qvo1d5492c0-df"
            tag: 1
            Interface "qvo1d5492c0-df"
        Port br-int
            Interface br-int
                type: internal
        Port "qvo6b7f3219-90"
            tag: 1
            Interface "qvo6b7f3219-90"
        Port "qvo3b4f81ed-f4"
            tag: 1
            Interface "qvo3b4f81ed-f4"
    Bridge "br215"
        Port "br215"
            Interface "br215"
                type: internal
        Port "phy-br215"
            Interface "phy-br215"
                type: patch
                options: {peer="int-br215"}
        Port "br215-ext"
            Interface "br215-ext"
                type: patch
                options: {peer="ext-vlan-215"}
    ovs_version: "2.5.1"

# ovs-appctl bond/show
---- bond0 ----
bond_mode: balance-slb
bond may use recirculation: no, Recirc-ID : -1
bond-hash-basis: 0
updelay: 0 ms
downdelay: 0 ms
next rebalance: 2426 ms
lacp_status: negotiated
active slave mac: 00:8c:fa:eb:2b:74(p3p1)

slave p3p1: enabled
                active slave
                may_enable: true
                hash 140: 154 kB load

slave p3p2: enabled
                may_enable: true
                hash 199: 69 kB load
                hash 220: 40 kB load
                hash 234: 21 kB load

# ovs-appctl lacp/show
---- bond0 ----
                status: active negotiated
                sys_id: 00:8c:fa:eb:2b:74
                sys_priority: 65534
                aggregation key: 9
                lacp_time: slow

slave: p3p1: current attached
                port_id: 9
                port_priority: 65535
                may_enable: true

                actor sys_id: 00:8c:fa:eb:2b:74
                actor sys_priority: 65534
                actor port_id: 9
                actor port_priority: 65535
                actor key: 9
                actor state: activity aggregation synchronized collecting distributing

                partner sys_id: 02:1c:73:87:60:cd
                partner sys_priority: 32768
                partner port_id: 52
                partner port_priority: 32768
                partner key: 52
                partner state: activity aggregation synchronized collecting distributing

slave: p3p2: current attached
                port_id: 10
                port_priority: 65535
                may_enable: true

                actor sys_id: 00:8c:fa:eb:2b:74
                actor sys_priority: 65534
                actor port_id: 10
                actor port_priority: 65535
                actor key: 9
                actor state: activity aggregation synchronized collecting distributing

                partner sys_id: 02:1c:73:87:60:cd
                partner sys_priority: 32768
                partner port_id: 32820
                partner port_priority: 32768
                partner key: 52
                partner state: activity aggregation synchronized collecting distributing

The server is connected to a nexus 3000 switch with vPC enabled, we are configured as lacp with balance-slb mode.  Mgmt0 has the HV’s management IP assigned to it.  We create the br<vlan> bridges and add the patch ports between br-ext and br-vlan.  Neutron openvsiwtch agent configures br-int and adds the patch ports between br<vlan> and br-int.  Along with any creating any tap devices.

The configured openflow entries for each bridge are as follows:
# ovs-ofctl dump-flows br-ext
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=713896.614s, table=0, n_packets=1369078301, n_bytes=130805436786, idle_age=0, hard_age=65534, priority=0 actions=NORMAL

# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0xb367eed8ac0e9e7d, duration=713933.475s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,icmp6,in_port=2,icmp_type=136 actions=resubmit(,24)
 cookie=0xb367eed8ac0e9e7d, duration=713932.943s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,icmp6,in_port=3,icmp_type=136 actions=resubmit(,24)
 cookie=0xb367eed8ac0e9e7d, duration=713929.414s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,icmp6,in_port=5,icmp_type=136 actions=resubmit(,24)
 cookie=0xb367eed8ac0e9e7d, duration=713928.888s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,icmp6,in_port=4,icmp_type=136 actions=resubmit(,24)
 cookie=0xb367eed8ac0e9e7d, duration=713933.280s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,arp,in_port=2 actions=resubmit(,24)
 cookie=0xb367eed8ac0e9e7d, duration=713932.660s, table=0, n_packets=149398, n_bytes=6274716, idle_age=4, hard_age=65534, priority=10,arp,in_port=3 actions=resubmit(,24)
 cookie=0xb367eed8ac0e9e7d, duration=713929.218s, table=0, n_packets=102577, n_bytes=4308234, idle_age=7, hard_age=65534, priority=10,arp,in_port=5 actions=resubmit(,24)
 cookie=0xb367eed8ac0e9e7d, duration=713928.620s, table=0, n_packets=61321, n_bytes=2575482, idle_age=8, hard_age=65534, priority=10,arp,in_port=4 actions=resubmit(,24)
 cookie=0xb367eed8ac0e9e7d, duration=713935.656s, table=0, n_packets=1274428312, n_bytes=105873932966, idle_age=0, hard_age=65534, priority=3,in_port=1,vlan_tci=0x0000 actions=mod_vlan_vid:1,NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713945.070s, table=0, n_packets=7817, n_bytes=707680, idle_age=65534, hard_age=65534, priority=2,in_port=1 actions=drop
 cookie=0xb367eed8ac0e9e7d, duration=713945.999s, table=0, n_packets=82510417, n_bytes=17955154731, idle_age=0, hard_age=65534, priority=0 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713945.936s, table=23, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions=drop
 cookie=0xb367eed8ac0e9e7d, duration=713933.544s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,icmp6,in_port=2,icmp_type=136,nd_target=fe80::f816:3eff:fe49:4dff actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713933.009s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,icmp6,in_port=3,icmp_type=136,nd_target=fe80::f816:3eff:fec7:82b9 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713929.482s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,icmp6,in_port=5,icmp_type=136,nd_target=fe80::f816:3eff:fe07:d92e actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713928.951s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,icmp6,in_port=4,icmp_type=136,nd_target=fe80::f816:3eff:fe17:9919 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713933.410s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=2,arp_spa=10.26.87.153 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713933.344s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=2,arp_spa=10.26.52.87 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713932.877s, table=24, n_packets=149394, n_bytes=6274548, idle_age=4, hard_age=65534, priority=2,arp,in_port=3,arp_spa=10.26.53.163 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713932.807s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=3,arp_spa=10.26.85.208 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713932.728s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=3,arp_spa=10.26.85.209 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713929.349s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=5,arp_spa=10.26.85.218 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713929.284s, table=24, n_packets=102573, n_bytes=4308066, idle_age=7, hard_age=65534, priority=2,arp,in_port=5,arp_spa=10.26.53.86 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713928.817s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=4,arp_spa=10.26.87.99 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713928.752s, table=24, n_packets=61317, n_bytes=2575314, idle_age=8, hard_age=65534, priority=2,arp,in_port=4,arp_spa=10.26.53.197 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713928.686s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=4,arp_spa=198.71.248.104 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713945.871s, table=24, n_packets=16, n_bytes=672, idle_age=65534, hard_age=65534, priority=0 actions=drop

Has any changes been made with LACP and handling of OVS NORMAL flows/mac learning and flooding that would cause it to flood a packet back out to the switch on which it was received on?  That’s the only thing that we can think of that is causing this to happen.


___________________________________________________________________
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20170224/c50fcbf9/attachment-0001.html>


More information about the discuss mailing list