[ovs-discuss] quad port X710 rNDC (Dell) make KVM host br0 OVS (2.5.0) port lose connection

Flavio Leitner fbl at sysclose.org
Mon Sep 4 22:30:02 UTC 2017


On Thu, 31 Aug 2017 07:12:43 +0000
"Jayakumar, Muthurajan" <muthurajan.jayakumar at intel.com> wrote:

> Dear team,
> Can I please request if any one of you have seen similar observation please.
> Kindly requesting to share your suggestion please.
> 
> Following is the observation:
> 
> quad port X710 rNDC (Dell) make KVM host br0 OVS (2.5.0) port lose connection
> 
> Background:
> 
> We are introducing quad port Intel X710 rNDC to all Dell 14G platform. We make all eth0-eth3 of X710 (i40e driver 2.0.23, FW 6.00) as br0 uplink bond of KVM OVS, and we have observed periodic network connection loss (ping unreachable) on the br0 interface.
> 
> 
> Attached is one of node's OVS config & network port info. The AHV KVM host is:
> CentOS release 6.8 (Final)
> 4.4.26-1.el6.nutanix.20160925.83.x86_64
> From ovs-vsctl, eth0 is active OVS upstream port, eth1 is standby OVS port.
>     Bridge "br0"
>         Port "br0"
>             Interface "br0"
>                 type: internal
>         Port "br0-dhcp"
>             Interface "br0-dhcp"
>                 type: vxlan
>                 options: {key="1", remote_ip="10.211.56.93"}
>         Port "br0-up"
>             Interface "eth1"
>             Interface "eth3"
>             Interface "eth0"
>             Interface "eth2"
>         Port "tap0"
>             tag: 0
>             Interface "tap0"
>         Port "vnet0"
>             Interface "vnet0"
>         Port "br0-arp"
>             Interface "br0-arp"
>                 type: vxlan
>                 options: {key="1", remote_ip="192.168.5.2"}
>     ovs_version: "2.5.0"
> ---- br0-up ----
> bond_mode: active-backup
> bond may use recirculation: no, Recirc-ID : -1
> bond-hash-basis: 0
> updelay: 0 ms
> downdelay: 0 ms
> lacp_status: off
> active slave mac: 24:6e:96:47:6d:0c(eth1)
> slave eth0: enabled
>     may_enable: true
> slave eth1: enabled
>     active slave
>     may_enable: true
> slave eth2: disabled
>     may_enable: false
> slave eth3: disabled
>     may_enable: false
> 
> However, eth1 (port 1) has much more tx/rx traffic than eth0 (port3) per ovs-ofctl dump, why?

Because eth1 is the active slave? So it will do all the TX/RX for
the br-up port.


> And pkt drop is on br0 uplink bond port & tap0 VM vnic port:
> ovs-ofctl dump-ports-desc br0
> -----------------------------
> OFPST_PORT_DESC reply (xid=0x2):
> 1(eth1): addr:24:6e:96:47:6d:0c config: 0 state: 0 current: 10GB-FD advertised: FIBER supported: 10GB-FD FIBER AUTO_PAUSE speed: 10000 Mbps now, 10000 Mbps max
> 2(eth3): addr:24:6e:96:47:6d:10 config: 0 state: LINK_DOWN advertised: 1GB-FD 10GB-FD AUTO_NEG supported: 1GB-FD 10GB-FD AUTO_NEG AUTO_PAUSE speed: 0 Mbps now, 10000 Mbps max
> 3(eth0): addr:24:6e:96:47:6d:0a config: 0 state: 0 current: 10GB-FD advertised: FIBER supported: 10GB-FD FIBER AUTO_PAUSE speed: 10000 Mbps now, 10000 Mbps max
> 4(eth2): addr:24:6e:96:47:6d:0e config: 0 state: LINK_DOWN advertised: 1GB-FD 10GB-FD AUTO_NEG supported: 1GB-FD 10GB-FD AUTO_NEG AUTO_PAUSE speed: 0 Mbps now, 10000 Mbps max
> 5(vnet0): addr:fe:6b:8d:80:5c:a8 config: 0 state: 0 current: 10MB-FD COPPER speed: 10 Mbps now, 0 Mbps max
> 6(br0-arp): addr:a6:3e:f0:db:76:c6 config: NO_FLOOD state: 0 speed: 0 Mbps now, 0 Mbps max
> 7(br0-dhcp): addr:62:4d:cc:2f:33:b4 config: NO_FLOOD state: 0 speed: 0 Mbps now, 0 Mbps max
> 37(tap0): addr:4a:37:5e:99:48:b7 config: 0 state: 0 current: 10MB-FD COPPER speed: 10 Mbps now, 0 Mbps max
> LOCAL(br0): addr:24:6e:96:47:6d:0a config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max
> ovs-ofctl dump-ports br0
> ------------------------
> OFPST_PORT reply (xid=0x2): 9 ports
> port LOCAL: rx pkts=2908711, bytes=942823314, drop=3031, errs=0, frame=0, over=0, crc=0 tx pkts=2527350, bytes=909830850, drop=0, errs=0, coll=0

This is br0. Is it up or down? If it's down, the broadcasts for that port
will be dropped and accounted.


> port 37: rx pkts=7, bytes=412, drop=0, errs=0, frame=0, over=0, crc=0 tx pkts=16, bytes=1666, drop=50414, errs=0, coll=0

Hard to tell without more info, like flow table and traffic. You might tcpdump
in the host and in the guest to see more, perhaps.

-- 
Flavio



More information about the discuss mailing list