[ovs-discuss] Bonding interface stops working. Link down.

Сергей Фадеев fadeev at novotelecom.ru
Fri Aug 9 11:26:59 UTC 2019


Hello. 

Description of problem:
For no apparent reason the bonding interface stops working. 
Afte "systemctl start openvswitch-switch.service" connection is being restored 

The problem is repeated on different hosts with OS: Ubuntu 18.04.*

The problem is repeated on the next versions of openvswitch:
1. openvswitch-common/openvswitch-switch - 2.9.2-0ubuntu0.18.04.3 
2. openvswitch-common/openvswitch-switch/libopenvswitch  - 2.9.5-1_amd64.deb
3. openvswitch-common/openvswitch-switch/libopenvswitch - 2.10.2-1_amd64.deb

Repeatable on the following BONDing configurations: 4x1G, 3x1G, 2x1G, 2x10G

Bug report in the attachment

The log file /var/log/openvswitch/ovsdb-server.log is empty. Records appear only after restart 
2019-08-07T23:25:01.904Z|00008|vlog|INFO|opened log file /var/log/openvswitch/ovsdb-server.log
2019-08-08T16:52:31.801Z|00003|vlog(monitor)|INFO|opened log file /var/log/openvswitch/ovsdb-server.log
2019-08-08T16:52:31.801Z|00004|daemon_unix(monitor)|INFO|pid 5285 died, exit status 0, exiting
2019-08-08T16:52:32.028Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovsdb-server.log
2019-08-08T16:52:32.034Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.10.2
2019-08-08T16:52:32.223Z|00003|jsonrpc|WARN|unix#4: receive error: Connection reset by peer
2019-08-08T16:52:32.223Z|00004|reconnect|WARN|unix#4: connection dropped (Connection reset by peer)
2019-08-08T16:52:42.041Z|00005|memory|INFO|6628 kB peak resident set size after 10.0 seconds
2019-08-08T16:52:42.041Z|00006|memory|INFO|cells:2929 json-caches:1 monitors:2 sessions:1
2019-08-08T23:25:01.484Z|00007|vlog|INFO|closing log file

File contents /var/log/openvswitch/ovs-ctl.log
Thu Aug  8 16:52:27 UTC 2019:stop
2019-08-08T16:52:28Z|00001|fatal_signal|WARN|terminating with signal 14 (Alarm clock)
Alarm clock
2019-08-08T16:52:29Z|00001|fatal_signal|WARN|terminating with signal 14 (Alarm clock)
Alarm clock
 * Exiting ovs-vswitchd (5303)
 * Killing ovs-vswitchd (5303)
 * Exiting ovsdb-server (5285)
Thu Aug  8 16:52:31 UTC 2019:load-kmod
Thu Aug  8 16:52:31 UTC 2019:start --system-id=random
 * Starting ovsdb-server
 * Configuring Open vSwitch system IDs
 * Starting ovs-vswitchd
 * Enabling remote OVSDB managers


File contents /var/log/openvswitch/ovs-vswitchd.log
2019-08-08T16:51:14.093Z|17076|bond|INFO|bond bond1: shift 6kB of load (with hash 3) from eno4 to eno3 (now carrying 1473kB and 1368kB load, respectively)
2019-08-08T16:51:14.093Z|17077|bond|INFO|bond bond1: shift 6kB of load (with hash 5) from eno4 to eno3 (now carrying 1466kB and 1375kB load, respectively)
2019-08-08T16:51:18.396Z|00001|ovs_rcu(urcu5)|WARN|blocked 1000 ms waiting for revalidator32 to quiesce
2019-08-08T16:51:19.395Z|00002|ovs_rcu(urcu5)|WARN|blocked 2000 ms waiting for revalidator32 to quiesce
2019-08-08T16:51:21.395Z|00003|ovs_rcu(urcu5)|WARN|blocked 4000 ms waiting for revalidator32 to quiesce
2019-08-08T16:51:25.395Z|00004|ovs_rcu(urcu5)|WARN|blocked 8000 ms waiting for revalidator32 to quiesce
2019-08-08T16:51:33.395Z|00005|ovs_rcu(urcu5)|WARN|blocked 16000 ms waiting for revalidator32 to quiesce
2019-08-08T16:51:34.030Z|17078|ovs_rcu|WARN|blocked 1000 ms waiting for revalidator32 to quiesce
2019-08-08T16:51:35.030Z|17079|ovs_rcu|WARN|blocked 2001 ms waiting for revalidator32 to quiesce
2019-08-08T16:51:37.030Z|17080|ovs_rcu|WARN|blocked 4000 ms waiting for revalidator32 to quiesce
2019-08-08T16:51:41.030Z|17081|ovs_rcu|WARN|blocked 8000 ms waiting for revalidator32 to quiesce
2019-08-08T16:51:49.030Z|17082|ovs_rcu|WARN|blocked 16000 ms waiting for revalidator32 to quiesce
2019-08-08T16:51:49.396Z|00006|ovs_rcu(urcu5)|WARN|blocked 32001 ms waiting for revalidator32 to quiesce
2019-08-08T16:52:05.029Z|17083|ovs_rcu|WARN|blocked 32000 ms waiting for revalidator32 to quiesce
2019-08-08T16:52:21.396Z|00007|ovs_rcu(urcu5)|WARN|blocked 64000 ms waiting for revalidator32 to quiesce
2019-08-08T16:52:31.690Z|17084|fatal_signal|WARN|terminating with signal 15 (Terminated)
2019-08-08T16:52:31.736Z|00003|vlog(monitor)|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
2019-08-08T16:52:31.736Z|00004|daemon_unix(monitor)|INFO|pid 5303 died, killed (Terminated), exiting
2019-08-08T16:52:32.061Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
2019-08-08T16:52:32.064Z|00002|ovs_numa|INFO|Discovered 8 CPU cores on NUMA node 0
2019-08-08T16:52:32.064Z|00003|ovs_numa|INFO|Discovered 8 CPU cores on NUMA node 1
2019-08-08T16:52:32.064Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes and 16 CPU cores
2019-08-08T16:52:32.064Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2019-08-08T16:52:32.064Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2019-08-08T16:52:32.077Z|00007|ofproto_dpif|INFO|system at ovs-system: Datapath supports recirculation
2019-08-08T16:52:32.077Z|00008|ofproto_dpif|INFO|system at ovs-system: VLAN header stack length probed as 2
2019-08-08T16:52:32.078Z|00009|ofproto_dpif|INFO|system at ovs-system: MPLS label stack length probed as 1
2019-08-08T16:52:32.078Z|00010|ofproto_dpif|INFO|system at ovs-system: Datapath supports truncate action
2019-08-08T16:52:32.078Z|00011|ofproto_dpif|INFO|system at ovs-system: Datapath supports unique flow ids
2019-08-08T16:52:32.078Z|00012|ofproto_dpif|INFO|system at ovs-system: Datapath does not support clone action
2019-08-08T16:52:32.078Z|00013|ofproto_dpif|INFO|system at ovs-system: Max sample nesting level probed as 10
2019-08-08T16:52:32.078Z|00014|ofproto_dpif|INFO|system at ovs-system: Datapath supports eventmask in conntrack action
2019-08-08T16:52:32.078Z|00015|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_clear action
2019-08-08T16:52:32.078Z|00016|ofproto_dpif|INFO|system at ovs-system: Max dp_hash algorithm probed to be 0
2019-08-08T16:52:32.078Z|00017|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_state
2019-08-08T16:52:32.078Z|00018|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_zone
2019-08-08T16:52:32.078Z|00019|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_mark
2019-08-08T16:52:32.078Z|00020|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_label
2019-08-08T16:52:32.078Z|00021|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_state_nat
2019-08-08T16:52:32.078Z|00022|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_orig_tuple
2019-08-08T16:52:32.078Z|00023|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_orig_tuple6
2019-08-08T16:52:32.094Z|00001|ofproto_dpif_upcall(handler2)|INFO|received packet on unassociated datapath port 11
2019-08-08T16:52:32.094Z|00001|ofproto_dpif_upcall(handler1)|INFO|received packet on unassociated datapath port 11
2019-08-08T16:52:32.094Z|00002|ofproto_dpif_upcall(handler2)|INFO|received packet on unassociated datapath port 44
2019-08-08T16:52:32.097Z|00001|ofproto_dpif_upcall(handler3)|INFO|received packet on unassociated datapath port 11
2019-08-08T16:52:32.101Z|00002|ofproto_dpif_upcall(handler1)|INFO|received packet on unassociated datapath port 11
2019-08-08T16:52:32.142Z|00024|dpif_netlink|INFO|dpif_netlink_meter_transact get failed
2019-08-08T16:52:32.142Z|00025|dpif_netlink|INFO|The kernel module has a broken meter implementation.
2019-08-08T16:52:32.146Z|00026|bridge|INFO|bridge ovsbr0: added interface one-925-1 on port 170


Log of record from a switch
{date} {name} lacpd[1269]: LACPD_TIMEOUT: ge-0/0/7: lacp current while timer expired current Receive State: CURRENT
{date} {name} /kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd ge-0/0/7 - ATTACHED state - acting as standby link
{date} {name} lacpd[1269]: LACPD_TIMEOUT: ge-0/0/0: lacp current while timer expired current Receive State: CURRENT
{date} {name} /kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd ge-0/0/0 - ATTACHED state - acting as standby link
{date} {name} lacpd[1269]: LACPD_TIMEOUT: ge-0/0/3: lacp current while timer expired current Receive State: CURRENT
{date} {name}/kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd ge-0/0/3 - ATTACHED state - acting as standby link
{date} {name} lacpd[1269]: LACPD_TIMEOUT: ge-0/0/4: lacp current while timer expired current Receive State: CURRENT
{date} {name} /kernel: ae_bundlestate_ifd_change: bundle ae9: bundle IFD minimum links not met 0 < 1
{date} {name} /kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd ge-0/0/4 - ATTACHED state - acting as standby link
{date} {name} lacpd[1269]: LACP_INTF_DOWN: ae9: Interface marked down due to lacp timeout on member ge-0/0/4
{date} {name} mib2d[1256]: SNMP_TRAP_LINK_DOWN: ifIndex 906, ifAdminStatus up(1), ifOperStatus down(2), ifName ae9
{date} {name} /kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd ge-0/0/7 - DETACHED state - will not carry traffic
{date} {name} /kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd ge-0/0/0 - DETACHED state - will not carry traffic
{date} {name} /kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd ge-0/0/3 - DETACHED state - will not carry traffic
{date} {name} /kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd ge-0/0/4 - DETACHED state - will not carry traffic

Seting  bonding on switch 
EX3200-48T
set interfaces ae11 flexible-vlan-tagging
set interfaces ae11 encapsulation flexible-ethernet-services
set interfaces ae11 aggregated-ether-options link-speed 1g
set interfaces ae11 aggregated-ether-options lacp active

Seting opevswitch 
ovs-vsctl add-br ovsbr0
ovs-vsctl add-bond ovsbr0 bond1 eno1 eno2 eno3 eno4
ovs-vsctl set port bond1 lacp=active
ovs-vsctl set port bond1 bond_mode=balance-tcp
ovs-vsctl set port ovsbr0 tag=116


Sincerely, Fadeev Sergey.
Engineer
Engineering Department
LLC Novotelecom, tel: (383) 209-00-00
Int. tel .: 25462
t.me/barte1by
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bug-report-20190622135242.tar.gz
Type: application/x-compressed-tar
Size: 7078208 bytes
Desc: not available
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20190809/916484bc/attachment-0001.bin>


More information about the discuss mailing list