[ovs-discuss] ovs-vswitchd cpu always 100%, slave's change_seq value can't be updated.

Fri Feb 17 10:27:29 UTC 2017

I use openvswitch-2.3.1 ,when my server run serveral months, ovs-vswitchd
run at 100%.

ovs-vswitchd.log:
2017-02-16T19:14:41.899Z|91933|poll_loop|INFO|wakeup due to 0-ms timeout at
lib/seq.c:179 (100% CPU usage)
2017-02-16T19:14:47.899Z|91934|poll_loop|INFO|Dropped 258113 log messages
in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2017-02-16T19:14:47.899Z|91935|poll_loop|INFO|wakeup due to 0-ms timeout at
lib/seq.c:179 (100% CPU usage)

I did some research:
1. when I remove bond device(eth1, eth3), the cpu goes to 0%
2. when I add the bond device(eth1, eth3),use "ovs-vsctl add-bond br-bond1
bond1 eth1 eth3", the cpu goes to 100%

so I believe,the bond device cause this problem.

Then I use gdb to attach the running process.After some research.
I create my gdb script:

b ofproto/bond.c:655
commands
print seq_read(connectivity_seq_get())
print slave->change_seq
c
end
c

the code in line 655:
653   HMAP_FOR_EACH (slave, hmap_node, &bond->slaves) {
654       bond_link_status_update(slave);
655       slave->change_seq = seq_read(connectivity_seq_get());
656   }

but GDB output is strange,such as :
$1 = 24375903
$2 = 560359

$3 = 24375903
$4 = 560359

The result seems like "slave->change_seq = seq_read(connectivity_seq_get());"
can't set seq_read(connectivity_seq_get()) to value slave->change_seq.

if this is true , then cpu will always run 100%, because the seq_wait call
will always let the "poll_loop" in main function return immediatly.

Does this could happen? or my gdb script is wrong?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20170217/d52a2e90/attachment.html>