[ovs-discuss] OVS under load causes TCP connection stalls on the lo interface

Andreas Schultz aschultz at tpip.net
Fri Sep 19 11:40:44 UTC 2014


Hi,

I have been observing strange TCP connection aborts on an OVS lately. The TCP
connection are all localhost only. So no external components can be blamed.
tcpdump shows the TCP ACKs where missing and there might have been some data
corruption as well (hard to tell without a proper decoder).

The ovs instance is not configured to touch lo.

After lots of debugging I have been able to find a correlation with an OVS
instance on that host. To reproduce the issue I run netperf on lo like this:

# netperf -l 600 -D 1,second -H localhost

This reports a steady 48527.77 10^6bits/s through on lo. Then I push load
through OVS. My OF controller creates on flow rule per TCP connection going
through the switch. With about 100 new connections per second this loads
the 8 cores to about 50% each. At some random point (mostly within the first
10 seconds of the test), CPU load drops to zero and netperf stalls.

The kernel begins to spill out messages like this:

grep : 1433 callbacks suppressed

With systemtap, I have traced this message to ip_finish_output2 in
net/ipv4/ip_output.c. The skb's at that point have a destination IP
of 0.0.0.0.

Combinations tested:

openvswitch-1.11 on Linux 3.8.13
openvswitch-2.3.0 on Linux 3.14.19
openvswitch-git (2654cc338bfb413a6295078e3a7a8e1d4f67cbcc) on Linux 3.14.19

I seems that under this type of load openvswitch kills traffic through lo.

Any ideas on what to try next?

Andreas
-- 
-- 
Dipl. Inform.
Andreas Schultz



More information about the discuss mailing list