[ovs-discuss] ovs performance on 'worst case scenario' with ovs-vswitchd up to 100%
George Shuklin
george.shuklin at gmail.com
Tue Mar 19 03:50:32 UTC 2013
Good day.
I done some research related to performance of few software bridges, and
here some thoughts.
Hyper-V and ESXi contains software bridges working fine at speeds up to
10G regardless traffic type.
Native linux brtools has no issues with 'bad flows', but stuck at 50k
pps per CPU core.
OVS shows excellent performance on low-flow high-volume traffic like
'iperf' or few tcp flows. I saw at least 9.1Gbit/s and it was limited by
xen's netback, not by ovs.
But when traffic change it shape from 'few huge flow' to 'many small
flows' situation changes drastically.
Following line (mangled slightly against script kiddies):
hping3 -i 500 -s 2 -p ++1 --udp target
cause about 80% CPU utilization by ovs-vswitchd on host receiving traffic.
Reducing latency to lower values cause complete denial of service at
overall traffic volume just about 15-20 Mb/s.
Checked with ovs: 1.0, 1.4.3, latest 1.9 - same behavior (and 1.0 adds
some memory leaks).
I done some research and found reason for that kind of behavior: when
new flow came in, it forwarded by kernel to ovs-vswitchd, wich slowly
analyze it and send back data to kernel with 'kernel flows'.
I thought it should be problem only for 'normal' rule, but following set
of rules actually cause same effect:
ovs-ofctl add-flow xenbr0 'arp action=normal priority=30'
ovs-ofctl add-flow xenbr0 'action=drop priority=20'
As you can see that rule says 'drop every traffic except arp'. And even
with that rule 'hping' line cause huge CPU load on ovs-vswitchd.
I check it in ovs-dpctl dump-flow xenbr0 and got following lines:
in_port(1),eth(src=88:e0:f3:b6:47:f0,dst=56:de:7d:66:ad:28),eth_type(0x0800),ipv4(src=offender,dst=victim,proto=17,tos=0,ttl=63,frag=no),udp(src=54882,dst=54882),
packets:0, bytes:0, used:never, actions:drop
in_port(1),eth(src=88:e0:f3:b6:47:f0,dst=56:de:7d:66:ad:28),eth_type(0x0800),ipv4(src=offender,dst=victim,proto=17,tos=0,ttl=63,frag=no),udp(src=48424,dst=48424),
packets:0, bytes:0, used:never, actions:drop
...
thousands of them.
I clearly see kernel asking ovs-vswitchd about every packet (because of
-p ++1 and -s 2 cause to change source and designation port on every upd
packet).
And ovs-vswitchd replies on every 'sample' packet with exact rules with
very deep L3 set of headers (source and designation ports).
Why ovs-vswitchd can't feed kernel with very simple 'kernel flow' 'drop
all' after some smart and exact rules for arp?
Why UDP is parsed so deep?
Is any way to feed kernel module with fast generic rules without passing
every new flow to the userspace?
This is not theoretical question. I have few clients serving small
static scripts to very large audience, and average tcp session is just
about 300-400 bytes long, but they serves about 3-6k connection per
second. And that really put OVS on the knees.
... And I'm searching any solution to reduce cpu utilization caused by
endless interaction between openvswitch and ovs-vswitchd.
Any help would be appreciated.
More information about the discuss
mailing list