[ovs-discuss] High CPU Usage by ovs-vswitchd and resulting packet loss

Oliver Francke Oliver.Francke at filoo.de
Wed Jun 6 09:52:52 UTC 2012


Hi Kaushal,

thanks for your first impressions. My next change-window is in two days, 
will put the current version on one of 5 nodes.
I'ill establish a small script, which monitors memory-usages, CPU-load, 
no of flows etc...

@Justin: Any other recommendations?

If it's worth, I could try to start a new thread, but talking about high 
CPU-load, how do you all handle something like SYN-FLOOD attacks and 
stuff like that?

Thnx in at vance,

Oliver.


On 06/06/2012 09:09 AM, Kaushal Shubhank wrote:
> Hi Justin, Oliver,
>
> So I switched to the newer version around 11 hrs ago. Here are some 
> observations:
>
> 1. Number of flows has come down to a couple of thousands (from 
> 12-15k). However we might wait for the setup to run for one whole day 
> to see through peak and lean times and then count the flows again.
>
> 2. The flows have lesser percentage of lower packet count. I have 
> attached a dump for reference.
>
> 3. The CPU usage is still around the same, that means we still see 
> misses in kernel flow tables.
>
> $ sudo ovs-dpctl dump-flows br0 | grep -e "packets:[0123]," | wc -l
> 764
> $ sudo ovs-dpctl show
> system at br0:
> lookups: hit:117426873 missed:87741549 lost:0
> flows: 2145
> port 0: br0 (internal)
> port 1: eth3
> port 2: eth4
>
> - Kaushal
>
> On Tue, Jun 5, 2012 at 12:49 PM, Kaushal Shubhank <kshubhank at gmail.com 
> <mailto:kshubhank at gmail.com>> wrote:
>
>     Surely we will try the 1.7.0 version. Considering this is
>     production, we will be able to try this in off-peak hours. We will
>     update you with the results as soon as possible.
>
>     Thanks a lot and looking forward to contribute to the project in
>     any way possible.
>
>     Kaushal
>
>
>     On Tue, Jun 5, 2012 at 12:36 PM, Justin Pettit <jpettit at nicira.com
>     <mailto:jpettit at nicira.com>> wrote:
>
>         Of your nearly 12,000 flows, over 10,000 had fewer than four
>         packets:
>
>         [jpettit at timber-2 Desktop] grep -e "packets:[0123],"
>         live_flows_20120604  |wc -l
>           10143
>
>         Short-lived flows are really difficult for OVS, since there's
>         a lot of overhead in setting up and maintaining the kernel
>         flow table.  We made *substantial* improvements for handling
>         just this scenario in the forthcoming 1.7.0 release.  The code
>         should be stable, but it hasn't gone through a full QA
>         regression.  However, if you're willing to give it a shot, you
>         can download a snapshot of the tip of the 1.7 branch:
>
>         http://openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=snapshot;h=04a67c083458784d1fed689bcb7ed904026d2352;sf=tgz
>
>         We've only been able to test it with generated traffic, so
>         seeing how much it improves performance with real traffic
>         would be invaluable.  If you're able to give it a try and let
>         us know, we'd really appreciate it.
>
>         --Justin
>
>
>         On Jun 4, 2012, at 11:39 PM, Kaushal Shubhank wrote:
>
>         > Hi Justin,
>         >
>         > This is how the connections are made, so I guess eth3 and
>         eth4 are not in the same network segment.
>         > Router--->eth4==eth3--->switch
>         >
>         > We tried with eviction threshold 10000, but were seeing high
>         packet losses. I am pasting a few kernel flows (ovs-dpct
>         dump-flows) here, and attaching the whole dump (11k flows). I
>         don't see any pattern. The port 80 filtering flows were around
>         800 in the 11k flows, that means other flows were just
>         non-port 80 packets which we just send from eth3 to eth4 or
>         vice-versa.
>         >
>         > If there is any way we reduce those (11k - 800) flows, we
>         could reduce CPU usage.
>         >
>         >
>         in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=203.188.231.195,dst=1.2.138.199,proto=17,tos=0,ttl=127,frag
>         > =no),udp(src=62294,dst=16464), packets:1, bytes:60,
>         used:3.170s, actions:2
>         >
>         in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=94.194.158.115,dst=110.172.18.250,proto=6,tos=0,ttl=22,frag
>         > =no),tcp(src=62760,dst=47868), packets:0, bytes:0,
>         used:never, actions:1
>         >
>         in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=203.188.231.134,dst=209.85.148.139,proto=6,tos=0,ttl=126,frag=no),tcp(src=64741,dst=80),
>         packets:1, bytes:60, used:2.850s,
>         actions:set(eth(src=00:15:17:44:03:6e,dst=00:e0:ed:15:24:4a)),0
>         >
>         in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=110.172.18.137,dst=219.90.100.27,proto=6,tos=0,ttl=127,frag=no),tcp(src=49504,dst=12758),
>         packets:67603, bytes:4060369, used:0.360s, actions:2
>         >
>         in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=189.63.179.72,dst=203.188.231.195,proto=17,tos=0,ttl=110,frag=no),udp(src=60414,dst=16464),
>         packets:1, bytes:60, used:0.620s, actions:1
>         >
>         in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=213.57.230.226,dst=110.172.18.8,proto=17,tos=0,ttl=101,frag=no),udp(src=59274,dst=24844),
>         packets:0, bytes:0, used:never, actions:1
>         >
>         in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=195.35.128.105,dst=110.172.18.250,proto=6,tos=0,ttl=15,frag=no),tcp(src=54303,dst=47868),
>         packets:3, bytes:222, used:5.300s, actions:2
>         >
>         in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=110.172.18.154,dst=76.186.139.105,proto=6,tos=0,ttl=126,frag=no),tcp(src=10369,dst=61585),
>         packets:1, bytes:60, used:0.290s, actions:2
>         >
>         in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=78.92.118.9,dst=110.172.18.80,proto=17,tos=0,ttl=23,frag=no),udp(src=44779,dst=59357),
>         packets:0, bytes:0, used:never, actions:2
>         >
>         in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=89.216.130.134,dst=203.188.231.206,proto=17,tos=0,ttl=33,frag=no),udp(src=52342,dst=30291),
>         packets:0, bytes:0, used:never, actions:1
>         >
>         in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=76.226.72.157,dst=110.172.18.250,proto=6,tos=0,ttl=36,frag=no),tcp(src=46637,dst=47868),
>         packets:2, bytes:148, used:2.730s, actions:1
>         >
>         in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=89.211.162.95,dst=110.172.18.80,proto=17,tos=0,ttl=92,frag=no),udp(src=19442,dst=59357),
>         packets:0, bytes:0, used:never, actions:2
>         >
>         in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=86.179.231.157,dst=110.172.18.11,proto=17,tos=0,ttl=109,frag=no),udp(src=58240,dst=23813),
>         packets:7, bytes:1181, used:1.700s, actions:1
>         >
>         in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=72.201.71.66,dst=203.188.231.195,proto=17,tos=0,ttl=115,frag=no),udp(src=1025,dst=16464),
>         packets:1, bytes:60, used:2.620s, actions:1
>         >
>         in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=95.165.107.21,dst=110.172.18.80,proto=17,tos=0,ttl=96,frag=no),udp(src=49400,dst=59357),
>         packets:1, bytes:72, used:3.360s, actions:2
>         >
>         in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=110.172.18.203,dst=212.96.161.246,proto=6,tos=0,ttl=127,frag=no),tcp(src=49172,dst=80),
>         packets:2, bytes:735, used:0.240s,
>         actions:set(eth(src=00:15:17:44:03:6e,dst=00:e0:ed:15:24:4a)),0
>         >
>         in_port(0),eth(src=00:e0:ed:15:24:4a,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=203.188.231.54,dst=111.119.15.31,proto=6,tos=0,ttl=64,frag=no),tcp(src=47463,dst=80),
>         packets:6, bytes:928, used:4.440s, actions:2
>         >
>         > Thanks,
>         > Kaushal
>         >
>         > On Tue, Jun 5, 2012 at 11:29 AM, Justin Pettit
>         <jpettit at nicira.com <mailto:jpettit at nicira.com>> wrote:
>         > Are eth3 and eth4 on the same network segment?  If so, I'd
>         guess you've introduced a loop.
>         >
>         > I wouldn't recommend setting your evection threshold so
>         high, since OVS is going to have to do a lot of work to
>         maintain so many kernel flows.  I wouldn't go above 10s of
>         thousands of flows.  What do your kernel flows look like?  You
>         have too many to post here, but maybe you can provide a
>         sampling of a couple hundred.  Do you see any patterns?
>         >
>         > --Justin
>         >
>         >
>         > On Jun 4, 2012, at 10:40 PM, Kaushal Shubhank wrote:
>         >
>         > > Hello,
>         > >
>         > > We have a simple setup in which a server running a
>         transparent proxy needs to intercept the http port 80 data. We
>         have installed openvswitch (1.4.1) in the same server (running
>         Ubuntu-natty 2.6.38-12-server 64bit) to feed the proxy with
>         the corresponding type of packets while bridging all other
>         types of packets. The functionality is working properly but
>         the CPU usage is quite high (~30% for 20mbps traffic). The
>         total load we need to deploy on is around 350mbps, and as soon
>         as we plug in, the CPU usage shoots up to 100% (on a quad core
>         Intel(R) Xeon(R) CPU E5420  @ 2.50GHz), even when only
>         allowing all the packets to flow through br0. Packet loss also
>         starts to occur.
>         > >
>         > > After reading similar discussions on previous threads I
>         made my bridge stp-enabled and increased the
>         flow-eviction-threshold to "1000000". Still the CPU load is
>         high due to misses in kernel flow table. I have defined only
>         the following flows:
>         > >
>         > > $ ovs-ofctl dump-flows br0
>         > >
>         > > NXST_FLOW reply (xid=0x4):
>         > >  cookie=0x0, duration=80105.621s, table=0,
>         n_packets=61978784, n_bytes=7438892513,
>         priority=100,tcp,in_port=1,tp_dst=80
>         actions=mod_dl_dst:00:e0:ed:15:24:4a,LOCAL
>         > >  cookie=0x0, duration=80105.501s, table=0,
>         n_packets=49343241, n_bytes=113922939324,
>         priority=100,tcp,dl_src=00:e0:ed:15:24:4a,tp_src=80
>         actions=output:1
>         > >  cookie=0x0, duration=518332.577s, table=0,
>         n_packets=3052099665, n_bytes=2041603012562, priority=0
>         actions=NORMAL
>         > >  cookie=0x0, duration=80105.586s, table=0,
>         n_packets=46209782, n_bytes=109671221356,
>         priority=100,tcp,in_port=2,tp_src=80
>         actions=mod_dl_dst:00:e0:ed:15:24:4a,LOCAL
>         > >  cookie=0x0, duration=80105.601s, table=0,
>         n_packets=40389137, n_bytes=5660094662,
>         priority=100,tcp,dl_src=00:e0:ed:15:24:4a,tp_dst=80
>         actions=output:2
>         > >
>         > > where 00:e0:ed:15:24:4a is br0's MAC address
>         > >
>         > > $ ovs-dpctl show
>         > >
>         > > system at br0:
>         > >       lookups: hit:3105457869 missed:792488043 lost:903955
>         {these lost packets came with 350mbps load and do not change
>         with 20mbps}
>         > >       flows: 12251
>         > >       port 0: br0 (internal)
>         > >       port 1: eth3
>         > >       port 2: eth4
>         > >
>         > > As far as we could understand, the missed packets here
>         cause context switch to user-mode and increase CPU usage. Let
>         me know if any other detail about the setup is required.
>         > >
>         > > Is there anything else we can do to reduce CPU usage?
>         > > Can the flows above be improved in some way?
>         > > Is there any other configuration for deployment in
>         production that we missed?
>         > >
>         > > Regards,
>         > > Kaushal
>         > > _______________________________________________
>         > > discuss mailing list
>         > > discuss at openvswitch.org <mailto:discuss at openvswitch.org>
>         > > http://openvswitch.org/mailman/listinfo/discuss
>         >
>         >
>         > <flows.tgz>
>
>
>
>
>
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> http://openvswitch.org/mailman/listinfo/discuss


-- 

Oliver Francke

filoo GmbH
Moltkestraße 25a
33330 Gütersloh
HRB4355 AG Gütersloh

Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz

Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openvswitch.org/pipermail/ovs-discuss/attachments/20120606/e5a237a3/attachment.html>


More information about the discuss mailing list