[ovs-discuss] High CPU Usage by ovs-vswitchd and resulting packet loss
Oliver Francke
Oliver.Francke at filoo.de
Wed Jun 6 09:52:52 UTC 2012
Hi Kaushal,
thanks for your first impressions. My next change-window is in two days,
will put the current version on one of 5 nodes.
I'ill establish a small script, which monitors memory-usages, CPU-load,
no of flows etc...
@Justin: Any other recommendations?
If it's worth, I could try to start a new thread, but talking about high
CPU-load, how do you all handle something like SYN-FLOOD attacks and
stuff like that?
Thnx in at vance,
Oliver.
On 06/06/2012 09:09 AM, Kaushal Shubhank wrote:
> Hi Justin, Oliver,
>
> So I switched to the newer version around 11 hrs ago. Here are some
> observations:
>
> 1. Number of flows has come down to a couple of thousands (from
> 12-15k). However we might wait for the setup to run for one whole day
> to see through peak and lean times and then count the flows again.
>
> 2. The flows have lesser percentage of lower packet count. I have
> attached a dump for reference.
>
> 3. The CPU usage is still around the same, that means we still see
> misses in kernel flow tables.
>
> $ sudo ovs-dpctl dump-flows br0 | grep -e "packets:[0123]," | wc -l
> 764
> $ sudo ovs-dpctl show
> system at br0:
> lookups: hit:117426873 missed:87741549 lost:0
> flows: 2145
> port 0: br0 (internal)
> port 1: eth3
> port 2: eth4
>
> - Kaushal
>
> On Tue, Jun 5, 2012 at 12:49 PM, Kaushal Shubhank <kshubhank at gmail.com
> <mailto:kshubhank at gmail.com>> wrote:
>
> Surely we will try the 1.7.0 version. Considering this is
> production, we will be able to try this in off-peak hours. We will
> update you with the results as soon as possible.
>
> Thanks a lot and looking forward to contribute to the project in
> any way possible.
>
> Kaushal
>
>
> On Tue, Jun 5, 2012 at 12:36 PM, Justin Pettit <jpettit at nicira.com
> <mailto:jpettit at nicira.com>> wrote:
>
> Of your nearly 12,000 flows, over 10,000 had fewer than four
> packets:
>
> [jpettit at timber-2 Desktop] grep -e "packets:[0123],"
> live_flows_20120604 |wc -l
> 10143
>
> Short-lived flows are really difficult for OVS, since there's
> a lot of overhead in setting up and maintaining the kernel
> flow table. We made *substantial* improvements for handling
> just this scenario in the forthcoming 1.7.0 release. The code
> should be stable, but it hasn't gone through a full QA
> regression. However, if you're willing to give it a shot, you
> can download a snapshot of the tip of the 1.7 branch:
>
> http://openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=snapshot;h=04a67c083458784d1fed689bcb7ed904026d2352;sf=tgz
>
> We've only been able to test it with generated traffic, so
> seeing how much it improves performance with real traffic
> would be invaluable. If you're able to give it a try and let
> us know, we'd really appreciate it.
>
> --Justin
>
>
> On Jun 4, 2012, at 11:39 PM, Kaushal Shubhank wrote:
>
> > Hi Justin,
> >
> > This is how the connections are made, so I guess eth3 and
> eth4 are not in the same network segment.
> > Router--->eth4==eth3--->switch
> >
> > We tried with eviction threshold 10000, but were seeing high
> packet losses. I am pasting a few kernel flows (ovs-dpct
> dump-flows) here, and attaching the whole dump (11k flows). I
> don't see any pattern. The port 80 filtering flows were around
> 800 in the 11k flows, that means other flows were just
> non-port 80 packets which we just send from eth3 to eth4 or
> vice-versa.
> >
> > If there is any way we reduce those (11k - 800) flows, we
> could reduce CPU usage.
> >
> >
> in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=203.188.231.195,dst=1.2.138.199,proto=17,tos=0,ttl=127,frag
> > =no),udp(src=62294,dst=16464), packets:1, bytes:60,
> used:3.170s, actions:2
> >
> in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=94.194.158.115,dst=110.172.18.250,proto=6,tos=0,ttl=22,frag
> > =no),tcp(src=62760,dst=47868), packets:0, bytes:0,
> used:never, actions:1
> >
> in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=203.188.231.134,dst=209.85.148.139,proto=6,tos=0,ttl=126,frag=no),tcp(src=64741,dst=80),
> packets:1, bytes:60, used:2.850s,
> actions:set(eth(src=00:15:17:44:03:6e,dst=00:e0:ed:15:24:4a)),0
> >
> in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=110.172.18.137,dst=219.90.100.27,proto=6,tos=0,ttl=127,frag=no),tcp(src=49504,dst=12758),
> packets:67603, bytes:4060369, used:0.360s, actions:2
> >
> in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=189.63.179.72,dst=203.188.231.195,proto=17,tos=0,ttl=110,frag=no),udp(src=60414,dst=16464),
> packets:1, bytes:60, used:0.620s, actions:1
> >
> in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=213.57.230.226,dst=110.172.18.8,proto=17,tos=0,ttl=101,frag=no),udp(src=59274,dst=24844),
> packets:0, bytes:0, used:never, actions:1
> >
> in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=195.35.128.105,dst=110.172.18.250,proto=6,tos=0,ttl=15,frag=no),tcp(src=54303,dst=47868),
> packets:3, bytes:222, used:5.300s, actions:2
> >
> in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=110.172.18.154,dst=76.186.139.105,proto=6,tos=0,ttl=126,frag=no),tcp(src=10369,dst=61585),
> packets:1, bytes:60, used:0.290s, actions:2
> >
> in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=78.92.118.9,dst=110.172.18.80,proto=17,tos=0,ttl=23,frag=no),udp(src=44779,dst=59357),
> packets:0, bytes:0, used:never, actions:2
> >
> in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=89.216.130.134,dst=203.188.231.206,proto=17,tos=0,ttl=33,frag=no),udp(src=52342,dst=30291),
> packets:0, bytes:0, used:never, actions:1
> >
> in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=76.226.72.157,dst=110.172.18.250,proto=6,tos=0,ttl=36,frag=no),tcp(src=46637,dst=47868),
> packets:2, bytes:148, used:2.730s, actions:1
> >
> in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=89.211.162.95,dst=110.172.18.80,proto=17,tos=0,ttl=92,frag=no),udp(src=19442,dst=59357),
> packets:0, bytes:0, used:never, actions:2
> >
> in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=86.179.231.157,dst=110.172.18.11,proto=17,tos=0,ttl=109,frag=no),udp(src=58240,dst=23813),
> packets:7, bytes:1181, used:1.700s, actions:1
> >
> in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=72.201.71.66,dst=203.188.231.195,proto=17,tos=0,ttl=115,frag=no),udp(src=1025,dst=16464),
> packets:1, bytes:60, used:2.620s, actions:1
> >
> in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=95.165.107.21,dst=110.172.18.80,proto=17,tos=0,ttl=96,frag=no),udp(src=49400,dst=59357),
> packets:1, bytes:72, used:3.360s, actions:2
> >
> in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=110.172.18.203,dst=212.96.161.246,proto=6,tos=0,ttl=127,frag=no),tcp(src=49172,dst=80),
> packets:2, bytes:735, used:0.240s,
> actions:set(eth(src=00:15:17:44:03:6e,dst=00:e0:ed:15:24:4a)),0
> >
> in_port(0),eth(src=00:e0:ed:15:24:4a,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=203.188.231.54,dst=111.119.15.31,proto=6,tos=0,ttl=64,frag=no),tcp(src=47463,dst=80),
> packets:6, bytes:928, used:4.440s, actions:2
> >
> > Thanks,
> > Kaushal
> >
> > On Tue, Jun 5, 2012 at 11:29 AM, Justin Pettit
> <jpettit at nicira.com <mailto:jpettit at nicira.com>> wrote:
> > Are eth3 and eth4 on the same network segment? If so, I'd
> guess you've introduced a loop.
> >
> > I wouldn't recommend setting your evection threshold so
> high, since OVS is going to have to do a lot of work to
> maintain so many kernel flows. I wouldn't go above 10s of
> thousands of flows. What do your kernel flows look like? You
> have too many to post here, but maybe you can provide a
> sampling of a couple hundred. Do you see any patterns?
> >
> > --Justin
> >
> >
> > On Jun 4, 2012, at 10:40 PM, Kaushal Shubhank wrote:
> >
> > > Hello,
> > >
> > > We have a simple setup in which a server running a
> transparent proxy needs to intercept the http port 80 data. We
> have installed openvswitch (1.4.1) in the same server (running
> Ubuntu-natty 2.6.38-12-server 64bit) to feed the proxy with
> the corresponding type of packets while bridging all other
> types of packets. The functionality is working properly but
> the CPU usage is quite high (~30% for 20mbps traffic). The
> total load we need to deploy on is around 350mbps, and as soon
> as we plug in, the CPU usage shoots up to 100% (on a quad core
> Intel(R) Xeon(R) CPU E5420 @ 2.50GHz), even when only
> allowing all the packets to flow through br0. Packet loss also
> starts to occur.
> > >
> > > After reading similar discussions on previous threads I
> made my bridge stp-enabled and increased the
> flow-eviction-threshold to "1000000". Still the CPU load is
> high due to misses in kernel flow table. I have defined only
> the following flows:
> > >
> > > $ ovs-ofctl dump-flows br0
> > >
> > > NXST_FLOW reply (xid=0x4):
> > > cookie=0x0, duration=80105.621s, table=0,
> n_packets=61978784, n_bytes=7438892513,
> priority=100,tcp,in_port=1,tp_dst=80
> actions=mod_dl_dst:00:e0:ed:15:24:4a,LOCAL
> > > cookie=0x0, duration=80105.501s, table=0,
> n_packets=49343241, n_bytes=113922939324,
> priority=100,tcp,dl_src=00:e0:ed:15:24:4a,tp_src=80
> actions=output:1
> > > cookie=0x0, duration=518332.577s, table=0,
> n_packets=3052099665, n_bytes=2041603012562, priority=0
> actions=NORMAL
> > > cookie=0x0, duration=80105.586s, table=0,
> n_packets=46209782, n_bytes=109671221356,
> priority=100,tcp,in_port=2,tp_src=80
> actions=mod_dl_dst:00:e0:ed:15:24:4a,LOCAL
> > > cookie=0x0, duration=80105.601s, table=0,
> n_packets=40389137, n_bytes=5660094662,
> priority=100,tcp,dl_src=00:e0:ed:15:24:4a,tp_dst=80
> actions=output:2
> > >
> > > where 00:e0:ed:15:24:4a is br0's MAC address
> > >
> > > $ ovs-dpctl show
> > >
> > > system at br0:
> > > lookups: hit:3105457869 missed:792488043 lost:903955
> {these lost packets came with 350mbps load and do not change
> with 20mbps}
> > > flows: 12251
> > > port 0: br0 (internal)
> > > port 1: eth3
> > > port 2: eth4
> > >
> > > As far as we could understand, the missed packets here
> cause context switch to user-mode and increase CPU usage. Let
> me know if any other detail about the setup is required.
> > >
> > > Is there anything else we can do to reduce CPU usage?
> > > Can the flows above be improved in some way?
> > > Is there any other configuration for deployment in
> production that we missed?
> > >
> > > Regards,
> > > Kaushal
> > > _______________________________________________
> > > discuss mailing list
> > > discuss at openvswitch.org <mailto:discuss at openvswitch.org>
> > > http://openvswitch.org/mailman/listinfo/discuss
> >
> >
> > <flows.tgz>
>
>
>
>
>
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> http://openvswitch.org/mailman/listinfo/discuss
--
Oliver Francke
filoo GmbH
Moltkestraße 25a
33330 Gütersloh
HRB4355 AG Gütersloh
Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz
Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openvswitch.org/pipermail/ovs-discuss/attachments/20120606/e5a237a3/attachment.html>
More information about the discuss
mailing list