[ovs-discuss] High CPU Usage by ovs-vswitchd and resulting packet loss

Kaushal Shubhank kshubhank at gmail.com
Tue Jun 5 07:19:37 UTC 2012


Surely we will try the 1.7.0 version. Considering this is production, we
will be able to try this in off-peak hours. We will update you with the
results as soon as possible.

Thanks a lot and looking forward to contribute to the project in any way
possible.

Kaushal

On Tue, Jun 5, 2012 at 12:36 PM, Justin Pettit <jpettit at nicira.com> wrote:

> Of your nearly 12,000 flows, over 10,000 had fewer than four packets:
>
> [jpettit at timber-2 Desktop] grep -e "packets:[0123]," live_flows_20120604
>  |wc -l
>   10143
>
> Short-lived flows are really difficult for OVS, since there's a lot of
> overhead in setting up and maintaining the kernel flow table.  We made
> *substantial* improvements for handling just this scenario in the
> forthcoming 1.7.0 release.  The code should be stable, but it hasn't gone
> through a full QA regression.  However, if you're willing to give it a
> shot, you can download a snapshot of the tip of the 1.7 branch:
>
>
> http://openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=snapshot;h=04a67c083458784d1fed689bcb7ed904026d2352;sf=tgz
>
> We've only been able to test it with generated traffic, so seeing how much
> it improves performance with real traffic would be invaluable.  If you're
> able to give it a try and let us know, we'd really appreciate it.
>
> --Justin
>
>
> On Jun 4, 2012, at 11:39 PM, Kaushal Shubhank wrote:
>
> > Hi Justin,
> >
> > This is how the connections are made, so I guess eth3 and eth4 are not
> in the same network segment.
> > Router--->eth4==eth3--->switch
> >
> > We tried with eviction threshold 10000, but were seeing high packet
> losses. I am pasting a few kernel flows (ovs-dpct dump-flows) here, and
> attaching the whole dump (11k flows). I don't see any pattern. The port 80
> filtering flows were around 800 in the 11k flows, that means other flows
> were just non-port 80 packets which we just send from eth3 to eth4 or
> vice-versa.
> >
> > If there is any way we reduce those (11k - 800) flows, we could reduce
> CPU usage.
> >
> >
> in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=203.188.231.195,dst=1.2.138.199,proto=17,tos=0,ttl=127,frag
> > =no),udp(src=62294,dst=16464), packets:1, bytes:60, used:3.170s,
> actions:2
> >
> in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=94.194.158.115,dst=110.172.18.250,proto=6,tos=0,ttl=22,frag
> > =no),tcp(src=62760,dst=47868), packets:0, bytes:0, used:never, actions:1
> >
> in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=203.188.231.134,dst=209.85.148.139,proto=6,tos=0,ttl=126,frag=no),tcp(src=64741,dst=80),
> packets:1, bytes:60, used:2.850s,
> actions:set(eth(src=00:15:17:44:03:6e,dst=00:e0:ed:15:24:4a)),0
> >
> in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=110.172.18.137,dst=219.90.100.27,proto=6,tos=0,ttl=127,frag=no),tcp(src=49504,dst=12758),
> packets:67603, bytes:4060369, used:0.360s, actions:2
> >
> in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=189.63.179.72,dst=203.188.231.195,proto=17,tos=0,ttl=110,frag=no),udp(src=60414,dst=16464),
> packets:1, bytes:60, used:0.620s, actions:1
> >
> in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=213.57.230.226,dst=110.172.18.8,proto=17,tos=0,ttl=101,frag=no),udp(src=59274,dst=24844),
> packets:0, bytes:0, used:never, actions:1
> >
> in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=195.35.128.105,dst=110.172.18.250,proto=6,tos=0,ttl=15,frag=no),tcp(src=54303,dst=47868),
> packets:3, bytes:222, used:5.300s, actions:2
> >
> in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=110.172.18.154,dst=76.186.139.105,proto=6,tos=0,ttl=126,frag=no),tcp(src=10369,dst=61585),
> packets:1, bytes:60, used:0.290s, actions:2
> >
> in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=78.92.118.9,dst=110.172.18.80,proto=17,tos=0,ttl=23,frag=no),udp(src=44779,dst=59357),
> packets:0, bytes:0, used:never, actions:2
> >
> in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=89.216.130.134,dst=203.188.231.206,proto=17,tos=0,ttl=33,frag=no),udp(src=52342,dst=30291),
> packets:0, bytes:0, used:never, actions:1
> >
> in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=76.226.72.157,dst=110.172.18.250,proto=6,tos=0,ttl=36,frag=no),tcp(src=46637,dst=47868),
> packets:2, bytes:148, used:2.730s, actions:1
> >
> in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=89.211.162.95,dst=110.172.18.80,proto=17,tos=0,ttl=92,frag=no),udp(src=19442,dst=59357),
> packets:0, bytes:0, used:never, actions:2
> >
> in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=86.179.231.157,dst=110.172.18.11,proto=17,tos=0,ttl=109,frag=no),udp(src=58240,dst=23813),
> packets:7, bytes:1181, used:1.700s, actions:1
> >
> in_port(2),eth(src=e8:b7:48:42:5b:09,dst=00:15:17:44:03:6e),eth_type(0x0800),ipv4(src=72.201.71.66,dst=203.188.231.195,proto=17,tos=0,ttl=115,frag=no),udp(src=1025,dst=16464),
> packets:1, bytes:60, used:2.620s, actions:1
> >
> in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=95.165.107.21,dst=110.172.18.80,proto=17,tos=0,ttl=96,frag=no),udp(src=49400,dst=59357),
> packets:1, bytes:72, used:3.360s, actions:2
> >
> in_port(1),eth(src=00:15:17:44:03:6e,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=110.172.18.203,dst=212.96.161.246,proto=6,tos=0,ttl=127,frag=no),tcp(src=49172,dst=80),
> packets:2, bytes:735, used:0.240s,
> actions:set(eth(src=00:15:17:44:03:6e,dst=00:e0:ed:15:24:4a)),0
> >
> in_port(0),eth(src=00:e0:ed:15:24:4a,dst=e8:b7:48:42:5b:09),eth_type(0x0800),ipv4(src=203.188.231.54,dst=111.119.15.31,proto=6,tos=0,ttl=64,frag=no),tcp(src=47463,dst=80),
> packets:6, bytes:928, used:4.440s, actions:2
> >
> > Thanks,
> > Kaushal
> >
> > On Tue, Jun 5, 2012 at 11:29 AM, Justin Pettit <jpettit at nicira.com>
> wrote:
> > Are eth3 and eth4 on the same network segment?  If so, I'd guess you've
> introduced a loop.
> >
> > I wouldn't recommend setting your evection threshold so high, since OVS
> is going to have to do a lot of work to maintain so many kernel flows.  I
> wouldn't go above 10s of thousands of flows.  What do your kernel flows
> look like?  You have too many to post here, but maybe you can provide a
> sampling of a couple hundred.  Do you see any patterns?
> >
> > --Justin
> >
> >
> > On Jun 4, 2012, at 10:40 PM, Kaushal Shubhank wrote:
> >
> > > Hello,
> > >
> > > We have a simple setup in which a server running a transparent proxy
> needs to intercept the http port 80 data. We have installed openvswitch
> (1.4.1) in the same server (running Ubuntu-natty 2.6.38-12-server 64bit) to
> feed the proxy with the corresponding type of packets while bridging all
> other types of packets. The functionality is working properly but the CPU
> usage is quite high (~30% for 20mbps traffic). The total load we need to
> deploy on is around 350mbps, and as soon as we plug in, the CPU usage
> shoots up to 100% (on a quad core Intel(R) Xeon(R) CPU E5420  @ 2.50GHz),
> even when only allowing all the packets to flow through br0. Packet loss
> also starts to occur.
> > >
> > > After reading similar discussions on previous threads I made my bridge
> stp-enabled and increased the flow-eviction-threshold to "1000000". Still
> the CPU load is high due to misses in kernel flow table. I have defined
> only the following flows:
> > >
> > > $ ovs-ofctl dump-flows br0
> > >
> > > NXST_FLOW reply (xid=0x4):
> > >  cookie=0x0, duration=80105.621s, table=0, n_packets=61978784,
> n_bytes=7438892513, priority=100,tcp,in_port=1,tp_dst=80
> actions=mod_dl_dst:00:e0:ed:15:24:4a,LOCAL
> > >  cookie=0x0, duration=80105.501s, table=0, n_packets=49343241,
> n_bytes=113922939324, priority=100,tcp,dl_src=00:e0:ed:15:24:4a,tp_src=80
> actions=output:1
> > >  cookie=0x0, duration=518332.577s, table=0, n_packets=3052099665,
> n_bytes=2041603012562, priority=0 actions=NORMAL
> > >  cookie=0x0, duration=80105.586s, table=0, n_packets=46209782,
> n_bytes=109671221356, priority=100,tcp,in_port=2,tp_src=80
> actions=mod_dl_dst:00:e0:ed:15:24:4a,LOCAL
> > >  cookie=0x0, duration=80105.601s, table=0, n_packets=40389137,
> n_bytes=5660094662, priority=100,tcp,dl_src=00:e0:ed:15:24:4a,tp_dst=80
> actions=output:2
> > >
> > > where 00:e0:ed:15:24:4a is br0's MAC address
> > >
> > > $ ovs-dpctl show
> > >
> > > system at br0:
> > >       lookups: hit:3105457869 missed:792488043 lost:903955 {these lost
> packets came with 350mbps load and do not change with 20mbps}
> > >       flows: 12251
> > >       port 0: br0 (internal)
> > >       port 1: eth3
> > >       port 2: eth4
> > >
> > > As far as we could understand, the missed packets here cause context
> switch to user-mode and increase CPU usage. Let me know if any other detail
> about the setup is required.
> > >
> > > Is there anything else we can do to reduce CPU usage?
> > > Can the flows above be improved in some way?
> > > Is there any other configuration for deployment in production that we
> missed?
> > >
> > > Regards,
> > > Kaushal
> > > _______________________________________________
> > > discuss mailing list
> > > discuss at openvswitch.org
> > > http://openvswitch.org/mailman/listinfo/discuss
> >
> >
> > <flows.tgz>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openvswitch.org/pipermail/ovs-discuss/attachments/20120605/5ffb08a6/attachment.html>


More information about the discuss mailing list