[ovs-discuss] High CPU Usage by ovs-vswitchd and resulting packet loss

Oliver Francke Oliver.Francke at filoo.de
Thu Jun 7 10:22:18 UTC 2012


Hi Justin,

thnx for the explanations.

Here is an excerpt of a scenario, when CPU-load goes up, though within our network the lost-figures don't normally change:

20120607-114420: lookups: hit:21149788628 missed:12736368714 lost:210746961 flows: 3280 SHORT_FLOWS=2451, TOP=mem: 19m cpu: 39
20120607-114425: lookups: hit:21149799408 missed:12736372737 lost:210746961 flows: 4654 SHORT_FLOWS=3831, TOP=mem: 19m cpu: 45
20120607-114430: lookups: hit:21149810014 missed:12736374681 lost:210746961 flows: 2769 SHORT_FLOWS=1907, TOP=mem: 19m cpu: 18
20120607-114435: lookups: hit:21149821758 missed:12736378269 lost:210746961 flows: 4160 SHORT_FLOWS=3231, TOP=mem: 19m cpu: 49
20120607-114440: lookups: hit:21149831635 missed:12736381096 lost:210746961 flows: 3871 SHORT_FLOWS=2995, TOP=mem: 19m cpu: 18
20120607-114445: lookups: hit:21149842697 missed:12736384730 lost:210746961 flows: 4099 SHORT_FLOWS=3241, TOP=mem: 19m cpu: 35
20120607-114450: lookups: hit:21149853788 missed:12736387554 lost:210746961 flows: 3769 SHORT_FLOWS=2944, TOP=mem: 19m cpu: 12
20120607-114455: lookups: hit:21149862246 missed:12736389740 lost:210746961 flows: 2589 SHORT_FLOWS=1809, TOP=mem: 19m cpu: 18
20120607-114500: lookups: hit:21149875807 missed:12736392636 lost:210746961 flows: 3311 SHORT_FLOWS=2489, TOP=mem: 19m cpu: 29
20120607-114505: lookups: hit:21149893590 missed:12736396998 lost:210746961 flows: 5187 SHORT_FLOWS=4066, TOP=mem: 19m cpu: 53
20120607-114510: lookups: hit:21149904797 missed:12736402095 lost:210746961 flows: 6230 SHORT_FLOWS=5171, TOP=mem: 19m cpu: 37
20120607-114515: lookups: hit:21149915723 missed:12736407377 lost:210746961 flows: 6054 SHORT_FLOWS=4950, TOP=mem: 19m cpu: 45
20120607-114520: lookups: hit:21149928325 missed:12736412748 lost:210746961 flows: 6422 SHORT_FLOWS=5326, TOP=mem: 19m cpu: 31
20120607-114525: lookups: hit:21149938705 missed:12736415973 lost:210746961 flows: 4072 SHORT_FLOWS=2993, TOP=mem: 19m cpu: 43
20120607-114530: lookups: hit:21149949606 missed:12736422759 lost:210746961 flows: 7633 SHORT_FLOWS=6338, TOP=mem: 19m cpu: 94
20120607-114535: lookups: hit:21149964017 missed:12736452506 lost:210746961 flows: 11739 SHORT_FLOWS=10993, TOP=mem: 19m cpu: 96
20120607-114540: lookups: hit:21149976648 missed:12736480881 lost:210746961 flows: 15925 SHORT_FLOWS=15143, TOP=mem: 19m cpu: 98
20120607-114545: lookups: hit:21149988896 missed:12736508350 lost:210746961 flows: 13592 SHORT_FLOWS=12888, TOP=mem: 19m cpu: 98
20120607-114550: lookups: hit:21150002168 missed:12736538481 lost:210746961 flows: 15581 SHORT_FLOWS=14800, TOP=mem: 19m cpu: 100
20120607-114555: lookups: hit:21150016018 missed:12736566873 lost:210746961 flows: 11541 SHORT_FLOWS=10865, TOP=mem: 19m cpu: 98
20120607-114600: lookups: hit:21150029226 missed:12736594616 lost:210746961 flows: 14313 SHORT_FLOWS=15555, TOP=mem: 19m cpu: 100
20120607-114605: lookups: hit:21150049470 missed:12736623781 lost:210746961 flows: 14113 SHORT_FLOWS=13341, TOP=mem: 19m cpu: 100
20120607-114610: lookups: hit:21150061782 missed:12736651311 lost:210746961 flows: 13490 SHORT_FLOWS=12613, TOP=mem: 19m cpu: 99
20120607-114615: lookups: hit:21150074821 missed:12736677656 lost:210746961 flows: 12209 SHORT_FLOWS=11518, TOP=mem: 19m cpu: 97
20120607-114620: lookups: hit:21150087942 missed:12736704949 lost:210746961 flows: 11863 SHORT_FLOWS=11182, TOP=mem: 19m cpu: 84
20120607-114625: lookups: hit:21150101016 missed:12736731540 lost:210746961 flows: 11214 SHORT_FLOWS=10475, TOP=mem: 19m cpu: 97
20120607-114630: lookups: hit:21150114324 missed:12736758289 lost:210746961 flows: 10456 SHORT_FLOWS=10931, TOP=mem: 19m cpu: 98
20120607-114635: lookups: hit:21150128318 missed:12736785776 lost:210746961 flows: 11338 SHORT_FLOWS=10645, TOP=mem: 19m cpu: 98

this now last for a couple of minutes, then falls down again. Nothing critical so far.

Regards,

Oliver.

Am 07.06.2012 um 09:52 schrieb Justin Pettit:

> On Jun 6, 2012, at 2:52 AM, Oliver Francke wrote:
> 
>> @Justin: Any other recommendations?
> 
> Are you also having many short-lived flows?  If you're in the range I mentioned in my response to Kaushal (roughly 120,000 flow setups per second), then the forthcoming 1.7.0 release may be enough for you.
> 
>> If it's worth, I could try to start a new thread, but talking about high CPU-load, how do you all handle something like SYN-FLOOD attacks and stuff like that?
> 
> Each datapath has 16 queues that connect the kernel to userspace.  We assign each port to one of those queues, which will help prevent a port from starving the other ports.  Our use-case is to prevent one VM from starving out the others.  In Kaushal's case, he using OVS more like a bump-in-the-wire than a vswitch, meaning that he's not concerned with a bad actor at the port level.
> 
> We've got a couple of people traveling this week, but when they get back, I plan to discuss how we may be able to provide finer-grained control over flow setups for vswitch deployments, since our current approach is rather coarse and can lead to queue collisions.  I've also written Kaushal off-line to see if I can get more information about his situation.
> 
> --Justin
> 
> 
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> http://openvswitch.org/mailman/listinfo/discuss




More information about the discuss mailing list