[ovs-discuss] About CPU usage and packet size

Tue Jun 14 06:18:23 UTC 2011

On Jun 13, 2011, at 10:23 PM, Victor T. wrote:

> So i would like to ask now more specifically about the CPU usage of the OpenVSwitch and the different types of packet size in a flow.
> 
> My test setup consists of 3 PCs, where 1 sends data to 3 by 2(OpenVSwitch,Nox). The traffic was generated by Iperf UDP.
> 
> I tested with datagram sizes of 100 bytes and 1470 bytes, and the difference i've noticed is that in the CPU monitor, when i use a 50Mb/s bandwidth and 100 bytes of datagram, the CPU usage meter rises up and stays high for a few seconds, and then drops and becomes "normal", even with the flow passing through it.
> 
> With the same BW and 1470 byte-datagram, the CPU usage meter didn't rise much, and as the other one in a few seconds you just could say its there...
> 
> Is this normal?

As I mentioned on the OpenFlow list, my guess is that your small packets are going at such a high rate that many of them are going up to userspace (ovs-vswitchd) before the kernel entries can be added.  The CPU drop probably occurs once the flows are in the kernel.  My guess is that if you ran the same test with TCP, you wouldn't see the same CPU spike, since the packets won't start blasting until after the 3-way handshake has completed.  

Section 4 of this paper discusses OVS's design and some of its implications:

	http://openvswitch.org/papers/hotnets2009.pdf

> I was trying to measure how much CPU the OpenVSwitch is consuming for each type of flow, do you guys think its possible?

You can help determine if the problem is due to packets being sent to userspace by running the "ovs-dpctl show" command with the different sizes datagrams.  For example:

-=-=-=-=-=-=-=-=-
[root at stumpjump ~]# ovs-dpctl show
system at xenbr0:
	lookups: frags:0, hit:116519019, missed:1663188, lost:213
	port 0: xenbr0 (internal)
	port 1: eth0
	port 2: vif20.1
-=-=-=-=-=-=-=-=-

The "hit" field counts the number of packets that matched a kernel flow entry.  The "missed" field counts the number of packets that didn't match a kernel flow entry and were sent to userspace.  The "lost" field counts the number of packets that were dropped due to the "miss" queue being full.  If the "missed" or "lost" count are particularly high, this is probably the source of the CPU spike.  In real network traffic, flows don't usually start blasting at full rate immediately, so we haven't seen the design be too much of an issue.

--Justin