[ovs-dev] [PATCH] RFC: Pass more packet and flow key info to userspace.

Rajahalme, Jarno (NSN - FI/Espoo) jarno.rajahalme at nsn.com
Tue Jan 29 15:10:40 UTC 2013


On Jan 24, 2013, at 19:41 , ext Jesse Gross wrote:

> On Thu, Jan 24, 2013 at 7:34 AM, Jarno Rajahalme
> <jarno.rajahalme at nsn.com> wrote:
>> 
>> On Jan 23, 2013, at 19:30 , ext Jesse Gross wrote:
>> 
>>> On Tue, Jan 22, 2013 at 9:48 PM, Jarno Rajahalme
>>> <jarno.rajahalme at nsn.com> wrote:
>>>> Add OVS_PACKET_ATTR_KEY_INFO to relieve userspace from re-computing
>>>> data already computed within the kernel datapath.  In the typical
>>>> case of an upcall with perfect key fitness between kernel and
>>>> userspace this eliminates flow_extract() and flow_hash() calls in
>>>> handle_miss_upcalls().
>>>> 
>>>> Additional bookkeeping within the kernel datapath is minimal.
>>>> Kernel flow insertion also saves one flow key hash computation.
>>>> 
>>>> Removed setting the packet's l7 pointer for ICMP packets, as this was
>>>> never used.
>>>> 
>>>> Signed-off-by: Jarno Rajahalme <jarno.rajahalme at nsn.com>
>>>> ---
>>>> 
>>>> This likely requires some discussion, but it took a while for me to
>>>> understand why each packet miss upcall would require flow_extract()
>>>> right after the flow key has been obtained from odp attributes.
>>> 
>>> Do you have any performance numbers to share?  Since this is an
>>> optimization it's important to understand if the benefit is worth the
>>> extra complexity.
>> 
>> Not yet, but would be happy to. Any hits towards for the best way of obtaining
>> meaningful numbers for something like this?
> 
> This is a flow setup optimization, so usually something like netperf
> TCP_CRR would be a good way to stress that.
> 
> However, Ben mentioned to me that he had tried eliminating the
> flow_extract() call from userspace in the past and the results were
> disappointing.

I made a simple test, where there is only one flow entry "in_port=LOCAL actions=drop", and only the local port is configured. One process sends UDP packets with different source/destination port combinations in a loop. OVS then tries to cope with the load. During the test both processes run near 100% CPU utilization in a virtual machine on a dual-core laptop. On each round 10100000 packets were generated:

OFPST_PORT reply (xid=0x2): 1 ports
  port LOCAL: rx pkts=10100006, bytes=464600468, drop=0, errs=0, frame=0, over=0, crc=0
           tx pkts=0, bytes=0, drop=0, errs=0, coll=0

With current master 19.35% of packets on average get processed by the flow:

Round 1:
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=29.124s, table=0, n_packets=1959794, n_bytes=90150548, idle_age=4, in_port=LOCAL actions=drop

Round 2:
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=63.534s, table=0, n_packets=1932785, n_bytes=88908158, idle_age=37, in_port=LOCAL actions=drop

Round 3:
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=33.394s, table=0, n_packets=1972389, n_bytes=90729894, idle_age=8, in_port=LOCAL actions=drop


With the proposed change 20.2% of packets on average get processed by the flow:

Round 4:
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=31.96s, table=0, n_packets=2042759, n_bytes=93966914, idle_age=4, in_port=LOCAL actions=drop

Round 5:
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=38.6s, table=0, n_packets=2040224, n_bytes=93850372, idle_age=8, in_port=LOCAL actions=drop

Round 6:
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=35.661s, table=0, n_packets=2039595, n_bytes=93821418, idle_age=3, in_port=LOCAL actions=drop


So there is a consistent benefit, but it is not very large. Seemingly the flow_extract() and flow_hash() represent only a small portion of the OVS flow setup CPU use.

  Jarno




More information about the dev mailing list