[ovs-dev] Userspace Netlink MMAP status

Thomas Graf tgraf at redhat.com
Tue May 6 10:18:20 UTC 2014

On 05/01/2014 06:09 PM, Zoltan Kiss wrote:
> On 29/04/14 17:36, Thomas Graf wrote:
>> On Tue, Apr 29, 2014 at 05:17:07PM +0100, Zoltan Kiss wrote:
>>> On 23/04/14 22:56, Thomas Graf wrote:
>>>> On 04/23/2014 10:12 PM, Ethan Jackson wrote:
>>>>> The problem has actually gotten worse since we've gotten rid of the
>>>>> dispatcher thread.  Now each thread has it's own channel per port.
>>>>> I wonder if the right approach is to simply ditch the per-port
>>>>> fairness in the case where mmap netlink is enabled.  I.E. we simply
>>>>> have one channel per thread and call it a day.
>>>>> Anyways, I don't have a lot of context on this thread, so take
>>>>> everything above with a grain of salt.
>>>> I agree with Ethan's statement. Even with a reduced frame size the cost
>>>> of an individual ring buffer per port is likely still too large and
>>>> we lose the benefit of zerocopy for large packets which are typically
>>>> the expensive packets.
>>> My expectation is that such large packets shouldn't go to the
>>> userspace very often, as ideally the TCP handshake packets already
>>> established the flow. Do you have a use case where this is not true?
>> The common use case is a flow expiring during the lifetime of a TCP
>> connection. It will result in multiple data packets being sent upwards.
>> It's much less likely in the megaflows era though.
>>>> As we extend the GSO path into the upcall and make use of the new DPDK
>>>> style ofpbuf to avoid the memcpy() for the mmap case
>>> Can you elaborate a bit more on this?
>> The current upcall code does segmentation which is not required and is
>> expensive for the above mentioned case. A single 64K GSO packet will
>> automatically result in up to 50 upcalls.
>> Also, right now, the first thing we do in the mmap case is copy the
>> buffer into an ofpbuf. This is not required at all and the copy is
>> expensive,  instead, we should make use of the shared memory just like
>> in the DPDK case and only release the buffer after the packet has been
>> fully processed.
> So you suggest userspace should directly access the linear buffer and
> the frags, instead of copying them into the shared buffer?

That would be step three. The first intermediate step I suggest is to
have ofpbuf point to the shared buffer instead of allocating new space
for the ofpbuf data just like DPDK does.

An API that would allow nlmmap to refer to the DMA buffer directly does
not exist yet but is definitely desirable. Given that, the cost of an
upcall would be reduced to the cost of a context switch which can be
further reduced by pushing batches or 64K GSO frames.

More information about the dev mailing list