[ovs-discuss] Flow miss/Packet order question

Jesse Gross jesse at nicira.com
Thu Oct 3 01:20:39 UTC 2013


On Wed, Oct 2, 2013 at 4:49 AM, Dmitry Fleytman <dfleytma at redhat.com> wrote:
>
> On Apr 30, 2012, at 20:15 PM, Ben Pfaff <blp at nicira.com> wrote:
>
>> I think that your explanation stems from a misunderstanding.  Yes, if
>> an OpenFlow controller uses a reactive model, then it cannot avoid the
>> problem.  However, I think that Joji is raising a different issue, one
>> that is an implementation detail within Open vSwitch and that
>> controllers have no power to avoid.
>>
>> Let me explain in detail.  When a packet arrives for which there is no
>> kernel flow, the kernel sends it to userspace.  Userspace sends the
>> packet and sets up a kernel flow.  In the meantime, more packets might
>> have arrived and been queued to userspace.  Userspace will send these
>> packets, but any packets that arrive after the kernel flow is set up
>> will be forwarded directly by the kernel before those queued to
>> userspace go out.
>>
>
>
> This is exactly the problem we face while going for KVM paravirtualized network driver for Windows (NetKVM) certification.
> There are a few automated tests that send bursts of packets and wait for the same packets and the same order on the other side.
>
> We have a POC patches (pretty dirty) that solve the problem (below). The idea is simple - when datapath makes upcall it queues packets in kernel until user mode completes processing and downloads a new flow. It looks like overkill to queue packets per datapath, queueing per vport will be enough, but it was easier to implement this way and it proves the concept as well. Still, it is obvious there is performance and scaling impact so another ideas are highly welcome.
>
> What do you think? Should we go for this solution and prepare clean patches for submission?

I think in order to fully solve the problem you actually need to queue
per flow, rather than per port or per datapath. Otherwise, you end up
serializing to one flow setup at a time, which is probably a bigger
problem in practice than the one this is trying to solve.

It's not entirely clear to me that the general solution is really
worth the extra complexity for situations beyond the WHQL test so
there might be a bandaid for that particular problem. Do you know
exactly what it is doing?



More information about the discuss mailing list