[ovs-dev] knowing when a kernel flow is really deleted

Thu Dec 15 18:48:20 UTC 2011

On Wed, Dec 14, 2011 at 06:34:04PM -0800, Jesse Gross wrote:
> On Wed, Dec 14, 2011 at 4:12 PM, Ben Pfaff <blp at nicira.com> wrote:
> > Why do we care? ??Current userspace isn't really affected. ??At most, we
> > get flow statistics wrong by one packet (I guess multiple packets is
> > theoretically possible?), if and when this happens. ??The place where it
> > keeps coming up is in situations where we want userspace to be able to
> > associate metadata with a packet sent from kernel to user. ??This came up
> > in August regarding sflow:
> > ?? ?? ?? ??http://openvswitch.org/pipermail/dev/2011-August/010384.html
> 
> In theory the number of packets that you lose is unbounded if traffic
> suddenly spikes, causing the CPU to hold onto the flow forever.  I
> don't think it really matters and it is possible to do better than we
> do today, though.

I agree that memory freeing is deferred to the next grace period,
which can effectively be "forever".  But I think that the number of
packets whose statistics can be lost is limited by the time that it
takes for a memory write to the flow table to become visible to other
CPUs.  I think that would normally be a much shorter duration; my
guess is that it would be hard for more than one packet to get "lost"
that way.

> > ?? ?? ?? ??3. Somehow actually eliminate the problem with deleting flows,
> > ?? ?? ?? ?? ?? so that when userspace receives the response to the flow
> > ?? ?? ?? ?? ?? deletion we know that no more packets can go through the
> > ?? ?? ?? ?? ?? flow. ??I don't know how to do this efficiently.
> 
> I'm not sure that this is really a question of efficiency, so much as
> it is complexity.  Basically you have to make userspace able to
> tolerate blocking while the flow is deleted and then use
> synchronize_rcu when removing flows.  Presumably this would mean that
> you need some kind of worker threads.

synchronize_rcu() is the obvious solution but the efficiency I was
worried about is being able to delete flows at a reasonable pace.
Wouldn't using synchronize_rcu() throttle back the speed of flow
deletion to an unacceptable rate?  I remember from a long time ago
that synchronize_rcu() can be ridiculously slow.  Oh yeah, here's the
log message (which might be so old that it's not in the current OVS
repo, not sure).  It's referring to testing that we did on whatever
was the current XenServer version at the time, with a 2.6.18-based
kernel:

----------------------------------------------------------------------
>From 876d3efc253cb6507ef47c2fa0f267a11722d8a9 Mon Sep 17 00:00:00 2001
From: Ben Pfaff <blp at nicira.com>
Date: Mon, 16 Mar 2009 11:37:34 -0700
Subject: [PATCH] datapath: Speed up ioctl fast paths.

synchronize_rcu() was causing some common datapath ioctls to take up to
approx. 1 second (!) in some cases, which was killing our performance.
Use call_rcu() instead.
---
 datapath/datapath.c |   19 +++++++++++++------
 1 files changed, 13 insertions(+), 6 deletions(-)
----------------------------------------------------------------------

> > ?? ?? ?? ??4. Have the RCU callback for flow deletion send the Netlink
> > ?? ?? ?? ?? ?? broadcast message that tells userspace that the flow is gone.
> > ?? ?? ?? ?? ?? The Netlink client that sent the actual deletion request
> > ?? ?? ?? ?? ?? would still get a synchronous response, but the broadcast
> > ?? ?? ?? ?? ?? would be delayed until the flow was really gone. ??I think
> > ?? ?? ?? ?? ?? this might be practical, but I don't know the exact
> > ?? ?? ?? ?? ?? restrictions on RCU callbacks.
> 
> I think that should be OK.  RCU is just operating in softirq context
> so the restrictions aren't too severe.  It will make sending the
> messages more likely to fail because they have to be allocated in
> atomic context.

Makes sense.  Your analysis matches my own educated guesses, thanks.