[ovs-dev] knowing when a kernel flow is really deleted

Jesse Gross jesse at nicira.com
Fri Dec 16 00:49:57 UTC 2011


On Thu, Dec 15, 2011 at 3:46 PM, Ben Pfaff <blp at nicira.com> wrote:
> On Thu, Dec 15, 2011 at 03:24:38PM -0800, Jesse Gross wrote:
>> On Thu, Dec 15, 2011 at 1:34 PM, Ben Pfaff <blp at nicira.com> wrote:
>> > On Thu, Dec 15, 2011 at 01:00:00PM -0800, Jesse Gross wrote:
>> >> On Thu, Dec 15, 2011 at 12:34 PM, Ben Pfaff <blp at nicira.com> wrote:
>> >> > A workaround would be to call synchronize_rcu() and send the genl
>> >> > reply from some context that doesn't hold genl_lock, but that's hardly
>> >> > elegant. ??Also it means that the real reply would come after the one
>> >> > generated automatically by af_netlink.c when NLM_F_ACK is used, which
>> >> > violates the normal rules.
>> >>
>> >> Isn't that almost exactly the same as sending the message from the RCU
>> >> callback (including the Netlink ordering issue)?
>> >
>> > In the "send from RCU callback" case, I intended that the normal
>> > Netlink reply would be sent synchronously just after deletion, just as
>> > it is now. ??The ordering would therefore stay just as it is now. ??Only
>> > the broadcast reply would be deferred.
>>
>> What do you consider the "normal Netlink reply"?  There are basically
>> 3 different pieces:
>>
>>  * Multicast flow information to the relevant group (excluding the
>> requester if NLM_F_ECHO is specified).
>>  * Unicast flow information to the requester if NLM_F_ECHO is specified.
>>  * Unicast Netlink acknowledgement if NLM_F_ACK is specified or there
>> was an error.
>>
>> Are you talking about doing the last two as-is and moving the
>> multicast to the flow reclamation (but actually sending it to everyone
>> at that time)?
>
> Yes, that's roughly what I had in mind, and perhaps exactly, depending
> on what "but actually sending it everyone" means.  The idea is that
> the immediate, synchronous reply for NLM_F_ECHO would give approximate
> statistics, the later multicast would give exact statistics, and
> receiving the multicast could be used as an indicator that any upcalls
> generated by the flow have already been queued to userspace.
>
> I assumed one socket would be used for sending the request and
> receiving the unicast reply and another socket would be used for
> receiving the multicasts, since we'd want to get both separately.

I think we're talking about the same thing now, although I would
somewhat uncomfortable with this as it seems a pretty non-standard use
of Netlink where the unicast and multicast replies have different
values and can be inferred to mean different things.

>> > The "use synchronize_rcu() on the side then send the reply" case would
>> > change the message ordering.
>>
>> I guess I don't see why you couldn't use exactly the same strategy as
>> you choose above.
>
> I assumed that an intended advantage of using synchronize_rcu() was
> that all three forms

I think you could do it either way in both - as I said, they seem
pretty much equivalent to me.

>> > Could we just use the existing spinlock in sw_flow plus the 'dead'
>> > variable to ensure that no packets go through a deleted flow after
>> > it's deleted? ??On the delete side, take the spinlock and set 'dead'.
>> > On the fast path, take the spinlock and check 'dead' before executing
>> > the actions, release the spinlock after executing the actions. ??We
>> > already have to take the spinlock for every packet anyway to update
>> > the stats, so it's not an additional cost. ??I doubt that there's any
>> > parallel work for a given flow anyway (that would imply packet
>> > reordering). ??We would have to avoid recursive locking somehow.
>>
>> Hmm, I think that would probably work.  It's slightly convoluted but
>> certainly much less so than the alternatives.  How would you get
>> recursive locking?
>
> Flow loops e.g. through patch ports was what came to mind.

We don't execute any actions while the spin lock is held and trips
through a patch port will appear as different flows, so I don't think
that there is a possibility of recursive locking.  I think you
basically do:

For flow processing:

 * Extract and lookup flow
 * Take spin lock
 * Check to see if flow is dead
 * Else update stats
 * Release spin lock
 * If dead, act like you didn't find the flow
 * Else execute actions

For flow deletion:

 * Take spin lock
 * Read stats
 * Mark flow dead
 * Release spin lock



More information about the dev mailing list