[ovs-dev] [RFC 2/2] datapath: Cache netlink flow key, mask on flow_put.

Thomas Graf tgraf at noironetworks.com
Fri Jul 11 09:03:47 UTC 2014


On 07/11/14 at 11:22am, Joe Stringer wrote:
> Thanks for the review,
> 
> On 10 July 2014 21:13, Thomas Graf <tgraf at noironetworks.com> wrote:
> 
> > If I understand the code correctly the gain is only visible on
> > consecutive dumps of the same flow. How about constructing the cache
> > when you require it for the first time? That avoids the cost on flow
> > setup.
> 
> 
> Correct. My main concern with doing it the first time it is required, is
> how it's synchronised. Flow dump is RCU. I don't really know what the
> threadsafety requirements are for this code, or what options are available
> to address this. Is it viable to check the pointer is NULL, allocate the
> cache, perform an atomic swap, and free the (potentially NULL) returned
> pointer?

Right, you would need the cmpxchg + RCU protection or take ovs_lock().
Given the Netlink buffer is sized large enough, the cost of taking the
lock once for a large number of flows might get amortized compared to
the expensive translation. A benchmark would help.

Another sidenote:
Looking at the code, OVS does not handle NLM_F_DUMP_INTR in user space yet
and the kernel dump does not call genl_dump_check_consistent() yet to provide
the flag. So what can currently happen even without your patch is that if
the dump requires more than one Netlink message, the RCU critical section
is left and thus a list deletion or insertion of a flow may occur in the
middle of the dump and thus provide a slightly incorrect set of flows.

> The alternative that I mentioned below the commit message is that we
> directly copy the key/mask from the original netlink message, and adjust
> the mask for any fields that have wildcarded bits where the kernel only
> supports full masking. This should be cheaper than converting to sw_flow
> and back again as this patch does currently. Furthermore, it shouldn't
> require locking, so the critical section can remain the same size as
> currently.

Right, needs verification of your assumption.



More information about the dev mailing list