[ovs-dev] [PATCH 03/11] datapath: Fold key_len and hash into sw_flow_key.

Thu Feb 21 02:32:21 UTC 2013

On Wed, Feb 20, 2013 at 1:30 AM, Rajahalme, Jarno (NSN - FI/Espoo)
<jarno.rajahalme at nsn.com> wrote:
>
> On Feb 19, 2013, at 21:38 , ext Jesse Gross wrote:
>
>> On Mon, Feb 18, 2013 at 12:13 AM, Rajahalme, Jarno (NSN - FI/Espoo)
>> <jarno.rajahalme at nsn.com> wrote:
>>>
>>> I'd be happy to break it apart.
>>>
>>> In datapath.c ovs_flow_cmd_build_info(), there is first a call to ovs_flow_tbl_lookup(), which currently computes the hash. In the case of a new flow this is followed by ovs_flow_tbl_insert(), which currently recomputes the same hash. This can be avoided if hash is made a part of the key itself.
>>
>> OK, I see now.  In the numbers that you provided before, the test did
>> not stress the kernel flow setup path (although the performance still
>> seemed to improve after this patch, which seems odd to me).  Do you
>> know if this helps independently (and also how such a situation might
>> arise)?
>>
>
> Based on the sporadic coverage logs I concluded that most of the time my test will set up flows without facets. However, at least initially there should be a period when the governor has not yet been started, during which facets and kernel flows should be set up. So I think that the benefit seen is because of the reduced hash computations and maybe in part by reduced parameter passing(?).
>
> So it should have a benefit by it's own, I just haven't been able to construct a test case for that. For some reason I can not get ovs-benchmark to get ovs-vswitchd anywhere above single digit CPU use in my dev setup...

I agree that your test should not be setting up facets (and therefore
exercising this code path).  I'm pretty reluctant to apply this on the
basis of being a performance optimization without harder data simply
because in my experience speculation about performance results for
these types of things is very often wrong.

>>> In a later patch (which may be more controversial), I have proposed to expose the kernel computed key hash to the userspace (think "cookie"), which is given back to the kernel only when the key is passed back as-is. In this case additional key hash computations can be avoided in kernel flow set-up.
>>
>> My hope was that if this had a benefit on it's own then it could be
>> applied independently.  Otherwise, we'll have to wait to see what
>> conclusion we come to on the rest of the patches.
>
> This should be applicable by itself. IMO it is also a bit cleaner, conceptually, apart from any performance benefit :-)

struct sw_flow_key is pretty big already so I'd like to avoid
expanding it without a compelling reason.  Over the long term, I'd
like to find ways to reduce its size, pack it neatly into cachelines,
keep it on the right NUMA node, etc.

I'm actually not philosophically opposed to this, I'm just not really
convinced of the benefits at the moment.