[ovs-dev] [PATCH v14 5/5] dpif-netdev: Add specialized generic scalar functions

Stokes, Ian ian.stokes at intel.com
Fri Jul 19 08:42:57 UTC 2019


> On 18.07.2019 16:03, Harry van Haaren wrote:
> > This commit adds a number of specialized functions, that handle
> > common miniflow fingerprints. This enables compiler optimization,
> > resulting in higher performance. Below a quick description of
> > how this optimization actually works;
> >
> > "Specialized functions" are "instances" of the generic implementation,
> > but the compiler is given extra context when compiling. In the case of
> > iterating miniflow datastructures, the most interesting value to enable
> > compile time optimizations is the loop trip count per unit.
> >
> > In order to create a specialized function, there is a generic
> implementation,
> > which uses a for() loop without the compiler knowing the loop trip count
> at
> > compile time. The loop trip count is passed in as an argument to the
> function:
> >
> > uint32_t miniflow_impl_generic(struct miniflow *mf, uint32_t loop_count)
> > {
> >     for(uint32_t i = 0; i < loop_count; i++)
> >         // do work
> > }
> >
> > In order to "specialize" the function, we call the generic
> implementation
> > with hard-coded numbers - these are compile time constants!
> >
> > uint32_t miniflow_impl_loop5(struct miniflow *mf, uint32_t loop_count)
> > {
> >     // use hard coded constant for compile-time constant-propogation
> >     return miniflow_impl_generic(mf, 5);
> > }
> >
> > Given the compiler is aware of the loop trip count at compile time,
> > it can perform an optimization known as "constant propogation". Combined
> > with inlining of the miniflow_impl_generic() function, the compiler is
> > now enabled to *compile time* unroll the loop 5x, and produce "flat"
> code.
> >
> > The last step to using the specialized functions is to utilize a
> > function-pointer to choose the specialized (or generic) implementation.
> > The selection of the function pointer is performed at subtable creation
> > time, when miniflow fingerprint of the subtable is known. This technique
> > is known as "multiple dispatch" in some literature, as it uses multiple
> > items of information (miniflow bit counts) to select the dispatch
> function.
> >
> > By pointing the function pointer at the optimized implementation, OvS
> > benefits from the compile time optimizations at runtime.
> >
> > Signed-off-by: Harry van Haaren <harry.van.haaren at intel.com>
> > Tested-by: Malvika Gupta <malvika.gupta at arm.com>
> >
> > ---
> >
> > v13:
> > - Update macros to new lookup function prototype without blocks scratch
> >
> > v12:
> > - Fix typo (Ian)
> > - Fix missing . after comments (Ian)
> > - Improve return value comments for probe function (Ian)
> > - Spaces after number in optimized lookup declarations (Ilya)
> > - Make VLOG level debug instead of info (Ilya)
> >
> > v11:
> > - Use MACROs to declare and check optimized functions (Ilya)
> > - Use captial letter for commit message (Ilya)
> > - Rebase onto latest patchset changes
> > - Added NEWS entry for data-path subtable specialization (Ian/Harry)
> > - Checkpatch notes an "incorrect bracketing" in the MACROs, however I
> >   didn't find a solution that it does like.
> >
> > v10:
> > - Rebase changes from previous patches.
> > - Remove "restrict" keyword as windows CI failed, see here for details:
> >   https://ci.appveyor.com/project/istokes/ovs-q8fvv/builds/24398228
> >
> > v8:
> > - Rework to use blocks_cache from the dpcls instance, to avoid variable
> >   lenght arrays in the data-path.
> > ---
> >  NEWS                             |  4 +++
> >  lib/dpif-netdev-lookup-generic.c | 49 ++++++++++++++++++++++++++++++++
> >  lib/dpif-netdev-private.h        |  8 ++++++
> >  lib/dpif-netdev.c                |  9 ++++--
> >  4 files changed, 68 insertions(+), 2 deletions(-)
> >
> > diff --git a/NEWS b/NEWS
> > index 81130e667..4cfffb1bc 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -34,6 +34,10 @@ Post-v2.11.0
> >       * 'ovs-appctl exit' now implies cleanup of non-internal ports in
> userspace
> >         datapath regardless of '--cleanup' option. Use '--cleanup' to
> remove
> >         internal ports too.
> > +     * Datapath classifer code refactored to enable function pointers
> to select
> > +       the lookup implementation at runtime. This enables
> specialization of
> > +       specific subtables based on the miniflow attributes, enhancing
> the
> > +       performance of the subtable search.
> >     - OVSDB:
> >       * OVSDB clients can now resynchronize with clustered servers much
> more
> >         quickly after a brief disconnection, saving bandwidth and CPU
> time.
> > diff --git a/lib/dpif-netdev-lookup-generic.c b/lib/dpif-netdev-lookup-
> generic.c
> > index 4dcf4640e..10ac41d8e 100644
> > --- a/lib/dpif-netdev-lookup-generic.c
> > +++ b/lib/dpif-netdev-lookup-generic.c
> > @@ -260,7 +260,56 @@ dpcls_subtable_lookup_generic(struct dpcls_subtable
> *subtable,
> >                                const struct netdev_flow_key *keys[],
> >                                struct dpcls_rule **rules)
> >  {
> > +    /* Here the runtime subtable->mf_bits counts are used, which forces
> the
> > +     * compiler to iterate normal for() loops. Due to this limitation
> in the
> > +     * compilers available optimizations, this function has lower
> performance
> > +     * than the below specialized functions.
> > +     */
> >      return lookup_generic_impl(subtable, keys_map, keys, rules,
> >                                 subtable->mf_bits_set_unit0,
> >                                 subtable->mf_bits_set_unit1);
> >  }
> > +
> > +/* Expand out specialized functions with U0 and U1 bit attributes. */
> > +#define DECLARE_OPTIMIZED_LOOKUP_FUNCTION(U0, U1)
> \
> > +    static uint32_t
> \
> > +    dpcls_subtable_lookup_mf_u0w##U0##_u1w##U1(
> \
> > +                                         struct dpcls_subtable
> *subtable,     \
> > +                                         uint32_t keys_map,
> \
> > +                                         const struct netdev_flow_key
> *keys[],\
> > +                                         struct dpcls_rule **rules)
> \
> > +    {
> \
> > +        return lookup_generic_impl(subtable, keys_map, keys, rules, U0,
> U1);  \
> > +    }
> \
> > +
> > +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 1)
> > +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 1)
> > +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 0)
> > +
> > +/* Check if a specialized function is valid for the required subtable.
> */
> > +#define CHECK_LOOKUP_FUNCTION(U0,U1)
> \
> 
> Ian, could you, please, add a space after a comma here before applying the
> patch?

Sure, no problem.

Ian


More information about the dev mailing list