[ovs-dev] [PATCH v14 5/5] dpif-netdev: Add specialized generic scalar functions

Ilya Maximets i.maximets at samsung.com
Fri Jul 19 08:41:26 UTC 2019


On 18.07.2019 16:03, Harry van Haaren wrote:
> This commit adds a number of specialized functions, that handle
> common miniflow fingerprints. This enables compiler optimization,
> resulting in higher performance. Below a quick description of
> how this optimization actually works;
> 
> "Specialized functions" are "instances" of the generic implementation,
> but the compiler is given extra context when compiling. In the case of
> iterating miniflow datastructures, the most interesting value to enable
> compile time optimizations is the loop trip count per unit.
> 
> In order to create a specialized function, there is a generic implementation,
> which uses a for() loop without the compiler knowing the loop trip count at
> compile time. The loop trip count is passed in as an argument to the function:
> 
> uint32_t miniflow_impl_generic(struct miniflow *mf, uint32_t loop_count)
> {
>     for(uint32_t i = 0; i < loop_count; i++)
>         // do work
> }
> 
> In order to "specialize" the function, we call the generic implementation
> with hard-coded numbers - these are compile time constants!
> 
> uint32_t miniflow_impl_loop5(struct miniflow *mf, uint32_t loop_count)
> {
>     // use hard coded constant for compile-time constant-propogation
>     return miniflow_impl_generic(mf, 5);
> }
> 
> Given the compiler is aware of the loop trip count at compile time,
> it can perform an optimization known as "constant propogation". Combined
> with inlining of the miniflow_impl_generic() function, the compiler is
> now enabled to *compile time* unroll the loop 5x, and produce "flat" code.
> 
> The last step to using the specialized functions is to utilize a
> function-pointer to choose the specialized (or generic) implementation.
> The selection of the function pointer is performed at subtable creation
> time, when miniflow fingerprint of the subtable is known. This technique
> is known as "multiple dispatch" in some literature, as it uses multiple
> items of information (miniflow bit counts) to select the dispatch function.
> 
> By pointing the function pointer at the optimized implementation, OvS
> benefits from the compile time optimizations at runtime.
> 
> Signed-off-by: Harry van Haaren <harry.van.haaren at intel.com>
> Tested-by: Malvika Gupta <malvika.gupta at arm.com>
> 
> ---
> 
> v13:
> - Update macros to new lookup function prototype without blocks scratch
> 
> v12:
> - Fix typo (Ian)
> - Fix missing . after comments (Ian)
> - Improve return value comments for probe function (Ian)
> - Spaces after number in optimized lookup declarations (Ilya)
> - Make VLOG level debug instead of info (Ilya)
> 
> v11:
> - Use MACROs to declare and check optimized functions (Ilya)
> - Use captial letter for commit message (Ilya)
> - Rebase onto latest patchset changes
> - Added NEWS entry for data-path subtable specialization (Ian/Harry)
> - Checkpatch notes an "incorrect bracketing" in the MACROs, however I
>   didn't find a solution that it does like.
> 
> v10:
> - Rebase changes from previous patches.
> - Remove "restrict" keyword as windows CI failed, see here for details:
>   https://ci.appveyor.com/project/istokes/ovs-q8fvv/builds/24398228
> 
> v8:
> - Rework to use blocks_cache from the dpcls instance, to avoid variable
>   lenght arrays in the data-path.
> ---
>  NEWS                             |  4 +++
>  lib/dpif-netdev-lookup-generic.c | 49 ++++++++++++++++++++++++++++++++
>  lib/dpif-netdev-private.h        |  8 ++++++
>  lib/dpif-netdev.c                |  9 ++++--
>  4 files changed, 68 insertions(+), 2 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 81130e667..4cfffb1bc 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -34,6 +34,10 @@ Post-v2.11.0
>       * 'ovs-appctl exit' now implies cleanup of non-internal ports in userspace
>         datapath regardless of '--cleanup' option. Use '--cleanup' to remove
>         internal ports too.
> +     * Datapath classifer code refactored to enable function pointers to select
> +       the lookup implementation at runtime. This enables specialization of
> +       specific subtables based on the miniflow attributes, enhancing the
> +       performance of the subtable search.
>     - OVSDB:
>       * OVSDB clients can now resynchronize with clustered servers much more
>         quickly after a brief disconnection, saving bandwidth and CPU time.
> diff --git a/lib/dpif-netdev-lookup-generic.c b/lib/dpif-netdev-lookup-generic.c
> index 4dcf4640e..10ac41d8e 100644
> --- a/lib/dpif-netdev-lookup-generic.c
> +++ b/lib/dpif-netdev-lookup-generic.c
> @@ -260,7 +260,56 @@ dpcls_subtable_lookup_generic(struct dpcls_subtable *subtable,
>                                const struct netdev_flow_key *keys[],
>                                struct dpcls_rule **rules)
>  {
> +    /* Here the runtime subtable->mf_bits counts are used, which forces the
> +     * compiler to iterate normal for() loops. Due to this limitation in the
> +     * compilers available optimizations, this function has lower performance
> +     * than the below specialized functions.
> +     */
>      return lookup_generic_impl(subtable, keys_map, keys, rules,
>                                 subtable->mf_bits_set_unit0,
>                                 subtable->mf_bits_set_unit1);
>  }
> +
> +/* Expand out specialized functions with U0 and U1 bit attributes. */
> +#define DECLARE_OPTIMIZED_LOOKUP_FUNCTION(U0, U1)                             \
> +    static uint32_t                                                           \
> +    dpcls_subtable_lookup_mf_u0w##U0##_u1w##U1(                               \
> +                                         struct dpcls_subtable *subtable,     \
> +                                         uint32_t keys_map,                   \
> +                                         const struct netdev_flow_key *keys[],\
> +                                         struct dpcls_rule **rules)           \
> +    {                                                                         \
> +        return lookup_generic_impl(subtable, keys_map, keys, rules, U0, U1);  \
> +    }                                                                         \
> +
> +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 1)
> +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 1)
> +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 0)
> +
> +/* Check if a specialized function is valid for the required subtable. */
> +#define CHECK_LOOKUP_FUNCTION(U0,U1)                                          \

Ian, could you, please, add a space after a comma here before applying the patch?


More information about the dev mailing list