[ovs-dev] [PATCH v14 5/5] dpif-netdev: Add specialized generic scalar functions
Ilya Maximets
i.maximets at samsung.com
Fri Jul 19 08:41:26 UTC 2019
On 18.07.2019 16:03, Harry van Haaren wrote:
> This commit adds a number of specialized functions, that handle
> common miniflow fingerprints. This enables compiler optimization,
> resulting in higher performance. Below a quick description of
> how this optimization actually works;
>
> "Specialized functions" are "instances" of the generic implementation,
> but the compiler is given extra context when compiling. In the case of
> iterating miniflow datastructures, the most interesting value to enable
> compile time optimizations is the loop trip count per unit.
>
> In order to create a specialized function, there is a generic implementation,
> which uses a for() loop without the compiler knowing the loop trip count at
> compile time. The loop trip count is passed in as an argument to the function:
>
> uint32_t miniflow_impl_generic(struct miniflow *mf, uint32_t loop_count)
> {
> for(uint32_t i = 0; i < loop_count; i++)
> // do work
> }
>
> In order to "specialize" the function, we call the generic implementation
> with hard-coded numbers - these are compile time constants!
>
> uint32_t miniflow_impl_loop5(struct miniflow *mf, uint32_t loop_count)
> {
> // use hard coded constant for compile-time constant-propogation
> return miniflow_impl_generic(mf, 5);
> }
>
> Given the compiler is aware of the loop trip count at compile time,
> it can perform an optimization known as "constant propogation". Combined
> with inlining of the miniflow_impl_generic() function, the compiler is
> now enabled to *compile time* unroll the loop 5x, and produce "flat" code.
>
> The last step to using the specialized functions is to utilize a
> function-pointer to choose the specialized (or generic) implementation.
> The selection of the function pointer is performed at subtable creation
> time, when miniflow fingerprint of the subtable is known. This technique
> is known as "multiple dispatch" in some literature, as it uses multiple
> items of information (miniflow bit counts) to select the dispatch function.
>
> By pointing the function pointer at the optimized implementation, OvS
> benefits from the compile time optimizations at runtime.
>
> Signed-off-by: Harry van Haaren <harry.van.haaren at intel.com>
> Tested-by: Malvika Gupta <malvika.gupta at arm.com>
>
> ---
>
> v13:
> - Update macros to new lookup function prototype without blocks scratch
>
> v12:
> - Fix typo (Ian)
> - Fix missing . after comments (Ian)
> - Improve return value comments for probe function (Ian)
> - Spaces after number in optimized lookup declarations (Ilya)
> - Make VLOG level debug instead of info (Ilya)
>
> v11:
> - Use MACROs to declare and check optimized functions (Ilya)
> - Use captial letter for commit message (Ilya)
> - Rebase onto latest patchset changes
> - Added NEWS entry for data-path subtable specialization (Ian/Harry)
> - Checkpatch notes an "incorrect bracketing" in the MACROs, however I
> didn't find a solution that it does like.
>
> v10:
> - Rebase changes from previous patches.
> - Remove "restrict" keyword as windows CI failed, see here for details:
> https://ci.appveyor.com/project/istokes/ovs-q8fvv/builds/24398228
>
> v8:
> - Rework to use blocks_cache from the dpcls instance, to avoid variable
> lenght arrays in the data-path.
> ---
> NEWS | 4 +++
> lib/dpif-netdev-lookup-generic.c | 49 ++++++++++++++++++++++++++++++++
> lib/dpif-netdev-private.h | 8 ++++++
> lib/dpif-netdev.c | 9 ++++--
> 4 files changed, 68 insertions(+), 2 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 81130e667..4cfffb1bc 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -34,6 +34,10 @@ Post-v2.11.0
> * 'ovs-appctl exit' now implies cleanup of non-internal ports in userspace
> datapath regardless of '--cleanup' option. Use '--cleanup' to remove
> internal ports too.
> + * Datapath classifer code refactored to enable function pointers to select
> + the lookup implementation at runtime. This enables specialization of
> + specific subtables based on the miniflow attributes, enhancing the
> + performance of the subtable search.
> - OVSDB:
> * OVSDB clients can now resynchronize with clustered servers much more
> quickly after a brief disconnection, saving bandwidth and CPU time.
> diff --git a/lib/dpif-netdev-lookup-generic.c b/lib/dpif-netdev-lookup-generic.c
> index 4dcf4640e..10ac41d8e 100644
> --- a/lib/dpif-netdev-lookup-generic.c
> +++ b/lib/dpif-netdev-lookup-generic.c
> @@ -260,7 +260,56 @@ dpcls_subtable_lookup_generic(struct dpcls_subtable *subtable,
> const struct netdev_flow_key *keys[],
> struct dpcls_rule **rules)
> {
> + /* Here the runtime subtable->mf_bits counts are used, which forces the
> + * compiler to iterate normal for() loops. Due to this limitation in the
> + * compilers available optimizations, this function has lower performance
> + * than the below specialized functions.
> + */
> return lookup_generic_impl(subtable, keys_map, keys, rules,
> subtable->mf_bits_set_unit0,
> subtable->mf_bits_set_unit1);
> }
> +
> +/* Expand out specialized functions with U0 and U1 bit attributes. */
> +#define DECLARE_OPTIMIZED_LOOKUP_FUNCTION(U0, U1) \
> + static uint32_t \
> + dpcls_subtable_lookup_mf_u0w##U0##_u1w##U1( \
> + struct dpcls_subtable *subtable, \
> + uint32_t keys_map, \
> + const struct netdev_flow_key *keys[],\
> + struct dpcls_rule **rules) \
> + { \
> + return lookup_generic_impl(subtable, keys_map, keys, rules, U0, U1); \
> + } \
> +
> +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 1)
> +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 1)
> +DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 0)
> +
> +/* Check if a specialized function is valid for the required subtable. */
> +#define CHECK_LOOKUP_FUNCTION(U0,U1) \
Ian, could you, please, add a space after a comma here before applying the patch?
More information about the dev
mailing list