[ovs-dev] [PATCH 1/4] compiler: Introduce OVS_PREFETCH variants.

Stokes, Ian ian.stokes at intel.com
Tue Mar 13 10:37:15 UTC 2018


> -----Original Message-----
> From: ovs-dev-bounces at openvswitch.org [mailto:ovs-dev-
> bounces at openvswitch.org] On Behalf Of Bhanuprakash Bodireddy
> Sent: Friday, January 12, 2018 5:41 PM
> To: dev at openvswitch.org
> Subject: [ovs-dev] [PATCH 1/4] compiler: Introduce OVS_PREFETCH variants.
> 
> This commit introduces prefetch variants by using the GCC built-in
> prefetch function.
> 
> The prefetch variants gives the user better control on designing data
> caching strategy in order to increase cache efficiency and minimize cache
> pollution. Data reference patterns here can be classified in to
> 
>  - Non-temporal(NT) - Data that is referenced once and not reused in
>                       immediate future.
>  - Temporal         - Data will be used again soon.
> 
> The Macro variants can be used where there are
>  - Predictable memory access patterns.
>  - Execution pipeline can stall if data isn't available.
>  - Time consuming loops.
> 
> For example:
> 
>   OVS_PREFETCH_CACHE(addr, OPCH_LTR)
>     - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ.
>     - __builtin_prefetch(addr, 0, 1)
>     - Prefetch data in to L3 cache for readonly purpose.
> 
>   OVS_PREFETCH_CACHE(addr, OPCH_HTW)
>     - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE.
>     - __builtin_prefetch(addr, 1, 3)
>     - Prefetch data in to all caches in anticipation of write. In doing
>       so it invalidates other cached copies so as to gain 'exclusive'
>       access.
> 
>   OVS_PREFETCH(addr)
>     - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ.
>     - __builtin_prefetch(addr, 0, 3)
>     - Prefetch data in to all caches in anticipation of read and that
>       data will be used again soon (HTR - High Temporal Read).
> 
> Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy at intel.com>
> ---
>  include/openvswitch/compiler.h | 147
> ++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 139 insertions(+), 8 deletions(-)
> 
> diff --git a/include/openvswitch/compiler.h
> b/include/openvswitch/compiler.h index c7cb930..94bb24d 100644
> --- a/include/openvswitch/compiler.h
> +++ b/include/openvswitch/compiler.h
> @@ -222,18 +222,149 @@
>      static void f(void)
>  #endif
> 
> -/* OVS_PREFETCH() can be used to instruct the CPU to fetch the cache
> - * line containing the given address to a CPU cache.
> - * OVS_PREFETCH_WRITE() should be used when the memory is going to be
> - * written to.  Depending on the target CPU, this can generate the same
> - * instruction as OVS_PREFETCH(), or bring the data into the cache in an
> - * exclusive state. */
>  #if __GNUC__
> -#define OVS_PREFETCH(addr) __builtin_prefetch((addr)) -#define
> OVS_PREFETCH_WRITE(addr) __builtin_prefetch((addr), 1)
> +enum cache_locality {
> +    NON_TEMPORAL_LOCALITY,
> +    LOW_TEMPORAL_LOCALITY,
> +    MODERATE_TEMPORAL_LOCALITY,
> +    HIGH_TEMPORAL_LOCALITY
> +};
> +
> +enum cache_rw {
> +    PREFETCH_READ,
> +    PREFETCH_WRITE
> +};
> +
> +/* The prefetch variants gives the user better control on designing
> +data
> + * caching strategy in order to increase cache efficiency and minimize
> + * cache pollution. Data reference patterns here can be classified in
> +to
> + *
> + *   Non-temporal(NT) - Data that is referenced once and not reused in
> + *                      immediate future.
> + *   Temporal         - Data will be used again soon.
> + *
> + * The Macro variants can be used where there are
> + *   o Predictable memory access patterns.
> + *   o Execution pipeline can stall if data isn't available.
> + *   o Time consuming loops.
> + *
> + * OVS_PREFETCH_CACHE() can be used to instruct the CPU to fetch the
> +cache
> + * line containing the given address to a CPU cache. The second
> +argument
> + * OPCH_XXR (or) OPCH_XXW is used to hint if the prefetched data is
> +going
> + * to be read or written to by core.
> + *
> + * Example Usage:
> + *
> + *   OVS_PREFETCH_CACHE(addr, OPCH_LTR)
> + *       - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ.
> + *       - __builtin_prefetch(addr, 0, 1)
> + *       - Prefetch data in to L3 cache for readonly purpose.
> + *
> + *   OVS_PREFETCH_CACHE(addr, OPCH_HTW)
> + *       - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE.
> + *       - __builtin_prefetch(addr, 1, 3)
> + *       - Prefetch data in to all caches in anticipation of write. In
> doing
> + *         so it invalidates other cached copies so as to gain
> 'exclusive'
> + *         access.
> + *
> + *   OVS_PREFETCH(addr)
> + *       - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ.
> + *       - __builtin_prefetch(addr, 0, 3)
> + *       - Prefetch data in to all caches in anticipation of read and
> that
> + *         data will be used again soon (HTR - High Temporal Read).
> + *
> + * Implementation details of prefetch hint instructions may vary across
> + * different processors and microarchitectures.

Herein lies a potential problem, have you tested this on systems that have different interpretations of the prefetch hints? What about systems that don't support it?

In some cases OVS will be compiled on one system but then deployed on another, they might not be the same HW platform. What happens in that case?

Will it behave as expected i.e. similar fashion to how prefetch currently behaves?

> + *
> + * OPCH_NTW, OPCH_LTW, OPCH_MTW uses prefetchwt1 instruction and
> +OPCH_HTW
> + * uses prefetchw instruction when available. Refer Documentation on
> +how
> + * to enable prefetchwt1 instruction.

Just to clarify, Is it HW documentation for a user's setup they must refer to?
Are there any extra setup steps for compilers etc. for these instructions?

I would expect something like this to be added to the OVS docs.

> + *
> + * PREFETCH HINT    Instruction     GCC builtin function
> + * -------------------------------------------------------
> + *   OPCH_NTR       prefetchnta  __builtin_prefetch(a, 0, 0)
> + *   OPCH_LTR       prefetcht2   __builtin_prefetch(a, 0, 1)
> + *   OPCH_MTR       prefetcht1   __builtin_prefetch(a, 0, 2)
> + *   OPCH_HTR       prefetcht0   __builtin_prefetch(a, 0, 3)
> + *
> + *   OPCH_NTW       prefetchwt1  __builtin_prefetch(a, 1, 0)
> + *   OPCH_LTW       prefetchwt1  __builtin_prefetch(a, 1, 1)
> + *   OPCH_MTW       prefetchwt1  __builtin_prefetch(a, 1, 2)
> + *   OPCH_HTW       prefetchw    __builtin_prefetch(a, 1, 3)
> + *
> + * */
> +#define OVS_PREFETCH_CACHE_HINT
> \
> +    OPCH(OPCH_NTR, PREFETCH_READ, NON_TEMPORAL_LOCALITY,
> \
> +         "Fetch data to non-temporal cache close to processor"
> \
> +         "to minimize cache pollution")
> \
> +    OPCH(OPCH_LTR, PREFETCH_READ, LOW_TEMPORAL_LOCALITY,
> \
> +         "Fetch data to L2 and L3 cache")
> \
> +    OPCH(OPCH_MTR, PREFETCH_READ, MODERATE_TEMPORAL_LOCALITY,
> \
> +         "Fetch data to L2 and L3 caches, same as LTR on"
> \
> +         "Nehalem, Westmere, Sandy Bridge and newer microarchitectures")
> \
> +    OPCH(OPCH_HTR, PREFETCH_READ, HIGH_TEMPORAL_LOCALITY,
> \
> +         "Fetch data in to all cache levels L1, L2 and L3")
> \
> +    OPCH(OPCH_NTW, PREFETCH_WRITE, NON_TEMPORAL_LOCALITY,
> \
> +         "Fetch data to L2 and L3 cache in exclusive state"
> \
> +         "in anticipation of write")
> \
> +    OPCH(OPCH_LTW, PREFETCH_WRITE, LOW_TEMPORAL_LOCALITY,
> \
> +         "Fetch data to L2 and L3 cache in exclusive state")
> \
> +    OPCH(OPCH_MTW, PREFETCH_WRITE, MODERATE_TEMPORAL_LOCALITY,
> \
> +         "Fetch data in to L2 and L3 caches in exclusive state")
> \
> +    OPCH(OPCH_HTW, PREFETCH_WRITE, HIGH_TEMPORAL_LOCALITY,
> \
> +         "Fetch data in to all cache levels in exclusive state")
> +
> +/* Indexes for cache prefetch types. */ enum { #define OPCH(ENUM, RW,
> +LOCALITY, EXPLANATION) ENUM##_INDEX,
> +    OVS_PREFETCH_CACHE_HINT
> +#undef OPCH
> +};
> +
> +/* Cache prefetch types. */
> +enum ovs_prefetch_type {
> +#define OPCH(ENUM, RW, LOCALITY, EXPLANATION) ENUM = 1 << ENUM##_INDEX,
> +    OVS_PREFETCH_CACHE_HINT
> +#undef OPCH
> +};
> +
> +#define OVS_PREFETCH_CACHE(addr, TYPE) switch(TYPE)

Checkpatch caught the following:

ERROR: Improper whitespace around control block
#164 FILE: include/openvswitch/compiler.h:331:
#define OVS_PREFETCH_CACHE(addr, TYPE) switch(TYPE)                           \

Lines checked: 204, Warnings: 0, Errors: 1> \
> +{
> \
> +    case OPCH_NTR:
> \
> +        __builtin_prefetch((addr), PREFETCH_READ, NON_TEMPORAL_LOCALITY);
> \
> +        break;
> \
> +    case OPCH_LTR:
> \
> +        __builtin_prefetch((addr), PREFETCH_READ, LOW_TEMPORAL_LOCALITY);
> \
> +        break;
> \
> +    case OPCH_MTR:
> \
> +        __builtin_prefetch((addr), PREFETCH_READ,
> \
> +                           MODERATE_TEMPORAL_LOCALITY);
> \
> +        break;
> \
> +    case OPCH_HTR:
> \
> +        __builtin_prefetch((addr), PREFETCH_READ,
> HIGH_TEMPORAL_LOCALITY);    \
> +        break;
> \
> +    case OPCH_NTW:
> \
> +        __builtin_prefetch((addr), PREFETCH_WRITE,
> NON_TEMPORAL_LOCALITY);    \
> +        break;
> \
> +    case OPCH_LTW:
> \
> +        __builtin_prefetch((addr), PREFETCH_WRITE,
> LOW_TEMPORAL_LOCALITY);    \
> +        break;
> \
> +    case OPCH_MTW:
> \
> +        __builtin_prefetch((addr), PREFETCH_WRITE,
> \
> +                           MODERATE_TEMPORAL_LOCALITY);
> \
> +        break;
> \
> +    case OPCH_HTW:
> \
> +        __builtin_prefetch((addr), PREFETCH_WRITE,
> HIGH_TEMPORAL_LOCALITY);   \
> +        break;
> \
> +}
> +
> +/* Retain this for backward compatibility. */ #define
> +OVS_PREFETCH(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTR) #define
> +OVS_PREFETCH_WRITE(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTW)
>  #else
>  #define OVS_PREFETCH(addr)
>  #define OVS_PREFETCH_WRITE(addr)
> +#define OVS_PREFETCH_CACHE(addr, OP)
>  #endif
> 
>  /* Build assertions.
> --
> 2.4.11
> 
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list