[ovs-dev] [PATCH 1/4] compiler: Introduce OVS_PREFETCH variants.
Bodireddy, Bhanuprakash
bhanuprakash.bodireddy at intel.com
Tue Mar 13 14:52:48 UTC 2018
>
>> -----Original Message-----
>> From: ovs-dev-bounces at openvswitch.org [mailto:ovs-dev-
>> bounces at openvswitch.org] On Behalf Of Bhanuprakash Bodireddy
>> Sent: Friday, January 12, 2018 5:41 PM
>> To: dev at openvswitch.org
>> Subject: [ovs-dev] [PATCH 1/4] compiler: Introduce OVS_PREFETCH variants.
>>
>> This commit introduces prefetch variants by using the GCC built-in
>> prefetch function.
>>
>> The prefetch variants gives the user better control on designing data
>> caching strategy in order to increase cache efficiency and minimize
>> cache pollution. Data reference patterns here can be classified in to
>>
>> - Non-temporal(NT) - Data that is referenced once and not reused in
>> immediate future.
>> - Temporal - Data will be used again soon.
>>
>> The Macro variants can be used where there are
>> - Predictable memory access patterns.
>> - Execution pipeline can stall if data isn't available.
>> - Time consuming loops.
>>
>> For example:
>>
>> OVS_PREFETCH_CACHE(addr, OPCH_LTR)
>> - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ.
>> - __builtin_prefetch(addr, 0, 1)
>> - Prefetch data in to L3 cache for readonly purpose.
>>
>> OVS_PREFETCH_CACHE(addr, OPCH_HTW)
>> - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE.
>> - __builtin_prefetch(addr, 1, 3)
>> - Prefetch data in to all caches in anticipation of write. In doing
>> so it invalidates other cached copies so as to gain 'exclusive'
>> access.
>>
>> OVS_PREFETCH(addr)
>> - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ.
>> - __builtin_prefetch(addr, 0, 3)
>> - Prefetch data in to all caches in anticipation of read and that
>> data will be used again soon (HTR - High Temporal Read).
>>
>> Signed-off-by: Bhanuprakash Bodireddy
>> <bhanuprakash.bodireddy at intel.com>
>> ---
>> include/openvswitch/compiler.h | 147
>> ++++++++++++++++++++++++++++++++++++++---
>> 1 file changed, 139 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/openvswitch/compiler.h
>> b/include/openvswitch/compiler.h index c7cb930..94bb24d 100644
>> --- a/include/openvswitch/compiler.h
>> +++ b/include/openvswitch/compiler.h
>> @@ -222,18 +222,149 @@
>> static void f(void)
>> #endif
>>
>> -/* OVS_PREFETCH() can be used to instruct the CPU to fetch the cache
>> - * line containing the given address to a CPU cache.
>> - * OVS_PREFETCH_WRITE() should be used when the memory is going to
>be
>> - * written to. Depending on the target CPU, this can generate the
>> same
>> - * instruction as OVS_PREFETCH(), or bring the data into the cache in
>> an
>> - * exclusive state. */
>> #if __GNUC__
>> -#define OVS_PREFETCH(addr) __builtin_prefetch((addr)) -#define
>> OVS_PREFETCH_WRITE(addr) __builtin_prefetch((addr), 1)
>> +enum cache_locality {
>> + NON_TEMPORAL_LOCALITY,
>> + LOW_TEMPORAL_LOCALITY,
>> + MODERATE_TEMPORAL_LOCALITY,
>> + HIGH_TEMPORAL_LOCALITY
>> +};
>> +
>> +enum cache_rw {
>> + PREFETCH_READ,
>> + PREFETCH_WRITE
>> +};
>> +
>> +/* The prefetch variants gives the user better control on designing
>> +data
>> + * caching strategy in order to increase cache efficiency and
>> +minimize
>> + * cache pollution. Data reference patterns here can be classified in
>> +to
>> + *
>> + * Non-temporal(NT) - Data that is referenced once and not reused in
>> + * immediate future.
>> + * Temporal - Data will be used again soon.
>> + *
>> + * The Macro variants can be used where there are
>> + * o Predictable memory access patterns.
>> + * o Execution pipeline can stall if data isn't available.
>> + * o Time consuming loops.
>> + *
>> + * OVS_PREFETCH_CACHE() can be used to instruct the CPU to fetch the
>> +cache
>> + * line containing the given address to a CPU cache. The second
>> +argument
>> + * OPCH_XXR (or) OPCH_XXW is used to hint if the prefetched data is
>> +going
>> + * to be read or written to by core.
>> + *
>> + * Example Usage:
>> + *
>> + * OVS_PREFETCH_CACHE(addr, OPCH_LTR)
>> + * - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ.
>> + * - __builtin_prefetch(addr, 0, 1)
>> + * - Prefetch data in to L3 cache for readonly purpose.
>> + *
>> + * OVS_PREFETCH_CACHE(addr, OPCH_HTW)
>> + * - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE.
>> + * - __builtin_prefetch(addr, 1, 3)
>> + * - Prefetch data in to all caches in anticipation of write. In
>> doing
>> + * so it invalidates other cached copies so as to gain
>> 'exclusive'
>> + * access.
>> + *
>> + * OVS_PREFETCH(addr)
>> + * - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ.
>> + * - __builtin_prefetch(addr, 0, 3)
>> + * - Prefetch data in to all caches in anticipation of read and
>> that
>> + * data will be used again soon (HTR - High Temporal Read).
>> + *
>> + * Implementation details of prefetch hint instructions may vary
>> + across
>> + * different processors and microarchitectures.
>
>Herein lies a potential problem, have you tested this on systems that have
>different interpretations of the prefetch hints? What about systems that
>don't support it?
[BHANU]
I have tested it on different intel micro architectures(Haswell, Broadwell, skylake).
I understand that you are concerned about ARM platform, I see that ARM do support prefetch variants and they have the same functionality as x86_64.
For example, the below code snippet when compiled on ARM64 with gcc 5.4
void pref(void *p) {
__builtin_prefetch(p,0,0);
__builtin_prefetch(p,0,1);
__builtin_prefetch(p,0,2);
__builtin_prefetch(p,0,3);
__builtin_prefetch(p,1,0);
__builtin_prefetch(p,1,1);
__builtin_prefetch(p,1,2);
__builtin_prefetch(p,1,3);
}
ON ARM64 (gcc 5.4) :
pref:
prfm PLDL1STRM, [x0]
prfm PLDL3KEEP, [x0]
prfm PLDL2KEEP, [x0]
prfm PLDL1KEEP, [x0]
prfm PSTL1STRM, [x0]
prfm PSTL3KEEP, [x0]
prfm PSTL2KEEP, [x0]
prfm PSTL1KEEP, [x0]
ret
On instruction details: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802b/PRFM_imm.html
The best way to verify different platforms and complier versions is to use https://gcc.godbolt.org/
>
>In some cases OVS will be compiled on one system but then deployed on
>another, they might not be the same HW platform. What happens in that
>case?
If the target doesn't support the prefetch, it might be a NOP on that platform and doesn't cause any application crashes or performance penalties.
>
>Will it behave as expected i.e. similar fashion to how prefetch currently
>behaves?
Yes.
>
>> + *
>> + * OPCH_NTW, OPCH_LTW, OPCH_MTW uses prefetchwt1 instruction and
>> +OPCH_HTW
>> + * uses prefetchw instruction when available. Refer Documentation on
>> +how
>> + * to enable prefetchwt1 instruction.
>
>Just to clarify, Is it HW documentation for a user's setup they must refer to?
[BHANU]
Nope, I meant the OvS Documentation in this patch.
https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343101.html
>Are there any extra setup steps for compilers etc. for these instructions?
[BHANU] True, this has been clearly mentioned in the Documentation in the above specified link.
>
>I would expect something like this to be added to the OVS docs.
>
>> + *
>> + * PREFETCH HINT Instruction GCC builtin function
>> + * -------------------------------------------------------
>> + * OPCH_NTR prefetchnta __builtin_prefetch(a, 0, 0)
>> + * OPCH_LTR prefetcht2 __builtin_prefetch(a, 0, 1)
>> + * OPCH_MTR prefetcht1 __builtin_prefetch(a, 0, 2)
>> + * OPCH_HTR prefetcht0 __builtin_prefetch(a, 0, 3)
>> + *
>> + * OPCH_NTW prefetchwt1 __builtin_prefetch(a, 1, 0)
>> + * OPCH_LTW prefetchwt1 __builtin_prefetch(a, 1, 1)
>> + * OPCH_MTW prefetchwt1 __builtin_prefetch(a, 1, 2)
>> + * OPCH_HTW prefetchw __builtin_prefetch(a, 1, 3)
>> + *
>> + * */
>> +#define OVS_PREFETCH_CACHE_HINT
>> \
>> + OPCH(OPCH_NTR, PREFETCH_READ, NON_TEMPORAL_LOCALITY,
>> \
>> + "Fetch data to non-temporal cache close to processor"
>> \
>> + "to minimize cache pollution")
>> \
>> + OPCH(OPCH_LTR, PREFETCH_READ, LOW_TEMPORAL_LOCALITY,
>> \
>> + "Fetch data to L2 and L3 cache")
>> \
>> + OPCH(OPCH_MTR, PREFETCH_READ, MODERATE_TEMPORAL_LOCALITY,
>> \
>> + "Fetch data to L2 and L3 caches, same as LTR on"
>> \
>> + "Nehalem, Westmere, Sandy Bridge and newer
>> + microarchitectures")
>> \
>> + OPCH(OPCH_HTR, PREFETCH_READ, HIGH_TEMPORAL_LOCALITY,
>> \
>> + "Fetch data in to all cache levels L1, L2 and L3")
>> \
>> + OPCH(OPCH_NTW, PREFETCH_WRITE, NON_TEMPORAL_LOCALITY,
>> \
>> + "Fetch data to L2 and L3 cache in exclusive state"
>> \
>> + "in anticipation of write")
>> \
>> + OPCH(OPCH_LTW, PREFETCH_WRITE, LOW_TEMPORAL_LOCALITY,
>> \
>> + "Fetch data to L2 and L3 cache in exclusive state")
>> \
>> + OPCH(OPCH_MTW, PREFETCH_WRITE,
>MODERATE_TEMPORAL_LOCALITY,
>> \
>> + "Fetch data in to L2 and L3 caches in exclusive state")
>> \
>> + OPCH(OPCH_HTW, PREFETCH_WRITE, HIGH_TEMPORAL_LOCALITY,
>> \
>> + "Fetch data in to all cache levels in exclusive state")
>> +
>> +/* Indexes for cache prefetch types. */ enum { #define OPCH(ENUM, RW,
>> +LOCALITY, EXPLANATION) ENUM##_INDEX,
>> + OVS_PREFETCH_CACHE_HINT
>> +#undef OPCH
>> +};
>> +
>> +/* Cache prefetch types. */
>> +enum ovs_prefetch_type {
>> +#define OPCH(ENUM, RW, LOCALITY, EXPLANATION) ENUM = 1 <<
>ENUM##_INDEX,
>> + OVS_PREFETCH_CACHE_HINT
>> +#undef OPCH
>> +};
>> +
>> +#define OVS_PREFETCH_CACHE(addr, TYPE) switch(TYPE)
>
>Checkpatch caught the following:
>
>ERROR: Improper whitespace around control block
>#164 FILE: include/openvswitch/compiler.h:331:
>#define OVS_PREFETCH_CACHE(addr, TYPE) switch(TYPE) \
>
>Lines checked: 204, Warnings: 0, Errors: 1> \
[BHANU]
I will fix this.
>> +{
>> \
>> + case OPCH_NTR:
>> \
>> + __builtin_prefetch((addr), PREFETCH_READ,
>> + NON_TEMPORAL_LOCALITY);
>> \
>> + break;
>> \
>> + case OPCH_LTR:
>> \
>> + __builtin_prefetch((addr), PREFETCH_READ,
>> + LOW_TEMPORAL_LOCALITY);
>> \
>> + break;
>> \
>> + case OPCH_MTR:
>> \
>> + __builtin_prefetch((addr), PREFETCH_READ,
>> \
>> + MODERATE_TEMPORAL_LOCALITY);
>> \
>> + break;
>> \
>> + case OPCH_HTR:
>> \
>> + __builtin_prefetch((addr), PREFETCH_READ,
>> HIGH_TEMPORAL_LOCALITY); \
>> + break;
>> \
>> + case OPCH_NTW:
>> \
>> + __builtin_prefetch((addr), PREFETCH_WRITE,
>> NON_TEMPORAL_LOCALITY); \
>> + break;
>> \
>> + case OPCH_LTW:
>> \
>> + __builtin_prefetch((addr), PREFETCH_WRITE,
>> LOW_TEMPORAL_LOCALITY); \
>> + break;
>> \
>> + case OPCH_MTW:
>> \
>> + __builtin_prefetch((addr), PREFETCH_WRITE,
>> \
>> + MODERATE_TEMPORAL_LOCALITY);
>> \
>> + break;
>> \
>> + case OPCH_HTW:
>> \
>> + __builtin_prefetch((addr), PREFETCH_WRITE,
>> HIGH_TEMPORAL_LOCALITY); \
>> + break;
>> \
>> +}
>> +
>> +/* Retain this for backward compatibility. */ #define
>> +OVS_PREFETCH(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTR) #define
>> +OVS_PREFETCH_WRITE(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTW)
>> #else
>> #define OVS_PREFETCH(addr)
>> #define OVS_PREFETCH_WRITE(addr)
>> +#define OVS_PREFETCH_CACHE(addr, OP)
>> #endif
>>
>> /* Build assertions.
>> --
>> 2.4.11
>>
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
More information about the dev
mailing list