[ovs-dev] [PATCH RFC 2/5] configure: Include -mprefetchwt1 explicitly.

Bodireddy, Bhanuprakash bhanuprakash.bodireddy at intel.com
Thu Dec 7 19:46:19 UTC 2017


>> >> If CPU just skips this instruction we will lost all the prefetching
>> >> optimizations because all the calls will be replaced by non-existent
>'prefetchwt1'.
>> >
>> > [Bhanu] I would be worried if core generates an exception treating
>> > it as illegal instruction. Instead pipeline units treat this as NOP
>> > if it
>> doesn't support it.
>> > So the micro optimizations doesn't really do any thing on the processors
>that doesn't support it.
>>
>> This could be an issue. If someday we'll have real performance
>> optimization based on OPCH_HTW prefetch, we will have prefetchwt1 on
>> system that supports it and NOP on others even if they have usual
>> prefetchw which could provide performance improvement too.

[Bhanu]  Adding the below information only for future reference, (going to point to this thread in the commit log)

On systems that has *only* prefetchw and no prefetchwt1 instruction.
     OPCH_LTW    -   prefetchw 
     OPCH_MTW  -   prefetchw
     OPCH_HTW   -    prefetchw
     OPCH_NTW   -    prefetchw

On systems that supports both prefetchw and prefetchwt1,
     OPCH_LTW    -   prefetchwt1
     OPCH_MTW  -   prefetchwt1
     OPCH_HTW   -    prefetchw
     OPCH_NTW   -    prefetchwt1

So OPCH_HTW would always be prefetchw and LTW/MTW/HTW  might turn in to NOPs on processors that support prefetchw alone.
(when compiled with CFLAGS = -march=native -mprefetchwt1)

>>
>> As I understand, checking of '-mprefetchwt1' is equal to checking
>> compiler version. It doesn't check anything about supporting of this
>instruction in CPU.
>> This could end up with non-working performance optimizations and even
>> degradation on systems that supports usual prefetches but not
>> prefetchwt1 (useless NOPs degrades performance if they are on a hot
>path).
>>
>> IMHO, This compiler option should be passed only if CPU really supports it.
>> I guess, the maximum that we can do is add a note into performance
>> optimization guide that '-mprefetchwt1' could be passed via CFLAGS if
>> user sure that it supported by target CPU.
>
>That is my thinking as well. The people/organizations building OVS packages
>for deployment have the responsibility to specify the minimum requirements
>on the target architecture and feed that into the compiler using CFLAGS. That
>may well be leaning towards the lower end of capabilities to maximize
>compatibility and sacrifice some performance on high-end CPUs.
>
>The specialized prefetch macros should be mapped to the best available
>target instructions by the compiler and/or conditional compile directives
>based on the CFLAGS architecture settings.
>
>We would gather all these target-specific compiler optimization guidelines in
>the advanced DPDK documentation of OVS.
>
>Of course developers or benchmark testers are free to use -march=native or
>similar at their discretion in their local test beds for best possible performance.

If the general view is get rid of this flag at compilation and only to document this, I am happy with this and can update the documentation.
But I still think we are being too defensive here and with few NOPs performance impact isn't even noticeable. 

- Bhanuprakash.


More information about the dev mailing list