[ovs-dev] [PATCH RFC 2/5] configure: Include -mprefetchwt1 explicitly.

Bodireddy, Bhanuprakash bhanuprakash.bodireddy at intel.com
Tue Dec 5 16:19:08 UTC 2017


[...]
>int main()
>{
>        int c;
>
>        __builtin_prefetch(&c, 1, 1);
>        c = 8;
>
>        return c;
>}
>
>on my old Ivy Bridge i7-3770 CPU. It does not support even 'prefetchw':
>
>      PREFETCHWT1                              = false
>      3DNow! PREFETCH/PREFETCHW instructions = false
>
>Results:

[Bhanu] I  found https://gcc.godbolt.org/ the other day and its handy to generate code for different targets and compilers.

>$ gcc 1.c
>$ objdump -S ./a.out | grep prefetch -A2 -B2
>  40055b:       31 c0                   xor    %eax,%eax
>  40055d:       48 8d 45 f4             lea    -0xc(%rbp),%rax
>  400561:       0f 18 18                prefetcht2 (%rax)
>  400564:       c7 45 f4 08 00 00 00    movl   $0x8,-0xc(%rbp)
>  40056b:       8b 45 f4                mov    -0xc(%rbp),%eax

[Bhanu] Expected and compiler generates prefetcht2.

>
>$ gcc 1.c -march=native
>$ objdump -S ./a.out | grep prefetch -A2 -B2
>  40055b:       31 c0                   xor    %eax,%eax
>  40055d:       48 8d 45 f4             lea    -0xc(%rbp),%rax
>  400561:       0f 18 18                prefetcht2 (%rax)
>  400564:       c7 45 f4 08 00 00 00    movl   $0x8,-0xc(%rbp)
>  40056b:       8b 45 f4                mov    -0xc(%rbp),%eax

[Bhanu] Though march=native is specified the processor doesn't  have it and still prefetchnt2 is generated by compiler.

>$ gcc 1.c -march=native -mprefetchwt1
>$ objdump -S ./a.out | grep prefetch -A2 -B2
>  40055b:       31 c0                   xor    %eax,%eax
>  40055d:       48 8d 45 f4             lea    -0xc(%rbp),%rax
>  400561:       0f 0d 10                prefetchwt1 (%rax)
>  400564:       c7 45 f4 08 00 00 00    movl   $0x8,-0xc(%rbp)
>  40056b:       8b 45 f4                mov    -0xc(%rbp),%eax

[Bhanu] The compiler inserts prefetchwt1 instruction as we asked it to do.

>
>So, it inserts this instruction even if I have no such instruction in CPU.

[Bhanu] 
Though the compiler generates this, as the instruction isn't available on the processor it just become a multi byte NO-Operation(NOP).
On processors(Intel) that doesn't have prefetchw or 3D Now feature(AMD)  it decodes in to NOP.
http://ref.x86asm.net/coder64.html#x0F0D
	- Click on '0D' in two-byte opcode index - (16.  0F0D NOP)
               -  More information on this can be found in Intel SW developers manual (Combined Volumes)

>More interesting is that program still works without any issues.
>I assume that CPU just skips that instruction or executes something else.

[Bhanu] This is what is mostly expected. On processors that supports prefetchwt1 it executes and others it just becomes a NOP.

>
>So, it's really strange and it's unclear what CPU really executes in case where
>we have 'prefetchwt1' in code but not supported by CPU.

[Bhanu] It’s decoded in to NOP may be by pipeline decoding units.

>
>If CPU just skips this instruction we will lost all the prefetching optimizations
>because all the calls will be replaced by non-existent 'prefetchwt1'.

[Bhanu] I would be worried if core generates an exception treating it as illegal instruction. Instead pipeline units treat this as NOP if it doesn't support it.
So the micro optimizations doesn't really do any thing on the processors that doesn't support it.

>
>How can we be sure that 'prefetchwt1' was really executed?

[Bhanu] I don’t know how we can see this unless we can peek in to Instruction queues & Decoders of the pipeline :(.

- Bhanuprakash.


More information about the dev mailing list