[ovs-dev] [PATCH 1/3] lib/ovs-atomic-i586: Faster 64-bit atomics on 32-bit builds with SSE.

Jarno Rajahalme jrajahalme at nicira.com
Thu Oct 2 16:14:42 UTC 2014


On Oct 1, 2014, at 4:38 PM, Jarno Rajahalme <jrajahalme at nicira.com> wrote:

> 
> On Sep 26, 2014, at 11:20 AM, Ben Pfaff <blp at nicira.com> wrote:
> 
>> On Wed, Sep 24, 2014 at 11:24:00AM -0700, Jarno Rajahalme wrote:
>>> Aligned 64-bit memory accesses in i586 are atomic.  By using an SSE
>>> register we can make such memory accesses in one instruction without
>>> bus-locking.  Need to compile with -msse to enable this feature.
>>> 
>>> Signed-off-by: Jarno Rajahalme <jrajahalme at nicira.com>
>> 
>> I guess that ovs-atomic-i586 must be aimed at older versions of
>> XenServer, which always run on 64-bit capable processors but in 32-bit
>> mode.  That means that we can always build with -msse for XenServer.
>> Should we patch xenserver/openvswitch-xen.spec to do that? 
>> 
> 
> Yes, I think we should do that. Maybe you are familiar with that file already, so…
> 

64-bit capable CPUs have sse2, so better make it -msse2.

>> The non-SSE code in atomic_read_8__() is very clever.  I am not sure
>> that I would have thought of using the existing value in EBX:ECX as
>> the value to write as well.  It works around the PIC issue very well,
>> without needing any extra code.
>> 
> 
> That cleverness I must have borrowed from somewhere else.
> 
>> I am not sure why the asm statements for reading atomic variables are
>> volatile.  I don't think they have any side effects.
>> 
> 
> GCC manual:
> 
> "6.42.2.1 Volatile
> 
> GCC's optimizers sometimes discard asm statements if they determine there is no need for the output variables. Also, the optimizers may move code out of loops if they believe that the code will always return the same result (i.e. none of its input values change between calls). Using the volatile qualifier disables these optimizations. asm statements that have no output operands are implicitly volatile."
> 
> 
> Reading an atomic variable in a loop may return a different value, even when the input operands (an address) is the same, as another thread may be writing to the same variable, so the optimizations mentioned above should be disabled. Or do you think that the fact that the pointer itself is defined as volatile is enough?
> 

I added some more testing for this and removed the volatile’s from atomic read asm lines.

>> Acked-by: Ben Pfaff <blp at nicira.com>

Pushed to master,

  Jarno

> 
> Thanks!
> 




More information about the dev mailing list