[ovs-dev] [PATCH 1/3] lib/ovs-atomic-i586: Faster 64-bit atomics on 32-bit builds with SSE.

Ben Pfaff blp at nicira.com
Fri Oct 3 22:05:26 UTC 2014


On Thu, Oct 02, 2014 at 09:14:42AM -0700, Jarno Rajahalme wrote:
> 
> On Oct 1, 2014, at 4:38 PM, Jarno Rajahalme <jrajahalme at nicira.com> wrote:
> 
> > 
> > On Sep 26, 2014, at 11:20 AM, Ben Pfaff <blp at nicira.com> wrote:
> > 
> >> On Wed, Sep 24, 2014 at 11:24:00AM -0700, Jarno Rajahalme wrote:
> >>> Aligned 64-bit memory accesses in i586 are atomic.  By using an SSE
> >>> register we can make such memory accesses in one instruction without
> >>> bus-locking.  Need to compile with -msse to enable this feature.
> >>> 
> >>> Signed-off-by: Jarno Rajahalme <jrajahalme at nicira.com>
> >> 
> >> I guess that ovs-atomic-i586 must be aimed at older versions of
> >> XenServer, which always run on 64-bit capable processors but in 32-bit
> >> mode.  That means that we can always build with -msse for XenServer.
> >> Should we patch xenserver/openvswitch-xen.spec to do that? 
> >> 
> > 
> > Yes, I think we should do that. Maybe you are familiar with that file already, so?
> > 
> 
> 64-bit capable CPUs have sse2, so better make it -msse2.

OK, I'll work on a patch.

> >> The non-SSE code in atomic_read_8__() is very clever.  I am not sure
> >> that I would have thought of using the existing value in EBX:ECX as
> >> the value to write as well.  It works around the PIC issue very well,
> >> without needing any extra code.
> >> 
> > 
> > That cleverness I must have borrowed from somewhere else.
> > 
> >> I am not sure why the asm statements for reading atomic variables are
> >> volatile.  I don't think they have any side effects.
> >> 
> > 
> > GCC manual:
> > 
> > "6.42.2.1 Volatile
> > 
> > GCC's optimizers sometimes discard asm statements if they
> > determine there is no need for the output variables. Also, the
> > optimizers may move code out of loops if they believe that the
> > code will always return the same result (i.e. none of its input
> > values change between calls). Using the volatile qualifier
> > disables these optimizations. asm statements that have no output
> > operands are implicitly volatile."
> > 
> > 
> > Reading an atomic variable in a loop may return a different value,
> > even when the input operands (an address) is the same, as another
> > thread may be writing to the same variable, so the optimizations
> > mentioned above should be disabled. Or do you think that the fact
> > that the pointer itself is defined as volatile is enough?
> 
> I added some more testing for this and removed the volatile?s from
> atomic read asm lines.

Hmm.  I should have replied more quickly.  I had this idea that
volatile only related to side effects, but your rationale for using
volatile makes sense to me.  I guess that based on your testing you
are confident that volatile is not needed after all?

Thanks,

Ben.



More information about the dev mailing list