[ovs-dev] [PATCH] ovs-atomics: Add atomic support for Windows.

Ben Pfaff blp at nicira.com
Thu Aug 28 20:01:19 UTC 2014


On Thu, Aug 28, 2014 at 09:57:42AM -0700, Gurucharan Shetty wrote:
> Before this change (i.e., with pthread locks for atomics on Windows),
> the benchmark for cmap and hmap was as follows:
> 
> $ ./tests/ovstest.exe test-cmap benchmark 10000000 3 1
> Benchmarking with n=10000000, 3 threads, 1.00% mutations:
> cmap insert:  61070 ms
> cmap iterate:  2750 ms
> cmap search:  14238 ms
> cmap destroy:  8354 ms
> 
> hmap insert:   1701 ms
> hmap iterate:   985 ms
> hmap search:   3755 ms
> hmap destroy:  1052 ms
> 
> After this change, the benchmark is as follows:
> $ ./tests/ovstest.exe test-cmap benchmark 10000000 3 1
> Benchmarking with n=10000000, 3 threads, 1.00% mutations:
> cmap insert:   3666 ms
> cmap iterate:   365 ms
> cmap search:   2016 ms
> cmap destroy:  1331 ms
> 
> hmap insert:   1495 ms
> hmap iterate:  1026 ms
> hmap search:   4167 ms
> hmap destroy:  1046 ms
> 
> So there is clearly a big improvement for cmap.
> 
> But the correspondig test on Linux (with gcc 4.6) yeilds the following:
> 
> ./tests/ovstest test-cmap benchmark 10000000 3 1
> Benchmarking with n=10000000, 3 threads, 1.00% mutations:
> cmap insert:   3917 ms
> cmap iterate:   355 ms
> cmap search:    871 ms
> cmap destroy:  1158 ms
> 
> hmap insert:   1988 ms
> hmap iterate:  1005 ms
> hmap search:   5428 ms
> hmap destroy:   980 ms
> 
> So for this particular test, except for "cmap search", Windows and
> Linux have similar performance. Windows is around 2.5x slower in "cmap search"
> compared to Linux. This has to be investigated.

Either way it's a great improvement.  Thank you!

This comment needs an update:
> +/* This header implements atomic operation primitives using pthreads. */

We usually format long comments with a line of * down the left side:
> +/* From msdn documentation: With Visual Studio 2003, volatile to volatile
> +references are ordered; the compiler will not re-order volatile variable
> +access. With Visual Studio 2005, the compiler also uses acquire semantics
> +for read operations on volatile variables and release semantics for write
> +operations on volatile variables (when supported by the CPU). */
> +#define ATOMIC(TYPE) TYPE volatile

I see several funny casts with an extra space in them, e.g.
> +                               (int32_t ) SRC);                         \
would normally be
> +                               (int32_t) SRC);                          \

I'd normally expect 64-bit reads and write to be atomic when we're
building for x86-64:
> +/* 64 bit reads and write are not atomic on x86.

I think that the *s here should be outside the parentheses, so that
DST is fully parenthesized, e.g. *(DST) instead of (*DST).  Similarly
for atomic_read_explicit():
> +#define atomic_store_explicit(DST, SRC, ORDER)                           \
> +    if (sizeof (*DST) == 1) {                                            \
> +        atomic_storeX(8, DST, SRC, ORDER)                                \
> +    } else if (sizeof (*DST) == 2) {                                     \
> +        atomic_storeX(16, DST, SRC, ORDER)                               \
> +    } else if (sizeof (*DST) == 4) {                                     \
> +        atomic_store32(DST, SRC, ORDER)                                  \
> +    } else if (sizeof (*DST) == 8) {                                     \
> +        atomic_store64(DST, SRC, ORDER)                                  \
> +    }
Also I wonder whether we should add a final "else { abort(); }" just in
case?  Or "BUILD_ASSERT(sizeof *(DST) == 1 || sizeof *(DST) == 2 ||
...);"?

I think that many of the macros should more carefully parenthesize
their argument expansions.  Here are three examples but I see others:
> +#define atomic_read32(SRC, DST, ORDER)                        \
> +    if (ORDER == memory_order_seq_cst) {                      \
> +        *DST = InterlockedOr((int32_t volatile *) SRC, 0);    \
> +    } else {                                                  \
> +        *DST = *SRC;                                          \
> +    }
> +
> +/* 64 bit reads and writes are not atomic on x86. */
> +#define atomic_read64(SRC, DST, ORDER)                        \
> +    *DST = InterlockedOr64((int64_t volatile *) SRC, 0);
> +
> +/* For 8 and 16 bit variations. */
> +#define atomic_readX(X, SRC, DST, ORDER)                                   \
> +    if (ORDER == memory_order_seq_cst) {                                   \
> +        *DST = InterlockedOr##X((int##X##_t volatile *) SRC, 0);           \
> +    } else {                                                               \
> +        *DST = *SRC;                                                       \
> +    }
> +
> +#define atomic_read(SRC, DST)                               \
> +        atomic_read_explicit(SRC, DST, memory_order_seq_cst)

I am not sure that the stairstepping below is necessary:
> +#define atomic_compare_exchange_strong_explicit(DST, EXP, SRC, ORD1, ORD2)    \
> +    (sizeof (*DST) == 1                                                       \
> +     ? atomic_compare_exchange8((int8_t volatile *) DST, (int8_t *) EXP,      \
> +                                (int8_t ) SRC)                                \
> +     : (sizeof (*DST) == 2                                                    \
> +        ? atomic_compare_exchange16((int16_t volatile *) DST, (int16_t *) EXP,\
> +                                     (int16_t ) SRC)                          \
> +        : (sizeof (*DST) == 4                                                 \
> +           ? atomic_compare_exchange32((int32_t volatile *) DST,              \
> +                                       (int32_t *) EXP, (int32_t ) SRC)       \
> +           : (sizeof (*DST) == 8                                              \
> +              ? atomic_compare_exchange64((int64_t volatile *) DST,           \
> +                                           (int64_t *) EXP, (int64_t ) SRC)   \
> +              : ovs_fatal(0,                                                  \
> +                          "atomic operation with size greater than 8 bytes"), \
> +                atomic_compare_unreachable()))))

I think that it can be written as:
#define atomic_compare_exchange_strong_explicit(DST, EXP, SRC, ORD1, ORD2) \
    (sizeof (*DST) == 1                                                 \
     ? atomic_compare_exchange8((int8_t volatile *) DST, (int8_t *) EXP, \
                                (int8_t ) SRC)                          \
     : sizeof (*DST) == 2                                               \
     ? atomic_compare_exchange16((int16_t volatile *) DST, (int16_t *) EXP, \
                                 (int16_t ) SRC)                        \
     : sizeof (*DST) == 4                                               \
     ? atomic_compare_exchange32((int32_t volatile *) DST,              \
                                 (int32_t *) EXP, (int32_t ) SRC)       \
     : sizeof (*DST) == 8                                               \
     ? atomic_compare_exchange64((int64_t volatile *) DST,              \
                                 (int64_t *) EXP, (int64_t ) SRC)       \
     : (ovs_fatal(0, "atomic operation with size greater than 8 bytes"), \
        atomic_compare_unreachable()))



More information about the dev mailing list