[ovs-dev] [PATCH] ovs-atomics: Add atomic support for Windows.
Ben Pfaff
blp at nicira.com
Thu Aug 28 20:01:19 UTC 2014
On Thu, Aug 28, 2014 at 09:57:42AM -0700, Gurucharan Shetty wrote:
> Before this change (i.e., with pthread locks for atomics on Windows),
> the benchmark for cmap and hmap was as follows:
>
> $ ./tests/ovstest.exe test-cmap benchmark 10000000 3 1
> Benchmarking with n=10000000, 3 threads, 1.00% mutations:
> cmap insert: 61070 ms
> cmap iterate: 2750 ms
> cmap search: 14238 ms
> cmap destroy: 8354 ms
>
> hmap insert: 1701 ms
> hmap iterate: 985 ms
> hmap search: 3755 ms
> hmap destroy: 1052 ms
>
> After this change, the benchmark is as follows:
> $ ./tests/ovstest.exe test-cmap benchmark 10000000 3 1
> Benchmarking with n=10000000, 3 threads, 1.00% mutations:
> cmap insert: 3666 ms
> cmap iterate: 365 ms
> cmap search: 2016 ms
> cmap destroy: 1331 ms
>
> hmap insert: 1495 ms
> hmap iterate: 1026 ms
> hmap search: 4167 ms
> hmap destroy: 1046 ms
>
> So there is clearly a big improvement for cmap.
>
> But the correspondig test on Linux (with gcc 4.6) yeilds the following:
>
> ./tests/ovstest test-cmap benchmark 10000000 3 1
> Benchmarking with n=10000000, 3 threads, 1.00% mutations:
> cmap insert: 3917 ms
> cmap iterate: 355 ms
> cmap search: 871 ms
> cmap destroy: 1158 ms
>
> hmap insert: 1988 ms
> hmap iterate: 1005 ms
> hmap search: 5428 ms
> hmap destroy: 980 ms
>
> So for this particular test, except for "cmap search", Windows and
> Linux have similar performance. Windows is around 2.5x slower in "cmap search"
> compared to Linux. This has to be investigated.
Either way it's a great improvement. Thank you!
This comment needs an update:
> +/* This header implements atomic operation primitives using pthreads. */
We usually format long comments with a line of * down the left side:
> +/* From msdn documentation: With Visual Studio 2003, volatile to volatile
> +references are ordered; the compiler will not re-order volatile variable
> +access. With Visual Studio 2005, the compiler also uses acquire semantics
> +for read operations on volatile variables and release semantics for write
> +operations on volatile variables (when supported by the CPU). */
> +#define ATOMIC(TYPE) TYPE volatile
I see several funny casts with an extra space in them, e.g.
> + (int32_t ) SRC); \
would normally be
> + (int32_t) SRC); \
I'd normally expect 64-bit reads and write to be atomic when we're
building for x86-64:
> +/* 64 bit reads and write are not atomic on x86.
I think that the *s here should be outside the parentheses, so that
DST is fully parenthesized, e.g. *(DST) instead of (*DST). Similarly
for atomic_read_explicit():
> +#define atomic_store_explicit(DST, SRC, ORDER) \
> + if (sizeof (*DST) == 1) { \
> + atomic_storeX(8, DST, SRC, ORDER) \
> + } else if (sizeof (*DST) == 2) { \
> + atomic_storeX(16, DST, SRC, ORDER) \
> + } else if (sizeof (*DST) == 4) { \
> + atomic_store32(DST, SRC, ORDER) \
> + } else if (sizeof (*DST) == 8) { \
> + atomic_store64(DST, SRC, ORDER) \
> + }
Also I wonder whether we should add a final "else { abort(); }" just in
case? Or "BUILD_ASSERT(sizeof *(DST) == 1 || sizeof *(DST) == 2 ||
...);"?
I think that many of the macros should more carefully parenthesize
their argument expansions. Here are three examples but I see others:
> +#define atomic_read32(SRC, DST, ORDER) \
> + if (ORDER == memory_order_seq_cst) { \
> + *DST = InterlockedOr((int32_t volatile *) SRC, 0); \
> + } else { \
> + *DST = *SRC; \
> + }
> +
> +/* 64 bit reads and writes are not atomic on x86. */
> +#define atomic_read64(SRC, DST, ORDER) \
> + *DST = InterlockedOr64((int64_t volatile *) SRC, 0);
> +
> +/* For 8 and 16 bit variations. */
> +#define atomic_readX(X, SRC, DST, ORDER) \
> + if (ORDER == memory_order_seq_cst) { \
> + *DST = InterlockedOr##X((int##X##_t volatile *) SRC, 0); \
> + } else { \
> + *DST = *SRC; \
> + }
> +
> +#define atomic_read(SRC, DST) \
> + atomic_read_explicit(SRC, DST, memory_order_seq_cst)
I am not sure that the stairstepping below is necessary:
> +#define atomic_compare_exchange_strong_explicit(DST, EXP, SRC, ORD1, ORD2) \
> + (sizeof (*DST) == 1 \
> + ? atomic_compare_exchange8((int8_t volatile *) DST, (int8_t *) EXP, \
> + (int8_t ) SRC) \
> + : (sizeof (*DST) == 2 \
> + ? atomic_compare_exchange16((int16_t volatile *) DST, (int16_t *) EXP,\
> + (int16_t ) SRC) \
> + : (sizeof (*DST) == 4 \
> + ? atomic_compare_exchange32((int32_t volatile *) DST, \
> + (int32_t *) EXP, (int32_t ) SRC) \
> + : (sizeof (*DST) == 8 \
> + ? atomic_compare_exchange64((int64_t volatile *) DST, \
> + (int64_t *) EXP, (int64_t ) SRC) \
> + : ovs_fatal(0, \
> + "atomic operation with size greater than 8 bytes"), \
> + atomic_compare_unreachable()))))
I think that it can be written as:
#define atomic_compare_exchange_strong_explicit(DST, EXP, SRC, ORD1, ORD2) \
(sizeof (*DST) == 1 \
? atomic_compare_exchange8((int8_t volatile *) DST, (int8_t *) EXP, \
(int8_t ) SRC) \
: sizeof (*DST) == 2 \
? atomic_compare_exchange16((int16_t volatile *) DST, (int16_t *) EXP, \
(int16_t ) SRC) \
: sizeof (*DST) == 4 \
? atomic_compare_exchange32((int32_t volatile *) DST, \
(int32_t *) EXP, (int32_t ) SRC) \
: sizeof (*DST) == 8 \
? atomic_compare_exchange64((int64_t volatile *) DST, \
(int64_t *) EXP, (int64_t ) SRC) \
: (ovs_fatal(0, "atomic operation with size greater than 8 bytes"), \
atomic_compare_unreachable()))
More information about the dev
mailing list