[ovs-dev] [PATCH] Revert "dpif_netdev: Refactor dp_netdev_pmd_thread structure."
Ilya Maximets
i.maximets at samsung.com
Tue Nov 28 13:22:46 UTC 2017
Hello, Ben.
Thanks for looking at this.
On 28.11.2017 01:54, Ben Pfaff wrote:
> On Mon, Nov 27, 2017 at 02:59:29PM +0300, Ilya Maximets wrote:
>>> I also verified the other case when posix_memalign isn't available and even in that case
>>> it returns the address aligned on CACHE_LINE_SIZE boundary. I will send out a patch to use
>>> xzalloc_cacheline for allocating the memory.
>>
>> I don't know how you tested this, because it is impossible:
>>
>> 1. OVS allocates some memory:
>> base = xmalloc(...);
>>
>> 2. Rounds it up to the cache line start:
>> payload = (void **) ROUND_UP((uintptr_t) base, CACHE_LINE_SIZE);
>>
>> 3. Returns the pointer increased by 8 bytes:
>> return (char *) payload + MEM_ALIGN;
>>
>> So, unless you redefined MEM_ALIGN to zero, you will never get aligned memory
>> address while allocating by xmalloc_cacheline() on system without posix_memalign().
>
> Maybe we should make the non-HAVE_POSIX_MEMALIGN case better. Something
> like this (compile tested only);
That's interesting, but the question here is a bit different.
We definitely can apply 5-10 more patches to fix this unaligned memory allocation,
paddings, holes in case of different cache line sizes / mutex sizes, whatever else
we forget to mention, but *do we need this*? Why we're making big efforts to make
this work as it should?
Padding/alignment of struct dp_netdev_pmd_thread:
- Cons:
* Code complication.
* Worse grouping of structure members.
(In compare with what we can have without padding/alignment restrictions)
* Poor extensibility: Even small struct modifications may require regrouping
and rearranging of many other structure members. For example:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341230.html
* There is no any valuable performance impact (At least, all the results I saw
shows close performance between 'with' and 'without' padding/alignment cases)
* Broken now: Few additional patches required to make it work as intended.
(aligned memory allocation)
* Targeted to optimize x86 only. Doesn't care about different cache line sizes,
sizes of system/libc defined structures on different platforms. (Few more
patches required to fix this)
- Pros:
* Didn't find any.
I consider previously applied structure refactoring as an over-engineering which doesn't
have any positive impact.
---
Beside that, looking at the code below:
Do we care about false sharing on alloc/free operations? That is the difference between
old and new implementations. The new one will write and read bytes possibly placed
on a shared cacheline on alloc/free.
Best regards, Ilya Maximets.
>
> diff --git a/lib/util.c b/lib/util.c
> index 9e6edd27ae4c..137091a3cd4f 100644
> --- a/lib/util.c
> +++ b/lib/util.c
> @@ -196,15 +196,9 @@ x2nrealloc(void *p, size_t *n, size_t s)
> return xrealloc(p, *n * s);
> }
>
> -/* The desired minimum alignment for an allocated block of memory. */
> -#define MEM_ALIGN MAX(sizeof(void *), 8)
> -BUILD_ASSERT_DECL(IS_POW2(MEM_ALIGN));
> -BUILD_ASSERT_DECL(CACHE_LINE_SIZE >= MEM_ALIGN);
> -
> -/* Allocates and returns 'size' bytes of memory in dedicated cache lines. That
> - * is, the memory block returned will not share a cache line with other data,
> - * avoiding "false sharing". (The memory returned will not be at the start of
> - * a cache line, though, so don't assume such alignment.)
> +/* Allocates and returns 'size' bytes of memory aligned to a cache line and in
> + * dedicated cache lines. That is, the memory block returned will not share a
> + * cache line with other data, avoiding "false sharing".
> *
> * Use free_cacheline() to free the returned memory block. */
> void *
> @@ -221,28 +215,37 @@ xmalloc_cacheline(size_t size)
> }
> return p;
> #else
> - void **payload;
> - void *base;
> -
> /* Allocate room for:
> *
> - * - Up to CACHE_LINE_SIZE - 1 bytes before the payload, so that the
> - * start of the payload doesn't potentially share a cache line.
> + * - Header padding: Up to CACHE_LINE_SIZE - 1 bytes, to allow the
> + * pointer to be aligned exactly sizeof(void *) bytes before the
> + * beginning of a cache line.
> *
> - * - A payload consisting of a void *, followed by padding out to
> - * MEM_ALIGN bytes, followed by 'size' bytes of user data.
> + * - Pointer: A pointer to the start of the header padding, to allow us
> + * to free() the block later.
> *
> - * - Space following the payload up to the end of the cache line, so
> - * that the end of the payload doesn't potentially share a cache line
> - * with some following block. */
> - base = xmalloc((CACHE_LINE_SIZE - 1)
> - + ROUND_UP(MEM_ALIGN + size, CACHE_LINE_SIZE));
> -
> - /* Locate the payload and store a pointer to the base at the beginning. */
> - payload = (void **) ROUND_UP((uintptr_t) base, CACHE_LINE_SIZE);
> - *payload = base;
> -
> - return (char *) payload + MEM_ALIGN;
> + * - User data: 'size' bytes.
> + *
> + * - Trailer padding: Enough to bring the user data up to a cache line
> + * multiple.
> + *
> + * +---------------+---------+------------------------+---------+
> + * | header | pointer | user data | trailer |
> + * +---------------+---------+------------------------+---------+
> + * ^ ^ ^
> + * | | |
> + * p q r
> + *
> + */
> + void *p = xmalloc((CACHE_LINE_SIZE - 1)
> + + sizeof(void *)
> + + ROUND_UP(size, CACHE_LINE_SIZE));
> + bool runt = PAD_SIZE((uintptr_t) p, CACHE_LINE_SIZE) < sizeof(void *);
> + void *r = (void *) ROUND_UP((uintptr_t) p + (runt ? CACHE_LINE_SIZE : 0),
> + CACHE_LINE_SIZE);
> + void **q = (void **) r - 1;
> + *q = p;
> + return r;
> #endif
> }
>
> @@ -265,7 +268,8 @@ free_cacheline(void *p)
> free(p);
> #else
> if (p) {
> - free(*(void **) ((uintptr_t) p - MEM_ALIGN));
> + void **q = (void **) p - 1;
> + free(*q);
> }
> #endif
> }
>
>
>
More information about the dev
mailing list