[ovs-dev] [cmap v2 1/2] cmap: New module for cuckoo hash table.

Tue May 20 21:13:58 UTC 2014

On May 20, 2014, at 2:03 PM, Ben Pfaff <blp at nicira.com> wrote:

> On Tue, May 20, 2014 at 01:22:45PM -0700, Jarno Rajahalme wrote:
>> 
>> On May 20, 2014, at 9:48 AM, Ben Pfaff <blp at nicira.com> wrote:
>> 
>>> On Tue, May 20, 2014 at 09:12:23AM -0700, Ben Pfaff wrote:
>>>> On Fri, May 16, 2014 at 04:38:02PM -0700, Jarno Rajahalme wrote:
>>>>>> +    if (b->nodes[slot] == node) {
>>>>>> +        cmap_set_bucket(b, slot, cmap_node_next_protected(node), hash);
>>>>> 
>>>>> ?hash? is not changing here, so could just set the nodes:
>>>>> 
>>>>>       b->nodes[slot] = cmap_node_next_protected(node);
>>>>> 
>>>>> Btw, what is the rationale that the nodes pointers are not RCU
>>>>> pointers? If they were, it would feel possible to combine this special
>>>>> case with the loop below.
>>>> 
>>>> Good points.  I'll work on that for a v3.
>>> 
>>> After thinking a little further, I am not sure that it would become
>>> possible to combine them, because I think that the cases are a little
>>> different:
>>> 
>>>       * If we are removing the last node with a hash, which is usually
>>>         the case if node == b->nodes[slot], then we want to make sure
>>>         that from the viewpoint of any reader that this is atomic
>>>         (that is, the change to ->hashes[] and ->nodes[]), by
>>>         incrementing the counter around the change.  I am not
>>>         absolutely certain that this is required, but the cost is
>>>         minimal so, lacking confidence, I prefer to do it.
>> 
>> My point was that the hash need not be changed, if the dup insertion
>> code is also changed to not care about the node value (see the other
>> comment I just sent).
> 
> OK.  Like this?
> 
> diff --git a/lib/cmap.c b/lib/cmap.c
> index 30a6e2d..d291ec5 100644
> --- a/lib/cmap.c
> +++ b/lib/cmap.c
> @@ -648,7 +648,8 @@ cmap_remove__(struct cmap_impl *impl, struct cmap_node *node,
>     }
> 
>     if (b->nodes[slot] == node) {
> -        cmap_set_bucket(b, slot, cmap_node_next_protected(node), hash);
> +        b->nodes[slot] = cmap_node_next_protected(node);
> +        atomic_thread_fence(memory_order_release);

Yes. Would an ovsrcu_set() include the release barrier, if nodes[] were RCU pointers? (This relating to my earlier question about that.)

>     } else {
>         struct cmap_node *iter = b->nodes[slot];
>         for (;;) {
> 
>>> 
>>>       * Otherwise, we are shortening a linked list, but not
>>>         eliminating its slot, which does not affect readers in the
>>>         same way, so an ordinary RCU store should suffice.
>>> 
>>> What I'm increasingly uncertain about is whether cmap_find() is correct.
>>> The intention is that the atomic reads of the counters before and after
>>> checking the nodes and the hashes should ensure that the cache lines
>>> occupied by the buckets are stable.  I think that's going to be true in
>>> practice, with current compiler technology.  But I am not sure that the
>>> atomic_reads on the counters actually means that the node and hash reads
>>> can't be moved outside the counter reads.  If not, though, making the
>>> node reads atomic (via RCU) and even the hash reads atomic (by making
>>> them atomic_uint32s) wouldn't help.  I think that the only thing that
>>> would help would be adding explicit acquire and release barriers.  That
>>> might actually, in conjunction with good comments, be clearer than what
>>> we have now.
>>> 
>>> What do you think?
>> 
>> atomic_read() implies memory_order_seq_cst, which is stronger than
>> memory_order_acquire. An memory_order_acquire should guarantee that
>> no memory operations after the atomic_read are moved before it, so
>> we read the data only after reading an even counter. When we re-read
>> the counter to verify it has not changed, an acquire barrier would
>> guarantee no memory operations after the check are moved before it,
>> but it would be possible for the memory operations before it to be
>> moved after it. So the check needs a release barrier, even if we are
>> only reading, to guarantee that memory operations before the check
>> are not moved after it. The memory_order_seq_cst implied by
>> atomic_read() does that, but is too strong, a memory_order_acq_rel
>> should suffice, or even memory_order_acquire for the
>> read_even_counter, and a memory_order_release for a
>> ?check_counter()?. Makes sense?
> 
> Yes.  Like this?
> 
> diff --git a/lib/cmap.c b/lib/cmap.c
> index 30a6e2d..d291ec5 100644
> --- a/lib/cmap.c
> +++ b/lib/cmap.c
> @@ -245,11 +245,11 @@ cmap_is_empty(const struct cmap *cmap)
> }
> 
> static uint32_t
> -read_counter(struct cmap_bucket *bucket)
> +read_counter(struct cmap_bucket *bucket, memory_order order)
> {
>     uint32_t counter;
> 
> -    atomic_read(&bucket->counter, &counter);
> +    atomic_read_explicit(&bucket->counter, &counter, order);
>     return counter;
> }
> 
> @@ -259,7 +259,7 @@ read_even_counter(struct cmap_bucket *bucket)
>     uint32_t counter;
> 
>     do {
> -        counter = read_counter(bucket);
> +        counter = read_counter(bucket, memory_order_acquire);
>     } while (OVS_UNLIKELY(counter & 1));
> 
>     return counter;
> @@ -291,7 +291,7 @@ retry:
>     for (i = 0; i < CMAP_K; i++) {
>         struct cmap_node *node = b1->nodes[i];
>         if (node && b1->hashes[i] == hash) {
> -            if (OVS_UNLIKELY(read_counter(b1) != c1)) {
> +            if (OVS_UNLIKELY(read_counter(b1, memory_order_release) != c1)) {

I would find the code more balanced if a read_even_counter() was paired with a “read_same_counter()”, rather than a raw read_counter. To be more specific, the explicit barrier should be visible or hidden by an utility function the same way in both cases.

>                 goto retry;
>             }
>             return node;
> @@ -303,15 +303,15 @@ retry:
>     for (i = 0; i < CMAP_K; i++) {
>         struct cmap_node *node = b2->nodes[i];
>         if (node && b2->hashes[i] == hash) {
> -            if (OVS_UNLIKELY(read_counter(b2) != c2)) {
> +            if (OVS_UNLIKELY(read_counter(b2, memory_order_release) != c2)) {
>                 goto retry;
>             }
>             return node;
>         }
>     }
> 
> -    if (OVS_UNLIKELY(read_counter(b1) != c1) ||
> -        OVS_UNLIKELY(read_counter(b2) != c2)) {
> +    if (OVS_UNLIKELY(read_counter(b1, memory_order_release) != c1) ||
> +        OVS_UNLIKELY(read_counter(b2, memory_order_release) != c2)) {
>         goto retry;
>     }
>     return NULL;
> 
> I realize that all these incrementals are a big mess.  I've pushed the
> overall series to my "ovs-reviews" repo at
> https://github.com/blp/ovs-reviews in the "cmap" branch.  I'll happily
> repost a v3 if that is easiest for you.

I’d rather you tell me that you have addresses the questions I made and push it :-) I think it would be best at this point get this in and gain some experience with it. We can then enhance it as we see opportunities for it.

  Jarno

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-dev/attachments/20140520/dae926d4/attachment-0005.html>