[ovs-dev] [cmap v2 1/2] cmap: New module for cuckoo hash table.

Ben Pfaff blp at nicira.com
Tue May 20 21:03:41 UTC 2014


On Tue, May 20, 2014 at 01:22:45PM -0700, Jarno Rajahalme wrote:
> 
> On May 20, 2014, at 9:48 AM, Ben Pfaff <blp at nicira.com> wrote:
> 
> > On Tue, May 20, 2014 at 09:12:23AM -0700, Ben Pfaff wrote:
> >> On Fri, May 16, 2014 at 04:38:02PM -0700, Jarno Rajahalme wrote:
> >>>> +    if (b->nodes[slot] == node) {
> >>>> +        cmap_set_bucket(b, slot, cmap_node_next_protected(node), hash);
> >>> 
> >>> ?hash? is not changing here, so could just set the nodes:
> >>> 
> >>>        b->nodes[slot] = cmap_node_next_protected(node);
> >>> 
> >>> Btw, what is the rationale that the nodes pointers are not RCU
> >>> pointers? If they were, it would feel possible to combine this special
> >>> case with the loop below.
> >> 
> >> Good points.  I'll work on that for a v3.
> > 
> > After thinking a little further, I am not sure that it would become
> > possible to combine them, because I think that the cases are a little
> > different:
> > 
> >        * If we are removing the last node with a hash, which is usually
> >          the case if node == b->nodes[slot], then we want to make sure
> >          that from the viewpoint of any reader that this is atomic
> >          (that is, the change to ->hashes[] and ->nodes[]), by
> >          incrementing the counter around the change.  I am not
> >          absolutely certain that this is required, but the cost is
> >          minimal so, lacking confidence, I prefer to do it.
> 
> My point was that the hash need not be changed, if the dup insertion
> code is also changed to not care about the node value (see the other
> comment I just sent).

OK.  Like this?

diff --git a/lib/cmap.c b/lib/cmap.c
index 30a6e2d..d291ec5 100644
--- a/lib/cmap.c
+++ b/lib/cmap.c
@@ -648,7 +648,8 @@ cmap_remove__(struct cmap_impl *impl, struct cmap_node *node,
     }
 
     if (b->nodes[slot] == node) {
-        cmap_set_bucket(b, slot, cmap_node_next_protected(node), hash);
+        b->nodes[slot] = cmap_node_next_protected(node);
+        atomic_thread_fence(memory_order_release);
     } else {
         struct cmap_node *iter = b->nodes[slot];
         for (;;) {

> > 
> >        * Otherwise, we are shortening a linked list, but not
> >          eliminating its slot, which does not affect readers in the
> >          same way, so an ordinary RCU store should suffice.
> > 
> > What I'm increasingly uncertain about is whether cmap_find() is correct.
> > The intention is that the atomic reads of the counters before and after
> > checking the nodes and the hashes should ensure that the cache lines
> > occupied by the buckets are stable.  I think that's going to be true in
> > practice, with current compiler technology.  But I am not sure that the
> > atomic_reads on the counters actually means that the node and hash reads
> > can't be moved outside the counter reads.  If not, though, making the
> > node reads atomic (via RCU) and even the hash reads atomic (by making
> > them atomic_uint32s) wouldn't help.  I think that the only thing that
> > would help would be adding explicit acquire and release barriers.  That
> > might actually, in conjunction with good comments, be clearer than what
> > we have now.
> > 
> > What do you think?
> 
> atomic_read() implies memory_order_seq_cst, which is stronger than
> memory_order_acquire. An memory_order_acquire should guarantee that
> no memory operations after the atomic_read are moved before it, so
> we read the data only after reading an even counter. When we re-read
> the counter to verify it has not changed, an acquire barrier would
> guarantee no memory operations after the check are moved before it,
> but it would be possible for the memory operations before it to be
> moved after it. So the check needs a release barrier, even if we are
> only reading, to guarantee that memory operations before the check
> are not moved after it. The memory_order_seq_cst implied by
> atomic_read() does that, but is too strong, a memory_order_acq_rel
> should suffice, or even memory_order_acquire for the
> read_even_counter, and a memory_order_release for a
> ?check_counter()?. Makes sense?

Yes.  Like this?

diff --git a/lib/cmap.c b/lib/cmap.c
index 30a6e2d..d291ec5 100644
--- a/lib/cmap.c
+++ b/lib/cmap.c
@@ -245,11 +245,11 @@ cmap_is_empty(const struct cmap *cmap)
 }
 
 static uint32_t
-read_counter(struct cmap_bucket *bucket)
+read_counter(struct cmap_bucket *bucket, memory_order order)
 {
     uint32_t counter;
 
-    atomic_read(&bucket->counter, &counter);
+    atomic_read_explicit(&bucket->counter, &counter, order);
     return counter;
 }
 
@@ -259,7 +259,7 @@ read_even_counter(struct cmap_bucket *bucket)
     uint32_t counter;
 
     do {
-        counter = read_counter(bucket);
+        counter = read_counter(bucket, memory_order_acquire);
     } while (OVS_UNLIKELY(counter & 1));
 
     return counter;
@@ -291,7 +291,7 @@ retry:
     for (i = 0; i < CMAP_K; i++) {
         struct cmap_node *node = b1->nodes[i];
         if (node && b1->hashes[i] == hash) {
-            if (OVS_UNLIKELY(read_counter(b1) != c1)) {
+            if (OVS_UNLIKELY(read_counter(b1, memory_order_release) != c1)) {
                 goto retry;
             }
             return node;
@@ -303,15 +303,15 @@ retry:
     for (i = 0; i < CMAP_K; i++) {
         struct cmap_node *node = b2->nodes[i];
         if (node && b2->hashes[i] == hash) {
-            if (OVS_UNLIKELY(read_counter(b2) != c2)) {
+            if (OVS_UNLIKELY(read_counter(b2, memory_order_release) != c2)) {
                 goto retry;
             }
             return node;
         }
     }
 
-    if (OVS_UNLIKELY(read_counter(b1) != c1) ||
-        OVS_UNLIKELY(read_counter(b2) != c2)) {
+    if (OVS_UNLIKELY(read_counter(b1, memory_order_release) != c1) ||
+        OVS_UNLIKELY(read_counter(b2, memory_order_release) != c2)) {
         goto retry;
     }
     return NULL;

I realize that all these incrementals are a big mess.  I've pushed the
overall series to my "ovs-reviews" repo at
https://github.com/blp/ovs-reviews in the "cmap" branch.  I'll happily
repost a v3 if that is easiest for you.



More information about the dev mailing list