[ovs-dev] [cmap v2 1/2] cmap: New module for cuckoo hash table.

Ben Pfaff blp at nicira.com
Tue May 20 23:56:47 UTC 2014


On Tue, May 20, 2014 at 02:13:58PM -0700, Jarno Rajahalme wrote:
> 
> On May 20, 2014, at 2:03 PM, Ben Pfaff <blp at nicira.com> wrote:
> 
> > On Tue, May 20, 2014 at 01:22:45PM -0700, Jarno Rajahalme wrote:
> >> 
> >> On May 20, 2014, at 9:48 AM, Ben Pfaff <blp at nicira.com> wrote:
> >> 
> >>> On Tue, May 20, 2014 at 09:12:23AM -0700, Ben Pfaff wrote:
> >>>> On Fri, May 16, 2014 at 04:38:02PM -0700, Jarno Rajahalme wrote:
> >>>>>> +    if (b->nodes[slot] == node) {
> >>>>>> +        cmap_set_bucket(b, slot, cmap_node_next_protected(node), hash);
> >>>>> 
> >>>>> ?hash? is not changing here, so could just set the nodes:
> >>>>> 
> >>>>>       b->nodes[slot] = cmap_node_next_protected(node);
> >>>>> 
> >>>>> Btw, what is the rationale that the nodes pointers are not RCU
> >>>>> pointers? If they were, it would feel possible to combine this special
> >>>>> case with the loop below.
> >>>> 
> >>>> Good points.  I'll work on that for a v3.
> >>> 
> >>> After thinking a little further, I am not sure that it would become
> >>> possible to combine them, because I think that the cases are a little
> >>> different:
> >>> 
> >>>       * If we are removing the last node with a hash, which is usually
> >>>         the case if node == b->nodes[slot], then we want to make sure
> >>>         that from the viewpoint of any reader that this is atomic
> >>>         (that is, the change to ->hashes[] and ->nodes[]), by
> >>>         incrementing the counter around the change.  I am not
> >>>         absolutely certain that this is required, but the cost is
> >>>         minimal so, lacking confidence, I prefer to do it.
> >> 
> >> My point was that the hash need not be changed, if the dup insertion
> >> code is also changed to not care about the node value (see the other
> >> comment I just sent).
> > 
> > OK.  Like this?
> > 
> > diff --git a/lib/cmap.c b/lib/cmap.c
> > index 30a6e2d..d291ec5 100644
> > --- a/lib/cmap.c
> > +++ b/lib/cmap.c
> > @@ -648,7 +648,8 @@ cmap_remove__(struct cmap_impl *impl, struct cmap_node *node,
> >     }
> > 
> >     if (b->nodes[slot] == node) {
> > -        cmap_set_bucket(b, slot, cmap_node_next_protected(node), hash);
> > +        b->nodes[slot] = cmap_node_next_protected(node);
> > +        atomic_thread_fence(memory_order_release);
> 
> Yes. Would an ovsrcu_set() include the release barrier, if nodes[]
> were RCU pointers? (This relating to my earlier question about

Yes.

Maybe we really should make nodes[] into RCU pointers.

> >     } else {
> >         struct cmap_node *iter = b->nodes[slot];
> >         for (;;) {
> > 
> >>> 
> >>>       * Otherwise, we are shortening a linked list, but not
> >>>         eliminating its slot, which does not affect readers in the
> >>>         same way, so an ordinary RCU store should suffice.
> >>> 
> >>> What I'm increasingly uncertain about is whether cmap_find() is correct.
> >>> The intention is that the atomic reads of the counters before and after
> >>> checking the nodes and the hashes should ensure that the cache lines
> >>> occupied by the buckets are stable.  I think that's going to be true in
> >>> practice, with current compiler technology.  But I am not sure that the
> >>> atomic_reads on the counters actually means that the node and hash reads
> >>> can't be moved outside the counter reads.  If not, though, making the
> >>> node reads atomic (via RCU) and even the hash reads atomic (by making
> >>> them atomic_uint32s) wouldn't help.  I think that the only thing that
> >>> would help would be adding explicit acquire and release barriers.  That
> >>> might actually, in conjunction with good comments, be clearer than what
> >>> we have now.
> >>> 
> >>> What do you think?
> >> 
> >> atomic_read() implies memory_order_seq_cst, which is stronger than
> >> memory_order_acquire. An memory_order_acquire should guarantee that
> >> no memory operations after the atomic_read are moved before it, so
> >> we read the data only after reading an even counter. When we re-read
> >> the counter to verify it has not changed, an acquire barrier would
> >> guarantee no memory operations after the check are moved before it,
> >> but it would be possible for the memory operations before it to be
> >> moved after it. So the check needs a release barrier, even if we are
> >> only reading, to guarantee that memory operations before the check
> >> are not moved after it. The memory_order_seq_cst implied by
> >> atomic_read() does that, but is too strong, a memory_order_acq_rel
> >> should suffice, or even memory_order_acquire for the
> >> read_even_counter, and a memory_order_release for a
> >> ?check_counter()?. Makes sense?
> > 
> > Yes.  Like this?
> > 
> > diff --git a/lib/cmap.c b/lib/cmap.c
> > index 30a6e2d..d291ec5 100644
> > --- a/lib/cmap.c
> > +++ b/lib/cmap.c
> > @@ -245,11 +245,11 @@ cmap_is_empty(const struct cmap *cmap)
> > }
> > 
> > static uint32_t
> > -read_counter(struct cmap_bucket *bucket)
> > +read_counter(struct cmap_bucket *bucket, memory_order order)
> > {
> >     uint32_t counter;
> > 
> > -    atomic_read(&bucket->counter, &counter);
> > +    atomic_read_explicit(&bucket->counter, &counter, order);
> >     return counter;
> > }
> > 
> > @@ -259,7 +259,7 @@ read_even_counter(struct cmap_bucket *bucket)
> >     uint32_t counter;
> > 
> >     do {
> > -        counter = read_counter(bucket);
> > +        counter = read_counter(bucket, memory_order_acquire);
> >     } while (OVS_UNLIKELY(counter & 1));
> > 
> >     return counter;
> > @@ -291,7 +291,7 @@ retry:
> >     for (i = 0; i < CMAP_K; i++) {
> >         struct cmap_node *node = b1->nodes[i];
> >         if (node && b1->hashes[i] == hash) {
> > -            if (OVS_UNLIKELY(read_counter(b1) != c1)) {
> > +            if (OVS_UNLIKELY(read_counter(b1, memory_order_release) != c1)) {
> 
> I would find the code more balanced if a read_even_counter() was
> paired with a ?read_same_counter()?, rather than a raw
> read_counter. To be more specific, the explicit barrier should be
> visible or hidden by an utility function the same way in both cases.

Fair enough.  I folded this in:

diff --git a/lib/cmap.c b/lib/cmap.c
index dcb82fc..bd1d516 100644
--- a/lib/cmap.c
+++ b/lib/cmap.c
@@ -265,6 +265,12 @@ read_even_counter(struct cmap_bucket *bucket)
     return counter;
 }
 
+static bool
+counter_changed(const struct cmap_bucket *b, uint32_t c)
+{
+    return OVS_UNLIKELY(read_counter(b, memory_order_release) != c);
+}
+
 /* Searches 'cmap' for an element with the specified 'hash'.  If one or more is
  * found, returns a pointer to the first one, otherwise a null pointer.  All of
  * the nodes on the returned list are guaranteed to have exactly the given
@@ -291,7 +297,7 @@ retry:
     for (i = 0; i < CMAP_K; i++) {
         struct cmap_node *node = b1->nodes[i];
         if (node && b1->hashes[i] == hash) {
-            if (OVS_UNLIKELY(read_counter(b1, memory_order_release) != c1)) {
+            if (counter_changed(b1, c1)) {
                 goto retry;
             }
             return node;
@@ -303,15 +309,14 @@ retry:
     for (i = 0; i < CMAP_K; i++) {
         struct cmap_node *node = b2->nodes[i];
         if (node && b2->hashes[i] == hash) {
-            if (OVS_UNLIKELY(read_counter(b2, memory_order_release) != c2)) {
+            if (counter_changed(b2, c2)) {
                 goto retry;
             }
             return node;
         }
     }
 
-    if (OVS_UNLIKELY(read_counter(b1, memory_order_release) != c1) ||
-        OVS_UNLIKELY(read_counter(b2, memory_order_release) != c2)) {
+    if (counter_changed(b1, c1) || counter_changed(b2, c2)) {
         goto retry;
     }
     return NULL;

> >                 goto retry;
> >             }
> >             return node;
> > @@ -303,15 +303,15 @@ retry:
> >     for (i = 0; i < CMAP_K; i++) {
> >         struct cmap_node *node = b2->nodes[i];
> >         if (node && b2->hashes[i] == hash) {
> > -            if (OVS_UNLIKELY(read_counter(b2) != c2)) {
> > +            if (OVS_UNLIKELY(read_counter(b2, memory_order_release) != c2)) {
> >                 goto retry;
> >             }
> >             return node;
> >         }
> >     }
> > 
> > -    if (OVS_UNLIKELY(read_counter(b1) != c1) ||
> > -        OVS_UNLIKELY(read_counter(b2) != c2)) {
> > +    if (OVS_UNLIKELY(read_counter(b1, memory_order_release) != c1) ||
> > +        OVS_UNLIKELY(read_counter(b2, memory_order_release) != c2)) {
> >         goto retry;
> >     }
> >     return NULL;
> > 
> > I realize that all these incrementals are a big mess.  I've pushed the
> > overall series to my "ovs-reviews" repo at
> > https://github.com/blp/ovs-reviews in the "cmap" branch.  I'll happily
> > repost a v3 if that is easiest for you.
> 
> I?d rather you tell me that you have addresses the questions I made
> and push it :-) I think it would be best at this point get this in
> and gain some experience with it. We can then enhance it as we see
> opportunities for it.

Thanks.

I pushed this patch.  We can fine-tune it (e.g. making nodes[] RCU
pointers) later.

I didn't push patch 2 yet since you pointed out a bug.



More information about the dev mailing list