[ovs-dev] [cmap v2 1/2] cmap: New module for cuckoo hash table.

Ben Pfaff blp at nicira.com
Tue May 20 16:12:23 UTC 2014


On Fri, May 16, 2014 at 04:38:02PM -0700, Jarno Rajahalme wrote:
> Acked-by: Jarno Rajahalme <jrajahalme at nicira.com>

Thanks.

> Some comments below, though,

Responses below.

(It would be easier to spot your comments if you trimmed more
carefully.)

> On May 8, 2014, at 4:50 PM, Ben Pfaff <blp at nicira.com> wrote:
> > +/* The implementation of a concurrent hash map. */
> > +struct cmap_impl {
> > +    unsigned int n;             /* Number of in-use elements. */
> > +    unsigned int max_n;         /* Max elements before enlarging. */
> > +    unsigned int max_load;      /* Max load as fraction of UINT32_MAX. */
> 
> This seems to be a constant, or is this changeable through the API? I
> see that space saving is not an issue here, so maybe this does not
> matter.

You're right, it is now a constant.  The previously posted version had a
user-controllable max load.  I removed that for this version but forgot
to remove this and a few other bits that were necessary to make it
user-controllable.

I folded in the following incremental:

diff --git a/lib/cmap.c b/lib/cmap.c
index c974a13..91c7d04 100644
--- a/lib/cmap.c
+++ b/lib/cmap.c
@@ -147,18 +147,17 @@ BUILD_ASSERT_DECL(sizeof(struct cmap_bucket) == CACHE_LINE_SIZE);
 /* Default maximum load factor (as a fraction of UINT32_MAX + 1) before
  * enlarging a cmap.  Reasonable values lie between about 75% and 93%.  Smaller
  * values waste memory; larger values increase the average insertion time. */
-#define CMAP_DEFAULT_MAX_LOAD ((uint32_t) (UINT32_MAX * .85))
+#define CMAP_MAX_LOAD ((uint32_t) (UINT32_MAX * .85))
 
 /* The implementation of a concurrent hash map. */
 struct cmap_impl {
     unsigned int n;             /* Number of in-use elements. */
     unsigned int max_n;         /* Max elements before enlarging. */
-    unsigned int max_load;      /* Max load as fraction of UINT32_MAX. */
     uint32_t mask;              /* Number of 'buckets', minus one. */
     uint32_t basis;             /* Basis for rehashing client's hash values. */
 
     /* Padding to make cmap_impl exactly one cache line long. */
-    uint8_t pad[CACHE_LINE_SIZE - sizeof(unsigned int) * 5];
+    uint8_t pad[CACHE_LINE_SIZE - sizeof(unsigned int) * 4];
 
     struct cmap_bucket buckets[];
 };
@@ -190,13 +189,13 @@ cmap_get_impl(const struct cmap *cmap)
 }
 
 static uint32_t
-calc_max_n(uint32_t mask, uint32_t max_load)
+calc_max_n(uint32_t mask)
 {
-    return ((uint64_t) (mask + 1) * CMAP_K * max_load) >> 32;
+    return ((uint64_t) (mask + 1) * CMAP_K * CMAP_MAX_LOAD) >> 32;
 }
 
 static struct cmap_impl *
-cmap_impl_create(uint32_t mask, uint32_t max_load)
+cmap_impl_create(uint32_t mask)
 {
     struct cmap_impl *impl;
 
@@ -205,8 +204,7 @@ cmap_impl_create(uint32_t mask, uint32_t max_load)
     impl = xzalloc_cacheline(sizeof *impl
                              + (mask + 1) * sizeof *impl->buckets);
     impl->n = 0;
-    impl->max_n = calc_max_n(mask, max_load);
-    impl->max_load = max_load;
+    impl->max_n = calc_max_n(mask);
     impl->mask = mask;
     impl->basis = random_uint32();
 
@@ -217,7 +215,7 @@ cmap_impl_create(uint32_t mask, uint32_t max_load)
 void
 cmap_init(struct cmap *cmap)
 {
-    ovsrcu_set(&cmap->impl, cmap_impl_create(0, CMAP_DEFAULT_MAX_LOAD));
+    ovsrcu_set(&cmap->impl, cmap_impl_create(0));
 }
 
 /* Destroys 'cmap'.
@@ -687,7 +685,7 @@ cmap_rehash(struct cmap *cmap, uint32_t mask)
     struct cmap_impl *old = cmap_get_impl(cmap);
     struct cmap_impl *new;
 
-    new = cmap_impl_create(mask, old->max_load);
+    new = cmap_impl_create(mask);
     ovs_assert(old->n < new->max_n);
 
     while (!cmap_try_rehash(old, new)) {


> > +/* Returns the rehashed value for 'hash' within 'impl'.  (See also "Hash
> > + * Functions" at the top of this file.) */
> > +static uint32_t
> > +rehash(const struct cmap_impl *impl, uint32_t hash)
> > +{
> > +    return mhash_finish(impl->basis, hash);
> 
> 
> I thought about this more in terms of ?finishing the user provided
> (unfinished) hash with the current (temporally constant, for the lack
> of better way of putting this) ?finisher? value. This would yield:
> 
>  return mhash_finish(hash, impl->basis);
> 
> This should not make a real difference, though.

I guess that you're right.  As you say, it shouldn't really matter,
though, and since I did my performance testing with this approach, I
think I'll leave it that way.

> > +}
> > +
> > +static struct cmap_impl *
> > +cmap_get_impl(const struct cmap *cmap)
> > +{
> > +    return ovsrcu_get(struct cmap_impl *, &cmap->impl);
> > +}
> > +
> > +static uint32_t
> > +calc_max_n(uint32_t mask, uint32_t max_load)
> 
> As said, ?max_load? seems to be constant.

Fixed, see above.

> > +static bool
> > +cmap_insert_dup(struct cmap_node *new_node, uint32_t hash,
> > +                struct cmap_bucket *b)
> > +{
> > +    int i;
> > +
> > +    for (i = 0; i < CMAP_K; i++) {
> > +        struct cmap_node *node = b->nodes[i];
> > +        if (b->hashes[i] == hash && node) {
> > +            for (;;) {
> > +                struct cmap_node *next = cmap_node_next_protected(node);
> > +                if (!next) {
> > +                    ovsrcu_set(&node->next, new_node);
> > +                    return true;
> > +                }
> > +                node = next;
> > +            }
> 
> It would be faster to add the dup to the beginning of the list instead:
> 
>     for (i = 0; i < CMAP_K; i++) {
>         if (b->hashes[i] == hash) {
>             ovsrcu_set(&new_node->next, b->nodes[i]);
>             b->nodes[i] = new_node;
>             return true;
>         }
>     }
> 
> Syncing via the counter should be unnecessary, as the hash value was
> already set.

That's a good point about the counter.  That's the reason that I
switched to this method in v2.

I'll work on this for a v3. 

> > +        }
> > +    }
> > +    return false;
> > +}
> > +
> > +/* Searches 'b' for an empty slot.  If successful, stores 'node' and 'hash' in
> > + * the slot and returns true.  Otherwise, returns false. */
> > +static bool
> > +cmap_insert_bucket(struct cmap_node *node, uint32_t hash,
> > +                   struct cmap_bucket *b)
> > +{
> > +    int i;
> > +
> > +    for (i = 0; i < CMAP_K; i++) {
> > +        if (!b->nodes[i]) {
> > +            cmap_set_bucket(b, i, node, hash);
> > +            return true;
> > +        }
> > +    }
> > +    return false;
> > +}
> > +
> > +/* Returns the other bucket that b->nodes[slot] could occupy in 'impl'.  (This
> > + * might be the same as 'b'.) */
> > +static struct cmap_bucket *
> > +other_bucket(struct cmap_impl *impl, struct cmap_bucket *b, int slot)
> > +{
> > +    uint32_t h1 = rehash(impl, b->hashes[slot]);
> > +    uint32_t h2 = other_hash(h1);
> > +    uint32_t b_idx = b - impl->buckets;
> > +    uint32_t other_h = (h1 & impl->mask) == b_idx ? h2 : h1;
> > +
> > +    return &impl->buckets[other_h & impl->mask];
> > +}
> > +
> > +/* 'new_node' is to be inserted into 'impl', but both candidate buckets 'b1'
> > + * and 'b2' are full.  This function attempts to rearrange buckets within
> > + * 'impl' to make room for 'new_node'.
> > + *
> > + * The implementation is a general-purpose breadth-first search.  At first
> > + * glance, this is more complex than a random walk through 'impl' (suggested by
> > + * some references), but random walks have a tendency to loop back through a
> > + * single bucket.  We have to move nodes backward along the path that we find,
> > + * so that no node actually disappears from the hash table, which means a
> > + * random walk would have to be careful to deal with loops.  By contrast, a
> > + * successful breadth-first search always finds a *shortest* path through the
> > + * hash table, and a shortest path will never contain loops, so it avoids that
> > + * problem entirely.
> > + */
> > +static bool
> > +cmap_insert_bfs(struct cmap_impl *impl, struct cmap_node *new_node,
> > +                uint32_t hash, struct cmap_bucket *b1, struct cmap_bucket *b2)
> > +{
> > +    enum { MAX_PATH = 4 };
> > +
> > +    /* A path from 'start' to 'end' via the 'n' steps in 'slots[]'.
> > +     *
> > +     * One can follow the path via:
> > +     *
> > +     *     struct cmap_bucket *b;
> > +     *     int i;
> > +     *
> > +     *     b = path->start;
> > +     *     for (i = 0; i < path->n; i++) {
> > +     *         b = other_bucket(impl, b, path->slots[i]);
> > +     *     }
> > +     *     ovs_assert(b == path->end);
> > +     */
> > +    struct cmap_path {
> > +        struct cmap_bucket *start; /* First bucket along the path. */
> > +        struct cmap_bucket *end;   /* Last bucket on the path. */
> > +        uint8_t slots[MAX_PATH];   /* Slots used for each hop. */
> > +        int n;                     /* Number of slots[]. */
> > +    };
> > +
> > +    enum { MAX_QUEUE = 500 };
> 
> Is MAX_QUEUE a function of the MAX_PATH and CMAP_K? (7^4 == 2401, 5^4 == 625)

I added a comment:

    /* We need to limit the amount of work we do trying to find a path.  It
     * might actually be impossible to rearrange the cmap, and after some time
     * it is likely to be easier to rehash the entire cmap.
     *
     * This value of MAX_QUEUE is an arbitrary limit suggested by one of the
     * references.  Empirically, it seems to work OK .*/

I couldn't quickly find the particular reference that suggested this,
but I remember seeing 500 suggested in one of them.  It seems to work
OK: you can see that rehashing is rare if you instrument test-cmap to
print something when it rehashes.
 
> > +    struct cmap_path queue[MAX_QUEUE];
> > +    int head = 0;
> > +    int tail = 0;
> > +
> > +    /* Add 'b1' and 'b2' as starting points for the search. */
> > +    queue[head].start = b1;
> > +    queue[head].end = b1;
> > +    queue[head].n = 0;
> > +    head++;
> > +    if (b1 != b2) {
> > +        queue[head].start = b2;
> > +        queue[head].end = b2;
> > +        queue[head].n = 0;
> > +        head++;
> > +    }
> > +
> > +    while (tail < head) {
> 
> I feel that ?head? and ?tail? are reversed here. A (FIFO) queue is
> serviced at head, and new entries are added to the tail. We?re done
> when the head reaches the tail.

I have always informally reasoned that the tail follows behind the head,
as the tail of an animal follows behind the head of the animal.  I did
not know that there was disagreement on this.  According to Wikipedia,
there are reasonable people on both sides of the discussion:
        http://en.wikipedia.org/wiki/FIFO#Head_or_tail_first

Since OVS already contains other queues that follow the convention used
here (see e.g. lib/byteq.h), I prefer to be consistent and leave it as
is.

> > +            if (path->n < MAX_PATH && head < MAX_QUEUE) {
> 
> Was it so that the v1 queue was circular, and you decided to drop that
> for clarity? I might remember it wrong as well.

Yes, the v1 queue was circular.

I reasoned that, at most, a couple of items would wrap around
circularly, so I decided that it was slightly simpler to just stop when
we would run out of room.

> > +static bool
> > +cmap_try_insert(struct cmap_impl *impl, struct cmap_node *node, uint32_t hash)
> > +{
> > +    uint32_t h1 = rehash(impl, hash);
> > +    uint32_t h2 = other_hash(h1);
> > +    struct cmap_bucket *b1 = &impl->buckets[h1 & impl->mask];
> > +    struct cmap_bucket *b2 = &impl->buckets[h2 & impl->mask];
> > +
> 
> The likelihood of b1 == b2 is low enough to not avoid calling the
> insert functions twice for the same bucket?

Yes, that's my reasoning.

> > +    if (b->nodes[slot] == node) {
> > +        cmap_set_bucket(b, slot, cmap_node_next_protected(node), hash);
> 
> ?hash? is not changing here, so could just set the nodes:
> 
>         b->nodes[slot] = cmap_node_next_protected(node);
> 
> Btw, what is the rationale that the nodes pointers are not RCU
> pointers? If they were, it would feel possible to combine this special
> case with the loop below.

Good points.  I'll work on that for a v3.

Thanks,

Ben.



More information about the dev mailing list