[ovs-dev] [cmap v2 1/2] cmap: New module for cuckoo hash table.
Ben Pfaff
blp at nicira.com
Tue May 20 16:12:23 UTC 2014
On Fri, May 16, 2014 at 04:38:02PM -0700, Jarno Rajahalme wrote:
> Acked-by: Jarno Rajahalme <jrajahalme at nicira.com>
Thanks.
> Some comments below, though,
Responses below.
(It would be easier to spot your comments if you trimmed more
carefully.)
> On May 8, 2014, at 4:50 PM, Ben Pfaff <blp at nicira.com> wrote:
> > +/* The implementation of a concurrent hash map. */
> > +struct cmap_impl {
> > + unsigned int n; /* Number of in-use elements. */
> > + unsigned int max_n; /* Max elements before enlarging. */
> > + unsigned int max_load; /* Max load as fraction of UINT32_MAX. */
>
> This seems to be a constant, or is this changeable through the API? I
> see that space saving is not an issue here, so maybe this does not
> matter.
You're right, it is now a constant. The previously posted version had a
user-controllable max load. I removed that for this version but forgot
to remove this and a few other bits that were necessary to make it
user-controllable.
I folded in the following incremental:
diff --git a/lib/cmap.c b/lib/cmap.c
index c974a13..91c7d04 100644
--- a/lib/cmap.c
+++ b/lib/cmap.c
@@ -147,18 +147,17 @@ BUILD_ASSERT_DECL(sizeof(struct cmap_bucket) == CACHE_LINE_SIZE);
/* Default maximum load factor (as a fraction of UINT32_MAX + 1) before
* enlarging a cmap. Reasonable values lie between about 75% and 93%. Smaller
* values waste memory; larger values increase the average insertion time. */
-#define CMAP_DEFAULT_MAX_LOAD ((uint32_t) (UINT32_MAX * .85))
+#define CMAP_MAX_LOAD ((uint32_t) (UINT32_MAX * .85))
/* The implementation of a concurrent hash map. */
struct cmap_impl {
unsigned int n; /* Number of in-use elements. */
unsigned int max_n; /* Max elements before enlarging. */
- unsigned int max_load; /* Max load as fraction of UINT32_MAX. */
uint32_t mask; /* Number of 'buckets', minus one. */
uint32_t basis; /* Basis for rehashing client's hash values. */
/* Padding to make cmap_impl exactly one cache line long. */
- uint8_t pad[CACHE_LINE_SIZE - sizeof(unsigned int) * 5];
+ uint8_t pad[CACHE_LINE_SIZE - sizeof(unsigned int) * 4];
struct cmap_bucket buckets[];
};
@@ -190,13 +189,13 @@ cmap_get_impl(const struct cmap *cmap)
}
static uint32_t
-calc_max_n(uint32_t mask, uint32_t max_load)
+calc_max_n(uint32_t mask)
{
- return ((uint64_t) (mask + 1) * CMAP_K * max_load) >> 32;
+ return ((uint64_t) (mask + 1) * CMAP_K * CMAP_MAX_LOAD) >> 32;
}
static struct cmap_impl *
-cmap_impl_create(uint32_t mask, uint32_t max_load)
+cmap_impl_create(uint32_t mask)
{
struct cmap_impl *impl;
@@ -205,8 +204,7 @@ cmap_impl_create(uint32_t mask, uint32_t max_load)
impl = xzalloc_cacheline(sizeof *impl
+ (mask + 1) * sizeof *impl->buckets);
impl->n = 0;
- impl->max_n = calc_max_n(mask, max_load);
- impl->max_load = max_load;
+ impl->max_n = calc_max_n(mask);
impl->mask = mask;
impl->basis = random_uint32();
@@ -217,7 +215,7 @@ cmap_impl_create(uint32_t mask, uint32_t max_load)
void
cmap_init(struct cmap *cmap)
{
- ovsrcu_set(&cmap->impl, cmap_impl_create(0, CMAP_DEFAULT_MAX_LOAD));
+ ovsrcu_set(&cmap->impl, cmap_impl_create(0));
}
/* Destroys 'cmap'.
@@ -687,7 +685,7 @@ cmap_rehash(struct cmap *cmap, uint32_t mask)
struct cmap_impl *old = cmap_get_impl(cmap);
struct cmap_impl *new;
- new = cmap_impl_create(mask, old->max_load);
+ new = cmap_impl_create(mask);
ovs_assert(old->n < new->max_n);
while (!cmap_try_rehash(old, new)) {
> > +/* Returns the rehashed value for 'hash' within 'impl'. (See also "Hash
> > + * Functions" at the top of this file.) */
> > +static uint32_t
> > +rehash(const struct cmap_impl *impl, uint32_t hash)
> > +{
> > + return mhash_finish(impl->basis, hash);
>
>
> I thought about this more in terms of ?finishing the user provided
> (unfinished) hash with the current (temporally constant, for the lack
> of better way of putting this) ?finisher? value. This would yield:
>
> return mhash_finish(hash, impl->basis);
>
> This should not make a real difference, though.
I guess that you're right. As you say, it shouldn't really matter,
though, and since I did my performance testing with this approach, I
think I'll leave it that way.
> > +}
> > +
> > +static struct cmap_impl *
> > +cmap_get_impl(const struct cmap *cmap)
> > +{
> > + return ovsrcu_get(struct cmap_impl *, &cmap->impl);
> > +}
> > +
> > +static uint32_t
> > +calc_max_n(uint32_t mask, uint32_t max_load)
>
> As said, ?max_load? seems to be constant.
Fixed, see above.
> > +static bool
> > +cmap_insert_dup(struct cmap_node *new_node, uint32_t hash,
> > + struct cmap_bucket *b)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < CMAP_K; i++) {
> > + struct cmap_node *node = b->nodes[i];
> > + if (b->hashes[i] == hash && node) {
> > + for (;;) {
> > + struct cmap_node *next = cmap_node_next_protected(node);
> > + if (!next) {
> > + ovsrcu_set(&node->next, new_node);
> > + return true;
> > + }
> > + node = next;
> > + }
>
> It would be faster to add the dup to the beginning of the list instead:
>
> for (i = 0; i < CMAP_K; i++) {
> if (b->hashes[i] == hash) {
> ovsrcu_set(&new_node->next, b->nodes[i]);
> b->nodes[i] = new_node;
> return true;
> }
> }
>
> Syncing via the counter should be unnecessary, as the hash value was
> already set.
That's a good point about the counter. That's the reason that I
switched to this method in v2.
I'll work on this for a v3.
> > + }
> > + }
> > + return false;
> > +}
> > +
> > +/* Searches 'b' for an empty slot. If successful, stores 'node' and 'hash' in
> > + * the slot and returns true. Otherwise, returns false. */
> > +static bool
> > +cmap_insert_bucket(struct cmap_node *node, uint32_t hash,
> > + struct cmap_bucket *b)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < CMAP_K; i++) {
> > + if (!b->nodes[i]) {
> > + cmap_set_bucket(b, i, node, hash);
> > + return true;
> > + }
> > + }
> > + return false;
> > +}
> > +
> > +/* Returns the other bucket that b->nodes[slot] could occupy in 'impl'. (This
> > + * might be the same as 'b'.) */
> > +static struct cmap_bucket *
> > +other_bucket(struct cmap_impl *impl, struct cmap_bucket *b, int slot)
> > +{
> > + uint32_t h1 = rehash(impl, b->hashes[slot]);
> > + uint32_t h2 = other_hash(h1);
> > + uint32_t b_idx = b - impl->buckets;
> > + uint32_t other_h = (h1 & impl->mask) == b_idx ? h2 : h1;
> > +
> > + return &impl->buckets[other_h & impl->mask];
> > +}
> > +
> > +/* 'new_node' is to be inserted into 'impl', but both candidate buckets 'b1'
> > + * and 'b2' are full. This function attempts to rearrange buckets within
> > + * 'impl' to make room for 'new_node'.
> > + *
> > + * The implementation is a general-purpose breadth-first search. At first
> > + * glance, this is more complex than a random walk through 'impl' (suggested by
> > + * some references), but random walks have a tendency to loop back through a
> > + * single bucket. We have to move nodes backward along the path that we find,
> > + * so that no node actually disappears from the hash table, which means a
> > + * random walk would have to be careful to deal with loops. By contrast, a
> > + * successful breadth-first search always finds a *shortest* path through the
> > + * hash table, and a shortest path will never contain loops, so it avoids that
> > + * problem entirely.
> > + */
> > +static bool
> > +cmap_insert_bfs(struct cmap_impl *impl, struct cmap_node *new_node,
> > + uint32_t hash, struct cmap_bucket *b1, struct cmap_bucket *b2)
> > +{
> > + enum { MAX_PATH = 4 };
> > +
> > + /* A path from 'start' to 'end' via the 'n' steps in 'slots[]'.
> > + *
> > + * One can follow the path via:
> > + *
> > + * struct cmap_bucket *b;
> > + * int i;
> > + *
> > + * b = path->start;
> > + * for (i = 0; i < path->n; i++) {
> > + * b = other_bucket(impl, b, path->slots[i]);
> > + * }
> > + * ovs_assert(b == path->end);
> > + */
> > + struct cmap_path {
> > + struct cmap_bucket *start; /* First bucket along the path. */
> > + struct cmap_bucket *end; /* Last bucket on the path. */
> > + uint8_t slots[MAX_PATH]; /* Slots used for each hop. */
> > + int n; /* Number of slots[]. */
> > + };
> > +
> > + enum { MAX_QUEUE = 500 };
>
> Is MAX_QUEUE a function of the MAX_PATH and CMAP_K? (7^4 == 2401, 5^4 == 625)
I added a comment:
/* We need to limit the amount of work we do trying to find a path. It
* might actually be impossible to rearrange the cmap, and after some time
* it is likely to be easier to rehash the entire cmap.
*
* This value of MAX_QUEUE is an arbitrary limit suggested by one of the
* references. Empirically, it seems to work OK .*/
I couldn't quickly find the particular reference that suggested this,
but I remember seeing 500 suggested in one of them. It seems to work
OK: you can see that rehashing is rare if you instrument test-cmap to
print something when it rehashes.
> > + struct cmap_path queue[MAX_QUEUE];
> > + int head = 0;
> > + int tail = 0;
> > +
> > + /* Add 'b1' and 'b2' as starting points for the search. */
> > + queue[head].start = b1;
> > + queue[head].end = b1;
> > + queue[head].n = 0;
> > + head++;
> > + if (b1 != b2) {
> > + queue[head].start = b2;
> > + queue[head].end = b2;
> > + queue[head].n = 0;
> > + head++;
> > + }
> > +
> > + while (tail < head) {
>
> I feel that ?head? and ?tail? are reversed here. A (FIFO) queue is
> serviced at head, and new entries are added to the tail. We?re done
> when the head reaches the tail.
I have always informally reasoned that the tail follows behind the head,
as the tail of an animal follows behind the head of the animal. I did
not know that there was disagreement on this. According to Wikipedia,
there are reasonable people on both sides of the discussion:
http://en.wikipedia.org/wiki/FIFO#Head_or_tail_first
Since OVS already contains other queues that follow the convention used
here (see e.g. lib/byteq.h), I prefer to be consistent and leave it as
is.
> > + if (path->n < MAX_PATH && head < MAX_QUEUE) {
>
> Was it so that the v1 queue was circular, and you decided to drop that
> for clarity? I might remember it wrong as well.
Yes, the v1 queue was circular.
I reasoned that, at most, a couple of items would wrap around
circularly, so I decided that it was slightly simpler to just stop when
we would run out of room.
> > +static bool
> > +cmap_try_insert(struct cmap_impl *impl, struct cmap_node *node, uint32_t hash)
> > +{
> > + uint32_t h1 = rehash(impl, hash);
> > + uint32_t h2 = other_hash(h1);
> > + struct cmap_bucket *b1 = &impl->buckets[h1 & impl->mask];
> > + struct cmap_bucket *b2 = &impl->buckets[h2 & impl->mask];
> > +
>
> The likelihood of b1 == b2 is low enough to not avoid calling the
> insert functions twice for the same bucket?
Yes, that's my reasoning.
> > + if (b->nodes[slot] == node) {
> > + cmap_set_bucket(b, slot, cmap_node_next_protected(node), hash);
>
> ?hash? is not changing here, so could just set the nodes:
>
> b->nodes[slot] = cmap_node_next_protected(node);
>
> Btw, what is the rationale that the nodes pointers are not RCU
> pointers? If they were, it would feel possible to combine this special
> case with the loop below.
Good points. I'll work on that for a v3.
Thanks,
Ben.
More information about the dev
mailing list