[ovs-dev] [RFC Patch] dpif-netdev: Sorted subtable vectors per in_port in dpcls

Jarno Rajahalme jarno at ovn.org
Wed Jun 22 00:32:00 UTC 2016


Thanks for doing this! Some suggestions below.

This no longer applies cleanly, and I also needed to add this incremental to make the testsuite pass:

diff --git a/tests/pmd.at b/tests/pmd.at
index 3d219ff..9b6427e 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -144,10 +144,11 @@ dummy at ovs-dummy: hit:0 missed:0
		p0 7/1: (dummy-pmd: configured_rx_queues=4, configured_tx_queues=<cleared>, requested_rx_queues=4, requested_tx_queues=<cleared>)
])

-AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 4], [0], [dnl
+AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl
pmd thread numa_id <cleared> core_id <cleared>:
	emc hits:0
	megaflow hits:0
+	avg. subtable lookups per hit: 0.00
	miss:0
	lost:0
])
@@ -170,10 +171,11 @@ AT_CHECK([cat ovs-vswitchd.log | filter_flow_install | strip_xout], [0], [dnl
recirc_id(0),in_port(1),eth(src=50:54:00:00:00:77,dst=50:54:00:00:01:78),eth_type(0x0800),ipv4(frag=no), actions: <del>
])

-AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 4], [0], [dnl
+AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl
pmd thread numa_id <cleared> core_id <cleared>:
	emc hits:19
	megaflow hits:0
+	avg. subtable lookups per hit: 0.00
	miss:1
	lost:0
])

> On Jun 16, 2016, at 6:56 AM, Jan Scheurich <jan.scheurich at ericsson.com> wrote:
> 
> The user-space datapath (dpif-netdev) consists of a first level "exact match cache" (EMC) matching on 5-tuples and the normal megaflow classifier. With many parallel packet flows (e.g. TCP connections) the EMC becomes inefficient and the OVS forwarding performance is determined by the megaflow classifier.
> 
> The megaflow classifier (dpcls) consists of a variable number of hash tables (aka subtables), each containing megaflow entries with the same mask of packet header and metadata fields to match upon. A dpcls lookup matches a given packet against all subtables in sequence until it hits a match. As megaflow cache entries are by construction non-overlapping, the first match is the only match.
> 
> Today the order of the subtables in the dpcls is essentially random so that on average a dpcsl lookup has to visit N/2 subtables for a hit, when N is the total number of subtables. Even though every single hash-table lookup is fast, the performance of the current dpcls degrades when there are many subtables.
> 
> How does the patch address this issue:
> 
> In reality there is often a strong correlation between the ingress port and a small subset of subtables that have hits. The entire megaflow cache typically decomposes nicely into partitions that are hit only by packets entering from a range of similar ports (e.g. traffic from Phy  -> VM vs. traffic from VM -> Phy). 
> 
> Therefore, keeping a separate list of subtables per ingress port, sorted by frequency of hits, reduces the average number of subtables lookups in the dpcls to a minimum, even if the total number of subtables gets large.

Did you consider keeping a separate dpcls instance per input port instead? I.e., have an hmap of dpcls'es using the in_port as the hash key, and creating and destroying them on demand. This would avoid the fixed size array, simplify the code and each insert and remove operation would be a bit cheaper. A miss would typically go though a lower number of subtables, even though the upcall cost would likely hide this benefit in practice. We are always exact matching in_port (as well as dl_type and recirc_id (see xlate_wc_init()), so this would add no overhead in terms of the overall number of megaflows.

> 
> The patch introduces 32 subtable vectors per dpcls and hashes the ingress port to select the subtable vector. The patch also counts matches per 32 slots in each vector (hashing the subtable pointer to obtain the slot) and sorts the vectors according to match frequency every second.
> 

Also, it should be possible to expand the pvector API to allow using the pvector priorities directly as the counts. I posted a patch for this (https://patchwork.ozlabs.org/patch/638920/), after which you could apply the following incremental. I haven't tested how this performs, though.

 Jarno

From: Jarno Rajahalme <jarno at ovn.org>
Date: Tue, 21 Jun 2016 17:17:55 -0700
Subject: [PATCH] dpif-netdev: Use non-concurrent pvector.
Cc: <jarno at ovn.org>

PMD threads do not need the concurrent version of the pvector.  The
non-concurrent pvector entry priorities can be incremented in place
and periodically sorted.

Signed-off-by: Jarno Rajahalme <jarno at ovn.org>
---
lib/dpif-netdev.c | 68 +++++++++++++++++++++++++++----------------------------
1 file changed, 34 insertions(+), 34 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c491110..647bfb7 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -151,12 +151,11 @@ struct emc_cache {

#define NUM_SUBTABLE_VECTORS 32
#define NUM_SUBTABLE_SLOTS 32
-#define DPCLS_OPTIMIATION_INTERVAL 1000
+#define DPCLS_OPTIMIZATION_INTERVAL 1000

struct dpcls {
    struct cmap subtables_map;
-    struct pvector subtables[NUM_SUBTABLE_VECTORS];
-    uint64_t hit_cnt[NUM_SUBTABLE_VECTORS][NUM_SUBTABLE_SLOTS];
+    struct pvector_impl *subtables[NUM_SUBTABLE_VECTORS];
    long long int next_optimization;
};

@@ -4371,11 +4370,10 @@ dpcls_init(struct dpcls *cls)
    int i;

    cmap_init(&cls->subtables_map);
-    for (i=0; i<NUM_SUBTABLE_VECTORS; i++) {
-        pvector_init(&cls->subtables[i]);
+    for (i = 0; i < NUM_SUBTABLE_VECTORS; i++) {
+        cls->subtables[i] = pvector_impl_alloc(PVECTOR_EXTRA_ALLOC);
    }
-    memset(cls->hit_cnt, 0, sizeof cls->hit_cnt);
-    cls->next_optimization = time_msec() + DPCLS_OPTIMIATION_INTERVAL;
+    cls->next_optimization = time_msec() + DPCLS_OPTIMIZATION_INTERVAL;
}

static void
@@ -4383,8 +4381,8 @@ dpcls_destroy_subtable(struct dpcls *cls, struct dpcls_subtable *subtable)
{
    int i;

-    for (i=0; i<NUM_SUBTABLE_VECTORS; i++) {
-        pvector_remove(&cls->subtables[i], subtable);
+    for (i = 0; i < NUM_SUBTABLE_VECTORS; i++) {
+        pvector_impl_remove(cls->subtables[i], subtable);
    }
    cmap_remove(&cls->subtables_map, &subtable->cmap_node,
                subtable->mask.hash);
@@ -4407,8 +4405,8 @@ dpcls_destroy(struct dpcls *cls)
            dpcls_destroy_subtable(cls, subtable);
        }
        cmap_destroy(&cls->subtables_map);
-        for (i=0; i<NUM_SUBTABLE_VECTORS; i++) {
-            pvector_destroy(&cls->subtables[i]);
+        for (i = 0; i < NUM_SUBTABLE_VECTORS; i++) {
+            free(cls->subtables[i]);
        }
    }
}
@@ -4417,7 +4415,6 @@ static struct dpcls_subtable *
dpcls_create_subtable(struct dpcls *cls, const struct netdev_flow_key *mask)
{
    struct dpcls_subtable *subtable;
-    int i;

    /* Need to add one. */
    subtable = xmalloc(sizeof *subtable
@@ -4425,10 +4422,10 @@ dpcls_create_subtable(struct dpcls *cls, const struct netdev_flow_key *mask)
    cmap_init(&subtable->rules);
    netdev_flow_key_clone(&subtable->mask, mask);
    cmap_insert(&cls->subtables_map, &subtable->cmap_node, mask->hash);
-    for (i=0; i<NUM_SUBTABLE_VECTORS; i++) {
+
+    for (int i = 0; i < NUM_SUBTABLE_VECTORS; i++) {
        /* Insert new subtable with priority zero (no hits) */
-        pvector_insert(&cls->subtables[i], subtable, 0);
-        pvector_publish(&cls->subtables[i]);
+        pvector_impl_push_back(&cls->subtables[i], subtable, 0);
    }

    return subtable;
@@ -4456,22 +4453,22 @@ dpcls_try_optimize(struct dpcls *cls)
    long long int now = time_msec();

    if (now > cls->next_optimization) {
-        struct dpcls_subtable *subtable;
-        int port_idx, subtable_idx;
-
-        for (port_idx=0; port_idx<NUM_SUBTABLE_VECTORS; port_idx++) {
-            struct pvector *pvec = &cls->subtables[port_idx];
-            PVECTOR_FOR_EACH (subtable, pvec) {
-                subtable_idx = hash_pointer(subtable, 210365) % NUM_SUBTABLE_SLOTS;
-                pvector_change_priority(pvec, subtable,
-                        cls->hit_cnt[port_idx][subtable_idx]);
+        int port_idx;
+
+        for (port_idx = 0; port_idx < NUM_SUBTABLE_VECTORS; port_idx++) {
+            struct pvector_impl *pvec = cls->subtables[port_idx];
+            struct pvector_entry *entry;
+            int index;
+
+            pvector_impl_sort(pvec);
+
+            PVECTOR_IMPL_FOR_EACH_ENTRY (entry, index, pvec) {
+                entry->priority = 0;
            }
-            pvector_publish(pvec);
        }

        /* Start new measuring interval */
-        memset(cls->hit_cnt, 0, sizeof cls->hit_cnt);
-        cls->next_optimization = now + DPCLS_OPTIMIATION_INTERVAL;
+        cls->next_optimization = now + DPCLS_OPTIMIZATION_INTERVAL;
    }
}

@@ -4500,8 +4497,9 @@ dpcls_remove(struct dpcls *cls, struct dpcls_rule *rule)
    if (cmap_remove(&subtable->rules, &rule->cmap_node, rule->flow.hash)
        == 0) {
        dpcls_destroy_subtable(cls, subtable);
-        for (i=0; i<NUM_SUBTABLE_VECTORS; i++) {
-            pvector_publish(&cls->subtables[i]);
+
+        for (i = 0; i < NUM_SUBTABLE_VECTORS; i++) {
+            pvector_impl_sort(cls->subtables[i]); /* Removes gaps. */
        }
    }
}
@@ -4559,10 +4557,11 @@ dpcls_lookup(struct dpcls *cls, const struct netdev_flow_key keys[],

    /* Select the subtable vector for the in_port */
    uint32_t port_idx = hash_port_no(port_no) % NUM_SUBTABLE_VECTORS;
-    uint64_t *hit_cnt = cls->hit_cnt[port_idx];
    int lookups_match = 0, subtable_pos = 1;
+    struct pvector_entry *entry;
+    int index;

-    PVECTOR_FOR_EACH (subtable, &cls->subtables[port_idx]) {
+    PVECTOR_IMPL_FOR_EACH_ENTRY (entry, index, cls->subtables[port_idx]) {
        const struct netdev_flow_key *mkeys = keys;
        struct dpcls_rule **mrules = rules;
        map_type remains = 0;
@@ -4570,6 +4569,8 @@ dpcls_lookup(struct dpcls *cls, const struct netdev_flow_key keys[],

        BUILD_ASSERT_DECL(sizeof remains == sizeof *maps);

+        subtable = entry->ptr;
+
        for (m = 0; m < N_MAPS; m++, mkeys += MAP_BITS, mrules += MAP_BITS) {
            uint32_t hashes[MAP_BITS];
            const struct cmap_node *nodes[MAP_BITS];
@@ -4594,9 +4595,8 @@ dpcls_lookup(struct dpcls *cls, const struct netdev_flow_key keys[],
                CMAP_NODE_FOR_EACH (rule, cmap_node, nodes[i]) {
                    if (OVS_LIKELY(dpcls_rule_matches_key(rule, &mkeys[i]))) {
                        mrules[i] = rule;
-                        /* Update the subtable hit statistics for the in_port vector */
-                        int subtable_idx = hash_pointer(subtable, 210365) % NUM_SUBTABLE_SLOTS;
-                        hit_cnt[subtable_idx]++;
+                        /* Update the subtable hit stats.  */
+                        entry->priority++;
                        lookups_match += subtable_pos;
                        goto next;
                    }
-- 
2.1.4





More information about the dev mailing list