[ovs-dev] [PATCH] dpif-netdev: Refactor datapath flow cache
Ferriter, Cian
cian.ferriter at intel.com
Mon Dec 11 16:28:43 UTC 2017
Hi Jan,
This is a very interesting patch with compelling results.
I had a few questions after reading the commit message for this patch:
How did you decide on 1M as the proposed size for the DFC?
Will the size of this DFC be configurable (i.e. map a different number of hash bits)?
I was mostly interested in how the performance of the very basic phy2phy testcase would be affected by this change.
Below are my results from 1-10000 flows, 64B packets, unidirectional with 1 PMD. One OpenFlow rule is installed for each stream of traffic being sent:
Flows master DFC+EMC Gain
[Mpps] [Mpps]
------------------------------
1 12.34 13.12 6.3%
10 7.83 8.19 4.5%
100 4.71 4.78 1.3%
1000 3.77 3.83 1.5%
10000 3.27 3.51 7.1%
This shows a performance improvement for this testcase also.
Tested-by: Cian Ferriter <cian.ferriter at intel.com>
Let me know your thoughts on the above questions.
Thanks,
Cian
> -----Original Message-----
> From: ovs-dev-bounces at openvswitch.org [mailto:ovs-dev-
> bounces at openvswitch.org] On Behalf Of Jan Scheurich
> Sent: 20 November 2017 17:33
> To: dev at openvswitch.org
> Subject: [ovs-dev] [PATCH] dpif-netdev: Refactor datapath flow cache
>
> So far the netdev datapath uses an 8K EMC to speed up the lookup of
> frequently used flows by comparing the parsed packet headers against the
> miniflow of a cached flow, using 13 bits of the packet RSS hash as index. The
> EMC is too small for many applications with 100K or more parallel packet
> flows so that EMC threshing actually degrades performance.
> Furthermore, the size of struct miniflow and the flow copying cost prevents
> us from making it much larger.
>
> At the same time the lookup cost of the megaflow classifier (DPCLS) is
> increasing as the number of frequently hit subtables grows with the
> complexity of pipeline and the number of recirculations.
>
> To close the performance gap for many parallel flows, this patch introduces
> the datapath flow cache (DFC) with 1M entries as lookup stage between EMC
> and DPCLS. It directly maps 20 bits of the RSS hash to a pointer to the last hit
> megaflow entry and performs a masked comparison of the packet flow with
> the megaflow key to confirm the hit. This avoids the costly DPCLS lookup
> even for very large number of parallel flows with a small memory overhead.
>
> Due the large size of the DFC and the low risk of DFC thrashing, any DPCLS hit
> immediately inserts an entry in the DFC so that subsequent packets get
> speeded up. The DFC, thus, accelerate also short-lived flows.
>
> To further accelerate the lookup of few elephant flows, every DFC hit
> triggers a probabilistic EMC insertion of the flow. As the DFC entry is already
> in place the default EMC insertion probability can be reduced to
> 1/1000 to minimize EMC thrashing should there still be many fat flows.
> The inverse EMC insertion probability remains configurable.
>
> The EMC implementation is simplified by removing the possibility to store a
> flow in two slots, as there is no particular reason why two flows should
> systematically collide (the RSS hash is not symmetric).
> The maximum size of the EMC flow key is limited to 256 bytes to reduce the
> memory footprint. This should be sufficient to hold most real life packet flow
> keys. Larger flows are not installed in the EMC.
>
> The pmd-stats-show command is enhanced to show both EMC and DFC hits
> separately.
>
> The sweep speed for cleaning up obsolete EMC and DFC flow entries and
> freeing dead megaflow entries is increased. With a typical PMD cycle
> duration of 100us under load and checking one DFC entry per cycle, the DFC
> sweep should normally complete within in 100s.
>
> In PVP performance tests with an L3 pipeline over VXLAN we determined the
> optimal EMC size to be 16K entries to obtain a uniform speedup compared to
> the master branch over the full range of parallel flows. The measurement
> below is for 64 byte packets and the average number of subtable lookups per
> DPCLS hit in this pipeline is 1.0, i.e. the acceleration already starts for a single
> busy mask. Tests with many visited subtables should show a strong increase
> of the gain through DFC.
>
> Flows master DFC+EMC Gain
> [Mpps] [Mpps]
> ------------------------------
> 8 4.45 4.62 3.8%
> 100 4.17 4.47 7.2%
> 1000 3.88 4.34 12.0%
> 2000 3.54 4.17 17.8%
> 5000 3.01 3.82 27.0%
> 10000 2.75 3.63 31.9%
> 20000 2.64 3.50 32.8%
> 50000 2.60 3.33 28.1%
> 100000 2.59 3.23 24.7%
> 500000 2.59 3.16 21.9%
>
>
> Signed-off-by: Jan Scheurich <jan.scheurich at ericsson.com>
> ---
> lib/dpif-netdev.c | 349 ++++++++++++++++++++++++++++++++++++-------
> -----------
> 1 file changed, 235 insertions(+), 114 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index db78318..efcf2e9
> 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -127,19 +127,19 @@ struct netdev_flow_key {
> uint64_t buf[FLOW_MAX_PACKET_U64S]; };
>
> -/* Exact match cache for frequently used flows
> +/* Datapath flow cache (DFC) for frequently used flows
> *
> - * The cache uses a 32-bit hash of the packet (which can be the RSS hash) to
> - * search its entries for a miniflow that matches exactly the miniflow of the
> - * packet. It stores the 'dpcls_rule' (rule) that matches the miniflow.
> + * The cache uses the 32-bit hash of the packet (which can be the RSS
> + hash) to
> + * directly look up a pointer to the matching megaflow. To check for a
> + match
> + * the packet's flow key is compared against the key and mask of the
> megaflow.
> *
> - * A cache entry holds a reference to its 'dp_netdev_flow'.
> - *
> - * A miniflow with a given hash can be in one of EM_FLOW_HASH_SEGS
> different
> - * entries. The 32-bit hash is split into EM_FLOW_HASH_SEGS values (each
> of
> - * them is EM_FLOW_HASH_SHIFT bits wide and the remainder is thrown
> away). Each
> - * value is the index of a cache entry where the miniflow could be.
> + * For even faster lookup, the most frequently used packet flows are
> + also
> + * inserted into a small exact match cache (EMC). The EMC uses a part
> + of the
> + * packet hash to look up a miniflow that matches exactly the miniflow
> + of the
> + * packet. The matching EMC also returns a reference to the megaflow.
> *
> + * Flows are promoted from the DFC to the EMC through probabilistic
> + insertion
> + * after successful DFC lookup with minor probability to favor elephant
> flows.
> *
> * Thread-safety
> * =============
> @@ -148,33 +148,38 @@ struct netdev_flow_key {
> * If dp_netdev_input is not called from a pmd thread, a mutex is used.
> */
>
> -#define EM_FLOW_HASH_SHIFT 13
> -#define EM_FLOW_HASH_ENTRIES (1u << EM_FLOW_HASH_SHIFT) -
> #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1) -#define
> EM_FLOW_HASH_SEGS 2
> +#define DFC_MASK_LEN 20
> +#define DFC_ENTRIES (1u << DFC_MASK_LEN) #define DFC_MASK
> (DFC_ENTRIES
> +- 1) #define EMC_MASK_LEN 14 #define EMC_ENTRIES (1u <<
> EMC_MASK_LEN)
> +#define EMC_MASK (EMC_ENTRIES - 1)
>
> /* Default EMC insert probability is 1 /
> DEFAULT_EM_FLOW_INSERT_INV_PROB */ -#define
> DEFAULT_EM_FLOW_INSERT_INV_PROB 100
> +#define DEFAULT_EM_FLOW_INSERT_INV_PROB 1000
> #define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX / \
> DEFAULT_EM_FLOW_INSERT_INV_PROB)
>
> struct emc_entry {
> struct dp_netdev_flow *flow;
> - struct netdev_flow_key key; /* key.hash used for emc hash value. */
> + char key[248]; /* Holds struct netdev_flow_key of limited size. */
> };
>
> struct emc_cache {
> - struct emc_entry entries[EM_FLOW_HASH_ENTRIES];
> - int sweep_idx; /* For emc_cache_slow_sweep(). */
> + struct emc_entry entries[EMC_ENTRIES];
> + int sweep_idx;
> +};
> +
> +struct dfc_entry {
> + struct dp_netdev_flow *flow;
> +};
> +
> +struct dfc_cache {
> + struct emc_cache emc_cache;
> + struct dfc_entry entries[DFC_ENTRIES];
> + int sweep_idx;
> };
>
> -/* Iterate in the exact match cache through every entry that might contain a
> - * miniflow with hash 'HASH'. */
> -#define EMC_FOR_EACH_POS_WITH_HASH(EMC, CURRENT_ENTRY, HASH)
> \
> - for (uint32_t i__ = 0, srch_hash__ = (HASH); \
> - (CURRENT_ENTRY) = &(EMC)->entries[srch_hash__ &
> EM_FLOW_HASH_MASK], \
> - i__ < EM_FLOW_HASH_SEGS; \
> - i__++, srch_hash__ >>= EM_FLOW_HASH_SHIFT)
>
> /* Simple non-wildcarding single-priority classifier. */
>
> @@ -214,6 +219,8 @@ static bool dpcls_lookup(struct dpcls *cls,
> const struct netdev_flow_key keys[],
> struct dpcls_rule **rules, size_t cnt,
> int *num_lookups_p);
> +static bool dpcls_rule_matches_key(const struct dpcls_rule *rule,
> + const struct netdev_flow_key
> +*target);
>
> /* Set of supported meter flags */
> #define DP_SUPPORTED_METER_FLAGS_MASK \ @@ -332,6 +339,7 @@
> static struct dp_netdev_port *dp_netdev_lookup_port(const struct
> dp_netdev *dp,
>
> enum dp_stat_type {
> DP_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */
> + DP_STAT_DFC_HIT, /* Packets that had a flow cache hit (dfc). */
> DP_STAT_MASKED_HIT, /* Packets that matched in the flow table. */
> DP_STAT_MISS, /* Packets that did not match. */
> DP_STAT_LOST, /* Packets not passed up to the client. */
> @@ -565,7 +573,7 @@ struct dp_netdev_pmd_thread {
> * NON_PMD_CORE_ID can be accessed by multiple threads, and thusly
> * need to be protected by 'non_pmd_mutex'. Every other instance
> * will only be accessed by its own pmd thread. */
> - OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct emc_cache flow_cache;
> + OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct dfc_cache flow_cache;
> struct ovs_refcount ref_cnt; /* Every reference must be refcount'ed. */
>
> /* Queue id used by this pmd thread to send packets on all netdevs if @@
> -744,46 +752,59 @@ dpif_netdev_xps_revalidate_pmd(const struct
> dp_netdev_pmd_thread *pmd, static int
> dpif_netdev_xps_get_tx_qid(const struct dp_netdev_pmd_thread *pmd,
> struct tx_port *tx, long long now);
>
> -static inline bool emc_entry_alive(struct emc_entry *ce);
> +static inline bool dfc_entry_alive(struct dfc_entry *ce);
> static void emc_clear_entry(struct emc_entry *ce);
> +static void dfc_clear_entry(struct dfc_entry *ce);
>
> static void dp_netdev_request_reconfigure(struct dp_netdev *dp);
>
> static void
> -emc_cache_init(struct emc_cache *flow_cache)
> +emc_cache_init(struct emc_cache *emc)
> {
> int i;
>
> - flow_cache->sweep_idx = 0;
> + for (i = 0; i < ARRAY_SIZE(emc->entries); i++) {
> + emc->entries[i].flow = NULL;
> + struct netdev_flow_key *key =
> + (struct netdev_flow_key *) emc->entries[i].key;
> + key->hash = 0;
> + key->len = sizeof(struct miniflow);
> + flowmap_init(&key->mf.map);
> + }
> + emc->sweep_idx = 0;
> +}
> +
> +static void
> +dfc_cache_init(struct dfc_cache *flow_cache) {
> + int i;
> +
> + emc_cache_init(&flow_cache->emc_cache);
> for (i = 0; i < ARRAY_SIZE(flow_cache->entries); i++) {
> flow_cache->entries[i].flow = NULL;
> - flow_cache->entries[i].key.hash = 0;
> - flow_cache->entries[i].key.len = sizeof(struct miniflow);
> - flowmap_init(&flow_cache->entries[i].key.mf.map);
> }
> + flow_cache->sweep_idx = 0;
> }
>
> static void
> -emc_cache_uninit(struct emc_cache *flow_cache)
> +emc_cache_uninit(struct emc_cache *emc)
> {
> int i;
>
> - for (i = 0; i < ARRAY_SIZE(flow_cache->entries); i++) {
> - emc_clear_entry(&flow_cache->entries[i]);
> + for (i = 0; i < ARRAY_SIZE(emc->entries); i++) {
> + emc_clear_entry(&emc->entries[i]);
> }
> }
>
> -/* Check and clear dead flow references slowly (one entry at each
> - * invocation). */
> static void
> -emc_cache_slow_sweep(struct emc_cache *flow_cache)
> +dfc_cache_uninit(struct dfc_cache *flow_cache)
> {
> - struct emc_entry *entry = &flow_cache->entries[flow_cache-
> >sweep_idx];
> + int i;
>
> - if (!emc_entry_alive(entry)) {
> - emc_clear_entry(entry);
> + for (i = 0; i < ARRAY_SIZE(flow_cache->entries); i++) {
> + dfc_clear_entry(&flow_cache->entries[i]);
> }
> - flow_cache->sweep_idx = (flow_cache->sweep_idx + 1) &
> EM_FLOW_HASH_MASK;
> + emc_cache_uninit(&flow_cache->emc_cache);
> }
>
> /* Returns true if 'dpif' is a netdev or dummy dpif, false otherwise. */ @@ -
> 837,8 +858,8 @@ pmd_info_show_stats(struct ds *reply,
> }
>
> /* Sum of all the matched and not matched packets gives the total. */
> - total_packets = stats[DP_STAT_EXACT_HIT] +
> stats[DP_STAT_MASKED_HIT]
> - + stats[DP_STAT_MISS];
> + total_packets = stats[DP_STAT_EXACT_HIT] + stats[DP_STAT_DFC_HIT]
> + + stats[DP_STAT_MASKED_HIT] + stats[DP_STAT_MISS];
>
> for (i = 0; i < PMD_N_CYCLES; i++) {
> if (cycles[i] > pmd->cycles_zero[i]) { @@ -862,10 +883,13 @@
> pmd_info_show_stats(struct ds *reply,
> ds_put_cstr(reply, ":\n");
>
> ds_put_format(reply,
> - "\temc hits:%llu\n\tmegaflow hits:%llu\n"
> + "\temc hits:%llu\n\tdfc hits:%llu\n"
> + "\tmegaflow hits:%llu\n"
> "\tavg. subtable lookups per hit:%.2f\n"
> "\tmiss:%llu\n\tlost:%llu\n",
> - stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT],
> + stats[DP_STAT_EXACT_HIT],
> + stats[DP_STAT_DFC_HIT],
> + stats[DP_STAT_MASKED_HIT],
> stats[DP_STAT_MASKED_HIT] > 0
> ?
> (1.0*stats[DP_STAT_LOOKUP_HIT])/stats[DP_STAT_MASKED_HIT]
> : 0,
> @@ -1492,6 +1516,8 @@ dpif_netdev_get_stats(const struct dpif *dpif,
> struct dpif_dp_stats *stats)
> stats->n_hit += n;
> atomic_read_relaxed(&pmd->stats.n[DP_STAT_EXACT_HIT], &n);
> stats->n_hit += n;
> + atomic_read_relaxed(&pmd->stats.n[DP_STAT_DFC_HIT], &n);
> + stats->n_hit += n;
> atomic_read_relaxed(&pmd->stats.n[DP_STAT_MISS], &n);
> stats->n_missed += n;
> atomic_read_relaxed(&pmd->stats.n[DP_STAT_LOST], &n); @@ -2111,6
> +2137,16 @@ netdev_flow_key_hash_in_mask(const struct
> netdev_flow_key *key,
> return hash_finish(hash, (p - miniflow_get_values(&mask->mf)) * 8); }
>
> +/*
> + * Datapath Flow Cache and EMC implementation */
> +
> +static inline struct emc_entry *
> +emc_entry_get(struct emc_cache *emc, const uint32_t hash) {
> + return &emc->entries[hash & EMC_MASK]; }
> +
> static inline bool
> emc_entry_alive(struct emc_entry *ce)
> {
> @@ -2127,9 +2163,15 @@ emc_clear_entry(struct emc_entry *ce) }
>
> static inline void
> -emc_change_entry(struct emc_entry *ce, struct dp_netdev_flow *flow,
> - const struct netdev_flow_key *key)
> +emc_change_entry(struct emc_entry *ce, const struct netdev_flow_key
> *key,
> + struct dp_netdev_flow *flow)
> {
> + /* We only store small enough flows in the EMC. */
> + size_t key_size = offsetof(struct netdev_flow_key, mf) + key->len;
> + if (key_size > sizeof(ce->key)) {
> + return;
> + }
> +
> if (ce->flow != flow) {
> if (ce->flow) {
> dp_netdev_flow_unref(ce->flow); @@ -2141,73 +2183,148 @@
> emc_change_entry(struct emc_entry *ce, struct dp_netdev_flow *flow,
> ce->flow = NULL;
> }
> }
> - if (key) {
> - netdev_flow_key_clone(&ce->key, key);
> - }
> + netdev_flow_key_clone((struct netdev_flow_key *) ce->key, key);
> }
>
> static inline void
> -emc_insert(struct emc_cache *cache, const struct netdev_flow_key *key,
> - struct dp_netdev_flow *flow)
> +emc_probabilistic_insert(struct emc_cache *emc,
> + const struct netdev_flow_key *key,
> + struct dp_netdev_flow *flow)
> {
> - struct emc_entry *to_be_replaced = NULL;
> - struct emc_entry *current_entry;
> + const uint32_t threshold = UINT32_MAX/1000;
>
> - EMC_FOR_EACH_POS_WITH_HASH(cache, current_entry, key->hash) {
> - if (netdev_flow_key_equal(¤t_entry->key, key)) {
> - /* We found the entry with the 'mf' miniflow */
> - emc_change_entry(current_entry, flow, NULL);
> - return;
> + if (random_uint32() <= threshold) {
> +
> + struct emc_entry *current_entry = emc_entry_get(emc, key->hash);
> + emc_change_entry(current_entry, key, flow);
> + }
> +}
> +
> +static inline struct dp_netdev_flow *
> +emc_lookup(struct emc_cache *emc, const struct netdev_flow_key *key) {
> + struct emc_entry *current_entry = emc_entry_get(emc, key->hash);
> + struct netdev_flow_key *current_key =
> + (struct netdev_flow_key *) current_entry->key;
> +
> + if (current_key->hash == key->hash
> + && emc_entry_alive(current_entry)
> + && netdev_flow_key_equal_mf(current_key, &key->mf)) {
> +
> + /* We found the entry with the 'key->mf' miniflow */
> + return current_entry->flow;
> + }
> + return NULL;
> +}
> +
> +static inline struct dfc_entry *
> +dfc_entry_get(struct dfc_cache *cache, const uint32_t hash) {
> + return &cache->entries[hash & DFC_MASK]; }
> +
> +static inline bool
> +dfc_entry_alive(struct dfc_entry *ce)
> +{
> + return ce->flow && !ce->flow->dead; }
> +
> +static void
> +dfc_clear_entry(struct dfc_entry *ce)
> +{
> + if (ce->flow) {
> + dp_netdev_flow_unref(ce->flow);
> + ce->flow = NULL;
> + }
> +}
> +
> +static inline void
> +dfc_change_entry(struct dfc_entry *ce, struct dp_netdev_flow *flow) {
> + if (ce->flow != flow) {
> + if (ce->flow) {
> + dp_netdev_flow_unref(ce->flow);
> }
>
> - /* Replacement policy: put the flow in an empty (not alive) entry, or
> - * in the first entry where it can be */
> - if (!to_be_replaced
> - || (emc_entry_alive(to_be_replaced)
> - && !emc_entry_alive(current_entry))
> - || current_entry->key.hash < to_be_replaced->key.hash) {
> - to_be_replaced = current_entry;
> + if (dp_netdev_flow_ref(flow)) {
> + ce->flow = flow;
> + } else {
> + ce->flow = NULL;
> }
> }
> - /* We didn't find the miniflow in the cache.
> - * The 'to_be_replaced' entry is where the new flow will be stored */
> -
> - emc_change_entry(to_be_replaced, flow, key);
> }
>
> static inline void
> -emc_probabilistic_insert(struct dp_netdev_pmd_thread *pmd,
> - const struct netdev_flow_key *key,
> - struct dp_netdev_flow *flow)
> +dfc_insert(struct dp_netdev_pmd_thread *pmd,
> + const struct netdev_flow_key *key,
> + struct dp_netdev_flow *flow)
> {
> - /* Insert an entry into the EMC based on probability value 'min'. By
> - * default the value is UINT32_MAX / 100 which yields an insertion
> - * probability of 1/100 ie. 1% */
> + struct dfc_cache *cache = &pmd->flow_cache;
> + struct dfc_entry *current_entry;
> +
> + current_entry = dfc_entry_get(cache, key->hash);
> + dfc_change_entry(current_entry, flow);
> +}
> +
> +static inline struct dp_netdev_flow *
> +dfc_lookup(struct dfc_cache *cache, const struct netdev_flow_key *key,
> + bool *exact_match)
> +{
> + struct dp_netdev_flow *flow;
>
> - uint32_t min;
> - atomic_read_relaxed(&pmd->dp->emc_insert_min, &min);
> + /* Try an EMC lookup first. */
> + flow = emc_lookup(&cache->emc_cache, key);
> + if (flow) {
> + *exact_match = true;
> + return flow;
> + }
> +
> + /* EMC lookup not successful: try DFC lookup. */
> + struct dfc_entry *current_entry = dfc_entry_get(cache, key->hash);
> + flow = current_entry->flow;
>
> - if (min && random_uint32() <= min) {
> - emc_insert(&pmd->flow_cache, key, flow);
> + if (dfc_entry_alive(current_entry) &&
> + dpcls_rule_matches_key(&flow->cr, key)) {
> +
> + /* Found a match in DFC. Insert into EMC for subsequent lookups.
> + * We use probabilistic insertion here so that mainly elephant
> + * flows enter EMC. */
> + emc_probabilistic_insert(&cache->emc_cache, key, flow);
> + *exact_match = false;
> + return flow;
> + } else {
> +
> + /* No match. Need to go to DPCLS lookup. */
> + return NULL;
> }
> }
>
> -static inline struct dp_netdev_flow *
> -emc_lookup(struct emc_cache *cache, const struct netdev_flow_key *key)
> +/* Check and clear dead flow references slowly (one entry at each
> + * invocation). */
> +static void
> +emc_slow_sweep(struct emc_cache *emc)
> {
> - struct emc_entry *current_entry;
> + struct emc_entry *entry = &emc->entries[emc->sweep_idx];
>
> - EMC_FOR_EACH_POS_WITH_HASH(cache, current_entry, key->hash) {
> - if (current_entry->key.hash == key->hash
> - && emc_entry_alive(current_entry)
> - && netdev_flow_key_equal_mf(¤t_entry->key, &key->mf)) {
> + if (!emc_entry_alive(entry)) {
> + emc_clear_entry(entry);
> + }
> + emc->sweep_idx = (emc->sweep_idx + 1) & EMC_MASK;
> +}
>
> - /* We found the entry with the 'key->mf' miniflow */
> - return current_entry->flow;
> - }
> +static void
> +dfc_slow_sweep(struct dfc_cache *cache)
> +{
> + /* Sweep the EMC so that both finish in the same time. */
> + if ((cache->sweep_idx & (DFC_ENTRIES/EMC_ENTRIES - 1)) == 0) {
> + emc_slow_sweep(&cache->emc_cache);
> }
>
> - return NULL;
> + struct dfc_entry *entry = &cache->entries[cache->sweep_idx];
> + if (!dfc_entry_alive(entry)) {
> + dfc_clear_entry(entry);
> + }
> + cache->sweep_idx = (cache->sweep_idx + 1) & DFC_MASK;
> }
>
> static struct dp_netdev_flow *
> @@ -4048,7 +4165,7 @@ pmd_thread_main(void *f_)
> ovs_numa_thread_setaffinity_core(pmd->core_id);
> dpdk_set_lcore_id(pmd->core_id);
> poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);
> - emc_cache_init(&pmd->flow_cache);
> + dfc_cache_init(&pmd->flow_cache);
> reload:
> pmd_alloc_static_tx_qid(pmd);
>
> @@ -4078,17 +4195,16 @@ reload:
> : PMD_CYCLES_IDLE);
> }
>
> - if (lc++ > 1024) {
> - bool reload;
> + dfc_slow_sweep(&pmd->flow_cache);
>
> + if (lc++ > 1024) {
> lc = 0;
>
> coverage_try_clear();
> dp_netdev_pmd_try_optimize(pmd, poll_list, poll_cnt);
> - if (!ovsrcu_try_quiesce()) {
> - emc_cache_slow_sweep(&pmd->flow_cache);
> - }
> + ovsrcu_try_quiesce();
>
> + bool reload;
> atomic_read_relaxed(&pmd->reload, &reload);
> if (reload) {
> break;
> @@ -4110,7 +4226,7 @@ reload:
> goto reload;
> }
>
> - emc_cache_uninit(&pmd->flow_cache);
> + dfc_cache_uninit(&pmd->flow_cache);
> free(poll_list);
> pmd_free_cached_ports(pmd);
> return NULL;
> @@ -4544,7 +4660,7 @@ dp_netdev_configure_pmd(struct
> dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
> /* init the 'flow_cache' since there is no
> * actual thread created for NON_PMD_CORE_ID. */
> if (core_id == NON_PMD_CORE_ID) {
> - emc_cache_init(&pmd->flow_cache);
> + dfc_cache_init(&pmd->flow_cache);
> pmd_alloc_static_tx_qid(pmd);
> }
> cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *,
> &pmd->node),
> @@ -4586,7 +4702,7 @@ dp_netdev_del_pmd(struct dp_netdev *dp, struct
> dp_netdev_pmd_thread *pmd)
> * but extra cleanup is necessary */
> if (pmd->core_id == NON_PMD_CORE_ID) {
> ovs_mutex_lock(&dp->non_pmd_mutex);
> - emc_cache_uninit(&pmd->flow_cache);
> + dfc_cache_uninit(&pmd->flow_cache);
> pmd_free_cached_ports(pmd);
> pmd_free_static_tx_qid(pmd);
> ovs_mutex_unlock(&dp->non_pmd_mutex);
> @@ -4896,7 +5012,7 @@ dp_netdev_queue_batches(struct dp_packet *pkt,
> packet_batch_per_flow_update(batch, pkt, mf);
> }
>
> -/* Try to process all ('cnt') the 'packets' using only the exact match cache
> +/* Try to process all ('cnt') the 'packets' using only the PMD flow cache
> * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', the
> * miniflow is copied into 'keys' and the packet pointer is moved at the
> * beginning of the 'packets' array.
> @@ -4911,19 +5027,20 @@ dp_netdev_queue_batches(struct dp_packet
> *pkt,
> * will be ignored.
> */
> static inline size_t
> -emc_processing(struct dp_netdev_pmd_thread *pmd,
> +dfc_processing(struct dp_netdev_pmd_thread *pmd,
> struct dp_packet_batch *packets_,
> struct netdev_flow_key *keys,
> struct packet_batch_per_flow batches[], size_t *n_batches,
> bool md_is_valid, odp_port_t port_no)
> {
> - struct emc_cache *flow_cache = &pmd->flow_cache;
> + struct dfc_cache *flow_cache = &pmd->flow_cache;
> struct netdev_flow_key *key = &keys[0];
> - size_t n_missed = 0, n_dropped = 0;
> + size_t n_missed = 0, n_dfc_hit = 0, n_emc_hit = 0;
> struct dp_packet *packet;
> const size_t cnt = dp_packet_batch_size(packets_);
> uint32_t cur_min;
> int i;
> + bool exact_match;
>
> atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min);
>
> @@ -4932,7 +5049,6 @@ emc_processing(struct dp_netdev_pmd_thread
> *pmd,
>
> if (OVS_UNLIKELY(dp_packet_size(packet) < ETH_HEADER_LEN)) {
> dp_packet_delete(packet);
> - n_dropped++;
> continue;
> }
>
> @@ -4948,7 +5064,7 @@ emc_processing(struct dp_netdev_pmd_thread
> *pmd,
> }
> miniflow_extract(packet, &key->mf);
> key->len = 0; /* Not computed yet. */
> - /* If EMC is disabled skip hash computation and emc_lookup */
> + /* If DFC is disabled skip hash computation and DFC lookup */
> if (cur_min) {
> if (!md_is_valid) {
> key->hash = dpif_netdev_packet_get_rss_hash_orig_pkt(packet,
> @@ -4956,11 +5072,16 @@ emc_processing(struct dp_netdev_pmd_thread
> *pmd,
> } else {
> key->hash = dpif_netdev_packet_get_rss_hash(packet, &key->mf);
> }
> - flow = emc_lookup(flow_cache, key);
> + flow = dfc_lookup(flow_cache, key, &exact_match);
> } else {
> flow = NULL;
> }
> if (OVS_LIKELY(flow)) {
> + if (exact_match) {
> + n_emc_hit++;
> + } else {
> + n_dfc_hit++;
> + }
> dp_netdev_queue_batches(packet, flow, &key->mf, batches,
> n_batches);
> } else {
> @@ -4974,8 +5095,8 @@ emc_processing(struct dp_netdev_pmd_thread
> *pmd,
> }
> }
>
> - dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT,
> - cnt - n_dropped - n_missed);
> + dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT, n_emc_hit);
> + dp_netdev_count_packet(pmd, DP_STAT_DFC_HIT, n_dfc_hit);
>
> return dp_packet_batch_size(packets_);
> }
> @@ -5044,7 +5165,7 @@ handle_packet_upcall(struct
> dp_netdev_pmd_thread *pmd,
> add_actions->size);
> }
> ovs_mutex_unlock(&pmd->flow_mutex);
> - emc_probabilistic_insert(pmd, key, netdev_flow);
> + dfc_insert(pmd, key, netdev_flow);
> }
> }
>
> @@ -5136,7 +5257,7 @@ fast_path_processing(struct
> dp_netdev_pmd_thread *pmd,
>
> flow = dp_netdev_flow_cast(rules[i]);
>
> - emc_probabilistic_insert(pmd, &keys[i], flow);
> + dfc_insert(pmd, &keys[i], flow);
> dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches,
> n_batches);
> }
>
> @@ -5169,7 +5290,7 @@ dp_netdev_input__(struct
> dp_netdev_pmd_thread *pmd,
> odp_port_t in_port;
>
> n_batches = 0;
> - emc_processing(pmd, packets, keys, batches, &n_batches,
> + dfc_processing(pmd, packets, keys, batches, &n_batches,
> md_is_valid, port_no);
> if (!dp_packet_batch_is_empty(packets)) {
> /* Get ingress port from first packet's metadata. */
> @@ -6063,7 +6184,7 @@ dpcls_remove(struct dpcls *cls, struct dpcls_rule
> *rule)
>
> /* Returns true if 'target' satisfies 'key' in 'mask', that is, if each 1-bit
> * in 'mask' the values in 'key' and 'target' are the same. */
> -static inline bool
> +static bool
> dpcls_rule_matches_key(const struct dpcls_rule *rule,
> const struct netdev_flow_key *target)
> {
> --
> 1.9.1
>
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
More information about the dev
mailing list