[ovs-dev] [PATCH v2] dpif-netdev: dpcls per in_port with sorted subtables
Fischetti, Antonio
antonio.fischetti at intel.com
Tue Jul 19 15:22:32 UTC 2016
Hi Jan,
I added some comments prefixed with [Antonio F].
Thanks,
Antonio
> -----Original Message-----
> From: dev [mailto:dev-bounces at openvswitch.org] On Behalf Of Jan
> Scheurich
> Sent: Friday, July 15, 2016 5:35 PM
> To: dev at openvswitch.org
> Subject: [ovs-dev] [PATCH v2] dpif-netdev: dpcls per in_port with
> sorted subtables
>
> This turns the previous RFC PATCH dpif-netdev: dpcls per in_port with
> sorted
> subtables into a non-RFC patch v2.
>
> The user-space datapath (dpif-netdev) consists of a first level
> "exact match
> cache" (EMC) matching on 5-tuples and the normal megaflow classifier.
> With
> many parallel packet flows (e.g. TCP connections) the EMC becomes
> inefficient
> and the OVS forwarding performance is determined by the megaflow
> classifier.
>
> The megaflow classifier (dpcls) consists of a variable number of hash
> tables
> (aka subtables), each containing megaflow entries with the same mask
> of
> packet header and metadata fields to match upon. A dpcls lookup
> matches a
> given packet against all subtables in sequence until it hits a match.
> As
> megaflow cache entries are by construction non-overlapping, the first
> match
> is the only match.
>
> Today the order of the subtables in the dpcls is essentially random
> so that
> on average a dpcsl lookup has to visit N/2 subtables for a hit, when
> N is the
> total number of subtables. Even though every single hash-table lookup
> is
> fast, the performance of the current dpcls degrades when there are
> many
> subtables.
>
> How does the patch address this issue:
>
> In reality there is often a strong correlation between the ingress
> port and a
> small subset of subtables that have hits. The entire megaflow cache
> typically
> decomposes nicely into partitions that are hit only by packets
> entering from
> a range of similar ports (e.g. traffic from Phy -> VM vs. traffic
> from VM ->
> Phy).
>
> Therefore, maintaining a separate dpcls instance per ingress port
> with its
> subtable vector sorted by frequency of hits reduces the average
> number of
> subtables lookups in the dpcls to a minimum, even if the total number
> of
> subtables gets large. This is possible because megaflows always have
> an exact
> match on in_port, so every megaflow belongs to unique dpcls instance.
>
> For thread safety, the PMD thread needs to block out revalidators
> during the
> periodic optimization. We use ovs_mutex_trylock() to avoid blocking
> the PMD.
>
> To monitor the effectiveness of the patch we have enhanced the ovs-
> appctl
> dpif-netdev/pmd-stats-show command with an extra line "avg. subtable
> lookups
> per hit" to report the average number of subtable lookup needed for a
> megaflow match. Ideally, this should be close to 1 and almost all
> cases much
> smaller than N/2.
>
> I have benchmarked a cloud L3 overlay pipeline with a VXLAN overlay
> mesh.
> With pure L3 tenant traffic between VMs on different nodes the
> resulting
> netdev dpcls contains N=4 subtables.
>
> Disabling the EMC, I have measured a baseline performance (in+out) of
> ~1.32
> Mpps (64 bytes, 1000 L4 flows). The average number of subtable
> lookups per
> dpcls match is 2.5.
>
> With the patch the average number of subtable lookups per dpcls match
> is
> reduced 1 and the forwarding performance grows by ~30% to 1.72 Mpps.
>
> As the actual number of subtables will often be higher in reality, we
> can
> assume that this is at the lower end of the speed-up one can expect
> from this
> optimization. Just running a parallel ping between the VXLAN tunnel
> endpoints
> increases the number of subtables and hence the average number of
> subtable
> lookups from 2.5 to 3.5 with a corresponding decrease of throughput
> to 1.14
> Mpps. With the patch the parallel ping has no impact on average
> number of
> subtable lookups and performance. The performance gain is then ~50%.
>
> The main change to the previous patch is that instead of having a
> subtable
> vector per in_port in a single dplcs instance, we now have one dpcls
> instance
> with a single subtable per ingress port. This is better aligned with
> the
> design base code and also improves the number of subtable lookups in
> a miss
> case.
>
> The PMD tests have been adjusted to the additional line in pmd-stats-
> show.
>
> Signed-off-by: Jan Scheurich <jan.scheurich at ericsson.com>
>
>
> Changes in v2:
> - Rebased to master (commit 3041e1fc9638)
> - Take the pmd->flow_mutex during optimization to block out
> revalidators
> Use trylock in order to not block the PMD thread
> - Made in_port an explicit input parameter to fast_path_processing()
> - Fixed coding style issues
>
>
> ---
>
> lib/dpif-netdev.c | 195
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------
> ----
> tests/pmd.at | 6 +++--
> 2 files changed, 173 insertions(+), 28 deletions(-)
>
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index e0107b7..e7eaab0 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -149,7 +149,11 @@ struct emc_cache {
> ^L
> /* Simple non-wildcarding single-priority classifier. */
>
> +#define DPCLS_OPTIMIZATION_INTERVAL 1000
[Antonio F] I'd prefer to add a comment to specify if these are sec, msec,..
> +
> struct dpcls {
> + struct cmap_node node;
> + odp_port_t in_port;
> struct cmap subtables_map;
> struct pvector subtables;
> };
> @@ -164,12 +168,14 @@ struct dpcls_rule {
>
> static void dpcls_init(struct dpcls *);
> static void dpcls_destroy(struct dpcls *);
> +static void dpcls_optimize(struct dpcls *);
> static void dpcls_insert(struct dpcls *, struct dpcls_rule *,
> const struct netdev_flow_key *mask);
> static void dpcls_remove(struct dpcls *, struct dpcls_rule *);
> -static bool dpcls_lookup(const struct dpcls *cls,
> +static bool dpcls_lookup(struct dpcls *cls,
> const struct netdev_flow_key keys[],
> - struct dpcls_rule **rules, size_t cnt);
> + struct dpcls_rule **rules, size_t cnt,
> + int *num_lookups_p);
> ^L
> /* Datapath based on the network device interface from netdev.h.
> *
> @@ -239,6 +245,7 @@ enum dp_stat_type {
> DP_STAT_MASKED_HIT, /* Packets that matched in the flow
> table. */
> DP_STAT_MISS, /* Packets that did not match. */
> DP_STAT_LOST, /* Packets not passed up to the
> client. */
> + DP_STAT_LOOKUP_HIT, /* Number of subtable lookups for
> flow table hits */
[Antonio F] Line longer than 80 chars.
> DP_N_STATS
> };
>
> @@ -419,15 +426,19 @@ struct dp_netdev_pmd_thread {
> * will only be accessed by its own pmd thread. */
> struct emc_cache flow_cache;
>
> - /* Classifier and Flow-Table.
> + /* Flow-Table and classifiers
> *
> * Writers of 'flow_table' must take the 'flow_mutex'.
> Corresponding
> - * changes to 'cls' must be made while still holding the
> 'flow_mutex'.
> + * changes to 'classifiesr' must be made while still holding the
> 'flow_mutex'.
[Antonio F] Line longer than 80 chars.
[Antonio F] Trivial typo: 'classifiers'
> */
> struct ovs_mutex flow_mutex;
> - struct dpcls cls;
> struct cmap flow_table OVS_GUARDED; /* Flow table. */
>
> + /* One classifier per in_port polled by the pmd */
> + struct cmap classifiers;
> + /* Periodically sort subtable vectors according to hit
> frequencies */
> + long long int next_optimization;
> +
> /* Statistics. */
> struct dp_netdev_pmd_stats stats;
>
> @@ -540,6 +551,8 @@ static void dp_netdev_pmd_unref(struct
> dp_netdev_pmd_thread *pmd);
> static void dp_netdev_pmd_flow_flush(struct dp_netdev_pmd_thread
> *pmd);
> static void pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd)
> OVS_REQUIRES(pmd->port_mutex);
> +static inline void
> +dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd);
>
> static inline bool emc_entry_alive(struct emc_entry *ce);
> static void emc_clear_entry(struct emc_entry *ce);
> @@ -659,8 +672,10 @@ pmd_info_show_stats(struct ds *reply,
>
> ds_put_format(reply,
> "\temc hits:%llu\n\tmegaflow hits:%llu\n"
> + "\tavg. subtable lookups per hit:%.2f\n"
> "\tmiss:%llu\n\tlost:%llu\n",
> stats[DP_STAT_EXACT_HIT],
> stats[DP_STAT_MASKED_HIT],
> + stats[DP_STAT_MASKED_HIT] > 0 ?
> (1.0*stats[DP_STAT_LOOKUP_HIT])/stats[DP_STAT_MASKED_HIT] : 0,
[Antonio F] Line longer than 80 chars.
> stats[DP_STAT_MISS], stats[DP_STAT_LOST]);
>
> if (total_cycles == 0) {
> @@ -1503,14 +1518,51 @@ dp_netdev_flow_hash(const ovs_u128 *ufid)
> return ufid->u32[0];
> }
>
> +static inline struct dpcls *
> +dp_netdev_pmd_lookup_dpcls(struct dp_netdev_pmd_thread *pmd,
> + odp_port_t in_port)
> +{
> + struct dpcls *cls;
> + uint32_t hash = hash_port_no(in_port);
> + CMAP_FOR_EACH_WITH_HASH (cls, node, hash, &pmd->classifiers) {
> + if (cls->in_port == in_port) {
> + /* Port classifier exists already */
> + return cls;
> + }
> + }
> + return NULL;
> +}
> +
> +static inline struct dpcls *
> +dp_netdev_pmd_find_dpcls(struct dp_netdev_pmd_thread *pmd,
> + odp_port_t in_port)
> +{
> + struct dpcls *cls = dp_netdev_pmd_lookup_dpcls(pmd, in_port);
> + uint32_t hash = hash_port_no(in_port);
> +
> + if (!cls) {
> + /* Create new classifier for in_port */
> + cls = xmalloc(sizeof(*cls));
> + dpcls_init(cls);
> + cls->in_port = in_port;
> + cmap_insert(&pmd->classifiers, &cls->node, hash);
> + VLOG_DBG("Creating dpcls %p for in_port %d", cls, in_port);
> + }
> + return cls;
> +}
> +
> static void
> dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
> struct dp_netdev_flow *flow)
> OVS_REQUIRES(pmd->flow_mutex)
> {
> struct cmap_node *node = CONST_CAST(struct cmap_node *, &flow-
> >node);
> + struct dpcls *cls;
> + odp_port_t in_port = flow->flow.in_port.odp_port;
>
> - dpcls_remove(&pmd->cls, &flow->cr);
> + cls = dp_netdev_pmd_lookup_dpcls(pmd, in_port);
> + ovs_assert(cls != NULL);
> + dpcls_remove(cls, &flow->cr);
> cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow-
> >ufid));
> flow->dead = true;
>
> @@ -1857,15 +1909,20 @@ emc_lookup(struct emc_cache *cache, const
> struct netdev_flow_key *key)
> }
>
> static struct dp_netdev_flow *
> -dp_netdev_pmd_lookup_flow(const struct dp_netdev_pmd_thread *pmd,
> - const struct netdev_flow_key *key)
> +dp_netdev_pmd_lookup_flow(struct dp_netdev_pmd_thread *pmd,
> + const struct netdev_flow_key *key,
> + int *lookup_num_p)
> {
> - struct dp_netdev_flow *netdev_flow;
> + struct dpcls *cls;
> struct dpcls_rule *rule;
> + odp_port_t in_port = MINIFLOW_GET_U32(&key->mf, in_port);
> + struct dp_netdev_flow *netdev_flow = NULL;
>
> - dpcls_lookup(&pmd->cls, key, &rule, 1);
> - netdev_flow = dp_netdev_flow_cast(rule);
> -
> + cls = dp_netdev_pmd_lookup_dpcls(pmd, in_port);
> + if (OVS_LIKELY(cls)) {
> + dpcls_lookup(cls, key, &rule, 1, lookup_num_p);
> + netdev_flow = dp_netdev_flow_cast(rule);
> + }
> return netdev_flow;
> }
>
> @@ -2098,6 +2155,8 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread
> *pmd,
> {
> struct dp_netdev_flow *flow;
> struct netdev_flow_key mask;
> + struct dpcls *cls;
> + odp_port_t in_port = match->flow.in_port.odp_port;
>
> netdev_flow_mask_init(&mask, match);
> /* Make sure wc does not have metadata. */
> @@ -2116,7 +2175,11 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread
> *pmd,
> ovsrcu_set(&flow->actions, dp_netdev_actions_create(actions,
> actions_len));
>
> netdev_flow_key_init_masked(&flow->cr.flow, &match->flow,
> &mask);
> - dpcls_insert(&pmd->cls, &flow->cr, &mask);
> +
> + /* Select dpcls for in_port. Relies on in_port to be exact match
> */
> + ovs_assert(match->wc.masks.in_port.odp_port == UINT32_MAX);
> + cls = dp_netdev_pmd_find_dpcls(pmd, in_port);
> + dpcls_insert(cls, &flow->cr, &mask);
>
> cmap_insert(&pmd->flow_table, CONST_CAST(struct cmap_node *,
> &flow->node),
> dp_netdev_flow_hash(&flow->ufid));
> @@ -2197,7 +2260,7 @@ dpif_netdev_flow_put(struct dpif *dpif, const
> struct dpif_flow_put *put)
> }
>
> ovs_mutex_lock(&pmd->flow_mutex);
> - netdev_flow = dp_netdev_pmd_lookup_flow(pmd, &key);
> + netdev_flow = dp_netdev_pmd_lookup_flow(pmd, &key, NULL);
> if (!netdev_flow) {
> if (put->flags & DPIF_FP_CREATE) {
> if (cmap_count(&pmd->flow_table) < MAX_FLOWS) {
> @@ -2872,6 +2935,7 @@ reload:
> lc = 0;
>
> coverage_try_clear();
> + dp_netdev_pmd_try_optimize(pmd);
> if (!ovsrcu_try_quiesce()) {
> emc_cache_slow_sweep(&pmd->flow_cache);
> }
> @@ -3033,8 +3097,9 @@ dp_netdev_configure_pmd(struct
> dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
> ovs_mutex_init(&pmd->cond_mutex);
> ovs_mutex_init(&pmd->flow_mutex);
> ovs_mutex_init(&pmd->port_mutex);
> - dpcls_init(&pmd->cls);
> cmap_init(&pmd->flow_table);
> + cmap_init(&pmd->classifiers);
> + pmd->next_optimization = time_msec() +
> DPCLS_OPTIMIZATION_INTERVAL;
> ovs_list_init(&pmd->poll_list);
> hmap_init(&pmd->tx_ports);
> hmap_init(&pmd->port_cache);
> @@ -3050,10 +3115,16 @@ dp_netdev_configure_pmd(struct
> dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
> static void
> dp_netdev_destroy_pmd(struct dp_netdev_pmd_thread *pmd)
> {
> + struct dpcls *cls;
> +
> dp_netdev_pmd_flow_flush(pmd);
> - dpcls_destroy(&pmd->cls);
> hmap_destroy(&pmd->port_cache);
> hmap_destroy(&pmd->tx_ports);
> + /* All flows (including their dpcls_rules) have been deleted
> already */
> + CMAP_FOR_EACH(cls, node, &pmd->classifiers) {
> + dpcls_destroy(cls);
> + }
> + cmap_destroy(&pmd->classifiers);
> cmap_destroy(&pmd->flow_table);
> ovs_mutex_destroy(&pmd->flow_mutex);
> latch_destroy(&pmd->exit_latch);
> @@ -3798,7 +3869,7 @@ handle_packet_upcall(struct
> dp_netdev_pmd_thread *pmd, struct dp_packet *packet,
> * to be locking everyone out of making flow installs. If
> we
> * move to a per-core classifier, it would be reasonable. */
> ovs_mutex_lock(&pmd->flow_mutex);
> - netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key);
> + netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
> if (OVS_LIKELY(!netdev_flow)) {
> netdev_flow = dp_netdev_flow_add(pmd, &match, &ufid,
> add_actions->data,
> @@ -3814,7 +3885,8 @@ static inline void
> fast_path_processing(struct dp_netdev_pmd_thread *pmd,
> struct dp_packet_batch *packets_,
> struct netdev_flow_key *keys,
> - struct packet_batch_per_flow batches[], size_t
> *n_batches)
> + struct packet_batch_per_flow batches[], size_t
> *n_batches,
> + odp_port_t in_port)
> {
> int cnt = packets_->count;
> #if !defined(__CHECKER__) && !defined(_WIN32)
> @@ -3824,10 +3896,12 @@ fast_path_processing(struct
> dp_netdev_pmd_thread *pmd,
> enum { PKT_ARRAY_SIZE = NETDEV_MAX_BURST };
> #endif
> struct dp_packet **packets = packets_->packets;
> + struct dpcls *cls;
> struct dpcls_rule *rules[PKT_ARRAY_SIZE];
> struct dp_netdev *dp = pmd->dp;
> struct emc_cache *flow_cache = &pmd->flow_cache;
> int miss_cnt = 0, lost_cnt = 0;
> + int lookup_cnt = 0, add_lookup_cnt;
> bool any_miss;
> size_t i;
>
> @@ -3835,7 +3909,14 @@ fast_path_processing(struct
> dp_netdev_pmd_thread *pmd,
> /* Key length is needed in all the cases, hash computed on
> demand. */
> keys[i].len =
> netdev_flow_key_size(miniflow_n_values(&keys[i].mf));
> }
> - any_miss = !dpcls_lookup(&pmd->cls, keys, rules, cnt);
> + /* Get the classifier for the in_port */
> + cls = dp_netdev_pmd_lookup_dpcls(pmd, in_port);
> + if (OVS_LIKELY(cls)) {
> + any_miss = !dpcls_lookup(cls, keys, rules, cnt,
> &lookup_cnt);
> + } else {
> + any_miss = true;
> + memset(rules, 0, sizeof(rules));
> + }
> if (OVS_UNLIKELY(any_miss) && !fat_rwlock_tryrdlock(&dp-
> >upcall_rwlock)) {
> uint64_t actions_stub[512 / 8], slow_stub[512 / 8];
> struct ofpbuf actions, put_actions;
> @@ -3853,8 +3934,9 @@ fast_path_processing(struct
> dp_netdev_pmd_thread *pmd,
> /* It's possible that an earlier slow path execution
> installed
> * a rule covering this flow. In this case, it's a lot
> cheaper
> * to catch it here than execute a miss. */
> - netdev_flow = dp_netdev_pmd_lookup_flow(pmd, &keys[i]);
> + netdev_flow = dp_netdev_pmd_lookup_flow(pmd, &keys[i],
> &add_lookup_cnt);
[Antonio F] Line longer than 80 chars.
> if (netdev_flow) {
> + lookup_cnt += add_lookup_cnt;
> rules[i] = &netdev_flow->cr;
> continue;
> }
> @@ -3893,6 +3975,7 @@ fast_path_processing(struct
> dp_netdev_pmd_thread *pmd,
> }
>
> dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt);
> + dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt);
[Antonio F] Should we divide lookup_cnt by the processed packets,
I mean
dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt/cnt);
?
> dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt);
> dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt);
> }
> @@ -3919,13 +4002,16 @@ dp_netdev_input__(struct dp_netdev_pmd_thread
> *pmd,
> struct packet_batch_per_flow batches[PKT_ARRAY_SIZE];
> long long now = time_msec();
> size_t newcnt, n_batches, i;
> + odp_port_t in_port;
>
> n_batches = 0;
> newcnt = emc_processing(pmd, packets, keys, batches, &n_batches,
> md_is_valid, port_no);
> if (OVS_UNLIKELY(newcnt)) {
> packets->count = newcnt;
> - fast_path_processing(pmd, packets, keys, batches,
> &n_batches);
> + /* Get ingress port from first packet's metadata. */
> + in_port = packets->packets[0]->md.in_port.odp_port;
> + fast_path_processing(pmd, packets, keys, batches,
> &n_batches, in_port);
> }
>
> for (i = 0; i < n_batches; i++) {
> @@ -3942,14 +4028,14 @@ dp_netdev_input(struct dp_netdev_pmd_thread
> *pmd,
> struct dp_packet_batch *packets,
> odp_port_t port_no)
> {
> - dp_netdev_input__(pmd, packets, false, port_no);
> + dp_netdev_input__(pmd, packets, false, port_no);
> }
>
> static void
> dp_netdev_recirculate(struct dp_netdev_pmd_thread *pmd,
> struct dp_packet_batch *packets)
> {
> - dp_netdev_input__(pmd, packets, true, 0);
> + dp_netdev_input__(pmd, packets, true, 0);
> }
>
> struct dp_netdev_execute_aux {
> @@ -4374,6 +4460,8 @@ struct dpcls_subtable {
>
> /* These fields are accessed by readers. */
> struct cmap rules; /* Contains "struct dpcls_rule"s.
> */
> + uint32_t hit_cnt; /* Number of match hits in subtable
> in current
> + optimization interval. */
> struct netdev_flow_key mask; /* Wildcards for fields (const). */
> /* 'mask' must be the last field, additional space is allocated
> here. */
> };
> @@ -4390,6 +4478,7 @@ dpcls_init(struct dpcls *cls)
> static void
> dpcls_destroy_subtable(struct dpcls *cls, struct dpcls_subtable
> *subtable)
> {
> + VLOG_DBG("Destroying subtable %p for in_port %d", subtable, cls-
> >in_port);
> pvector_remove(&cls->subtables, subtable);
> cmap_remove(&cls->subtables_map, &subtable->cmap_node,
> subtable->mask.hash);
> @@ -4424,10 +4513,13 @@ dpcls_create_subtable(struct dpcls *cls,
> const struct netdev_flow_key *mask)
> subtable = xmalloc(sizeof *subtable
> - sizeof subtable->mask.mf + mask->len);
> cmap_init(&subtable->rules);
> + subtable->hit_cnt = 0;
> netdev_flow_key_clone(&subtable->mask, mask);
> cmap_insert(&cls->subtables_map, &subtable->cmap_node, mask-
> >hash);
> + /* Insert new subtable with priority zero (no hits yet) */
> pvector_insert(&cls->subtables, subtable, 0);
> pvector_publish(&cls->subtables);
> + VLOG_DBG("Creating %lu. subtable %p for in_port %d",
> cmap_count(&cls->subtables_map), subtable, cls->in_port);
[Antonio F] Line too long.
>
> return subtable;
> }
> @@ -4446,6 +4538,43 @@ dpcls_find_subtable(struct dpcls *cls, const
> struct netdev_flow_key *mask)
> return dpcls_create_subtable(cls, mask);
> }
>
> +
> +/* Periodically sort the dpcls subtable vectors according to hit
> counts */
> +static inline void
> +dpcls_optimize(struct dpcls *cls)
[Antonio F] Why not calling this function 'dpcls_sort_subtables'
or 'dpcls_upd_subtbl_priority' ?
> + OVS_REQUIRES(pmd->flow_mutex)
> +{
> + struct dpcls_subtable *subtable;
> +
> + struct pvector *pvec = &cls->subtables;
> + PVECTOR_FOR_EACH (subtable, pvec) {
> + pvector_change_priority(pvec, subtable, subtable->hit_cnt);
> + subtable->hit_cnt = 0;
> + }
> + pvector_publish(pvec);
> +}
> +
> +static inline void
> +dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd)
> +{
> + struct dpcls *cls;
> + long long int now = time_msec();
> +
> + if (now > pmd->next_optimization) {
> + /* Try to obtain the flow lock to block out revalidator
> threads.
> + * If not possible, just try next time. */
> + if (ovs_mutex_trylock(&pmd->flow_mutex)) {
> + /* Optimize each classifier */
> + CMAP_FOR_EACH(cls, node, &pmd->classifiers) {
> + dpcls_optimize(cls);
> + }
> + ovs_mutex_unlock(&pmd->flow_mutex);
> + /* Start new measuring interval */
> + pmd->next_optimization = now +
> DPCLS_OPTIMIZATION_INTERVAL;
> + }
> + }
> +}
> +
> /* Insert 'rule' into 'cls'. */
> static void
> dpcls_insert(struct dpcls *cls, struct dpcls_rule *rule,
> @@ -4453,6 +4582,7 @@ dpcls_insert(struct dpcls *cls, struct
> dpcls_rule *rule,
> {
> struct dpcls_subtable *subtable = dpcls_find_subtable(cls,
> mask);
>
> + /* Refer to subtable's mask, also for later removal. */
> rule->mask = &subtable->mask;
> cmap_insert(&subtable->rules, &rule->cmap_node, rule-
> >flow.hash);
> }
> @@ -4465,10 +4595,11 @@ dpcls_remove(struct dpcls *cls, struct
> dpcls_rule *rule)
>
> ovs_assert(rule->mask);
>
> + /* Get subtable from reference in rule->mask. */
> INIT_CONTAINER(subtable, rule->mask, mask);
> -
> if (cmap_remove(&subtable->rules, &rule->cmap_node, rule-
> >flow.hash)
> == 0) {
> + /* Delete empty subtable. */
> dpcls_destroy_subtable(cls, subtable);
> pvector_publish(&cls->subtables);
> }
> @@ -4503,8 +4634,9 @@ dpcls_rule_matches_key(const struct dpcls_rule
> *rule,
> *
> * Returns true if all flows found a corresponding rule. */
> static bool
> -dpcls_lookup(const struct dpcls *cls, const struct netdev_flow_key
> keys[],
> - struct dpcls_rule **rules, const size_t cnt)
> +dpcls_lookup(struct dpcls *cls, const struct netdev_flow_key keys[],
> + struct dpcls_rule **rules, const size_t cnt,
> + int *num_lookups_p)
> {
> /* The batch size 16 was experimentally found faster than 8 or
> 32. */
> typedef uint16_t map_type;
> @@ -4524,6 +4656,8 @@ dpcls_lookup(const struct dpcls *cls, const
> struct netdev_flow_key keys[],
> }
> memset(rules, 0, cnt * sizeof *rules);
>
> + int lookups_match = 0, subtable_pos = 1;
> +
> PVECTOR_FOR_EACH (subtable, &cls->subtables) {
> const struct netdev_flow_key *mkeys = keys;
> struct dpcls_rule **mrules = rules;
> @@ -4556,6 +4690,8 @@ dpcls_lookup(const struct dpcls *cls, const
> struct netdev_flow_key keys[],
> CMAP_NODE_FOR_EACH (rule, cmap_node, nodes[i]) {
> if (OVS_LIKELY(dpcls_rule_matches_key(rule,
> &mkeys[i]))) {
> mrules[i] = rule;
> + subtable->hit_cnt++;
[Antonio F] I think we can be enough confident that hit_cnt does not overflow.
It's a uint32_d var and the ranking refresh it's every 1000 msec.
Even if a port is hitting the same subtable at the line-rate - say 14 Mpps -
it's enough less than 2^32-1.
So we don't need to check if hit_cnt overflows, or am I missing something?
If that's the case, maybe a comment here would help?
> + lookups_match += subtable_pos;
> goto next;
> }
> }
> @@ -4567,8 +4703,15 @@ dpcls_lookup(const struct dpcls *cls, const
> struct netdev_flow_key keys[],
> remains |= maps[m];
> }
> if (!remains) {
> + if (num_lookups_p) {
> + *num_lookups_p = lookups_match;
> + }
> return true; /* All found. */
> }
> + subtable_pos++;
> + }
> + if (num_lookups_p) {
> + *num_lookups_p = lookups_match;
> }
> return false; /* Some misses. */
> }
> diff --git a/tests/pmd.at b/tests/pmd.at
> index 3216762..c033047 100644
> --- a/tests/pmd.at
> +++ b/tests/pmd.at
> @@ -162,10 +162,11 @@ dummy at ovs-dummy: hit:0 missed:0
> p0 7/1: (dummy-pmd: configured_rx_queues=4,
> configured_tx_queues=<cleared>, requested_rx_queues=4,
> requested_tx_queues=<cleared>)
> ])
>
> -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed
> SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 4], [0], [dnl
> +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed
> SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl
> pmd thread numa_id <cleared> core_id <cleared>:
> emc hits:0
> megaflow hits:0
> + avg. subtable lookups per hit:0.00
> miss:0
> lost:0
> ])
> @@ -188,10 +189,11 @@ AT_CHECK([cat ovs-vswitchd.log |
> filter_flow_install | strip_xout], [0], [dnl
>
> recirc_id(0),in_port(1),eth(src=50:54:00:00:00:77,dst=50:54:00:00:01:
> 78),eth_type(0x0800),ipv4(frag=no), actions: <del>
> ])
>
> -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed
> SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 4], [0], [dnl
> +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed
> SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl
> pmd thread numa_id <cleared> core_id <cleared>:
> emc hits:19
> megaflow hits:0
> + avg. subtable lookups per hit:0.00
> miss:1
> lost:0
> ])
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
More information about the dev
mailing list