[ovs-dev] [PATCH v8] dpif-netdev: Conditional EMC insert

Kevin Traynor ktraynor at redhat.com
Fri Feb 10 15:57:56 UTC 2017


On 02/10/2017 10:57 AM, Ciara Loftus wrote:
> Unconditional insertion of EMC entries results in EMC thrashing at high
> numbers of parallel flows. When this occurs, the performance of the EMC
> often falls below that of the dpcls classifier, rendering the EMC
> practically useless.
> 
> Instead of unconditionally inserting entries into the EMC when a miss
> occurs, use a 1% probability of insertion. This ensures that the most
> frequent flows have the highest chance of creating an entry in the EMC,
> and the probability of thrashing the EMC is also greatly reduced.
> 
> The probability of insertion is configurable, via the
> other_config:emc-insert-prob option. For example the following command
> increases the insertion probability to 10%.
> 
> ovs-vsctl set Open_vSwitch . other_config:emc-insert-prob=10
> 
> Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
> Signed-off-by: Georg Schmuecking <georg.schmuecking at ericsson.com>
> Co-authored-by: Georg Schmuecking <georg.schmuecking at ericsson.com>
> ---
> v8:
> - Floating point precision percentiles
> - Moved NEWS entry to post-2.7 section and is no longer in the DPDK
>   specific section.
> v7:
> - Remove further code duplication
> v6:
> - Refactor the code to remove duplication around calls to
>   emc_probabilistic_insert()
> v5:
> - Use percentiles for emc-insert-prob (0-100%)
> - Update docs to reflect the option not exclusive to the DPDK datapath.
> v4:
> - Added Georg to Authors file
> - Set emc-insert-prob=1 for 'PMD - stats' unit test
> - Use read_relaxed on atomic var
> - Correctly allow for 0 and 100% probababilites
> - Cache align new element in dp_netdev struct
> - Revert to default probability if invalid emc-insert-prob set
> - Allow configurability for non-DPDK case
> v3:
> - Use new dpif other_config infrastructure to tidy up how the
>   emc-insert-prob value is passed to dpif-netdev.
> v2:
> - Enable probability configurability via other_config:emc-insert-prob
>   option.
> 
>  AUTHORS.rst                  |  1 +
>  Documentation/howto/dpdk.rst | 19 +++++++++++++++++
>  NEWS                         |  2 ++
>  lib/dpif-netdev.c            | 50 ++++++++++++++++++++++++++++++++++++++++----
>  lib/smap.c                   | 10 +++++++++
>  lib/smap.h                   |  1 +
>  tests/pmd.at                 |  1 +
>  vswitchd/vswitch.xml         | 13 ++++++++++++
>  8 files changed, 93 insertions(+), 4 deletions(-)
> 
> diff --git a/AUTHORS.rst b/AUTHORS.rst
> index b567fcc..3382ad8 100644
> --- a/AUTHORS.rst
> +++ b/AUTHORS.rst
> @@ -382,6 +382,7 @@ Eric Lopez                      elopez at nicira.com
>  Frido Roose                     fr.roose at gmail.com
>  Gaetano Catalli                 gaetano.catalli at gmail.com
>  Gavin Remaley                   gavin_remaley at selinc.com
> +Georg Schmuecking               georg.schmuecking at ericsson.com
>  George Shuklin                  amarao at desunote.ru
>  Gerald Rogers                   gerald.rogers at intel.com
>  Ghanem Bahri                    bahri.ghanem at gmail.com
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index d1e6e89..9e0dbeb 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -354,6 +354,25 @@ the `DPDK documentation
>  
>  Note: Not all DPDK virtual PMD drivers have been tested and verified to work.
>  
> +EMC Insertion Probability
> +-------------------------
> +By default 1 in every 100 flows are inserted into the Exact Match Cache (EMC).
> +It is possible to change this insertion probability by setting the
> +``emc-insert-prob`` option::
> +
> +    $ ovs-vsctl --no-wait set Open_vSwitch . other_config:emc-insert-prob=N
> +
> +where:
> +
> +``N``
> +  is a positive floating point number between 0 and 100 representing the
> +  percentage probability of EMC insertion.
> +
> +If ``N`` is set to 100, an insertion will be performed for every flow. If set
> +to 0, no insertions will be performed and the EMC will effectively be disabled.
> +
> +For more information on the EMC refer to :doc:`/intro/install/dpdk` .
> +
>  .. _dpdk-ovs-in-guest:
>  
>  OVS with DPDK Inside VMs
> diff --git a/NEWS b/NEWS
> index aebd99c..5c9e628 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -3,6 +3,8 @@ Post-v2.7.0
>     - Tunnels:
>       * Added support to set packet mark for tunnel endpoint using
>         `egress_pkt_mark` OVSDB option.
> +   - EMC insertion probability is reduced to 1% and is configurable via
> +     the new 'other_config:emc-insert-prob' option.
>  
>  v2.7.0 - xx xxx xxxx
>  ---------------------
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 0be5db5..c8cb9ee 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -144,6 +144,11 @@ struct netdev_flow_key {
>  #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1)
>  #define EM_FLOW_HASH_SEGS 2
>  
> +/* Percentage probability of calling emc_insert */
> +#define DEFAULT_EM_FLOW_INSERT_PROB 1
> +#define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX /                        \
> +                                    (100 / DEFAULT_EM_FLOW_INSERT_PROB))
> +
>  struct emc_entry {
>      struct dp_netdev_flow *flow;
>      struct netdev_flow_key key;   /* key.hash used for emc hash value. */
> @@ -254,6 +259,9 @@ struct dp_netdev {
>      uint64_t last_tnl_conf_seq;
>  
>      struct conntrack conntrack;
> +
> +    /* Probability of EMC insertions is a factor of 'emc_insert_min'.*/
> +    OVS_ALIGNED_VAR(CACHE_LINE_SIZE)atomic_uint32_t emc_insert_min;
>  };
>  
>  static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev *dp,
> @@ -1066,6 +1074,8 @@ create_dp_netdev(const char *name, const struct dpif_class *class,
>  
>      conntrack_init(&dp->conntrack);
>  
> +    atomic_init(&dp->emc_insert_min, DEFAULT_EM_FLOW_INSERT_MIN);
> +
>      cmap_init(&dp->poll_threads);
>      ovs_mutex_init_recursive(&dp->non_pmd_mutex);
>      ovsthread_key_create(&dp->per_pmd_key, NULL);
> @@ -1943,6 +1953,27 @@ emc_insert(struct emc_cache *cache, const struct netdev_flow_key *key,
>      emc_change_entry(to_be_replaced, flow, key);
>  }
>  
> +static inline void
> +emc_probabilistic_insert(struct dp_netdev_pmd_thread *pmd,
> +                         const struct netdev_flow_key *key,
> +                         struct dp_netdev_flow *flow)
> +{
> +    /* Insert an entry into the EMC based on probability value 'min'. By
> +     * default the value is UINT32_MAX / 100/1 which yields an insertion
> +     * probability of 1/100. */
> +
> +    uint32_t min;
> +    atomic_read_relaxed(&pmd->dp->emc_insert_min, &min);
> +
> +#ifdef DPDK_NETDEV
> +    if (min && (key->hash ^ (uint32_t)pmd->last_cycles) <= min) {
> +#else
> +    if (min && (key->hash ^ random_uint32()) <= min) {
> +#endif
> +        emc_insert(&pmd->flow_cache, key, flow);
> +    }
> +}
> +
>  static inline struct dp_netdev_flow *
>  emc_lookup(struct emc_cache *cache, const struct netdev_flow_key *key)
>  {
> @@ -2731,6 +2762,7 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
>  {
>      struct dp_netdev *dp = get_dp_netdev(dpif);
>      const char *cmask = smap_get(other_config, "pmd-cpu-mask");
> +    float insert_prob = smap_get_float(other_config, "emc-insert-prob", -1);
>  
>      if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) {
>          free(dp->pmd_cmask);
> @@ -2738,6 +2770,18 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
>          dp_netdev_request_reconfigure(dp);
>      }
>  
> +    if (insert_prob >= 0 && insert_prob <= 100) {
> +        uint32_t insert_min = insert_prob == 0 ? 0 :
> +                                  UINT32_MAX / (uint32_t)(100 / insert_prob);

The calculation here is not correct - every emc-insert-prob from 51-100
will give the same insert_min, 50-34 etc.

Aside from this, I don't think it's user friendly to need floating point
numbers. If someone wants to put an entry in the emc (on avg.) every
2000th packet, then let them set 2000 in the db, not 0.05. We'd just
need to document it 1/N as I think it was in the original patches and
maybe update some of the wording around 'probability' etc.

> +        uint32_t cur_min;
> +        atomic_read_relaxed(&dp->emc_insert_min, &cur_min);
> +        if (insert_min != cur_min) {
> +            atomic_store_relaxed(&dp->emc_insert_min, insert_min);
> +        }
> +    } else {
> +        atomic_store_relaxed(&dp->emc_insert_min, DEFAULT_EM_FLOW_INSERT_MIN);
> +    }
> +
>      return 0;
>  }
>  
> @@ -4193,8 +4237,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet,
>                                               add_actions->size);
>          }
>          ovs_mutex_unlock(&pmd->flow_mutex);
> -
> -        emc_insert(&pmd->flow_cache, key, netdev_flow);
> +        emc_probabilistic_insert(pmd, key, netdev_flow);
>      }
>  }
>  
> @@ -4217,7 +4260,6 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
>      struct dpcls *cls;
>      struct dpcls_rule *rules[PKT_ARRAY_SIZE];
>      struct dp_netdev *dp = pmd->dp;
> -    struct emc_cache *flow_cache = &pmd->flow_cache;
>      int miss_cnt = 0, lost_cnt = 0;
>      int lookup_cnt = 0, add_lookup_cnt;
>      bool any_miss;
> @@ -4288,7 +4330,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
>  
>          flow = dp_netdev_flow_cast(rules[i]);
>  
> -        emc_insert(flow_cache, &keys[i], flow);
> +        emc_probabilistic_insert(pmd, &keys[i], flow);
>          dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches);
>      }
>  
> diff --git a/lib/smap.c b/lib/smap.c
> index 0a5e54a..406a1eb 100644
> --- a/lib/smap.c
> +++ b/lib/smap.c
> @@ -231,6 +231,16 @@ smap_get_int(const struct smap *smap, const char *key, int def)
>      return value ? atoi(value) : def;
>  }
>  
> +/* Gets the value associated with 'key' in 'smap' and converts it to a float
> + * using atof().  If 'key' is not in 'smap', returns 'def'. */
> +float
> +smap_get_float(const struct smap *smap, const char *key, float def)
> +{
> +    const char *value = smap_get(smap, key);
> +
> +    return value ? atof(value) : def;
> +}
> +
>  /* Gets the value associated with 'key' in 'smap' and converts it to an int
>   * using strtoull().  If 'key' is not in 'smap', returns 'def'. */
>  unsigned long long int
> diff --git a/lib/smap.h b/lib/smap.h
> index edf591c..77afa7e 100644
> --- a/lib/smap.h
> +++ b/lib/smap.h
> @@ -98,6 +98,7 @@ const char *smap_get_def(const struct smap *, const char *key,
>  struct smap_node *smap_get_node(const struct smap *, const char *);
>  bool smap_get_bool(const struct smap *smap, const char *key, bool def);
>  int smap_get_int(const struct smap *smap, const char *key, int def);
> +float smap_get_float(const struct smap *smap, const char *key, float def);
>  unsigned long long int smap_get_ullong(const struct smap *, const char *key,
>                                         unsigned long long def);
>  bool smap_get_uuid(const struct smap *, const char *key, struct uuid *);
> diff --git a/tests/pmd.at b/tests/pmd.at
> index 35dbcfe..b3132ff 100644
> --- a/tests/pmd.at
> +++ b/tests/pmd.at
> @@ -158,6 +158,7 @@ CHECK_PMD_THREADS_CREATED()
>  
>  AT_CHECK([ovs-appctl vlog/set dpif_netdev:dbg])
>  AT_CHECK([ovs-ofctl add-flow br0 action=normal])
> +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:emc-insert-prob=100])
>  
>  sleep 1
>  
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 146a816..0865a6d 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -335,6 +335,19 @@
>            datapaths.
>          </p>
>        </column>
> +
> +      <column name="other_config" key="emc-insert-prob"
> +              type='{"type": "real", "minReal": 0, "maxReal": 100}'>
> +        <p>
> +          Specifies the percentage probability (0-100%) of a flow being
> +          inserted into the Exact Match Cache (EMC). A value of 100 will result
> +          in an insertion for every flow whereas a value of zero will result in
> +          no insertions and essentially disable the EMC.
> +        </p>
> +        <p>
> +          Defaults to 1 ie. there is 1% chance of EMC insertion.
> +        </p>
> +      </column>
>      </group>
>  
>      <group title="Status">
> 



More information about the dev mailing list