[ovs-dev] [PATCH v6 2/4] dpif-netdev: Add parameters to configure PMD auto load balance.

Stokes, Ian ian.stokes at intel.com
Thu Jan 14 19:49:31 UTC 2021


> From: Christophe Fontaine <cfontain at redhat.com>
> 
> Two important parts of how PMD auto load balance operates are how
> loaded a core needs to be and how much improvement is estimated
> before a PMD auto load balance can trigger.
> 
> Previously they were hardcoded to 95% loaded and 25% variance
> improvement.
> 
> These default values may not be suitable for all use cases and
> we may want to use a more (or less) aggressive rebalance, either
> on the pmd load threshold or on the minimum variance improvement
> threshold.
> 
> The defaults are not changed, but "pmd-auto-lb-pmd-load" and
> "pmd-auto-lb-improvement" parameters are added to override the
> defaults.
> 
> $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-pmd-load="70"
> $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-improvement="20"
> 

Thanks for the patch Kevin,

Have tested over the last few days and it looks good to me. The only note I would have is
in addition to updating the vswitch.xml, I would think it was worth noting the two new parameters
in the vswitch documentation also (possibly giving an example of their usage).

https://docs.openvswitch.org/en/latest/topics/dpdk/pmd/#automatic-assignment-of-port-rx-queue-to-pmd-threads-experimental

Right now the documentation just refers to the default 95% threshold and 25% improvement.

So even an additional note to specify how the threshold and improvements can be changed would suffice I would think.

This doesn't have to block merge, but would appreciate a follow up patch to improve the docs soon.

Minor comment for discussion below.

> Signed-off-by: Christophe Fontaine <cfontain at redhat.com>
> Co-Authored-by: Kevin Traynor <ktraynor at redhat.com>
> Signed-off-by: Kevin Traynor <ktraynor at redhat.com>
> Acked-by: David Marchand <david.marchand at redhat.com>
> ---
>  NEWS                 |  1 +
>  lib/dpif-netdev.c    | 54 +++++++++++++++++++++++++++++++++++++-----
> --
>  vswitchd/vswitch.xml | 29 ++++++++++++++++++++++--
>  3 files changed, 74 insertions(+), 10 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 7e291a180..7994e434d 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -17,4 +17,5 @@ Post-v2.14.0
>       * New 'options:dpdk-vf-mac' field for DPDK interface of VF ports,
>         that allows configuring the MAC address of a VF representor.
> +     * Add parameters to configure PMD auto load balance behaviour.
>     - The environment variable OVS_UNBOUND_CONF, if set, is now used
>       as the DNS resolver's (unbound) configuration file.
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index d19d5bbff..bf2112815 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -86,7 +86,7 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev);
> 
>  /* Auto Load Balancing Defaults */
> -#define ALB_ACCEPTABLE_IMPROVEMENT       25
> -#define ALB_PMD_LOAD_THRESHOLD           95
> -#define ALB_PMD_REBALANCE_POLL_INTERVAL  1 /* 1 Min */
> +#define ALB_IMPROVEMENT_THRESHOLD    25
> +#define ALB_LOAD_THRESHOLD           95
> +#define ALB_REBALANCE_INTERVAL       1 /* 1 Min */
>  #define MIN_TO_MSEC                  60000
> 
> @@ -301,4 +301,6 @@ struct pmd_auto_lb {
>      uint64_t rebalance_intvl;
>      uint64_t rebalance_poll_timer;
> +    uint8_t rebalance_improve_thresh;
> +    atomic_uint8_t rebalance_load_thresh;
>  };
> 
> @@ -4205,4 +4207,5 @@ set_pmd_auto_lb(struct dp_netdev *dp)
>      struct dp_netdev_pmd_thread *pmd;
>      struct pmd_auto_lb *pmd_alb = &dp->pmd_alb;
> +    uint8_t rebalance_load_thresh;
> 
>      bool enable_alb = false;
> @@ -4234,7 +4237,14 @@ set_pmd_auto_lb(struct dp_netdev *dp)
>          pmd_alb->is_enabled = enable_alb;
>          if (pmd_alb->is_enabled) {
> +            atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
> +                                &rebalance_load_thresh);
>              VLOG_INFO("PMD auto load balance is enabled "
> -                      "interval %"PRIu64" mins",
> -                       pmd_alb->rebalance_intvl / MIN_TO_MSEC);
> +                      "interval %"PRIu64" mins, "
> +                      "pmd load threshold %"PRIu8"%%, "
> +                      "improvement threshold %"PRIu8"%%",
> +                       pmd_alb->rebalance_intvl / MIN_TO_MSEC,
> +                       rebalance_load_thresh,
> +                       pmd_alb->rebalance_improve_thresh);
> +
>          } else {
>              pmd_alb->rebalance_poll_timer = 0;
> @@ -4260,4 +4270,6 @@ dpif_netdev_set_config(struct dpif *dpif, const
> struct smap *other_config)
>      uint32_t tx_flush_interval, cur_tx_flush_interval;
>      uint64_t rebalance_intvl;
> +    uint8_t rebalance_load, cur_rebalance_load;
> +    uint8_t rebalance_improve;
> 
>      tx_flush_interval = smap_get_int(other_config, "tx-flush-interval",
> @@ -4337,5 +4349,5 @@ dpif_netdev_set_config(struct dpif *dpif, const
> struct smap *other_config)
> 
>      rebalance_intvl = smap_get_int(other_config, "pmd-auto-lb-rebal-
> interval",
> -                              ALB_PMD_REBALANCE_POLL_INTERVAL);
> +                                   ALB_REBALANCE_INTERVAL);
> 
>      /* Input is in min, convert it to msec. */
> @@ -4349,4 +4361,27 @@ dpif_netdev_set_config(struct dpif *dpif, const
> struct smap *other_config)
>      }
> 
> +    rebalance_improve = smap_get_int(other_config, "pmd-auto-lb-
> improvement",
> +                                     ALB_IMPROVEMENT_THRESHOLD);
> +    if (rebalance_improve > 100) {
> +        rebalance_improve = ALB_IMPROVEMENT_THRESHOLD;
> +    }
> +    if (rebalance_improve != pmd_alb->rebalance_improve_thresh) {
> +        pmd_alb->rebalance_improve_thresh = rebalance_improve;
> +        VLOG_INFO("PMD auto load balance improvement threshold set to "
> +                  "%"PRIu8"%%\n", rebalance_improve);
> +    }
> +
> +    rebalance_load = smap_get_int(other_config, "pmd-auto-lb-pmd-load",
> +                             ALB_LOAD_THRESHOLD);
> +    if (rebalance_load > 100) {
> +        rebalance_load = ALB_LOAD_THRESHOLD;
> +    }
> +    atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
> &cur_rebalance_load);
> +    if (rebalance_load != cur_rebalance_load) {
> +        atomic_store_relaxed(&pmd_alb->rebalance_load_thresh,
> +                             rebalance_load);
> +        VLOG_INFO("PMD auto load balance pmd load threshold set to "
> +                "%"PRIu8"%%\n", rebalance_load);
> +    }
>      set_pmd_auto_lb(dp);
>      return 0;
> @@ -5677,5 +5712,5 @@ pmd_rebalance_dry_run(struct dp_netdev *dp)
>                  ((curr_variance - new_variance) * 100) / curr_variance;
>          }
> -        if (improvement < ALB_ACCEPTABLE_IMPROVEMENT) {
> +        if (improvement < dp->pmd_alb.rebalance_improve_thresh) {
>              ret = false;
>          }
> @@ -8712,4 +8747,5 @@ dp_netdev_pmd_try_optimize(struct
> dp_netdev_pmd_thread *pmd,
>      if (pmd->ctx.now > pmd->rxq_next_cycle_store) {
>          uint64_t curr_tsc;
> +        uint8_t rebalance_load_trigger;
>          struct pmd_auto_lb *pmd_alb = &pmd->dp->pmd_alb;
>          if (pmd_alb->is_enabled && !pmd->isolated
> @@ -8728,5 +8764,7 @@ dp_netdev_pmd_try_optimize(struct
> dp_netdev_pmd_thread *pmd,
>              }
> 
> -            if (pmd_load >= ALB_PMD_LOAD_THRESHOLD) {
> +            atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
> +                                &rebalance_load_trigger);
> +            if (pmd_load >= rebalance_load_trigger) {
>                  atomic_count_inc(&pmd->pmd_overloaded);
>              } else {
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 89a876796..76d528ea7 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -654,6 +654,6 @@
>          <p>
>           Configures PMD Auto Load Balancing that allows automatic assignment
> of
> -         RX queues to PMDs if any of PMDs is overloaded (i.e. processing cycles
> -         > 95%).
> +         RX queues to PMDs if any of PMDs is overloaded (i.e. a processing
> +         cycles > <ref column="other_config" key="pmd-auto-lb-pmd-load"/>).
>          </p>
>          <p>
> @@ -691,4 +691,29 @@
>          </p>
>        </column>
> +      <column name="other_config" key="pmd-auto-lb-pmd-load"
> +              type='{"type": "integer", "minInteger": 0, "maxInteger": 100}'>
> +        <p>
> +         Specifies the minimum pmd load threshold (% of used cycled) of
> +         any non-isolated pmds when a PMD auto load balance may be
> triggered.
> +        </p>
> +        <p>
> +         The default value is <code>95%</code>.
> +        </p>
> +      </column>
> +      <column name="other_config" key="pmd-auto-lb-improvement"
> +              type='{"type": "integer", "minInteger": 0, "maxInteger": 100}'>
> +        <p>
> +         Specifies the minimum evaluated % improvement in load distribution
> +         across the non-isolated pmds that will allow a PMD auto load balance
> +         to occur.
> +        </p>
> +        <p>
> +         Note, setting this parameter to 0 will always allow an auto load
> +         balance to occur regardless of estimated improvement or not.

Is there any reason you would want the 0 option above? Surely it would just become disruptive to the deployment.

I know you could argue an enforced minimum of 1% would not make much of a difference, but at least it would not re-distribute regardless of a gain?
Just curious as to the thinking behind it.


Regards
Ian

> +        </p>
> +        <p>
> +         The default value is <code>25%</code>.
> +        </p>
> +      </column>
>        <column name="other_config" key="userspace-tso-enable"
>                type='{"type": "boolean"}'>
> --
> 2.26.2



More information about the dev mailing list