[ovs-dev] [PATCH v6 2/4] dpif-netdev: Add parameters to configure PMD auto load balance.
Stokes, Ian
ian.stokes at intel.com
Thu Jan 14 19:49:31 UTC 2021
> From: Christophe Fontaine <cfontain at redhat.com>
>
> Two important parts of how PMD auto load balance operates are how
> loaded a core needs to be and how much improvement is estimated
> before a PMD auto load balance can trigger.
>
> Previously they were hardcoded to 95% loaded and 25% variance
> improvement.
>
> These default values may not be suitable for all use cases and
> we may want to use a more (or less) aggressive rebalance, either
> on the pmd load threshold or on the minimum variance improvement
> threshold.
>
> The defaults are not changed, but "pmd-auto-lb-pmd-load" and
> "pmd-auto-lb-improvement" parameters are added to override the
> defaults.
>
> $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-pmd-load="70"
> $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-improvement="20"
>
Thanks for the patch Kevin,
Have tested over the last few days and it looks good to me. The only note I would have is
in addition to updating the vswitch.xml, I would think it was worth noting the two new parameters
in the vswitch documentation also (possibly giving an example of their usage).
https://docs.openvswitch.org/en/latest/topics/dpdk/pmd/#automatic-assignment-of-port-rx-queue-to-pmd-threads-experimental
Right now the documentation just refers to the default 95% threshold and 25% improvement.
So even an additional note to specify how the threshold and improvements can be changed would suffice I would think.
This doesn't have to block merge, but would appreciate a follow up patch to improve the docs soon.
Minor comment for discussion below.
> Signed-off-by: Christophe Fontaine <cfontain at redhat.com>
> Co-Authored-by: Kevin Traynor <ktraynor at redhat.com>
> Signed-off-by: Kevin Traynor <ktraynor at redhat.com>
> Acked-by: David Marchand <david.marchand at redhat.com>
> ---
> NEWS | 1 +
> lib/dpif-netdev.c | 54 +++++++++++++++++++++++++++++++++++++-----
> --
> vswitchd/vswitch.xml | 29 ++++++++++++++++++++++--
> 3 files changed, 74 insertions(+), 10 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 7e291a180..7994e434d 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -17,4 +17,5 @@ Post-v2.14.0
> * New 'options:dpdk-vf-mac' field for DPDK interface of VF ports,
> that allows configuring the MAC address of a VF representor.
> + * Add parameters to configure PMD auto load balance behaviour.
> - The environment variable OVS_UNBOUND_CONF, if set, is now used
> as the DNS resolver's (unbound) configuration file.
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index d19d5bbff..bf2112815 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -86,7 +86,7 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev);
>
> /* Auto Load Balancing Defaults */
> -#define ALB_ACCEPTABLE_IMPROVEMENT 25
> -#define ALB_PMD_LOAD_THRESHOLD 95
> -#define ALB_PMD_REBALANCE_POLL_INTERVAL 1 /* 1 Min */
> +#define ALB_IMPROVEMENT_THRESHOLD 25
> +#define ALB_LOAD_THRESHOLD 95
> +#define ALB_REBALANCE_INTERVAL 1 /* 1 Min */
> #define MIN_TO_MSEC 60000
>
> @@ -301,4 +301,6 @@ struct pmd_auto_lb {
> uint64_t rebalance_intvl;
> uint64_t rebalance_poll_timer;
> + uint8_t rebalance_improve_thresh;
> + atomic_uint8_t rebalance_load_thresh;
> };
>
> @@ -4205,4 +4207,5 @@ set_pmd_auto_lb(struct dp_netdev *dp)
> struct dp_netdev_pmd_thread *pmd;
> struct pmd_auto_lb *pmd_alb = &dp->pmd_alb;
> + uint8_t rebalance_load_thresh;
>
> bool enable_alb = false;
> @@ -4234,7 +4237,14 @@ set_pmd_auto_lb(struct dp_netdev *dp)
> pmd_alb->is_enabled = enable_alb;
> if (pmd_alb->is_enabled) {
> + atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
> + &rebalance_load_thresh);
> VLOG_INFO("PMD auto load balance is enabled "
> - "interval %"PRIu64" mins",
> - pmd_alb->rebalance_intvl / MIN_TO_MSEC);
> + "interval %"PRIu64" mins, "
> + "pmd load threshold %"PRIu8"%%, "
> + "improvement threshold %"PRIu8"%%",
> + pmd_alb->rebalance_intvl / MIN_TO_MSEC,
> + rebalance_load_thresh,
> + pmd_alb->rebalance_improve_thresh);
> +
> } else {
> pmd_alb->rebalance_poll_timer = 0;
> @@ -4260,4 +4270,6 @@ dpif_netdev_set_config(struct dpif *dpif, const
> struct smap *other_config)
> uint32_t tx_flush_interval, cur_tx_flush_interval;
> uint64_t rebalance_intvl;
> + uint8_t rebalance_load, cur_rebalance_load;
> + uint8_t rebalance_improve;
>
> tx_flush_interval = smap_get_int(other_config, "tx-flush-interval",
> @@ -4337,5 +4349,5 @@ dpif_netdev_set_config(struct dpif *dpif, const
> struct smap *other_config)
>
> rebalance_intvl = smap_get_int(other_config, "pmd-auto-lb-rebal-
> interval",
> - ALB_PMD_REBALANCE_POLL_INTERVAL);
> + ALB_REBALANCE_INTERVAL);
>
> /* Input is in min, convert it to msec. */
> @@ -4349,4 +4361,27 @@ dpif_netdev_set_config(struct dpif *dpif, const
> struct smap *other_config)
> }
>
> + rebalance_improve = smap_get_int(other_config, "pmd-auto-lb-
> improvement",
> + ALB_IMPROVEMENT_THRESHOLD);
> + if (rebalance_improve > 100) {
> + rebalance_improve = ALB_IMPROVEMENT_THRESHOLD;
> + }
> + if (rebalance_improve != pmd_alb->rebalance_improve_thresh) {
> + pmd_alb->rebalance_improve_thresh = rebalance_improve;
> + VLOG_INFO("PMD auto load balance improvement threshold set to "
> + "%"PRIu8"%%\n", rebalance_improve);
> + }
> +
> + rebalance_load = smap_get_int(other_config, "pmd-auto-lb-pmd-load",
> + ALB_LOAD_THRESHOLD);
> + if (rebalance_load > 100) {
> + rebalance_load = ALB_LOAD_THRESHOLD;
> + }
> + atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
> &cur_rebalance_load);
> + if (rebalance_load != cur_rebalance_load) {
> + atomic_store_relaxed(&pmd_alb->rebalance_load_thresh,
> + rebalance_load);
> + VLOG_INFO("PMD auto load balance pmd load threshold set to "
> + "%"PRIu8"%%\n", rebalance_load);
> + }
> set_pmd_auto_lb(dp);
> return 0;
> @@ -5677,5 +5712,5 @@ pmd_rebalance_dry_run(struct dp_netdev *dp)
> ((curr_variance - new_variance) * 100) / curr_variance;
> }
> - if (improvement < ALB_ACCEPTABLE_IMPROVEMENT) {
> + if (improvement < dp->pmd_alb.rebalance_improve_thresh) {
> ret = false;
> }
> @@ -8712,4 +8747,5 @@ dp_netdev_pmd_try_optimize(struct
> dp_netdev_pmd_thread *pmd,
> if (pmd->ctx.now > pmd->rxq_next_cycle_store) {
> uint64_t curr_tsc;
> + uint8_t rebalance_load_trigger;
> struct pmd_auto_lb *pmd_alb = &pmd->dp->pmd_alb;
> if (pmd_alb->is_enabled && !pmd->isolated
> @@ -8728,5 +8764,7 @@ dp_netdev_pmd_try_optimize(struct
> dp_netdev_pmd_thread *pmd,
> }
>
> - if (pmd_load >= ALB_PMD_LOAD_THRESHOLD) {
> + atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
> + &rebalance_load_trigger);
> + if (pmd_load >= rebalance_load_trigger) {
> atomic_count_inc(&pmd->pmd_overloaded);
> } else {
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 89a876796..76d528ea7 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -654,6 +654,6 @@
> <p>
> Configures PMD Auto Load Balancing that allows automatic assignment
> of
> - RX queues to PMDs if any of PMDs is overloaded (i.e. processing cycles
> - > 95%).
> + RX queues to PMDs if any of PMDs is overloaded (i.e. a processing
> + cycles > <ref column="other_config" key="pmd-auto-lb-pmd-load"/>).
> </p>
> <p>
> @@ -691,4 +691,29 @@
> </p>
> </column>
> + <column name="other_config" key="pmd-auto-lb-pmd-load"
> + type='{"type": "integer", "minInteger": 0, "maxInteger": 100}'>
> + <p>
> + Specifies the minimum pmd load threshold (% of used cycled) of
> + any non-isolated pmds when a PMD auto load balance may be
> triggered.
> + </p>
> + <p>
> + The default value is <code>95%</code>.
> + </p>
> + </column>
> + <column name="other_config" key="pmd-auto-lb-improvement"
> + type='{"type": "integer", "minInteger": 0, "maxInteger": 100}'>
> + <p>
> + Specifies the minimum evaluated % improvement in load distribution
> + across the non-isolated pmds that will allow a PMD auto load balance
> + to occur.
> + </p>
> + <p>
> + Note, setting this parameter to 0 will always allow an auto load
> + balance to occur regardless of estimated improvement or not.
Is there any reason you would want the 0 option above? Surely it would just become disruptive to the deployment.
I know you could argue an enforced minimum of 1% would not make much of a difference, but at least it would not re-distribute regardless of a gain?
Just curious as to the thinking behind it.
Regards
Ian
> + </p>
> + <p>
> + The default value is <code>25%</code>.
> + </p>
> + </column>
> <column name="other_config" key="userspace-tso-enable"
> type='{"type": "boolean"}'>
> --
> 2.26.2
More information about the dev
mailing list