[ovs-dev] [PATCH v3] Upcall/Slowpath rate limiter for OVS

Jan Scheurich jan.scheurich at ericsson.com
Tue May 8 23:56:30 UTC 2018


Hi Manu,

Thanks for working on this. Two general comments:

1. Is there a chance to add unit test cases for this feature? I know it might be difficult due to the real-time character, but perhaps using very low parameter values?

2. I believe the number RL-dropped packets must be accounted for in function dpif_netdev_get_stats() as stats->n_missed, otherwise the overall number of reported packets may not match the total number of packets processed.

Other comments in-line.

Regards, Jan

> From: Manohar Krishnappa Chidambaraswamy
> Sent: Monday, 07 May, 2018 12:45
> 
> Hi
> 
> Rebased to master and adapted to the new dpif-netdev-perf counters.
> As explained in v2 thread, OFPM_SLOWPATH meters cannot be used as is
> for rate-limiting upcalls, hence reverted back to the simpler method
> using token bucket.

I guess the question was not whether to use meter actions in the datapath to implement the upcall rate limiter in dpif-netdev but whether to allow configuration of the upcall rate limiter through OpenFlow Meter Mod command for special pre-defined meter OFPM_SLOWPATH. In any case, the decision was to drop that idea.

> 
> Could you please review this patch?
> 
> Thanx
> Manu
> 
> v2: https://patchwork.ozlabs.org/patch/860687/
> v1: https://patchwork.ozlabs.org/patch/836737/

Please add the list of main changes between the current and the previous version in the next revision of the patch. Put it below the '---' separator so that is not part of the commit message.

> 
> Signed-off-by: Manohar K C
> <manohar.krishnappa.chidambaraswamy at ericsson.com>
> CC: Jan Scheurich <jan.scheurich at ericsson.com>
> ---
>  Documentation/howto/dpdk.rst | 21 +++++++++++
>  lib/dpif-netdev-perf.h       |  1 +
>  lib/dpif-netdev.c            | 83 ++++++++++++++++++++++++++++++++++++++++----
>  vswitchd/vswitch.xml         | 47 +++++++++++++++++++++++++
>  4 files changed, 146 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index 79b626c..bd1eaac 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -739,3 +739,24 @@ devices to bridge ``br0``. Once complete, follow the below steps:
>     Check traffic on multiple queues::
> 
>         $ cat /proc/interrupts | grep virtio
> +
> +Upcall rate limiting
> +--------------------
> +ovs-vsctl can be used to enable and configure upcall rate limit parameters.
> +There are 2 configurable values ``upcall-rate`` and ``upcall-burst`` which
> +take effect when global enable knob ``upcall-rl`` is set to true.

Please explain why upcall rate limiting may be relevant in the context of DPDK datapath (upcalls executed in the context of the PMD and affecting datapath forwarding capacity). Worth noting here, perhaps, that this rate limiting is independently per PMD and not a global limit.

Replace "knob" by "configuration parameter"  and put the command to enable rate limiting:
 $ ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true
before the commands to tune the token bucket parameters. Mention the default parameter values?

> +
> +Upcall rate should be set using ``upcall-rate`` in packets-per-sec. For
> +example::
> +
> +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-rate=2000
> +
> +Upcall burst should be set using ``upcall-burst`` in packets-per-sec. For
> +example::
> +
> +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-burst=2000
> +
> +Upcall ratelimit feature should be globally enabled using ``upcall-rl``. For
> +example::
> +
> +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true

> diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
> index 5993c25..189213c 100644
> --- a/lib/dpif-netdev-perf.h
> +++ b/lib/dpif-netdev-perf.h
> @@ -64,6 +64,7 @@ enum pmd_stat_type {
>                               * recirculation. */
>      PMD_STAT_SENT_PKTS,     /* Packets that have been sent. */
>      PMD_STAT_SENT_BATCHES,  /* Number of batches sent. */
> +    PMD_STAT_RATELIMIT_DROP,/* Packets dropped due to upcall policer. */

Name PMD_STAT_RL_DROP and move up in list after PMD_STAT_LOST. Add space before comment.

>      PMD_CYCLES_ITER_IDLE,   /* Cycles spent in idle iterations. */
>      PMD_CYCLES_ITER_BUSY,   /* Cycles spent in busy iterations. */
>      PMD_N_STATS
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index be31fd0..eebab89 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -101,6 +101,16 @@ static struct shash dp_netdevs OVS_GUARDED_BY(dp_netdev_mutex)
> 
>  static struct vlog_rate_limit upcall_rl = VLOG_RATE_LIMIT_INIT(600, 600);
> 
> +/* Upcall rate-limit parameters */
> +static bool upcall_ratelimit;
> +static unsigned int upcall_rate;
> +static unsigned int upcall_burst;

I suggest to store these datapath config parameters in struct dp_netdev, analogous to emc_min_insert and tx_flush_interval. Use "atomic" reads when accessing the parameter values in dp_netdev.

> +
> +/* These default values may be tuned based on upcall performance */
> +#define UPCALL_RATELIMIT_DEFAULT false /* Disabled by default */
> +#define UPCALL_RATE_DEFAULT      1000  /* pps */
> +#define UPCALL_BURST_DEFAULT     1000  /* pps */

The rate is in "upcalls per second". The burst size is not in PPS but "upcalls". 

Have you tested these default parameters? What is the performance degradation if a PMD handles 1000 upcalls per second? I have seen typical upcall durations of 75K cycles, that would 75 million cycles or 30 us at 2.5GHz clock spent processing upcalls (not including any degradation due to increased risk for locking in the slow path if multiple PMDs have similar upcall rates). So the minimum impact should be 3%, probably more.

With a burst of 1000 upcalls we have already seen packet drops earlier when running at forwarding rates close to the limit.

Without having tested this specifically, I'd consider rate of 500 upcalls per second and a burst size of 500 upcalls slightly more appropriate as default values.

> +
>  #define DP_NETDEV_CS_SUPPORTED_MASK (CS_NEW | CS_ESTABLISHED | CS_RELATED \
>                                       | CS_INVALID | CS_REPLY_DIR | CS_TRACKED \
>                                       | CS_SRC_NAT | CS_DST_NAT)
> @@ -615,6 +625,9 @@ struct dp_netdev_pmd_thread {
>      /* Keep track of detailed PMD performance statistics. */
>      struct pmd_perf_stats perf_stats;
> 
> +    /* Policer to rate limit slow-path */
> +    struct token_bucket upcall_tb;
> +
>      /* Set to true if the pmd thread needs to be reloaded. */
>      bool need_reload;
>  };
> @@ -856,12 +869,13 @@ pmd_info_show_stats(struct ds *reply,
>                    "\tavg. subtable lookups per megaflow hit: %.02f\n"
>                    "\tmiss with success upcall: %"PRIu64"\n"
>                    "\tmiss with failed upcall: %"PRIu64"\n"
> +                  "\tmiss with ratelimit dropped upcall: %"PRIu64"\n"

"miss with upcall dropped by rate limiter" or "miss with rate-limited upcall"?

>                    "\tavg. packets per output batch: %.02f\n",
>                    total_packets, stats[PMD_STAT_RECIRC],
>                    passes_per_pkt, stats[PMD_STAT_EXACT_HIT],
>                    stats[PMD_STAT_MASKED_HIT], lookups_per_hit,
>                    stats[PMD_STAT_MISS], stats[PMD_STAT_LOST],
> -                  packets_per_batch);
> +                  stats[PMD_STAT_RATELIMIT_DROP], packets_per_batch);
> 
>      if (total_cycles == 0) {
>          return;
> @@ -2987,6 +3001,8 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
>                          DEFAULT_EM_FLOW_INSERT_INV_PROB);
>      uint32_t insert_min, cur_min;
>      uint32_t tx_flush_interval, cur_tx_flush_interval;
> +    unsigned int rate, burst;
> +    bool ratelimit;
> 
>      tx_flush_interval = smap_get_int(other_config, "tx-flush-interval",
>                                       DEFAULT_TX_FLUSH_INTERVAL);
> @@ -3021,6 +3037,32 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
>          }
>      }
> 
> +    /* Handle upcall policer params */
> +    ratelimit = smap_get_bool(other_config, "upcall-rl",
> +                              UPCALL_RATELIMIT_DEFAULT);
> +    rate = smap_get_int(other_config, "upcall-rate",
> +                        UPCALL_RATE_DEFAULT);
> +    burst = smap_get_int(other_config, "upcall-burst",
> +                         UPCALL_BURST_DEFAULT);
> +
> +    if ((rate != upcall_rate) || (burst != upcall_burst)) {
> +        VLOG_INFO("Upcall ratelimit params changed : Old - rate=%d burst=%d "
> +                  ": New - rate=%d burst=%d\n", upcall_rate, upcall_burst,
> +                  rate, burst);

Better: "rate limiter parameters"? For all other config parameters only the new value is logged here.

> +
> +        upcall_rate = rate;
> +        upcall_burst = burst;
> +
> +        dp_netdev_request_reconfigure(dp);

Reconfiguring the datapath is a heavy operation with potential traffic impact as all PMDs must be stopped. The other parameters emc_min_insert and tx_flush_interval are updated without stopping PMDs. But I guess the token bucket per PMD must be re-initialized atomically while the PMD is not polling.

Furthermore, I think it is not sufficient to signal dp_netdev_request_reconfigure(). In reconfigure_datapath() you would also have to ensure that all PMDs are actually reloaded at least once when any of the token bucket parameters are changed. It would be sufficient to simply mark all PMDs with pmd->need_reload at the beginning of the function when needed. The first reload_affected_pmds() would then bring the RL config change into effect.

The non-PMD thread requires a special handling. But since reconfigure_datapath() is executed in the main thread, there is no need for reload.

> +    }
> +
> +    if (ratelimit != upcall_ratelimit) {
> +        upcall_ratelimit = ratelimit;
> +
> +        VLOG_INFO("Upcall ratelimit changed to %s\n",
> +                  (upcall_ratelimit ? "Enabled" : "Disabled"));
> +    }

Better "Upcall rate limiter enabled|disabled"? I think it is worth to also log the current parameter values.

> +
>      return 0;
>  }
> 
> @@ -3932,6 +3974,12 @@ dpif_netdev_run(struct dpif *dpif)
>      ovs_mutex_lock(&dp->port_mutex);
>      non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID);
>      if (non_pmd) {
> +        /* Reconfig the upcall policer if params have changed */
> +        if ((upcall_rate != non_pmd->upcall_tb.rate) ||
> +            (upcall_burst != non_pmd->upcall_tb.burst)) {
> +            token_bucket_init(&non_pmd->upcall_tb, upcall_rate, upcall_burst);
> +        }
> +

The update of non-PMD token bucket parameter should also be done as part of reconfigure_datapath() not in here in the dpif_netdev forwarding code of the main thread.

>          ovs_mutex_lock(&dp->non_pmd_mutex);
>          HMAP_FOR_EACH (port, node, &dp->ports) {
>              if (!netdev_is_pmd(port->netdev)) {
> @@ -4135,6 +4183,9 @@ reload:
>          lc = UINT_MAX;
>      }
> 
> +    /* Initialize upcall policer token bucket with configured params */
> +    token_bucket_init(&pmd->upcall_tb, upcall_rate, upcall_burst);
> +
>      pmd->intrvl_tsc_prev = 0;
>      atomic_store_relaxed(&pmd->intrvl_cycles, 0);
>      cycles_counter_update(s);
> @@ -4627,6 +4678,10 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
>          emc_cache_init(&pmd->flow_cache);
>          pmd_alloc_static_tx_qid(pmd);
>      }
> +
> +    /* Initialize upcall policer token bucket with configured params */
> +    token_bucket_init(&pmd->upcall_tb, upcall_rate, upcall_burst);
> +
>      pmd_perf_stats_init(&pmd->perf_stats);
>      cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, &pmd->node),
>                  hash_int(core_id, 0));
> @@ -5145,6 +5200,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
>      struct dpcls_rule *rules[PKT_ARRAY_SIZE];
>      struct dp_netdev *dp = pmd->dp;
>      int upcall_ok_cnt = 0, upcall_fail_cnt = 0;
> +    int upcall_rl_drop_cnt = 0;
>      int lookup_cnt = 0, add_lookup_cnt;
>      bool any_miss;
> 
> @@ -5185,13 +5241,26 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
>                  continue;
>              }
> 
> -            int error = handle_packet_upcall(pmd, packet, &keys[i],
> -                                             &actions, &put_actions);
> +            /*
> +             * Grab a token from the upcall policer to enter slowpath. If token
> +             * is not available, drop and account the packet. This is to
> +             * rate-limit packets getting into slowpath.
> +             */
> +            if (upcall_ratelimit &&
> +                !token_bucket_withdraw(&pmd->upcall_tb, 1)) {
> +                dp_packet_delete(packet);
> 
> -            if (OVS_UNLIKELY(error)) {
> -                upcall_fail_cnt++;
> +                upcall_rl_drop_cnt++;
>              } else {
> -                upcall_ok_cnt++;
> +
> +                int error = handle_packet_upcall(pmd, packet, &keys[i],
> +                                                 &actions, &put_actions);
> +
> +                if (OVS_UNLIKELY(error)) {
> +                    upcall_fail_cnt++;
> +                } else {
> +                    upcall_ok_cnt++;
> +                }
>              }
>          }
> 
> @@ -5228,6 +5297,8 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
>                              upcall_ok_cnt);
>      pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_LOST,
>                              upcall_fail_cnt);
> +    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_RATELIMIT_DROP,
> +                            upcall_rl_drop_cnt);
>  }
> 
>  /* Packets enter the datapath from a port (or from recirculation) here.
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 9c2a826..e0098ad 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -428,6 +428,53 @@
>          </p>
>        </column>
> 
> +      <column name="other_config" key="upcall-rl"
> +              type='{"type": "boolean"}'>
> +        <p>
> +          Set this value to <code>true</code> to enable upcall rate-limiting.
> +          The upcall parameters like rate and burst will be ignored, if this is
> +          not set.
> +        </p>
> +        <p>
> +          The default value is <code>false</code> and upcall rate-limiting will
> +          be disabled.
> +        </p>
> +      </column>
> +
> +      <column name="other_config" key="upcall-rate"
> +        type='{"type": "integer", "minInteger": 0, "maxInteger": 4294967295}'>
> +        <p>
> +          Specifies the rate of upcalls in packets-per-second that is to be
> +          allowed. For example, if the value is 10000, then those many upcalls
> +          (for packets) are allowed per second in each of the packet polling
> +          thread (PMD or non-PMD).
> +        </p>
> +        <p>
> +          A value of <code>0</code> means, no upcalls would be allowed i.e,
> +          upcall will be disabled. This is mainly for debugging.
> +        </p>
> +        <p>
> +          The default value is 1000.
> +        </p>

Proposal:
Specifies the sustained rate of upcalls in upcalls per second that are admitted
per packet-polling thread (PMD or non-PMD) in the netdev datapath when 
upcall rate limiting is enabled. The default value is 500 upcalls per second.

Setting both upcall-rate and upcall-burst to <code>0</code> disables upcalls
In the netdev datapath. This can be used for debugging purposes.

> +      </column>
> +
> +      <column name="other_config" key="upcall-burst"
> +        type='{"type": "integer", "minInteger": 0, "maxInteger": 4294967295}'>
> +        <p>
> +          Specifies the maximum burst of upcalls in packets-per-second that is
> +          to be allowed. For example, if the value is 15000, then a maximum
> +          burst of 15000 upcalls (for packets) are allowed per second in each
> +          of the packet polling thread (PMD or non-PMD).
> +        </p>
> +        <p>
> +          A value of <code>0</code> means, no upcalls would be allowed i.e,
> +          upcall will be disabled. This is mainly for debugging.
> +        </p>
> +        <p>
> +          The default value is 1000.
> +        </p>

Proposal:
When upcall rate limiting is enabled, this specifies the maximum burst of
upcalls that are admitted per packet-polling  thread (PMD or non-PMD) in 
the netdev datapath independent from their packet arrival rate. 
The default value is a burst of 500 upcalls.

Setting both upcall-rate and upcall-burst to <code>0</code> disables upcalls
In the netdev datapath. This can be used for debugging purposes.

> +      </column>
> +
>        <column name="other_config" key="vlan-limit"
>                type='{"type": "integer", "minInteger": 0}'>
>          <p>
> --
> 1.9.1
> 



More information about the dev mailing list