[ovs-dev] [PATCH v7 2/3] dpif-netdev: Detailed performance stats for PMDs
O Mahony, Billy
billy.o.mahony at intel.com
Fri Jan 19 11:31:29 UTC 2018
A few things I didn't come across until reading 3/3 but are not related to the other atomic/volatile/mutex discussion..
All these are suggestions so use only if felt they offer improvements in clarity.
> -----Original Message-----
> From: Jan Scheurich [mailto:jan.scheurich at ericsson.com]
> Sent: Tuesday, January 16, 2018 1:51 AM
> To: dev at openvswitch.org
> Cc: ktraynor at redhat.com; Stokes, Ian <ian.stokes at intel.com>;
> i.maximets at samsung.com; O Mahony, Billy <billy.o.mahony at intel.com>;
> Jan Scheurich <jan.scheurich at ericsson.com>
> Subject: [PATCH v7 2/3] dpif-netdev: Detailed performance stats for PMDs
>
> This patch instruments the dpif-netdev datapath to record detailed
> statistics of what is happening in every iteration of a PMD thread.
>
> The collection of detailed statistics can be controlled by a new
> Open_vSwitch configuration parameter "other_config:pmd-perf-metrics".
> By default it is disabled. The run-time overhead, when enabled, is
> in the order of 1%.
>
> The covered metrics per iteration are:
> - cycles
> - packets
> - (rx) batches
> - packets/batch
> - max. vhostuser qlen
> - upcalls
> - cycles spent in upcalls
>
> This raw recorded data is used threefold:
>
> 1. In histograms for each of the following metrics:
> - cycles/iteration (log.)
> - packets/iteration (log.)
> - cycles/packet
> - packets/batch
> - max. vhostuser qlen (log.)
> - upcalls
> - cycles/upcall (log)
> The histograms bins are divided linear or logarithmic.
>
> 2. A cyclic history of the above statistics for 999 iterations
>
> 3. A cyclic history of the cummulative/average values per millisecond
> wall clock for the last 1000 milliseconds:
> - number of iterations
> - avg. cycles/iteration
> - packets (Kpps)
> - avg. packets/batch
> - avg. max vhost qlen
> - upcalls
> - avg. cycles/upcall
>
> The gathered performance metrics can be printed at any time with the
> new CLI command
>
> ovs-appctl dpif-netdev/pmd-perf-show [-nh] [-it iter_len] [-ms ms_len]
> [-pmd core | dp]
>
> The options are
>
> -nh: Suppress the histograms
> -it iter_len: Display the last iter_len iteration stats
> -ms ms_len: Display the last ms_len millisecond stats
> -pmd core: Display only
>
> The performance statistics are reset with the existing
> dpif-netdev/pmd-stats-clear command.
>
> The output always contains the following global PMD statistics,
> similar to the pmd-stats-show command:
>
> Time: 15:24:55.270
> Measurement duration: 1.008 s
>
> pmd thread numa_id 0 core_id 1:
>
> Cycles: 2419034712 (2.40 GHz)
> Iterations: 572817 (1.76 us/it)
> - idle: 486808 (15.9 % cycles)
> - busy: 86009 (84.1 % cycles)
> Packets: 2399607 (2381 Kpps, 848 cycles/pkt)
> Datapath passes: 3599415 (1.50 passes/pkt)
> - EMC hits: 336472 ( 9.3 %)
> - Megaflow hits: 3262943 (90.7 %, 1.00 subtbl lookups/hit)
> - Upcalls: 0 ( 0.0 %, 0.0 us/upcall)
> - Lost upcalls: 0 ( 0.0 %)
>
> Signed-off-by: Jan Scheurich <jan.scheurich at ericsson.com>
> ---
> NEWS | 3 +
> lib/automake.mk | 1 +
> lib/dp-packet.h | 1 +
> lib/dpif-netdev-perf.c | 333
> +++++++++++++++++++++++++++++++++++++++++++-
> lib/dpif-netdev-perf.h | 239 +++++++++++++++++++++++++++++--
> lib/dpif-netdev.c | 177 +++++++++++++++++++++--
> lib/netdev-dpdk.c | 13 +-
> lib/netdev-dpdk.h | 14 ++
> lib/netdev-dpif-unixctl.man | 113 +++++++++++++++
> manpages.mk | 2 +
> vswitchd/ovs-vswitchd.8.in | 27 +---
> vswitchd/vswitch.xml | 12 ++
> 12 files changed, 881 insertions(+), 54 deletions(-)
> create mode 100644 lib/netdev-dpif-unixctl.man
>
> diff --git a/NEWS b/NEWS
> index 2c28456..743528e 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -44,6 +44,9 @@ Post-v2.8.0
> if available (for OpenFlow 1.4+).
> - Userspace datapath:
> * Output packet batching support.
> + * Commands ovs-appctl dpif-netdev/pmd-*-show can now work on a
> single PMD
> + * Detailed PMD performance metrics available with new command
> + ovs-appctl dpif-netdev/pmd-perf-show
> - vswitchd:
> * Datapath IDs may now be specified as 0x1 (etc.) instead of 16 digits.
> * Configuring a controller, or unconfiguring all controllers, now deletes
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 159319f..d07cbe9 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -468,6 +468,7 @@ MAN_FRAGMENTS += \
> lib/dpctl.man \
> lib/memory-unixctl.man \
> lib/netdev-dpdk-unixctl.man \
> + lib/netdev-dpif-unixctl.man \
> lib/ofp-version.man \
> lib/ovs.tmac \
> lib/service.man \
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index b4b721c..3d65088 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -697,6 +697,7 @@ struct dp_packet_batch {
> size_t count;
> bool trunc; /* true if the batch needs truncate. */
> struct dp_packet *packets[NETDEV_MAX_BURST];
> +
> };
>
> static inline void
> diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
> index f06991a..e0ef15d 100644
> --- a/lib/dpif-netdev-perf.c
> +++ b/lib/dpif-netdev-perf.c
> @@ -15,6 +15,7 @@
> */
>
> #include <config.h>
> +#include <stdint.h>
>
> #include "openvswitch/dynamic-string.h"
> #include "openvswitch/vlog.h"
> @@ -23,10 +24,299 @@
>
> VLOG_DEFINE_THIS_MODULE(pmd_perf);
>
> +#ifdef DPDK_NETDEV
> +static uint64_t
> +get_tsc_hz(void)
> +{
> + return rte_get_tsc_hz();
> +}
> +#else
> +/* This function is only invoked from PMD threads which depend on DPDK.
> + * A dummy function is sufficient when building without DPDK_NETDEV. */
> +static uint64_t
> +get_tsc_hz(void)
> +{
> + return 1;
> +}
> +#endif
> +
> +/* Histogram functions. */
> +
> +static void
> +histogram_walls_set_lin(struct histogram *hist, uint32_t min, uint32_t max)
> +{
> + int i;
> +
> + ovs_assert(min < max);
> + for (i = 0; i < NUM_BINS-1; i++) {
> + hist->wall[i] = min + (i * (max - min)) / (NUM_BINS - 2);
> + }
> + hist->wall[NUM_BINS-1] = UINT32_MAX;
> +}
> +
> +static void
> +histogram_walls_set_log(struct histogram *hist, uint32_t min, uint32_t
> max)
> +{
> + int i, start, bins, wall;
> + double log_min, log_max;
> +
> + ovs_assert(min < max);
> + if (min > 0) {
> + log_min = log(min);
> + log_max = log(max);
> + start = 0;
> + bins = NUM_BINS - 1;
> + } else {
> + hist->wall[0] = 0;
> + log_min = log(1);
> + log_max = log(max);
> + start = 1;
> + bins = NUM_BINS - 2;
> + }
> + wall = start;
> + for (i = 0; i < bins; i++) {
> + /* Make sure each wall is monotonically increasing. */
> + wall = MAX(wall, exp(log_min + (i * (log_max - log_min)) / (bins-1)));
> + hist->wall[start + i] = wall++;
> + }
> + if (hist->wall[NUM_BINS-2] < max) {
> + hist->wall[NUM_BINS-2] = max;
> + }
> + hist->wall[NUM_BINS-1] = UINT32_MAX;
> +}
> +
> +uint64_t
> +histogram_samples(const struct histogram *hist)
> +{
> + uint64_t samples = 0;
> +
> + for (int i = 0; i < NUM_BINS; i++) {
> + samples += hist->bin[i];
> + }
> + return samples;
> +}
> +
> +static void
> +histogram_clear(struct histogram *hist)
> +{
> + int i;
> +
> + for (i = 0; i < NUM_BINS; i++) {
> + hist->bin[i] = 0;
> + }
> +}
> +
> +static void
> +history_init(struct history *h)
> +{
> + memset(h, 0, sizeof(*h));
> +}
> +
> void
> pmd_perf_stats_init(struct pmd_perf_stats *s)
> {
> - memset(s, 0 , sizeof(*s));
> + memset(s, 0, sizeof(*s));
> + histogram_walls_set_log(&s->cycles, 500, 24000000);
> + histogram_walls_set_log(&s->pkts, 0, 1000);
> + histogram_walls_set_lin(&s->cycles_per_pkt, 100, 30000);
> + histogram_walls_set_lin(&s->pkts_per_batch, 0, 32);
> + histogram_walls_set_lin(&s->upcalls, 0, 30);
> + histogram_walls_set_log(&s->cycles_per_upcall, 1000, 1000000);
> + histogram_walls_set_log(&s->max_vhost_qfill, 0, 512);
> + s->start_ms = time_msec();
> +}
> +
> +void
> +pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
> + double duration)
> +{
> + uint64_t stats[PMD_N_STATS];
> + double us_per_cycle = 1000000.0 / get_tsc_hz();
> +
> + if (duration == 0) {
> + return;
> + }
> +
> + pmd_perf_read_counters(s, stats);
> + uint64_t tot_cycles = stats[PMD_CYCLES_ITER_IDLE] +
> + stats[PMD_CYCLES_ITER_BUSY];
> + uint64_t packets = stats[PMD_STAT_RECV];
> + uint64_t passes = stats[PMD_STAT_RECV] +
> + stats[PMD_STAT_RECIRC];
> + uint64_t upcalls = stats[PMD_STAT_MISS];
> + uint64_t upcall_cycles = stats[PMD_CYCLES_UPCALL];
> + uint64_t tot_iter = histogram_samples(&s->pkts);
> + uint64_t idle_iter = s->pkts.bin[0];
> + uint64_t busy_iter = tot_iter >= idle_iter ? tot_iter - idle_iter : 0;
> +
> + ds_put_format(str,
> + " Cycles: %12"PRIu64" (%.2f GHz)\n"
> + " Iterations: %12"PRIu64" (%.2f us/it)\n"
> + " - idle: %12"PRIu64" (%4.1f %% cycles)\n"
> + " - busy: %12"PRIu64" (%4.1f %% cycles)\n",
> + tot_cycles, (tot_cycles / duration) / 1E9,
> + tot_iter, tot_cycles * us_per_cycle / tot_iter,
> + idle_iter,
> + 100.0 * stats[PMD_CYCLES_ITER_IDLE] / tot_cycles,
> + busy_iter,
> + 100.0 * stats[PMD_CYCLES_ITER_BUSY] / tot_cycles);
> + if (packets > 0) {
> + ds_put_format(str,
> + " Packets: %12"PRIu64" (%.0f Kpps, %.0f cycles/pkt)\n"
> + " Datapath passes: %12"PRIu64" (%.2f passes/pkt)\n"
> + " - EMC hits: %12"PRIu64" (%4.1f %%)\n"
> + " - Megaflow hits: %12"PRIu64" (%4.1f %%, %.2f subtbl lookups/"
> + "hit)\n"
> + " - Upcalls: %12"PRIu64" (%4.1f %%, %.1f us/upcall)\n"
> + " - Lost upcalls: %12"PRIu64" (%4.1f %%)\n"
> + "\n",
> + packets, (packets / duration) / 1000,
> + 1.0 * stats[PMD_CYCLES_ITER_BUSY] / packets,
> + passes, packets ? 1.0 * passes / packets : 0,
> + stats[PMD_STAT_EXACT_HIT],
> + 100.0 * stats[PMD_STAT_EXACT_HIT] / passes,
> + stats[PMD_STAT_MASKED_HIT],
> + 100.0 * stats[PMD_STAT_MASKED_HIT] / passes,
> + stats[PMD_STAT_MASKED_HIT]
> + ? 1.0 * stats[PMD_STAT_MASKED_LOOKUP] /
> stats[PMD_STAT_MASKED_HIT]
> + : 0,
> + upcalls, 100.0 * upcalls / passes,
> + upcalls ? (upcall_cycles * us_per_cycle) / upcalls : 0,
> + stats[PMD_STAT_LOST],
> + 100.0 * stats[PMD_STAT_LOST] / passes);
> + } else {
> + ds_put_format(str,
> + " Packets: %12"PRIu64"\n"
> + "\n",
> + 0UL);
> + }
> +}
> +
> +void
> +pmd_perf_format_histograms(struct ds *str, struct pmd_perf_stats *s)
> +{
> + int i;
> +
> + ds_put_cstr(str, "Histograms\n");
> + ds_put_format(str,
> + " %-21s %-21s %-21s %-21s %-21s %-21s %-21s\n",
> + "cycles/it", "packets/it", "cycles/pkt", "pkts/batch",
> + "max vhost qlen", "upcalls/it", "cycles/upcall");
> + for (i = 0; i < NUM_BINS-1; i++) {
> + ds_put_format(str,
> + " %-9d %-11"PRIu64" %-9d %-11"PRIu64" %-9d %-11"PRIu64
> + " %-9d %-11"PRIu64" %-9d %-11"PRIu64" %-9d %-11"PRIu64
> + " %-9d %-11"PRIu64"\n",
> + s->cycles.wall[i], s->cycles.bin[i],
> + s->pkts.wall[i],s->pkts.bin[i],
> + s->cycles_per_pkt.wall[i], s->cycles_per_pkt.bin[i],
> + s->pkts_per_batch.wall[i], s->pkts_per_batch.bin[i],
> + s->max_vhost_qfill.wall[i], s->max_vhost_qfill.bin[i],
> + s->upcalls.wall[i], s->upcalls.bin[i],
> + s->cycles_per_upcall.wall[i], s->cycles_per_upcall.bin[i]);
> + }
> + ds_put_format(str,
> + " %-9s %-11"PRIu64" %-9s %-11"PRIu64" %-9s %-11"PRIu64
> + " %-9s %-11"PRIu64" %-9s %-11"PRIu64" %-9s %-11"PRIu64
> + " %-9s %-11"PRIu64"\n",
> + ">", s->cycles.bin[i],
> + ">", s->pkts.bin[i],
> + ">", s->cycles_per_pkt.bin[i],
> + ">", s->pkts_per_batch.bin[i],
> + ">", s->max_vhost_qfill.bin[i],
> + ">", s->upcalls.bin[i],
> + ">", s->cycles_per_upcall.bin[i]);
> + if (s->totals.iterations > 0) {
> + ds_put_cstr(str,
> + "-----------------------------------------------------"
> + "-----------------------------------------------------"
> + "------------------------------------------------\n");
> + ds_put_format(str,
> + " %-21s %-21s %-21s %-21s %-21s %-21s %-21s\n",
> + "cycles/it", "packets/it", "cycles/pkt", "pkts/batch",
> + "vhost qlen", "upcalls/it", "cycles/upcall");
> + ds_put_format(str,
> + " %-21"PRIu64" %-21.5f %-21"PRIu64
> + " %-21.5f %-21.5f %-21.5f %-21"PRIu32"\n",
> + s->totals.cycles / s->totals.iterations,
> + 1.0 * s->totals.pkts / s->totals.iterations,
> + s->totals.pkts
> + ? s->totals.busy_cycles / s->totals.pkts : 0,
> + s->totals.batches
> + ? 1.0 * s->totals.pkts / s->totals.batches : 0,
> + 1.0 * s->totals.max_vhost_qfill / s->totals.iterations,
> + 1.0 * s->totals.upcalls / s->totals.iterations,
> + s->totals.upcalls
> + ? s->totals.upcall_cycles / s->totals.upcalls : 0);
> + }
> +}
> +
> +void
> +pmd_perf_format_iteration_history(struct ds *str, struct pmd_perf_stats
> *s,
> + int n_iter)
> +{
> + struct iter_stats *is;
> + size_t index;
> + int i;
> +
> + if (n_iter == 0) {
> + return;
> + }
> + ds_put_format(str, " %-17s %-10s %-10s %-10s %-10s "
> + "%-10s %-10s %-10s\n",
> + "tsc", "cycles", "packets", "cycles/pkt", "pkts/batch",
> + "vhost qlen", "upcalls", "cycles/upcall");
> + for (i = 1; i <= n_iter; i++) {
> + index = (s->iterations.idx + HISTORY_LEN - i) % HISTORY_LEN;
> + is = &s->iterations.sample[index];
> + ds_put_format(str,
> + " %-17"PRIu64" %-11"PRIu64" %-11"PRIu32
> + " %-11"PRIu64" %-11"PRIu32" %-11"PRIu32
> + " %-11"PRIu32" %-11"PRIu32"\n",
> + is->timestamp,
> + is->cycles,
> + is->pkts,
> + is->pkts ? is->cycles / is->pkts : 0,
> + is->batches ? is->pkts / is->batches : 0,
> + is->max_vhost_qfill,
> + is->upcalls,
> + is->upcalls ? is->upcall_cycles / is->upcalls : 0);
> + }
> +}
> +
> +void
> +pmd_perf_format_ms_history(struct ds *str, struct pmd_perf_stats *s, int
> n_ms)
> +{
> + struct iter_stats *is;
> + size_t index;
> + int i;
> +
> + if (n_ms == 0) {
> + return;
> + }
> + ds_put_format(str,
> + " %-12s %-10s %-10s %-10s %-10s"
> + " %-10s %-10s %-10s %-10s\n",
> + "ms", "iterations", "cycles/it", "Kpps", "cycles/pkt",
> + "pkts/batch", "vhost qlen", "upcalls", "cycles/upcall");
> + for (i = 1; i <= n_ms; i++) {
> + index = (s->milliseconds.idx + HISTORY_LEN - i) % HISTORY_LEN;
> + is = &s->milliseconds.sample[index];
> + ds_put_format(str,
> + " %-12"PRIu64" %-11"PRIu32" %-11"PRIu64
> + " %-11"PRIu32" %-11"PRIu64" %-11"PRIu32
> + " %-11"PRIu32" %-11"PRIu32" %-11"PRIu32"\n",
> + is->timestamp,
> + is->iterations,
> + is->iterations ? is->cycles / is->iterations : 0,
> + is->pkts,
> + is->pkts ? is->busy_cycles / is->pkts : 0,
> + is->batches ? is->pkts / is->batches : 0,
> + is->iterations
> + ? is->max_vhost_qfill / is->iterations : 0,
> + is->upcalls,
> + is->upcalls ? is->upcall_cycles / is->upcalls : 0);
> + }
> }
>
> void
> @@ -51,10 +341,49 @@ pmd_perf_read_counters(struct pmd_perf_stats
> *s,
> }
> }
>
> +/* This function is executed in the context of the PMD at the start of
> + * a new iteration when requested through pmd_perf_stats_clear(). */
> void
> -pmd_perf_stats_clear(struct pmd_perf_stats *s)
> +pmd_perf_stats_clear__(struct pmd_perf_stats *s)
> {
> for (int i = 0; i < PMD_N_STATS; i++) {
> atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]);
> }
> + memset(&s->current, 0 , sizeof(struct iter_stats));
> + memset(&s->totals, 0 , sizeof(struct iter_stats));
> + histogram_clear(&s->cycles);
> + histogram_clear(&s->pkts);
> + histogram_clear(&s->cycles_per_pkt);
> + histogram_clear(&s->upcalls);
> + histogram_clear(&s->cycles_per_upcall);
> + histogram_clear(&s->pkts_per_batch);
> + histogram_clear(&s->max_vhost_qfill);
> + history_init(&s->iterations);
> + history_init(&s->milliseconds);
> + s->start_ms = time_msec();
> + s->milliseconds.sample[0].timestamp = s->start_ms;
> + /* Clearing finished. */
> + s->clear = false;
> +}
> +
> +/* This function must be called from outside the PMD thread to safely
> + * clear the PMD stats at the start of the next iteration. It blocks the
> + * caller until the stats are cleared. */
> +void
> +pmd_perf_stats_clear(struct pmd_perf_stats *s)
> +{
> + /* Request the PMD to clear its stats in pmd_perf_start_iteration(). */
> + s->clear = true;
> + /* Wait a number of milliseconds for the stats to be cleared. */
> + while (s->clear) {
> + xnanosleep(1000 * 1000);
> + }
> +}
> +
> +/* This function can be called from the anywhere to clear the stats
> + * of the non-pmd thread. */
> +void
> +non_pmd_perf_stats_clear(struct pmd_perf_stats *s)
> +{
> + pmd_perf_stats_clear__(s);
> }
> diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
> index 5993c25..7a89c40 100644
> --- a/lib/dpif-netdev-perf.h
> +++ b/lib/dpif-netdev-perf.h
> @@ -38,10 +38,18 @@
> extern "C" {
> #endif
>
> -/* This module encapsulates data structures and functions to maintain PMD
> - * performance metrics such as packet counters, execution cycles. It
> - * provides a clean API for dpif-netdev to initialize, update and read and
> +/* This module encapsulates data structures and functions to maintain basic
> PMD
> + * performance metrics such as packet counters, execution cycles as well as
> + * histograms and time series recording for more detailed PMD metrics.
> + *
> + * It provides a clean API for dpif-netdev to initialize, update and read and
> * reset these metrics.
> + *
> + * The basic set of PMD counters is implemented as atomic_uint64_t
> variables
> + * to guarantee correct read also in 32-bit systems.
> + *
> + * The detailed PMD performance metrics are only supported on 64-bit
> systems
> + * with atomic 64-bit read and store semantics for plain uint64_t counters.
> */
>
> /* Set of counter types maintained in pmd_perf_stats. */
> @@ -66,6 +74,7 @@ enum pmd_stat_type {
> PMD_STAT_SENT_BATCHES, /* Number of batches sent. */
> PMD_CYCLES_ITER_IDLE, /* Cycles spent in idle iterations. */
> PMD_CYCLES_ITER_BUSY, /* Cycles spent in busy iterations. */
> + PMD_CYCLES_UPCALL, /* Cycles spent processing upcalls. */
> PMD_N_STATS
> };
>
> @@ -81,18 +90,78 @@ struct pmd_counters {
> uint64_t zero[PMD_N_STATS]; /* Value at last _clear(). */
> };
>
> +/* Data structure to collect statistical distribution of an integer
> measurement
> + * type in form of a histogram. The wall[] array contains the inclusive
> + * upper boundaries of the bins, while the bin[] array contains the actual
> + * counters per bin. The histogram walls are typically set automatically
> + * using the functions provided below.*/
> +
> +#define NUM_BINS 32 /* Number of histogram bins. */
> +
> +struct histogram {
> + uint32_t wall[NUM_BINS];
> + uint64_t bin[NUM_BINS];
> +};
> +
> +/* Data structure to record details PMD execution metrics per iteration for
> + * a history period of up to HISTORY_LEN iterations in circular buffer.
> + * Also used to record up to HISTORY_LEN millisecond averages/totals of
> these
> + * metrics.*/
> +
> +struct iter_stats {
> + uint64_t timestamp; /* TSC or millisecond. */
> + uint64_t cycles; /* Number of TSC cycles spent in it/ms. */
> + uint64_t busy_cycles; /* Cycles spent in busy iterations in ms. */
> + uint32_t iterations; /* Iterations in ms. */
> + uint32_t pkts; /* Packets processed in iteration/ms. */
> + uint32_t upcalls; /* Number of upcalls in iteration/ms. */
> + uint32_t upcall_cycles; /* Cycles spent in upcalls in iteration/ms. */
> + uint32_t batches; /* Number of rx batches in iteration/ms. */
> + uint32_t max_vhost_qfill; /* Maximum fill level encountered in it/ms. */
> +};
> +
> +#define HISTORY_LEN 1000 /* Length of recorded history
> + (iterations and ms). */
> +#define DEF_HIST_SHOW 20 /* Default number of history samples to
> + display. */
> +
> +struct history {
> + uint64_t idx; /* Next slot in history. */
[[BO'M]] Suggest "Slot where next call to history_store() will write" is more educational.
> + struct iter_stats sample[HISTORY_LEN];
[[BO'M]] sample -> samples. The plural reads more naturally and iirc is in the coding sytle.
> +};
> +
> /* Container for all performance metrics of a PMD.
> * Part of the struct dp_netdev_pmd_thread. */
>
> struct pmd_perf_stats {
> - /* Start of the current PMD iteration in TSC cycles.*/
> - uint64_t start_it_tsc;
> + /* Set by CLI thread to order clearing of PMD stats. */
> + volatile atomic_bool clear;
> + /* Start of the current performance measurement period. */
> + uint64_t start_ms;
> /* Latest TSC time stamp taken in PMD. */
> uint64_t last_tsc;
> + /* Used to space certain checks in time. */
> + uint64_t next_check_tsc;
> /* If non-NULL, outermost cycle timer currently running in PMD. */
> struct cycle_timer *cur_timer;
> /* Set of PMD counters with their zero offsets. */
> struct pmd_counters counters;
> + /* Statistics of the current iteration. */
> + struct iter_stats current;
> + /* Totals for the current millisecond. */
> + struct iter_stats totals;
> + /* Histograms for the PMD metrics. */
> + struct histogram cycles;
> + struct histogram pkts;
> + struct histogram cycles_per_pkt;
> + struct histogram upcalls;
> + struct histogram cycles_per_upcall;
> + struct histogram pkts_per_batch;
> + struct histogram max_vhost_qfill;
> + /* Iteration history buffer. */
> + struct history iterations;
> + /* Millisecond hitory buffer. */
> + struct history milliseconds;
[[BO'M]] Would suggest iter_history or iteration_history and ms_history or millisec(ond)_history would help when reading the code. Don't shorted history to hist however as that clashes with the shortening used for histogram.
> };
>
> /* Support for accurate timing of PMD execution on TSC clock cycle level.
> @@ -175,8 +244,15 @@ cycle_timer_stop(struct pmd_perf_stats *s,
> return now - timer->start;
> }
>
> +/* Functions to initialize and reset the PMD performance metrics. */
> +
> void pmd_perf_stats_init(struct pmd_perf_stats *s);
> void pmd_perf_stats_clear(struct pmd_perf_stats *s);
> +void non_pmd_perf_stats_clear(struct pmd_perf_stats *s);
> +void pmd_perf_stats_clear__(struct pmd_perf_stats *s);
> +
> +/* Functions to read and update PMD counters. */
> +
> void pmd_perf_read_counters(struct pmd_perf_stats *s,
> uint64_t stats[PMD_N_STATS]);
>
> @@ -199,32 +275,175 @@ pmd_perf_update_counter(struct
> pmd_perf_stats *s,
> atomic_store_relaxed(&s->counters.n[counter], tmp);
> }
>
> +/* Functions to manipulate a sample history. */
> +
> +static inline void
> +histogram_add_sample(struct histogram *hist, uint32_t val)
> +{
> + /* TODO: Can do better with binary search? */
> + for (int i = 0; i < NUM_BINS-1; i++) {
> + if (val <= hist->wall[i]) {
> + hist->bin[i]++;
> + return;
> + }
> + }
> + hist->bin[NUM_BINS-1]++;
> +}
> +
> +uint64_t histogram_samples(const struct histogram *hist);
> +
> +static inline struct iter_stats *
> +history_current(struct history *h)
> +{
> + return &h->sample[h->idx];
> +}
> +
> +static inline struct iter_stats *
> +history_next(struct history *h)
> +{
> + struct iter_stats *next;
> +
> + h->idx++;
> + if (h->idx == HISTORY_LEN) {
> + h->idx = 0;
> + }
> + next = &h->sample[h->idx];
> + memset(next, 0, sizeof(*next));
> + return next;
> +}
> +
> +static inline struct iter_stats *
> +history_store(struct history *h, struct iter_stats *is)
> +{
> + if (is) {
> + h->sample[h->idx] = *is;
> + }
> + /* Advance the history pointer */
> + return history_next(h);
> +}
> +
> +/* Functions recording PMD metrics per iteration. */
> +
> static inline void
> pmd_perf_start_iteration(struct pmd_perf_stats *s)
> {
> + if (s->clear) {
> + /* Clear the PMD stats before starting next iteration. */
> + pmd_perf_stats_clear__(s);
> + }
> + /* Initialize the current interval stats. */
> + memset(&s->current, 0, sizeof(struct iter_stats));
> if (OVS_LIKELY(s->last_tsc)) {
> /* We assume here that last_tsc was updated immediately prior at
> * the end of the previous iteration, or just before the first
> * iteration. */
> - s->start_it_tsc = s->last_tsc;
> + s->current.timestamp = s->last_tsc;
> } else {
> /* In case last_tsc has never been set before. */
> - s->start_it_tsc = cycles_counter_update(s);
> + s->current.timestamp = cycles_counter_update(s);
> }
> }
>
> static inline void
> -pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets)
> +pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
> + int tx_packets, bool full_metrics)
> {
> - uint64_t cycles = cycles_counter_update(s) - s->start_it_tsc;
> + uint64_t now_tsc = cycles_counter_update(s);
> + struct iter_stats *cum_ms;
> + uint64_t cycles, cycles_per_pkt = 0;
>
> - if (rx_packets > 0) {
> + if (OVS_UNLIKELY(s->current.timestamp == 0)) {
> + /* Stats were cleared during the ongoing iteration. */
> + return;
> + }
> +
> + cycles = now_tsc - s->current.timestamp;
> + s->current.cycles = cycles;
> + s->current.pkts = rx_packets;
> +
> + if (rx_packets + tx_packets > 0) {
> pmd_perf_update_counter(s, PMD_CYCLES_ITER_BUSY, cycles);
> } else {
> pmd_perf_update_counter(s, PMD_CYCLES_ITER_IDLE, cycles);
> }
> + /* Add iteration samples to histograms. */
> + histogram_add_sample(&s->cycles, cycles);
> + histogram_add_sample(&s->pkts, rx_packets);
> +
> + if (!full_metrics) {
> + return;
> + }
> +
> + s->counters.n[PMD_CYCLES_UPCALL] += s->current.upcall_cycles;
> +
> + if (rx_packets > 0) {
> + cycles_per_pkt = cycles / rx_packets;
> + histogram_add_sample(&s->cycles_per_pkt, cycles_per_pkt);
> + }
> + if (s->current.batches > 0) {
> + histogram_add_sample(&s->pkts_per_batch,
> + rx_packets / s->current.batches);
> + }
> + histogram_add_sample(&s->upcalls, s->current.upcalls);
> + if (s->current.upcalls > 0) {
> + histogram_add_sample(&s->cycles_per_upcall,
> + s->current.upcall_cycles / s->current.upcalls);
> + }
> + histogram_add_sample(&s->max_vhost_qfill, s-
> >current.max_vhost_qfill);
> +
> + /* Add iteration samples to millisecond stats. */
> + cum_ms = history_current(&s->milliseconds);
> + cum_ms->iterations++;
> + cum_ms->cycles += cycles;
> + if (rx_packets > 0) {
> + cum_ms->busy_cycles += cycles;
> + }
> + cum_ms->pkts += s->current.pkts;
> + cum_ms->upcalls += s->current.upcalls;
> + cum_ms->upcall_cycles += s->current.upcall_cycles;
> + cum_ms->batches += s->current.batches;
> + cum_ms->max_vhost_qfill += s->current.max_vhost_qfill;
> +
> + /* Store in iteration history. */
> + history_store(&s->iterations, &s->current);
> + if (now_tsc > s->next_check_tsc) {
> + /* Check if ms is completed and store in milliseconds history. */
> + uint64_t now = time_msec();
> + if (now != cum_ms->timestamp) {
> + /* Add ms stats to totals. */
> + s->totals.iterations += cum_ms->iterations;
> + s->totals.cycles += cum_ms->cycles;
> + s->totals.busy_cycles += cum_ms->busy_cycles;
> + s->totals.pkts += cum_ms->pkts;
> + s->totals.upcalls += cum_ms->upcalls;
> + s->totals.upcall_cycles += cum_ms->upcall_cycles;
> + s->totals.batches += cum_ms->batches;
> + s->totals.max_vhost_qfill += cum_ms->max_vhost_qfill;
> + cum_ms = history_next(&s->milliseconds);
> + cum_ms->timestamp = now;
> + }
> + s->next_check_tsc = now_tsc + 10000;
> + }
> }
>
> +/* Functions for formatting the output of commands. */
> +
> +struct pmd_perf_params {
> + int command_type;
> + bool histograms;
> + size_t iter_hist_len;
> + size_t ms_hist_len;
> +};
> +
> +void pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats
> *s,
> + double duration);
> +void pmd_perf_format_histograms(struct ds *str, struct pmd_perf_stats
> *s);
> +void pmd_perf_format_iteration_history(struct ds *str,
> + struct pmd_perf_stats *s,
> + int n_iter);
> +void pmd_perf_format_ms_history(struct ds *str, struct pmd_perf_stats
> *s,
> + int n_ms);
> +
> #ifdef __cplusplus
> }
> #endif
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 48a8ebb..f5931bf 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -49,6 +49,7 @@
> #include "id-pool.h"
> #include "latch.h"
> #include "netdev.h"
> +#include "netdev-provider.h"
> #include "netdev-vport.h"
> #include "netlink.h"
> #include "odp-execute.h"
> @@ -281,6 +282,8 @@ struct dp_netdev {
>
> /* Probability of EMC insertions is a factor of 'emc_insert_min'.*/
> OVS_ALIGNED_VAR(CACHE_LINE_SIZE) atomic_uint32_t emc_insert_min;
> + /* Enable collection of PMD performance metrics. */
> + ATOMIC(bool) pmd_perf_metrics;
>
> /* Protects access to ofproto-dpif-upcall interface during revalidator
> * thread synchronization. */
> @@ -712,6 +715,8 @@ static inline bool emc_entry_alive(struct emc_entry
> *ce);
> static void emc_clear_entry(struct emc_entry *ce);
>
> static void dp_netdev_request_reconfigure(struct dp_netdev *dp);
> +static inline bool
> +pmd_perf_metrics_enabled(const struct dp_netdev_pmd_thread *pmd);
>
> static void
> emc_cache_init(struct emc_cache *flow_cache)
> @@ -795,7 +800,8 @@ get_dp_netdev(const struct dpif *dpif)
> enum pmd_info_type {
> PMD_INFO_SHOW_STATS, /* Show how cpu cycles are spent. */
> PMD_INFO_CLEAR_STATS, /* Set the cycles count to 0. */
> - PMD_INFO_SHOW_RXQ /* Show poll-lists of pmd threads. */
> + PMD_INFO_SHOW_RXQ, /* Show poll lists of pmd threads. */
> + PMD_INFO_PERF_SHOW, /* Show pmd performance details. */
> };
>
> static void
> @@ -886,6 +892,44 @@ pmd_info_show_stats(struct ds *reply,
> stats[PMD_CYCLES_ITER_BUSY], total_packets);
> }
>
> +static void
> +pmd_info_show_perf(struct ds *reply,
> + struct dp_netdev_pmd_thread *pmd,
> + struct pmd_perf_params *par)
> +{
> + if (pmd->core_id != NON_PMD_CORE_ID) {
> + char *time_str =
> + xastrftime_msec("%H:%M:%S.###", time_wall_msec(), true);
> + long long now = time_msec();
> + double duration = (now - pmd->perf_stats.start_ms) / 1000.0;
> +
> + ds_put_cstr(reply, "\n");
> + ds_put_format(reply, "Time: %s\n", time_str);
> + ds_put_format(reply, "Measurement duration: %.3f s\n", duration);
> + ds_put_cstr(reply, "\n");
> + format_pmd_thread(reply, pmd);
> + ds_put_cstr(reply, "\n");
> + pmd_perf_format_overall_stats(reply, &pmd->perf_stats, duration);
> + if (pmd_perf_metrics_enabled(pmd)) {
> + if (par->histograms) {
> + ds_put_cstr(reply, "\n");
> + pmd_perf_format_histograms(reply, &pmd->perf_stats);
> + }
> + if (par->iter_hist_len > 0) {
> + ds_put_cstr(reply, "\n");
> + pmd_perf_format_iteration_history(reply, &pmd->perf_stats,
> + par->iter_hist_len);
> + }
> + if (par->ms_hist_len > 0) {
> + ds_put_cstr(reply, "\n");
> + pmd_perf_format_ms_history(reply, &pmd->perf_stats,
> + par->ms_hist_len);
> + }
> + }
> + free(time_str);
> + }
> +}
> +
> static int
> compare_poll_list(const void *a_, const void *b_)
> {
> @@ -1088,9 +1132,15 @@ dpif_netdev_pmd_info(struct unixctl_conn
> *conn, int argc, const char *argv[],
> if (type == PMD_INFO_SHOW_RXQ) {
> pmd_info_show_rxq(&reply, pmd);
> } else if (type == PMD_INFO_CLEAR_STATS) {
> - pmd_perf_stats_clear(&pmd->perf_stats);
> + if (pmd->core_id == NON_PMD_CORE_ID) {
> + non_pmd_perf_stats_clear(&pmd->perf_stats);
> + } else {
> + pmd_perf_stats_clear(&pmd->perf_stats);
> + }
> } else if (type == PMD_INFO_SHOW_STATS) {
> pmd_info_show_stats(&reply, pmd);
> + } else if (type == PMD_INFO_PERF_SHOW) {
> + pmd_info_show_perf(&reply, pmd, (struct pmd_perf_params
> *)aux);
> }
> }
> free(pmd_list);
> @@ -1100,6 +1150,48 @@ dpif_netdev_pmd_info(struct unixctl_conn
> *conn, int argc, const char *argv[],
> unixctl_command_reply(conn, ds_cstr(&reply));
> ds_destroy(&reply);
> }
> +
> +static void
> +pmd_perf_show_cmd(struct unixctl_conn *conn, int argc,
> + const char *argv[],
> + void *aux OVS_UNUSED)
> +{
> + struct pmd_perf_params par;
> + long int it_hist = 0, ms_hist = 0;
> + par.histograms = true;
> +
> + while (argc > 1) {
> + if (!strcmp(argv[1], "-nh")) {
> + par.histograms = false;
> + argc -= 1;
> + argv += 1;
> + } else if (!strcmp(argv[1], "-it") && argc > 2) {
> + it_hist = strtol(argv[2], NULL, 10);
> + if (it_hist < 0) {
> + it_hist = 0;
> + } else if (it_hist > HISTORY_LEN) {
> + it_hist = HISTORY_LEN;
> + }
> + argc -= 2;
> + argv += 2;
> + } else if (!strcmp(argv[1], "-ms") && argc > 2) {
> + ms_hist = strtol(argv[2], NULL, 10);
> + if (ms_hist < 0) {
> + ms_hist = 0;
> + } else if (ms_hist > HISTORY_LEN) {
> + ms_hist = HISTORY_LEN;
> + }
> + argc -= 2;
> + argv += 2;
> + } else {
> + break;
> + }
> + }
> + par.iter_hist_len = it_hist;
> + par.ms_hist_len = ms_hist;
> + par.command_type = PMD_INFO_PERF_SHOW;
> + dpif_netdev_pmd_info(conn, argc, argv, &par);
> +}
>
> static int
> dpif_netdev_init(void)
> @@ -1117,6 +1209,12 @@ dpif_netdev_init(void)
> unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core]
> [dp]",
> 0, 3, dpif_netdev_pmd_info,
> (void *)&poll_aux);
> + unixctl_command_register("dpif-netdev/pmd-perf-show",
> + "[-nh] [-it iter-history-len]"
> + " [-ms ms-history-len]"
> + " [-pmd core | dp]",
> + 0, 7, pmd_perf_show_cmd,
> + NULL);
> unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
> 0, 1, dpif_netdev_pmd_rebalance,
> NULL);
> @@ -3003,6 +3101,18 @@ dpif_netdev_set_config(struct dpif *dpif, const
> struct smap *other_config)
> }
> }
>
> + bool perf_enabled = smap_get_bool(other_config, "pmd-perf-metrics",
> false);
> + bool cur_perf_enabled;
> + atomic_read_relaxed(&dp->pmd_perf_metrics, &cur_perf_enabled);
> + if (perf_enabled != cur_perf_enabled) {
> + atomic_store_relaxed(&dp->pmd_perf_metrics, perf_enabled);
> + if (perf_enabled) {
> + VLOG_INFO("PMD performance metrics collection enabled");
> + } else {
> + VLOG_INFO("PMD performance metrics collection disabled");
> + }
> + }
> +
> return 0;
> }
>
> @@ -3139,6 +3249,20 @@ dp_netdev_rxq_set_cycles(struct dp_netdev_rxq
> *rx,
> atomic_store_relaxed(&rx->cycles[type], cycles);
> }
>
> +static inline bool
> +pmd_perf_metrics_enabled(const struct dp_netdev_pmd_thread *pmd)
> +{
> + /* If stores and reads of 64-bit integers are not atomic, the
> + * full PMD performance metrics are not available as locked
> + * access to 64 bit integers would be prohibitively expensive. */
> + if (sizeof(uint64_t) > sizeof(void *)) {
> + return false;
> + }
> + bool pmd_perf_enabled;
> + atomic_read_relaxed(&pmd->dp->pmd_perf_metrics,
> &pmd_perf_enabled);
> + return pmd_perf_enabled;
> +}
> +
> static void
> dp_netdev_rxq_add_cycles(struct dp_netdev_rxq *rx,
> enum rxq_cycles_counter_type type,
> @@ -3247,10 +3371,11 @@ dp_netdev_process_rxq_port(struct
> dp_netdev_pmd_thread *pmd,
> struct dp_netdev_rxq *rxq,
> odp_port_t port_no)
> {
> + struct pmd_perf_stats *s = &pmd->perf_stats;
> struct dp_packet_batch batch;
> struct cycle_timer timer;
> int error;
> - int batch_cnt = 0, output_cnt = 0;
> + int batch_cnt = 0;
> uint64_t cycles;
>
> /* Measure duration for polling and processing rx burst. */
> @@ -3264,15 +3389,34 @@ dp_netdev_process_rxq_port(struct
> dp_netdev_pmd_thread *pmd,
> /* At least one packet received. */
> *recirc_depth_get() = 0;
> pmd_thread_ctx_time_update(pmd);
> -
> batch_cnt = batch.count;
> + if (pmd_perf_metrics_enabled(pmd)) {
> + /* Update batch histogram. */
> + s->current.batches++;
> + histogram_add_sample(&s->pkts_per_batch, batch_cnt);
> + /* Update the maximum Rx queue fill level. */
> + int dev_type = netdev_dpdk_get_type(
> + netdev_rxq_get_netdev(rxq->rx));
> + if (dev_type == DPDK_DEV_VHOST) {
> + /* Check queue fill level for vhostuser ports. */
> + uint32_t qfill = batch_cnt;
> + if (OVS_UNLIKELY(batch_cnt == NETDEV_MAX_BURST)) {
> + /* Likely more packets in rxq. */
> + qfill += netdev_rxq_length(rxq->rx);
> + }
> + if (qfill > s->current.max_vhost_qfill) {
> + s->current.max_vhost_qfill = qfill;
> + }
> + }
> + }
> + /* Process packet batch. */
> dp_netdev_input(pmd, &batch, port_no);
>
> /* Assign processing cycles to rx queue. */
> cycles = cycle_timer_stop(&pmd->perf_stats, &timer);
> dp_netdev_rxq_add_cycles(rxq, RXQ_CYCLES_PROC_CURR, cycles);
>
> - output_cnt = dp_netdev_pmd_flush_output_packets(pmd, false);
> + dp_netdev_pmd_flush_output_packets(pmd, false);
> } else {
> /* Discard cycles. */
> cycle_timer_stop(&pmd->perf_stats, &timer);
> @@ -3286,7 +3430,7 @@ dp_netdev_process_rxq_port(struct
> dp_netdev_pmd_thread *pmd,
>
> pmd->ctx.last_rxq = NULL;
>
> - return batch_cnt + output_cnt;
> + return batch_cnt;
> }
>
> static struct tx_port *
> @@ -4119,22 +4263,23 @@ reload:
>
> cycles_counter_update(s);
> for (;;) {
> - uint64_t iter_packets = 0;
> + uint64_t rx_packets = 0, tx_packets = 0;
>
> pmd_perf_start_iteration(s);
> +
> for (i = 0; i < poll_cnt; i++) {
> process_packets =
> dp_netdev_process_rxq_port(pmd, poll_list[i].rxq,
> poll_list[i].port_no);
> - iter_packets += process_packets;
> + rx_packets += process_packets;
> }
>
> - if (!iter_packets) {
> + if (!rx_packets) {
> /* We didn't receive anything in the process loop.
> * Check if we need to send something.
> * There was no time updates on current iteration. */
> pmd_thread_ctx_time_update(pmd);
> - iter_packets += dp_netdev_pmd_flush_output_packets(pmd, false);
> + tx_packets = dp_netdev_pmd_flush_output_packets(pmd, false);
> }
>
> if (lc++ > 1024) {
> @@ -4153,7 +4298,8 @@ reload:
> break;
> }
> }
> - pmd_perf_end_iteration(s, iter_packets);
> + pmd_perf_end_iteration(s, rx_packets, tx_packets,
> + pmd_perf_metrics_enabled(pmd));
> }
>
> poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);
> @@ -5050,6 +5196,7 @@ handle_packet_upcall(struct
> dp_netdev_pmd_thread *pmd,
> struct match match;
> ovs_u128 ufid;
> int error;
> + uint64_t cycles = cycles_counter_update(&pmd->perf_stats);
>
> match.tun_md.valid = false;
> miniflow_expand(&key->mf, &match.flow);
> @@ -5103,6 +5250,14 @@ handle_packet_upcall(struct
> dp_netdev_pmd_thread *pmd,
> ovs_mutex_unlock(&pmd->flow_mutex);
> emc_probabilistic_insert(pmd, key, netdev_flow);
> }
> + if (pmd_perf_metrics_enabled(pmd)) {
> + /* Update upcall stats. */
> + cycles = cycles_counter_update(&pmd->perf_stats) - cycles;
> + struct pmd_perf_stats *s = &pmd->perf_stats;
> + s->current.upcalls++;
> + s->current.upcall_cycles += cycles;
> + histogram_add_sample(&s->cycles_per_upcall, cycles);
> + }
> return error;
> }
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 4200556..7a8fdc2 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -36,6 +36,7 @@
> #include <rte_mbuf.h>
> #include <rte_meter.h>
> #include <rte_pci.h>
> +#include <rte_version.h>
> #include <rte_vhost.h>
> #include <rte_version.h>
>
> @@ -188,11 +189,6 @@ enum { DPDK_RING_SIZE = 256 };
> BUILD_ASSERT_DECL(IS_POW2(DPDK_RING_SIZE));
> enum { DRAIN_TSC = 200000ULL };
>
> -enum dpdk_dev_type {
> - DPDK_DEV_ETH = 0,
> - DPDK_DEV_VHOST = 1,
> -};
> -
> /* Quality of Service */
>
> /* An instance of a QoS configuration. Always associated with a particular
> @@ -843,6 +839,13 @@ netdev_dpdk_cast(const struct netdev *netdev)
> return CONTAINER_OF(netdev, struct netdev_dpdk, up);
> }
>
> +enum dpdk_dev_type
> +netdev_dpdk_get_type(const struct netdev *netdev)
> +{
> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> + return dev->type;
> +}
> +
> static struct netdev *
> netdev_dpdk_alloc(void)
> {
> diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
> index b7d02a7..2b357db 100644
> --- a/lib/netdev-dpdk.h
> +++ b/lib/netdev-dpdk.h
> @@ -22,11 +22,18 @@
> #include "openvswitch/compiler.h"
>
> struct dp_packet;
> +struct netdev;
> +
> +enum dpdk_dev_type {
> + DPDK_DEV_ETH = 0,
> + DPDK_DEV_VHOST = 1,
> +};
>
> #ifdef DPDK_NETDEV
>
> void netdev_dpdk_register(void);
> void free_dpdk_buf(struct dp_packet *);
> +enum dpdk_dev_type netdev_dpdk_get_type(const struct netdev
> *netdev);
>
> #else
>
> @@ -41,6 +48,13 @@ free_dpdk_buf(struct dp_packet *buf OVS_UNUSED)
> /* Nothing */
> }
>
> +static inline enum dpdk_dev_type
> +netdev_dpdk_get_type(const struct netdev *netdev OVS_UNUSED)
> +{
> + /* Nothing to do. Return value zero to make compiler happy. */
> + return DPDK_DEV_ETH;
> +}
> +
> #endif
>
> #endif /* netdev-dpdk.h */
> diff --git a/lib/netdev-dpif-unixctl.man b/lib/netdev-dpif-unixctl.man
> new file mode 100644
> index 0000000..53f4c51
> --- /dev/null
> +++ b/lib/netdev-dpif-unixctl.man
> @@ -0,0 +1,113 @@
> +.SS "DPIF-NETDEV COMMANDS"
> +These commands are used to expose internal information (mostly statistics)
> +about the "dpif-netdev" userspace datapath. If there is only one datapath
> +(as is often the case, unless \fBdpctl/\fR commands are used), the \fIdp\fR
> +argument can be omitted. By default the commands present data for all
> pmd
> +threads in the datapath. By specifying the "-pmd Core" option one can filter
> +the output for a single pmd in the datapath.
> +.
> +.IP "\fBdpif-netdev/pmd-stats-show\fR [\fB-pmd\fR \fIcore\fR] [\fIdp\fR]"
> +Shows performance statistics for one or all pmd threads of the datapath
> +\fIdp\fR. The special thread "main" sums up the statistics of every non pmd
> +thread.
> +
> +The sum of "emc hits", "masked hits" and "miss" is the number of
> +packet lookups performed by the datapath. Beware that a recirculated
> packet
> +experiences one additional lookup per recirculation, so there may be
> +more lookups than forwarded packets in the datapath.
> +
> +Cycles are counted using the TSC or similar facilities (when available on
> +the platform). The duration of one cycle depends on the processing
> platform.
> +
> +"idle cycles" refers to cycles spent in PMD iterations not forwarding any
> +any packets. "processing cycles" refers to cycles spent in PMD iterations
> +forwarding at least one packet, including the cost for polling, processing and
> +transmitting said packets.
> +
> +To reset these counters use \fBdpif-netdev/pmd-stats-clear\fR.
> +.
> +.IP "\fBdpif-netdev/pmd-perf-show\fR [\fB-nh\fR] [\fB-it\fR \fIiter_len\fR]
> \
> +[\fB-ms\fR \fIms_len\fR] [\fB-pmd\fR \fIcore\fR] [\fIdp\fR]"
> +Shows detailed performance metrics for one or all pmds threads of the
> +user space datapath.
> +
> +The collection of detailed statistics can be controlled by a new
> +configuration parameter "other_config:pmd-perf-metrics". By default it
> +is disabled. The run-time overhead, when enabled, is in the order of 1%.
> +
> +The covered metrics per iteration are:
> + - used cycles
> + - forwared packets
> + - number of rx batches
> + - packets/rx batch
> + - max. vhostuser queue fill level
> + - number of upcalls
> + - cycles spent in upcalls
> +
> +This raw recorded data is used threefold:
> +
> +1. In histograms for each of the following metrics:
> + - cycles/iteration (logarithmic)
> + - packets/iteration (logarithmic)
> + - cycles/packet
> + - packets/batch
> + - max. vhostuser qlen (logarithmic)
> + - upcalls
> + - cycles/upcall (logarithmic)
> + The histograms bins are divided linear or logarithmic.
> +
> +2. A cyclic history of the above metrics for 1024 iterations
> +
> +3. A cyclic history of the cummulative/average values per millisecond
> + wall clock for the last 1024 milliseconds:
> + - number of iterations
> + - avg. cycles/iteration
> + - packets (Kpps)
> + - avg. packets/batch
> + - avg. max vhost qlen
> + - upcalls
> + - avg. cycles/upcall
> +
> +The command options are
> +
> + \fB-nh\fR: Suppress the histograms
> + \fB-it\fR \fIiter_len\fR: Display the last iter_len iteration stats
> + \fB-ms\fR \fIms_len\fR: Display the last ms_len millisecond stats
> +
> +The output always contains the following global PMD statistics:
> +
> +Time: 15:24:55.270 .br
> +Measurement duration: 1.008 s
> +
> +pmd thread numa_id 0 core_id 1:
> +
> + Cycles: 2419034712 (2.40 GHz)
> + Iterations: 572817 (1.76 us/it)
> + - idle: 486808 (15.9 % cycles)
> + - busy: 86009 (84.1 % cycles)
> + Packets: 2399607 (2381 Kpps, 848 cycles/pkt)
> + Datapath passes: 3599415 (1.50 passes/pkt)
> + - EMC hits: 336472 ( 9.3 %)
> + - Megaflow hits: 3262943 (90.7 %, 1.00 subtbl lookups/hit)
> + - Upcalls: 0 ( 0.0 %, 0.0 us/upcall)
> + - Lost upcalls: 0 ( 0.0 %)
> +
> +Here "Packets" actually reflects the number of packets forwarded by the
> +datapath. "Datapath passes" matches the number of packet lookups as
> +reported by the \fBdpif-netdev/pmd-stats-show\fR command.
> +
> +To reset the counters and start a new measurement use
> +\fBdpif-netdev/pmd-stats-clear\fR.
> +.
> +.IP "\fBdpif-netdev/pmd-stats-clear\fR [\fIdp\fR]"
> +Resets to zero the per pmd thread performance numbers shown by the
> +\fBdpif-netdev/pmd-stats-show\fR and \Bdpif-netdev/pmd-perf-show \fR
> commands.
> +It will NOT reset datapath or bridge statistics, only the values shown by
> +the above commands.
> +.
> +.IP "\fBdpif-netdev/pmd-rxq-show\fR [\fB-pmd\fR \fIcore\fR] [\fIdp\fR]"
> +For one or all pmd threads of the datapath \fIdp\fR show the list of queue-
> ids
> +with port names, which this thread polls.
> +.
> +.IP "\fBdpif-netdev/pmd-rxq-rebalance\fR [\fIdp\fR]"
> +Reassigns rxqs to pmds in the datapath \fIdp\fR based on their current
> usage.
> diff --git a/manpages.mk b/manpages.mk
> index 351155f..9af2fa8 100644
> --- a/manpages.mk
> +++ b/manpages.mk
> @@ -256,6 +256,7 @@ vswitchd/ovs-vswitchd.8: \
> lib/dpctl.man \
> lib/memory-unixctl.man \
> lib/netdev-dpdk-unixctl.man \
> + lib/netdev-dpif-unixctl.man \
> lib/service.man \
> lib/ssl-bootstrap.man \
> lib/ssl.man \
> @@ -272,6 +273,7 @@ lib/daemon.man:
> lib/dpctl.man:
> lib/memory-unixctl.man:
> lib/netdev-dpdk-unixctl.man:
> +lib/netdev-dpif-unixctl.man:
> lib/service.man:
> lib/ssl-bootstrap.man:
> lib/ssl.man:
> diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in
> index 80e5f53..7e4714a 100644
> --- a/vswitchd/ovs-vswitchd.8.in
> +++ b/vswitchd/ovs-vswitchd.8.in
> @@ -256,32 +256,7 @@ type).
> ..
> .so lib/dpctl.man
> .
> -.SS "DPIF-NETDEV COMMANDS"
> -These commands are used to expose internal information (mostly statistics)
> -about the ``dpif-netdev'' userspace datapath. If there is only one datapath
> -(as is often the case, unless \fBdpctl/\fR commands are used), the \fIdp\fR
> -argument can be omitted.
> -.IP "\fBdpif-netdev/pmd-stats-show\fR [\fIdp\fR]"
> -Shows performance statistics for each pmd thread of the datapath \fIdp\fR.
> -The special thread ``main'' sums up the statistics of every non pmd thread.
> -The sum of ``emc hits'', ``masked hits'' and ``miss'' is the number of
> -packets received by the datapath. Cycles are counted using the TSC or
> similar
> -facilities (when available on the platform). To reset these counters use
> -\fBdpif-netdev/pmd-stats-clear\fR. The duration of one cycle depends on
> the
> -measuring infrastructure. ``idle cycles'' refers to cycles spent polling
> -devices but not receiving any packets. ``processing cycles'' refers to cycles
> -spent polling devices and successfully receiving packets, plus the cycles
> -spent processing said packets.
> -.IP "\fBdpif-netdev/pmd-stats-clear\fR [\fIdp\fR]"
> -Resets to zero the per pmd thread performance numbers shown by the
> -\fBdpif-netdev/pmd-stats-show\fR command. It will NOT reset datapath or
> -bridge statistics, only the values shown by the above command.
> -.IP "\fBdpif-netdev/pmd-rxq-show\fR [\fIdp\fR]"
> -For each pmd thread of the datapath \fIdp\fR shows list of queue-ids with
> -port names, which this thread polls.
> -.IP "\fBdpif-netdev/pmd-rxq-rebalance\fR [\fIdp\fR]"
> -Reassigns rxqs to pmds in the datapath \fIdp\fR based on their current
> usage.
> -.
> +.so lib/netdev-dpif-unixctl.man
> .so lib/netdev-dpdk-unixctl.man
> .so ofproto/ofproto-dpif-unixctl.man
> .so ofproto/ofproto-unixctl.man
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 61fb7b1..3aa8e8e 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -375,6 +375,18 @@
> </p>
> </column>
>
> + <column name="other_config" key="pmd-perf-metrics"
> + type='{"type": "boolean"}'>
> + <p>
> + Enables recording of detailed PMD performance metrics for analysis
> + and trouble-shooting. This can have a performance impact in the
> + order of 1%.
> + </p>
> + <p>
> + Defaults to false but can be changed at any time.
> + </p>
> + </column>
> +
> <column name="other_config" key="n-handler-threads"
> type='{"type": "integer", "minInteger": 1}'>
> <p>
> --
> 1.9.1
More information about the dev
mailing list