[ovs-dev] [PATCH v7 3/3] dpif-netdev: Detection and logging of suspicious PMD iterations

Jan Scheurich jan.scheurich at ericsson.com
Fri Jan 26 11:46:36 UTC 2018


Thanks for the review. Issues addressed in v8.
/Jan

> -----Original Message-----
> From: O Mahony, Billy [mailto:billy.o.mahony at intel.com]
> Sent: Friday, 19 January, 2018 12:07
> To: Jan Scheurich <jan.scheurich at ericsson.com>; dev at openvswitch.org
> Cc: ktraynor at redhat.com; Stokes, Ian <ian.stokes at intel.com>; i.maximets at samsung.com
> Subject: RE: [PATCH v7 3/3] dpif-netdev: Detection and logging of suspicious PMD iterations
> 
> Hi All,
> 
> I'm going to actually try out the code next. But for now a few comments on the code.
> 
> > -----Original Message-----
> > From: Jan Scheurich [mailto:jan.scheurich at ericsson.com]
> > Sent: Tuesday, January 16, 2018 1:51 AM
> > To: dev at openvswitch.org
> > Cc: ktraynor at redhat.com; Stokes, Ian <ian.stokes at intel.com>;
> > i.maximets at samsung.com; O Mahony, Billy <billy.o.mahony at intel.com>;
> > Jan Scheurich <jan.scheurich at ericsson.com>
> > Subject: [PATCH v7 3/3] dpif-netdev: Detection and logging of suspicious
> > PMD iterations
> >
> > This patch enhances dpif-netdev-perf to detect iterations with suspicious
> > statistics according to the following criteria:
> >
> > - iteration lasts longer than US_THR microseconds (default 250).
> >   This can be used to capture events where a PMD is blocked or
> >   interrupted for such a period of time that there is a risk for
> >   dropped packets on any of its Rx queues.
> >
> > - max vhost qlen exceeds a threshold Q_THR (default 128). This can
> >   be used to infer virtio queue overruns and dropped packets inside
> >   a VM, which are not visible in OVS otherwise.
> >
> > Such suspicious iterations can be logged together with their iteration
> > statistics to be able to correlate them to packet drop or other events outside
> > OVS.
> >
> > A new command is introduced to enable/disable logging at run-time and to
> > adjust the above thresholds for suspicious iterations:
> >
> > ovs-appctl dpif-netdev/pmd-perf-log-set on | off
> >     [-b before] [-a after] [-us usec] [-q qlen]
> >
> > Turn logging on or off at run-time (on|off).
> >
> > -b before:  The number of iterations before the suspicious iteration to
> >             be logged (default 5).
> > -a after:   The number of iterations after the suspicious iteration to
> >             be logged (default 5).
> > -q qlen:    Suspicious vhost queue fill level threshold. Increase this
> >             to 512 if the Qemu supports 1024 virtio queue length.
> >             (default 128).
> > -us usec:   change the duration threshold for a suspicious iteration
> >             (default 250 us).
> >
> > If more than 100 iterations before or after a suspicious iteration have been
> > looged once, OVS falls back to the safe default values (5/5) to avoid that
> > logging itself causes continuos further logging.
> >
> > Signed-off-by: Jan Scheurich <jan.scheurich at ericsson.com>
> > ---
> >  NEWS                        |   2 +
> >  lib/dpif-netdev-perf.c      | 142
> > ++++++++++++++++++++++++++++++++++++++++++++
> >  lib/dpif-netdev-perf.h      |  40 ++++++++++++-
> >  lib/dpif-netdev.c           |   7 ++-
> >  lib/netdev-dpif-unixctl.man |  47 ++++++++++++++-
> >  5 files changed, 233 insertions(+), 5 deletions(-)
> >
> > diff --git a/NEWS b/NEWS
> > index 743528e..7d40374 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -47,6 +47,8 @@ Post-v2.8.0
> >       * Commands ovs-appctl dpif-netdev/pmd-*-show can now work on a
> > single PMD
> >       * Detailed PMD performance metrics available with new command
> >           ovs-appctl dpif-netdev/pmd-perf-show
> > +     * Supervision of PMD performance metrics and logging of suspicious
> > +       iterations
> >     - vswitchd:
> >       * Datapath IDs may now be specified as 0x1 (etc.) instead of 16 digits.
> >       * Configuring a controller, or unconfiguring all controllers, now deletes diff
> > --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c index
> > e0ef15d..259a6c8 100644
> > --- a/lib/dpif-netdev-perf.c
> > +++ b/lib/dpif-netdev-perf.c
> > @@ -24,6 +24,23 @@
> >
> >  VLOG_DEFINE_THIS_MODULE(pmd_perf);
> >
> > +#define ITER_US_THRESHOLD 250   /* Warning threshold for iteration
> > duration
> > +                                   in microseconds. */
> > +#define VHOST_QUEUE_FULL 128    /* Size of the virtio TX queue. */
> > +#define LOG_IT_BEFORE 5         /* Number of iteration to log before
> > +                                   suspicious iteration. */
> > +#define LOG_IT_AFTER 5          /* Number of iteration to log after
> > +                                   suspicious iteration. */
> [[BO'M]] typo 'Number of iterations...'

Fixed

> > +
> > +bool log_on = false;
> > +static uint32_t log_it_before = LOG_IT_BEFORE; static uint32_t
> > +log_it_after = LOG_IT_AFTER; static uint32_t log_us_thr =
> > +ITER_US_THRESHOLD; uint32_t log_q_thr = VHOST_QUEUE_FULL; uint64_t
> > +iter_cycle_threshold;
> > +
> > +static struct vlog_rate_limit latency_rl = VLOG_RATE_LIMIT_INIT(600,
> > +600);
> > +
> >  #ifdef DPDK_NETDEV
> >  static uint64_t
> >  get_tsc_hz(void)
> > @@ -124,6 +141,8 @@ pmd_perf_stats_init(struct pmd_perf_stats *s)
> >      histogram_walls_set_log(&s->cycles_per_upcall, 1000, 1000000);
> >      histogram_walls_set_log(&s->max_vhost_qfill, 0, 512);
> >      s->start_ms = time_msec();
> > +    s->log_begin_it = UINT64_MAX;
> > +    s->log_end_it = UINT64_MAX;
> >  }
> >
> >  void
> > @@ -362,6 +381,8 @@ pmd_perf_stats_clear__(struct pmd_perf_stats *s)
> >      history_init(&s->milliseconds);
> >      s->start_ms = time_msec();
> >      s->milliseconds.sample[0].timestamp = s->start_ms;
> > +    s->log_begin_it = UINT64_MAX;
> > +    s->log_end_it = UINT64_MAX;
> >      /* Clearing finished. */
> >      s->clear = false;
> >  }
> > @@ -387,3 +408,124 @@ non_pmd_perf_stats_clear(struct pmd_perf_stats
> > *s)  {
> >      pmd_perf_stats_clear__(s);
> >  }
> > +
> > +void
> > +pmd_perf_log_suspicious_iteration(struct pmd_perf_stats *s,
> > +                                     uint64_t cycles,
> > +                                     char *reason) {
> > +    VLOG_WARN_RL(&latency_rl,
> > +                 "Suspicious iteration (%s): tsc=%"PRIu64
> > +                 " duration=%"PRIu64" us\n",
> > +                 reason, s->current.timestamp,
> > +                 (1000000L * cycles) / get_tsc_hz());
> > +    if (log_it_before + log_it_after > 0) {
> > +        if (s->log_begin_it == UINT64_MAX) {
> > +            s->log_begin_it =
> > +                    (s->iterations.idx + HISTORY_LEN - log_it_before)
> > +                        % HISTORY_LEN;
> > +        }
> > +        s->log_end_it =
> > +                (s->iterations.idx + log_it_after) % HISTORY_LEN;
> > +    }
> > +}
> > +
> > +void
> > +pmd_perf_log_iteration_neighborhood(struct pmd_perf_stats *s) {
> > +    struct ds log = DS_EMPTY_INITIALIZER;
> > +    pmd_perf_format_iteration_history(&log, s,
> > +            (s->log_end_it + HISTORY_LEN - s->log_begin_it) % HISTORY_LEN);
> > +    VLOG_WARN_RL(&latency_rl,
> > +                 "Neighborhood of suspicious iteration:\n"
> > +                 "%s", ds_cstr(&log));
> > +    ds_destroy(&log);
> > +    s->log_end_it = s->log_begin_it = UINT64_MAX;
> > +    if (log_it_before > 100 || log_it_after > 100) {
> > +        /* Reset to safe default values to avoid disturbance. */
> > +        log_it_before = LOG_IT_BEFORE;
> > +        log_it_after = LOG_IT_AFTER;
> > +    }
> > +}
> > +
> > +void
> > +pmd_perf_log_set_cmd(struct unixctl_conn *conn,
> > +                 int argc, const char *argv[],
> > +                 void *aux OVS_UNUSED)
> > +{
> > +    int it_before, it_after, us_thr, q_thr;
> > +    bool on;
> > +    bool usage = false;
> > +
> > +    on = log_on;
> > +    it_before = log_it_before;
> > +    it_after = log_it_after;
> > +    q_thr = log_q_thr;
> > +    us_thr = log_us_thr;
> > +
> > +    while (argc > 1) {
> > +        if (!strcmp(argv[1], "on")) {
> > +            on = true;
> > +            argc--;
> > +            argv++;
> > +        } else if (!strcmp(argv[1], "off")) {
> > +            on = false;
> > +            argc--;
> > +            argv++;
> > +        } else if (!strcmp(argv[1], "-a")) {
> > +            it_after = strtol(argv[2], NULL, 10);
> > +            if (it_after < 0) {
> > +                it_after = 0;
> > +            } else if (it_before + it_after > HISTORY_LEN-1) {
> > +                it_after = HISTORY_LEN-1 - it_before;
> > +            }
> > +            argc -= 2;
> > +            argv += 2;
> > +        } else if (!strcmp(argv[1], "-b")) {
> > +            it_before = strtol(argv[2], NULL, 10);
> > +            if (it_before < 0) {
> > +                it_before = 0;
> > +            } else if (it_before > HISTORY_LEN-10) {
> > +                it_before = HISTORY_LEN-10;
> > +            }
> > +            argc -= 2;
> > +            argv += 2;
> > +        } else if (!strcmp(argv[1], "-q")) {
> > +            q_thr = strtol(argv[2], NULL, 10);
> > +            if (q_thr < 0) {
> > +                q_thr = 0;
> > +            }
> > +            argc -= 2;
> > +            argv += 2;
> > +        } else if (!strcmp(argv[1], "-us")) {
> > +            us_thr = strtol(argv[2], NULL, 10);
> > +            if (us_thr < 0) {
> > +                us_thr = 0;
> > +            }
> > +            argc -= 2;
> > +            argv += 2;
> > +        } else {
> > +            usage = true;
> > +            break;
> > +        }
> > +    }
> > +
> > +    if (usage) {
> > +        unixctl_command_reply_error(conn,
> > +                "Usage: ovs-appctl dpif-netdev/pmd-perf-log-set "
> > +                "[on|off] [-b before] [-a after] [-us usec] [-q qlen]");
> > +        return;
> > +    }
> > +
> > +    VLOG_INFO("pmd-perf-log-set: %s, before=%d, after=%d, us_thr=%d,"
> > +              " q_thr=%d\n",
> > +              on ? "on" : "off", it_before, it_after, us_thr, q_thr);
> > +    log_on = on;
> > +    log_it_before = it_before;
> > +    log_it_after = it_after;
> > +    log_q_thr = q_thr;
> > +    log_us_thr = us_thr;
> > +    iter_cycle_threshold = (log_us_thr * get_tsc_hz()) / 1000000L;
> > +
> > +    unixctl_command_reply(conn, "");
> > +}
> > diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h index
> > 7a89c40..aa55c68 100644
> > --- a/lib/dpif-netdev-perf.h
> > +++ b/lib/dpif-netdev-perf.h
> > @@ -160,8 +160,12 @@ struct pmd_perf_stats {
> >      struct histogram max_vhost_qfill;
> >      /* Iteration history buffer. */
> >      struct history iterations;
> > -    /* Millisecond hitory buffer. */
> > +    /* Millisecond history buffer. */
> >      struct history milliseconds;
> > +    /* Start of iteration range to log. */
> > +    uint64_t log_begin_it;
> > +    /* End of iteration range to log. */
> > +    uint64_t log_end_it;
> >  };
> >
> >  /* Support for accurate timing of PMD execution on TSC clock cycle level.
> > @@ -322,6 +326,17 @@ history_store(struct history *h, struct iter_stats *is)
> >      return history_next(h);
> >  }
> >
> > +/* Data and function related to logging of suspicious iterations. */
> > +
> > +extern bool log_on;
> > +extern uint32_t log_q_thr;
> > +extern uint64_t iter_cycle_threshold;
> > +
> > +void pmd_perf_log_suspicious_iteration(struct pmd_perf_stats *s,
> > +                                       uint64_t cycles,
> > +                                       char *reason); void
> > +pmd_perf_log_iteration_neighborhood(struct pmd_perf_stats *s);
> > +
> >  /* Functions recording PMD metrics per iteration. */
> >
> >  static inline void
> > @@ -351,6 +366,7 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s,
> > int rx_packets,
> >      uint64_t now_tsc = cycles_counter_update(s);
> >      struct iter_stats *cum_ms;
> >      uint64_t cycles, cycles_per_pkt = 0;
> > +    char *reason = NULL;
> >
> >      if (OVS_UNLIKELY(s->current.timestamp == 0)) {
> >          /* Stats were cleared during the ongoing iteration. */ @@ -404,6 +420,23
> > @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
> >      cum_ms->batches += s->current.batches;
> >      cum_ms->max_vhost_qfill += s->current.max_vhost_qfill;
> >
> > +    if (log_on) {
> > +        /* Log suspicious iterations. */
> [[BO'M]] It took me a while to realise that of course we need to wait for some iterations after the suspicious iter in order to log details of
> the requested after iterations too.
> Suggest comment something like "Here we decide if this iteration is 'suspicious' and needs to be logged and if so what the range of before
> and after iterations that should be logged along with it."

Have improved the comments.

> > +        if (cycles > iter_cycle_threshold) {
> > +            reason = "Excessive total cycles";
> > +        } else if (s->current.max_vhost_qfill >= log_q_thr) {
> > +            reason = "Vhost RX queue full";
> > +        }
> > +        if (OVS_UNLIKELY(reason)) {
> > +            pmd_perf_log_suspicious_iteration(s, cycles, reason);
> > +        }
> > +
> > +        /* Log iteration stats interval when required. */
> [[BO'M]] Suggest something like " Suspicous iterations are not logged immediately - we wait until the required number of iterations
> 'after' iterations are available and log everything together"

See above.

> > +        if (OVS_UNLIKELY(s->log_end_it == s->iterations.idx)) {
> > +            pmd_perf_log_iteration_neighborhood(s);
> > +        }
> > +    }
> > +
> >      /* Store in iteration history. */
> >      history_store(&s->iterations, &s->current);
> >      if (now_tsc > s->next_check_tsc) {
> > @@ -426,7 +459,7 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s,
> > int rx_packets,
> >      }
> >  }
> >
> > -/* Functions for formatting the output of commands. */
> > +/* Formatting the output of commands. */
> >
> >  struct pmd_perf_params {
> >      int command_type;
> > @@ -443,6 +476,9 @@ void pmd_perf_format_iteration_history(struct ds
> > *str,
> >                                         int n_iter);  void pmd_perf_format_ms_history(struct ds
> > *str, struct pmd_perf_stats *s,
> >                                  int n_ms);
> > +void pmd_perf_log_set_cmd(struct unixctl_conn *conn,
> > +                          int argc, const char *argv[],
> > +                          void *aux OVS_UNUSED);
> >
> >  #ifdef  __cplusplus
> >  }
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index f5931bf..b493216
> > 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -1095,7 +1095,7 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn,
> > int argc, const char *argv[],
> >      ovs_mutex_lock(&dp_netdev_mutex);
> >
> >      while (argc > 1) {
> > -        if (!strcmp(argv[1], "-pmd") && argc >= 3) {
> > +        if (!strcmp(argv[1], "-pmd") && argc > 2) {
> >              if (str_to_uint(argv[2], 10, &core_id)) {
> >                  filter_on_pmd = true;
> >              }
> > @@ -1218,6 +1218,11 @@ dpif_netdev_init(void)
> >      unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
> >                               0, 1, dpif_netdev_pmd_rebalance,
> >                               NULL);
> > +    unixctl_command_register("dpif-netdev/pmd-perf-log-set",
> > +                             "on|off [-b before] [-a after] "
> > +                             "[-us usec] [-q qlen]",
> > +                             0, 10, pmd_perf_log_set_cmd,
> > +                             NULL);
> >      return 0;
> >  }
> >
> > diff --git a/lib/netdev-dpif-unixctl.man b/lib/netdev-dpif-unixctl.man index
> > 53f4c51..82e372f 100644
> > --- a/lib/netdev-dpif-unixctl.man
> > +++ b/lib/netdev-dpif-unixctl.man
> > @@ -70,7 +70,7 @@ This raw recorded data is used threefold:
> >
> >  The command options are
> >
> > -    \fB-nh\fR:                  Suppress the histograms
> > +    \fB-nh\fR:            Suppress the histograms
> >      \fB-it\fR \fIiter_len\fR:   Display the last iter_len iteration stats
> >      \fB-ms\fR \fIms_len\fR:     Display the last ms_len millisecond stats
> >
> > @@ -101,10 +101,53 @@ To reset the counters and start a new
> > measurement use  .
> >  .IP "\fBdpif-netdev/pmd-stats-clear\fR [\fIdp\fR]"
> >  Resets to zero the per pmd thread performance numbers shown by the -
> > \fBdpif-netdev/pmd-stats-show\fR and \Bdpif-netdev/pmd-perf-show \fR
> > commands.
> > +\fBdpif-netdev/pmd-stats-show\fR and \fBdpif-netdev/pmd-perf-show\fR
> > commands.
> >  It will NOT reset datapath or bridge statistics, only the values shown by  the
> > above commands.
> >  .
> > +.IP "\fBdpif-netdev/pmd-perf-log-set\fR \fBon\fR|\fBoff\fR \  [\fB-b\fR
> > +\fIbefore\fR] [\fB-a\fR \fIafter\fR] [\fB-us\fR \fIusec\fR] \
> > +[\fB-q\fR \fIqlen\fR]"
> > +.
> > +The userspace "netdev" datapath is able to supervise the PMD
> > +performance metrics and detect iterations with suspicious statistics
> > +according to the following criteria:
> > +
> > +- The iteration lasts longer than \fIusec\fR microseconds (default 250).
> > +  This can be used to capture events where a PMD is blocked or
> > +  interrupted for such a period of time that there is a risk for
> > +  dropped packets on any of its Rx queues.
> > +
> > +- The max vhost qlen exceeds a threshold \fIqlen\fR (default 128). This
> > +can
> > +  be used to infer virtio queue overruns and dropped packets inside a
> > +VM,
> > +  which are not visible in OVS otherwise.
> > +
> > +Such suspicious iterations can be logged together with their iteration
> > +statistics in the \fBovs-vswitchd.log\fR to be able to correlate them
> > +to packet drop or other events outside OVS.
> > +
> > +The above command enables (\fBon\fR) or disables (\fBoff\fR)
> > +supervision and logging at run-time and can be used to adjust the above
> > +thresholds for detecting suspicious iterations. By default supervision
> > +and logging is disabled.
> > +
> > +The command options are:
> > +
> > +    \fB-b\fR \fIbefore\fR:
> > +        The number of iterations before the suspicious iteration to be logged
> > (default 5).
> > +    \fB-a\fR \fIafter\fR:
> > +        The number of iterations after the suspicious iteration to be logged
> > (default 5).
> > +    \fB-q\fR \fIqlen\fR:
> > +        Suspicious vhost queue fill level threshold. Increase this to 512 if
> > +        the Qemu supports 1024 virtio queue length (default 128).
> > +    \fB-us\fR \fIusec\fR:
> > +        change the duration threshold for a suspicious iteration (default 250 us).
> > +
> > +If more than 100 iterations before or after a suspicious iteration have
> > +been looged once, OVS falls back to the safe default values (5/5) to
> [[BO'M]] typo
> > +avoid that logging itself causes continuos further suspicious
> [[BO'M]] typo

Fixed

> > +iterations and continuous logging.
> > +.
> >  .IP "\fBdpif-netdev/pmd-rxq-show\fR [\fB-pmd\fR \fIcore\fR] [\fIdp\fR]"
> >  For one or all pmd threads of the datapath \fIdp\fR show the list of queue-
> > ids  with port names, which this thread polls.
> > --
> > 1.9.1



More information about the dev mailing list