[ovs-dev] [PATCH v3 1/3] dpif-netdev: Refactor PMD performance into dpif-netdev-perf

O Mahony, Billy billy.o.mahony at intel.com
Fri Dec 8 17:25:13 UTC 2017


Hi Jan,

I had problems applying later patches in this series so just reviewing this one for now. I tried several revisions to apply them.

The second patch ([ovs-dev,v3,2/3] dpif-netdev: Detailed performance stats for PMDs ) fails with 
fatal: patch fragment without header at line 708: @@ -1073,6 +1155,12 @@ dpif_netdev_init(void)

Overall not only is user-visible output is clearer but the code is also more consistent and easier to understand. 

I tested this patch by applying it to: 3728b3b Ben Pfaff 2017-11-20 Merge branch 'dpdk_merge' of https://github.com/istokes/ovs into HEAD

These are the issues I did find:
1. make check #1159 "ofproto-dpif patch ports" consistently fails for me with this patch applied

2. ./utilities/checkpatch.py reports some line length issues:
== Checking "dpif-netdev-perf/ovs-dev-v3-1-3-dpif-netdev-Refactor-PMD-performance-into-dpif-netdev-perf.patch" ==
ERROR: Too many signoffs; are you missing Co-authored-by lines?
WARNING: Line length is >79-characters long
#346 FILE: lib/dpif-netdev.c:544:
        struct ovs_refcount ref_cnt;    /* Every reference must be refcount'ed. */

WARNING: Line has non-spaces leading whitespace
WARNING: Line has trailing whitespace
#347 FILE: lib/dpif-netdev.c:545:


WARNING: Line length is >79-characters long
#349 FILE: lib/dpif-netdev.c:547:
         * XPS disabled for this netdev. All static_tx_qid's are unique and less

WARNING: Line has non-spaces leading whitespace
WARNING: Line has trailing whitespace
#352 FILE: lib/dpif-netdev.c:550:


WARNING: Line length is >79-characters long
WARNING: Line has trailing whitespace
#413 FILE: lib/dpif-netdev.c:610:
    OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct pmd_perf_stats perf_stats;

Lines checked: 976, Warnings: 8, Errors: 1

3. Does the new 'pmd' arg to pmd-stats-show interfere with the existing [dp] arg? 

    sudo ./utilities/ovs-appctl dpif-netdev/pmd-stats-show -pmd 1 netdev at dpif-netdev
    "dpif-netdev/pmd-stats-show" command takes at most 2 arguments
     ovs-appctl: ovs-vswitchd: server returned an error

Otherwise it looks like a really useful patch. And the remainder of the series more so.

Thanks,
Billy. 

> -----Original Message-----
> From: ovs-dev-bounces at openvswitch.org [mailto:ovs-dev-
> bounces at openvswitch.org] On Behalf Of Jan Scheurich
> Sent: Tuesday, November 21, 2017 12:38 AM
> To: 'ovs-dev at openvswitch.org' <ovs-dev at openvswitch.org>
> Subject: [ovs-dev] [PATCH v3 1/3] dpif-netdev: Refactor PMD performance into
> dpif-netdev-perf
> 
> Add module dpif-netdev-perf to host all PMD performance-related data
> structures and functions in dpif-netdev. Refactor the PMD stats handling in dpif-
> netdev and delegate whatever possible into the new module, using clean
> interfaces to shield dpif-netdev from the implementation details. Accordingly,
> the all PMD statistics members are moved from the main struct
> dp_netdev_pmd_thread into a dedicated member of type struct pmd_perf_stats.
> 
> Include Darrel's prior refactoring of PMD stats contained in [PATCH v5,2/3] dpif-
> netdev: Refactor some pmd stats:
> 
> 1. The cycles per packet counts are now based on packets received rather than
> packet passes through the datapath.
> 
> 2. Packet counters are now kept for packets received and packets recirculated.
> These are kept as separate counters for maintainability reasons. The cost of
> incrementing these counters is negligible.  These new counters are also
> displayed to the user.
> 
> 3. A display statistic is added for the average number of datapath passes per
> packet. This should be useful for user debugging and understanding of packet
> processing.
> 
> 4. The user visible 'miss' counter is used for successful upcalls, rather than the
> sum of sucessful and unsuccessful upcalls. Hence, this becomes what user
> historically understands by OVS 'miss upcall'.
> The user display is annotated to make this clear as well.
> 
> 5. The user visible 'lost' counter remains as failed upcalls, but is annotated to
> make it clear what the meaning is.
> 
> 6. The enum pmd_stat_type is annotated to make the usage of the stats
> counters clear.
> 
> 7. The subtable lookup stats is renamed to make it clear that it relates to masked
> lookups.
> 
> 8. The PMD stats test is updated to handle the new user stats of packets
> received, packets recirculated and average number of datapath passes per
> packet.
> 
> On top of that introduce a "-pmd <core>" option to the PMD info commands to
> filter the output for a single PMD.
> 
> Signed-off-by: Jan Scheurich <jan.scheurich at ericsson.com>
> Signed-off-by: Darrell Ball <dlu998 at gmail.com>
> 
> ---
>  lib/automake.mk        |   2 +
>  lib/dpif-netdev-perf.c |  67 +++++++++
>  lib/dpif-netdev-perf.h | 123 ++++++++++++++++
>  lib/dpif-netdev.c      | 371 ++++++++++++++++++++-----------------------------
>  tests/pmd.at           |  22 +--
>  5 files changed, 358 insertions(+), 227 deletions(-)  create mode 100644 lib/dpif-
> netdev-perf.c  create mode 100644 lib/dpif-netdev-perf.h
> 
> diff --git a/lib/automake.mk b/lib/automake.mk index effe5b5..0b8df0e 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -80,6 +80,8 @@ lib_libopenvswitch_la_SOURCES = \
>  	lib/dpdk.h \
>  	lib/dpif-netdev.c \
>  	lib/dpif-netdev.h \
> +	lib/dpif-netdev-perf.c \
> +	lib/dpif-netdev-perf.h \
>  	lib/dpif-provider.h \
>  	lib/dpif.c \
>  	lib/dpif.h \
> diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c new file mode 100644
> index 0000000..6a8f7c4
> --- /dev/null
> +++ b/lib/dpif-netdev-perf.c
> @@ -0,0 +1,67 @@
> +/*
> + * Copyright (c) 2017 Ericsson AB.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include <config.h>
> +
> +#ifdef DPDK_NETDEV
> +#include <rte_cycles.h>
> +#endif
> +
> +#include "openvswitch/dynamic-string.h"
> +#include "openvswitch/vlog.h"
> +#include "dpif-netdev-perf.h"
> +#include "timeval.h"
> +
> +VLOG_DEFINE_THIS_MODULE(pmd_perf);
> +
> +void
> +pmd_perf_stats_init(struct pmd_perf_stats *s) {
> +    memset(s, 0 , sizeof(*s));
> +    s->start_ms = time_msec();
> +}
> +
> +void
> +pmd_perf_read_counters(struct pmd_perf_stats *s,
> +                       uint64_t stats[PMD_N_STATS]) {
> +    uint64_t val;
> +
> +    /* These loops subtracts reference values ('*_zero') from the counters.
> +     * Since loads and stores are relaxed, it might be possible for a '*_zero'
> +     * value to be more recent than the current value we're reading from the
> +     * counter.  This is not a big problem, since these numbers are not
> +     * supposed to be too accurate, but we should at least make sure that
> +     * the result is not negative. */
> +    for (int i = 0; i < PMD_N_STATS; i++) {
> +        atomic_read_relaxed(&s->counters.n[i], &val);
> +        if (val > s->counters.zero[i]) {
> +            stats[i] = val - s->counters.zero[i];
> +        } else {
> +            stats[i] = 0;
> +        }
> +    }
> +}
> +
> +void
> +pmd_perf_stats_clear(struct pmd_perf_stats *s) {
> +    s->start_ms = 0; /* Marks start of clearing. */
> +    for (int i = 0; i < PMD_N_STATS; i++) {
> +        atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]);
> +    }
> +    s->start_ms = time_msec(); /* Clearing finished. */ }
> +
> diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h new file mode 100644
> index 0000000..f55f7a2
> --- /dev/null
> +++ b/lib/dpif-netdev-perf.h
> @@ -0,0 +1,123 @@
> +/*
> + * Copyright (c) 2017 Ericsson AB.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#ifndef PMD_PERF_H
> +#define PMD_PERF_H 1
> +
> +#include <stdbool.h>
> +#include <stddef.h>
> +#include <stdint.h>
> +#include <string.h>
> +#include <math.h>
> +
> +#include "openvswitch/vlog.h"
> +#include "ovs-atomic.h"
> +#include "timeval.h"
> +#include "unixctl.h"
> +#include "util.h"
> +
> +#ifdef  __cplusplus
> +extern "C" {
> +#endif
> +
> +enum pmd_stat_type {
> +    PMD_STAT_EXACT_HIT,     /* Packets that had an exact match (emc). */
> +    PMD_STAT_MASKED_HIT,    /* Packets that matched in the flow table. */
> +    PMD_STAT_MISS,          /* Packets that did not match and upcall was ok. */
> +    PMD_STAT_LOST,          /* Packets that did not match and upcall failed. */
> +                           /* The above statistics account for the total
> +                            * number of packet passes through the datapath
> +                            * pipeline and should not be overlapping with each
> +                            * other. */
> +    PMD_STAT_MASKED_LOOKUP, /* Number of subtable lookups for flow table
> +                              hits. Each MASKED_HIT hit will have >= 1
> +                              MASKED_LOOKUP(s). */
> +    PMD_STAT_RECV,          /* Packets entering the datapath pipeline from an
> +                             * interface. */
> +    PMD_STAT_RECIRC,        /* Packets reentering the datapath pipeline due to
> +                             * recirculation. */
> +    PMD_CYCLES_POLL_IDLE,   /* Cycles spent unsuccessful polling. */
> +    PMD_CYCLES_POLL_BUSY,   /* Cycles spent successfully polling and
> +                             * processing polled packets. */
> +    PMD_CYCLES_OVERHEAD,    /* Cycles spent for other tasks. */
> +    PMD_CYCLES_UPCALL,      /* Cycles spent in upcalls. Included in
> +                               PMD_CYCLES_POLL_BUSY. */
> +    PMD_CYCLES_ITER_IDLE,   /* Cycles spent in idle iterations. */
> +    PMD_CYCLES_ITER_BUSY,   /* Cycles spent in busy iterations. */
> +    PMD_N_STATS
> +};
> +
> +struct pmd_counters {
> +    atomic_uint64_t n[PMD_N_STATS];     /* Indexed by PMD_STAT_*. */
> +    uint64_t zero[PMD_N_STATS];
> +};
> +
> +struct pmd_perf_stats {
> +    uint64_t start_ms;
> +    uint64_t last_tsc;
> +    struct pmd_counters counters;
> +};
> +
> +enum pmd_info_type;
> +struct dp_netdev_pmd_thread;
> +
> +struct pmd_perf_params {
> +    int command_type;
> +    bool histograms;
> +    size_t iter_hist_len;
> +    size_t ms_hist_len;
> +};
> +
> +void pmd_perf_stats_init(struct pmd_perf_stats *s); void
> +pmd_perf_stats_clear(struct pmd_perf_stats *s); void
> +pmd_perf_read_counters(struct pmd_perf_stats *s,
> +                            uint64_t stats[PMD_N_STATS]);
> +
> +static inline void
> +pmd_perf_update_counter(struct pmd_perf_stats *s,
> +                        enum pmd_stat_type counter, int delta) {
> +    uint64_t tmp;
> +    atomic_read_relaxed(&s->counters.n[counter], &tmp);
> +    tmp += delta;
> +    atomic_store_relaxed(&s->counters.n[counter], tmp); }
> +
> +static inline void
> +pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc) {
> +    if (OVS_UNLIKELY(s->start_ms == 0)) {
> +        /* Stats not yet fully initialised. */
> +        return;
> +    }
> +    s->last_tsc = now_tsc;
> +}
> +
> +static inline void
> +pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc,
> +                       int packets)
> +{
> +    uint64_t cycles = now_tsc - s->last_tsc;
> +
> +    /* No need for atomic updates as only invoked within PMD thread. */
> +    if (packets > 0) {
> +        s->counters.n[PMD_CYCLES_ITER_BUSY] += cycles;
> +    } else {
> +        s->counters.n[PMD_CYCLES_ITER_IDLE] += cycles;
> +    }
> +}
> +
> +#endif /* pmd-perf.h */
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 0a62630..bf99f81 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -43,6 +43,7 @@
>  #include "csum.h"
>  #include "dp-packet.h"
>  #include "dpif.h"
> +#include "dpif-netdev-perf.h"
>  #include "dpif-provider.h"
>  #include "dummy.h"
>  #include "fat-rwlock.h"
> @@ -330,23 +331,6 @@ static struct dp_netdev_port
> *dp_netdev_lookup_port(const struct dp_netdev *dp,
>                                                      odp_port_t)
>      OVS_REQUIRES(dp->port_mutex);
> 
> -enum dp_stat_type {
> -    DP_STAT_EXACT_HIT,          /* Packets that had an exact match (emc). */
> -    DP_STAT_MASKED_HIT,         /* Packets that matched in the flow table. */
> -    DP_STAT_MISS,               /* Packets that did not match. */
> -    DP_STAT_LOST,               /* Packets not passed up to the client. */
> -    DP_STAT_LOOKUP_HIT,         /* Number of subtable lookups for flow table
> -                                   hits */
> -    DP_N_STATS
> -};
> -
> -enum pmd_cycles_counter_type {
> -    PMD_CYCLES_IDLE,            /* Cycles spent idle or unsuccessful polling */
> -    PMD_CYCLES_PROCESSING,      /* Cycles spent successfully polling and
> -                                 * processing polled packets */
> -    PMD_N_CYCLES
> -};
> -
>  enum rxq_cycles_counter_type {
>      RXQ_CYCLES_PROC_CURR,       /* Cycles spent successfully polling and
>                                     processing packets during the current @@ -496,18 +480,6
> @@ struct dp_netdev_actions *dp_netdev_flow_get_actions(
>      const struct dp_netdev_flow *);
>  static void dp_netdev_actions_free(struct dp_netdev_actions *);
> 
> -/* Contained by struct dp_netdev_pmd_thread's 'stats' member.  */ -struct
> dp_netdev_pmd_stats {
> -    /* Indexed by DP_STAT_*. */
> -    atomic_ullong n[DP_N_STATS];
> -};
> -
> -/* Contained by struct dp_netdev_pmd_thread's 'cycle' member.  */ -struct
> dp_netdev_pmd_cycles {
> -    /* Indexed by PMD_CYCLES_*. */
> -    atomic_ullong n[PMD_N_CYCLES];
> -};
> -
>  struct polled_queue {
>      struct dp_netdev_rxq *rxq;
>      odp_port_t port_no;
> @@ -566,20 +538,22 @@ struct dp_netdev_pmd_thread {
>       * need to be protected by 'non_pmd_mutex'.  Every other instance
>       * will only be accessed by its own pmd thread. */
>      OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct emc_cache flow_cache;
> -    struct ovs_refcount ref_cnt;    /* Every reference must be refcount'ed. */
> -
> -    /* Queue id used by this pmd thread to send packets on all netdevs if
> -     * XPS disabled for this netdev. All static_tx_qid's are unique and less
> -     * than 'cmap_count(dp->poll_threads)'. */
> -    uint32_t static_tx_qid;
> -
> -    /* Flow-Table and classifiers
> -     *
> -     * Writers of 'flow_table' must take the 'flow_mutex'.  Corresponding
> -     * changes to 'classifiers' must be made while still holding the
> -     * 'flow_mutex'.
> -     */
> -    struct ovs_mutex flow_mutex;
> +    PADDED_MEMBERS(CACHE_LINE_SIZE,
> +        struct ovs_refcount ref_cnt;    /* Every reference must be refcount'ed. */
> +
> +        /* Queue id used by this pmd thread to send packets on all netdevs if
> +         * XPS disabled for this netdev. All static_tx_qid's are unique and less
> +         * than 'cmap_count(dp->poll_threads)'. */
> +        uint32_t static_tx_qid;
> +
> +        /* Flow-Table and classifiers
> +         *
> +         * Writers of 'flow_table' must take the 'flow_mutex'.  Corresponding
> +         * changes to 'classifiers' must be made while still holding the
> +         * 'flow_mutex'.
> +         */
> +        struct ovs_mutex flow_mutex;
> +    );
>      PADDED_MEMBERS(CACHE_LINE_SIZE,
>          struct cmap flow_table OVS_GUARDED; /* Flow table. */
> 
> @@ -591,20 +565,10 @@ struct dp_netdev_pmd_thread {
>             are stored for each polled rxq. */
>          long long int rxq_next_cycle_store;
> 
> -        /* Cycles counters */
> -        struct dp_netdev_pmd_cycles cycles;
> -
>          /* Used to count cycles. See 'cycles_counter_end()'. */
>          unsigned long long last_cycles;
>          struct latch exit_latch;        /* For terminating the pmd thread. */
> -    );
> -
> -    PADDED_MEMBERS(CACHE_LINE_SIZE,
> -        /* Statistics. */
> -        struct dp_netdev_pmd_stats stats;
> 
> -        struct seq *reload_seq;
> -        uint64_t last_reload_seq;
>          atomic_bool reload;             /* Do we need to reload ports? */
>          bool isolated;
> 
> @@ -612,11 +576,11 @@ struct dp_netdev_pmd_thread {
>          bool need_reload;
>          /* 5 pad bytes. */
>      );
> -
>      PADDED_MEMBERS(CACHE_LINE_SIZE,
> +        struct seq *reload_seq;
> +        uint64_t last_reload_seq;
>          struct ovs_mutex port_mutex;    /* Mutex for 'poll_list'
>                                             and 'tx_ports'. */
> -        /* 16 pad bytes. */
>      );
>      PADDED_MEMBERS(CACHE_LINE_SIZE,
>          /* List of rx queues to poll. */ @@ -642,15 +606,8 @@ struct
> dp_netdev_pmd_thread {
>          struct hmap send_port_cache;
>      );
> 
> -    PADDED_MEMBERS(CACHE_LINE_SIZE,
> -        /* Only a pmd thread can write on its own 'cycles' and 'stats'.
> -         * The main thread keeps 'stats_zero' and 'cycles_zero' as base
> -         * values and subtracts them from 'stats' and 'cycles' before
> -         * reporting to the user */
> -        unsigned long long stats_zero[DP_N_STATS];
> -        uint64_t cycles_zero[PMD_N_CYCLES];
> -        /* 8 pad bytes. */
> -    );
> +    /* Keep track of detailed PMD performance statistics. */
> +    OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct pmd_perf_stats perf_stats;
>  };
> 
>  /* Interface to netdev-based datapath. */ @@ -813,46 +770,11 @@ enum
> pmd_info_type {  };
> 
>  static void
> -pmd_info_show_stats(struct ds *reply,
> -                    struct dp_netdev_pmd_thread *pmd,
> -                    unsigned long long stats[DP_N_STATS],
> -                    uint64_t cycles[PMD_N_CYCLES])
> +format_pmd_thread(struct ds *reply,
> +                  struct dp_netdev_pmd_thread *pmd)
>  {
> -    unsigned long long total_packets;
> -    uint64_t total_cycles = 0;
> -    int i;
> -
> -    /* These loops subtracts reference values ('*_zero') from the counters.
> -     * Since loads and stores are relaxed, it might be possible for a '*_zero'
> -     * value to be more recent than the current value we're reading from the
> -     * counter.  This is not a big problem, since these numbers are not
> -     * supposed to be too accurate, but we should at least make sure that
> -     * the result is not negative. */
> -    for (i = 0; i < DP_N_STATS; i++) {
> -        if (stats[i] > pmd->stats_zero[i]) {
> -            stats[i] -= pmd->stats_zero[i];
> -        } else {
> -            stats[i] = 0;
> -        }
> -    }
> -
> -    /* Sum of all the matched and not matched packets gives the total.  */
> -    total_packets = stats[DP_STAT_EXACT_HIT] + stats[DP_STAT_MASKED_HIT]
> -                    + stats[DP_STAT_MISS];
> -
> -    for (i = 0; i < PMD_N_CYCLES; i++) {
> -        if (cycles[i] > pmd->cycles_zero[i]) {
> -           cycles[i] -= pmd->cycles_zero[i];
> -        } else {
> -            cycles[i] = 0;
> -        }
> -
> -        total_cycles += cycles[i];
> -    }
> -
>      ds_put_cstr(reply, (pmd->core_id == NON_PMD_CORE_ID)
>                          ? "main thread" : "pmd thread");
> -
>      if (pmd->numa_id != OVS_NUMA_UNSPEC) {
>          ds_put_format(reply, " numa_id %d", pmd->numa_id);
>      }
> @@ -860,16 +782,39 @@ pmd_info_show_stats(struct ds *reply,
>          ds_put_format(reply, " core_id %u", pmd->core_id);
>      }
>      ds_put_cstr(reply, ":\n");
> +}
> +
> +static void
> +pmd_info_show_stats(struct ds *reply,
> +                    struct dp_netdev_pmd_thread *pmd) {
> +    uint64_t stats[PMD_N_STATS];
> +    uint64_t total_cycles = 0;
> +
> +    pmd_perf_read_counters(&pmd->perf_stats, stats);
> +    total_cycles = stats[PMD_CYCLES_ITER_IDLE]
> +                         + stats[PMD_CYCLES_ITER_BUSY];
> +
> +    format_pmd_thread(reply, pmd);
> 
>      ds_put_format(reply,
> -                  "\temc hits:%llu\n\tmegaflow hits:%llu\n"
> -                  "\tavg. subtable lookups per hit:%.2f\n"
> -                  "\tmiss:%llu\n\tlost:%llu\n",
> -                  stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT],
> -                  stats[DP_STAT_MASKED_HIT] > 0
> -                  ? (1.0*stats[DP_STAT_LOOKUP_HIT])/stats[DP_STAT_MASKED_HIT]
> -                  : 0,
> -                  stats[DP_STAT_MISS], stats[DP_STAT_LOST]);
> +                  "\tpackets received:%"PRIu64"\n"
> +                  "\tpacket recirculations:%"PRIu64"\n"
> +                  "\tavg. datapath passes per packet:%.2f\n"
> +                  "\temc hits:%"PRIu64"\n"
> +                  "\tmegaflow hits:%"PRIu64"\n"
> +                  "\tavg. subtable lookups per megaflow hit:%.2f\n"
> +                  "\tmiss(i.e. lookup miss with success upcall):%"PRIu64"\n"
> +                  "\tlost(i.e. lookup miss with failed upcall):%"PRIu64"\n",
> +                  stats[PMD_STAT_RECV], stats[PMD_STAT_RECIRC],
> +                  stats[PMD_STAT_RECV] > 0
> +                  ? (1.0 * (stats[PMD_STAT_RECV] + stats[PMD_STAT_RECIRC]))
> +                     / stats[PMD_STAT_RECV] : 0,
> +                  stats[PMD_STAT_EXACT_HIT], stats[PMD_STAT_MASKED_HIT],
> +                  stats[PMD_STAT_MASKED_HIT] > 0
> +                  ? (1.0 * stats[PMD_STAT_MASKED_LOOKUP])
> +                     / stats[PMD_STAT_MASKED_HIT]
> +                  : 0, stats[PMD_STAT_MISS], stats[PMD_STAT_LOST]);
> 
>      if (total_cycles == 0) {
>          return;
> @@ -878,46 +823,26 @@ pmd_info_show_stats(struct ds *reply,
>      ds_put_format(reply,
>                    "\tidle cycles:%"PRIu64" (%.02f%%)\n"
>                    "\tprocessing cycles:%"PRIu64" (%.02f%%)\n",
> -                  cycles[PMD_CYCLES_IDLE],
> -                  cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100,
> -                  cycles[PMD_CYCLES_PROCESSING],
> -                  cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * 100);
> +                  stats[PMD_CYCLES_ITER_IDLE],
> +                  stats[PMD_CYCLES_ITER_IDLE] / (double)total_cycles * 100,
> +                  stats[PMD_CYCLES_ITER_BUSY],
> +                  stats[PMD_CYCLES_ITER_BUSY] / (double)total_cycles *
> + 100);
> 
> +    uint64_t total_packets = stats[PMD_STAT_RECV];
>      if (total_packets == 0) {
>          return;
>      }
> 
>      ds_put_format(reply,
> -                  "\tavg cycles per packet: %.02f (%"PRIu64"/%llu)\n",
> +                  "\tavg cycles per packet: %.02f
> + (%"PRIu64"/%"PRIu64")\n",
>                    total_cycles / (double)total_packets,
>                    total_cycles, total_packets);
> 
>      ds_put_format(reply,
>                    "\tavg processing cycles per packet: "
> -                  "%.02f (%"PRIu64"/%llu)\n",
> -                  cycles[PMD_CYCLES_PROCESSING] / (double)total_packets,
> -                  cycles[PMD_CYCLES_PROCESSING], total_packets);
> -}
> -
> -static void
> -pmd_info_clear_stats(struct ds *reply OVS_UNUSED,
> -                    struct dp_netdev_pmd_thread *pmd,
> -                    unsigned long long stats[DP_N_STATS],
> -                    uint64_t cycles[PMD_N_CYCLES])
> -{
> -    int i;
> -
> -    /* We cannot write 'stats' and 'cycles' (because they're written by other
> -     * threads) and we shouldn't change 'stats' (because they're used to count
> -     * datapath stats, which must not be cleared here).  Instead, we save the
> -     * current values and subtract them from the values to be displayed in the
> -     * future */
> -    for (i = 0; i < DP_N_STATS; i++) {
> -        pmd->stats_zero[i] = stats[i];
> -    }
> -    for (i = 0; i < PMD_N_CYCLES; i++) {
> -        pmd->cycles_zero[i] = cycles[i];
> -    }
> +                  "%.02f (%"PRIu64"/%"PRIu64")\n",
> +                  stats[PMD_CYCLES_POLL_BUSY] / (double)total_packets,
> +                  stats[PMD_CYCLES_POLL_BUSY], total_packets);
>  }
> 
>  static int
> @@ -1077,23 +1002,34 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn,
> int argc, const char *argv[],
>      struct ds reply = DS_EMPTY_INITIALIZER;
>      struct dp_netdev_pmd_thread **pmd_list;
>      struct dp_netdev *dp = NULL;
> -    size_t n;
>      enum pmd_info_type type = *(enum pmd_info_type *) aux;
> +    int core_id = -1;
> +    size_t n;
> 
>      ovs_mutex_lock(&dp_netdev_mutex);
> 
> -    if (argc == 2) {
> -        dp = shash_find_data(&dp_netdevs, argv[1]);
> -    } else if (shash_count(&dp_netdevs) == 1) {
> -        /* There's only one datapath */
> -        dp = shash_first(&dp_netdevs)->data;
> +    while (argc > 1) {
> +        if (!strcmp(argv[1], "-pmd")) {
> +            core_id = strtol(argv[2], NULL, 10);
> +            argc -= 2;
> +            argv += 2;
> +        } else {
> +            dp = shash_find_data(&dp_netdevs, argv[1]);
> +            argc -= 1;
> +            argv += 1;
> +        }
>      }
> 
>      if (!dp) {
> -        ovs_mutex_unlock(&dp_netdev_mutex);
> -        unixctl_command_reply_error(conn,
> -                                    "please specify an existing datapath");
> -        return;
> +        if (shash_count(&dp_netdevs) == 1) {
> +            /* There's only one datapath */
> +            dp = shash_first(&dp_netdevs)->data;
> +        } else {
> +            ovs_mutex_unlock(&dp_netdev_mutex);
> +            unixctl_command_reply_error(conn,
> +                                        "please specify an existing datapath");
> +            return;
> +        }
>      }
> 
>      sorted_poll_thread_list(dp, &pmd_list, &n); @@ -1102,26 +1038,15 @@
> dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[],
>          if (!pmd) {
>              break;
>          }
> -
> +        if (core_id != -1 && pmd->core_id != core_id) {
> +            continue;
> +        }
>          if (type == PMD_INFO_SHOW_RXQ) {
>              pmd_info_show_rxq(&reply, pmd);
> -        } else {
> -            unsigned long long stats[DP_N_STATS];
> -            uint64_t cycles[PMD_N_CYCLES];
> -
> -            /* Read current stats and cycle counters */
> -            for (size_t j = 0; j < ARRAY_SIZE(stats); j++) {
> -                atomic_read_relaxed(&pmd->stats.n[j], &stats[j]);
> -            }
> -            for (size_t j = 0; j < ARRAY_SIZE(cycles); j++) {
> -                atomic_read_relaxed(&pmd->cycles.n[j], &cycles[j]);
> -            }
> -
> -            if (type == PMD_INFO_CLEAR_STATS) {
> -                pmd_info_clear_stats(&reply, pmd, stats, cycles);
> -            } else if (type == PMD_INFO_SHOW_STATS) {
> -                pmd_info_show_stats(&reply, pmd, stats, cycles);
> -            }
> +        } else if (type == PMD_INFO_CLEAR_STATS) {
> +            pmd_perf_stats_clear(&pmd->perf_stats);
> +        } else if (type == PMD_INFO_SHOW_STATS) {
> +            pmd_info_show_stats(&reply, pmd);
>          }
>      }
>      free(pmd_list);
> @@ -1139,14 +1064,14 @@ dpif_netdev_init(void)
>                                clear_aux = PMD_INFO_CLEAR_STATS,
>                                poll_aux = PMD_INFO_SHOW_RXQ;
> 
> -    unixctl_command_register("dpif-netdev/pmd-stats-show", "[dp]",
> -                             0, 1, dpif_netdev_pmd_info,
> +    unixctl_command_register("dpif-netdev/pmd-stats-show", "[-pmd core]
> [dp]",
> +                             0, 2, dpif_netdev_pmd_info,
>                               (void *)&show_aux);
> -    unixctl_command_register("dpif-netdev/pmd-stats-clear", "[dp]",
> -                             0, 1, dpif_netdev_pmd_info,
> +    unixctl_command_register("dpif-netdev/pmd-stats-clear", "[-pmd core]
> [dp]",
> +                             0, 2, dpif_netdev_pmd_info,
>                               (void *)&clear_aux);
> -    unixctl_command_register("dpif-netdev/pmd-rxq-show", "[dp]",
> -                             0, 1, dpif_netdev_pmd_info,
> +    unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core] [dp]",
> +                             0, 2, dpif_netdev_pmd_info,
>                               (void *)&poll_aux);
>      unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
>                               0, 1, dpif_netdev_pmd_rebalance, @@ -1482,23 +1407,19 @@
> dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats)  {
>      struct dp_netdev *dp = get_dp_netdev(dpif);
>      struct dp_netdev_pmd_thread *pmd;
> +    uint64_t pmd_stats[PMD_N_STATS];
> 
> -    stats->n_flows = stats->n_hit = stats->n_missed = stats->n_lost = 0;
> +    stats->n_flows = stats->n_hit =
> +            stats->n_mask_hit = stats->n_missed = stats->n_lost = 0;
>      CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
> -        unsigned long long n;
>          stats->n_flows += cmap_count(&pmd->flow_table);
> -
> -        atomic_read_relaxed(&pmd->stats.n[DP_STAT_MASKED_HIT], &n);
> -        stats->n_hit += n;
> -        atomic_read_relaxed(&pmd->stats.n[DP_STAT_EXACT_HIT], &n);
> -        stats->n_hit += n;
> -        atomic_read_relaxed(&pmd->stats.n[DP_STAT_MISS], &n);
> -        stats->n_missed += n;
> -        atomic_read_relaxed(&pmd->stats.n[DP_STAT_LOST], &n);
> -        stats->n_lost += n;
> +        pmd_perf_read_counters(&pmd->perf_stats, pmd_stats);
> +        stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT];
> +        stats->n_mask_hit += pmd_stats[PMD_STAT_MASKED_HIT];
> +        stats->n_missed += pmd_stats[PMD_STAT_MISS];
> +        stats->n_lost += pmd_stats[PMD_STAT_LOST];
>      }
>      stats->n_masks = UINT32_MAX;
> -    stats->n_mask_hit = UINT64_MAX;
> 
>      return 0;
>  }
> @@ -3174,28 +3095,28 @@ cycles_count_start(struct dp_netdev_pmd_thread
> *pmd)
>  /* Stop counting cycles and add them to the counter 'type' */  static inline void
> cycles_count_end(struct dp_netdev_pmd_thread *pmd,
> -                 enum pmd_cycles_counter_type type)
> +                 enum pmd_stat_type type)
>      OVS_RELEASES(&cycles_counter_fake_mutex)
>      OVS_NO_THREAD_SAFETY_ANALYSIS
>  {
>      unsigned long long interval = cycles_counter() - pmd->last_cycles;
> 
> -    non_atomic_ullong_add(&pmd->cycles.n[type], interval);
> +    pmd_perf_update_counter(&pmd->perf_stats, type, interval);
>  }
> 
>  /* Calculate the intermediate cycle result and add to the counter 'type' */  static
> inline void  cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd,
>                            struct dp_netdev_rxq *rxq,
> -                          enum pmd_cycles_counter_type type)
> +                          enum pmd_stat_type type)
>      OVS_NO_THREAD_SAFETY_ANALYSIS
>  {
>      unsigned long long new_cycles = cycles_counter();
>      unsigned long long interval = new_cycles - pmd->last_cycles;
>      pmd->last_cycles = new_cycles;
> 
> -    non_atomic_ullong_add(&pmd->cycles.n[type], interval);
> -    if (rxq && (type == PMD_CYCLES_PROCESSING)) {
> +    pmd_perf_update_counter(&pmd->perf_stats, type, interval);
> +    if (rxq && (type == PMD_CYCLES_POLL_BUSY)) {
>          /* Add to the amount of current processing cycles. */
>          non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR],
> interval);
>      }
> @@ -3885,12 +3806,12 @@ dpif_netdev_run(struct dpif *dpif)
>                                                     port->port_no);
>                      cycles_count_intermediate(non_pmd, NULL,
>                                                process_packets
> -                                              ? PMD_CYCLES_PROCESSING
> -                                              : PMD_CYCLES_IDLE);
> +                                              ? PMD_CYCLES_POLL_BUSY
> +                                              : PMD_CYCLES_POLL_IDLE);
>                  }
>              }
>          }
> -        cycles_count_end(non_pmd, PMD_CYCLES_IDLE);
> +        cycles_count_end(non_pmd, PMD_CYCLES_POLL_IDLE);
>          dpif_netdev_xps_revalidate_pmd(non_pmd, time_msec(), false);
>          ovs_mutex_unlock(&dp->non_pmd_mutex);
> 
> @@ -4034,6 +3955,7 @@ static void *
>  pmd_thread_main(void *f_)
>  {
>      struct dp_netdev_pmd_thread *pmd = f_;
> +    struct pmd_perf_stats *s = &pmd->perf_stats;
>      unsigned int lc = 0;
>      struct polled_queue *poll_list;
>      bool exiting;
> @@ -4069,13 +3991,17 @@ reload:
> 
>      cycles_count_start(pmd);
>      for (;;) {
> +        uint64_t iter_packets = 0;
> +        pmd_perf_start_iteration(s, pmd->last_cycles);
>          for (i = 0; i < poll_cnt; i++) {
>              process_packets =
>                  dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx,
>                                             poll_list[i].port_no);
>              cycles_count_intermediate(pmd, poll_list[i].rxq,
> -                                      process_packets ? PMD_CYCLES_PROCESSING
> -                                                      : PMD_CYCLES_IDLE);
> +                                      process_packets
> +                                      ? PMD_CYCLES_POLL_BUSY
> +                                      : PMD_CYCLES_POLL_IDLE);
> +            iter_packets += process_packets;
>          }
> 
>          if (lc++ > 1024) {
> @@ -4093,10 +4019,12 @@ reload:
>              if (reload) {
>                  break;
>              }
> +            cycles_count_intermediate(pmd, NULL, PMD_CYCLES_OVERHEAD);
>          }
> +        pmd_perf_end_iteration(s, pmd->last_cycles, iter_packets);
>      }
> 
> -    cycles_count_end(pmd, PMD_CYCLES_IDLE);
> +    cycles_count_end(pmd, PMD_CYCLES_OVERHEAD);
> 
>      poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);
>      exiting = latch_is_set(&pmd->exit_latch); @@ -4549,6 +4477,7 @@
> dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct
> dp_netdev *dp,
>      }
>      cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, &pmd-
> >node),
>                  hash_int(core_id, 0));
> +    pmd_perf_stats_init(&pmd->perf_stats);
>  }
> 
>  static void
> @@ -4746,13 +4675,6 @@ dp_netdev_flow_used(struct dp_netdev_flow
> *netdev_flow, int cnt, int size,
>      atomic_store_relaxed(&netdev_flow->stats.tcp_flags, flags);  }
> 
> -static void
> -dp_netdev_count_packet(struct dp_netdev_pmd_thread *pmd,
> -                       enum dp_stat_type type, int cnt)
> -{
> -    non_atomic_ullong_add(&pmd->stats.n[type], cnt);
> -}
> -
>  static int
>  dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet
> *packet_,
>                   struct flow *flow, struct flow_wildcards *wc, ovs_u128 *ufid, @@ -
> 4926,6 +4848,9 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
>      int i;
> 
>      atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min);
> +    pmd_perf_update_counter(&pmd->perf_stats,
> +                            md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV,
> +                            cnt);
> 
>      DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) {
>          struct dp_netdev_flow *flow;
> @@ -4974,24 +4899,24 @@ emc_processing(struct dp_netdev_pmd_thread
> *pmd,
>          }
>      }
> 
> -    dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT,
> -                           cnt - n_dropped - n_missed);
> +    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT,
> +                            cnt - n_dropped - n_missed);
> 
>      return dp_packet_batch_size(packets_);  }
> 
> -static inline void
> +static inline int
>  handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
>                       struct dp_packet *packet,
>                       const struct netdev_flow_key *key,
>                       struct ofpbuf *actions, struct ofpbuf *put_actions,
> -                     int *lost_cnt, long long now)
> +                     long long now)
>  {
>      struct ofpbuf *add_actions;
>      struct dp_packet_batch b;
>      struct match match;
>      ovs_u128 ufid;
> -    int error;
> +    int error = 0;
> 
>      match.tun_md.valid = false;
>      miniflow_expand(&key->mf, &match.flow); @@ -5005,8 +4930,7 @@
> handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
>                               put_actions);
>      if (OVS_UNLIKELY(error && error != ENOSPC)) {
>          dp_packet_delete(packet);
> -        (*lost_cnt)++;
> -        return;
> +        return error;
>      }
> 
>      /* The Netlink encoding of datapath flow keys cannot express @@ -5046,6
> +4970,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
>          ovs_mutex_unlock(&pmd->flow_mutex);
>          emc_probabilistic_insert(pmd, key, netdev_flow);
>      }
> +    return error;
>  }
> 
>  static inline void
> @@ -5067,7 +4992,7 @@ fast_path_processing(struct dp_netdev_pmd_thread
> *pmd,
>      struct dpcls *cls;
>      struct dpcls_rule *rules[PKT_ARRAY_SIZE];
>      struct dp_netdev *dp = pmd->dp;
> -    int miss_cnt = 0, lost_cnt = 0;
> +    int upcall_ok_cnt = 0, upcall_fail_cnt = 0;
>      int lookup_cnt = 0, add_lookup_cnt;
>      bool any_miss;
>      size_t i;
> @@ -5109,9 +5034,14 @@ fast_path_processing(struct dp_netdev_pmd_thread
> *pmd,
>                  continue;
>              }
> 
> -            miss_cnt++;
> -            handle_packet_upcall(pmd, packet, &keys[i], &actions,
> -                                 &put_actions, &lost_cnt, now);
> +            int error = handle_packet_upcall(pmd, packet, &keys[i],
> +                           &actions, &put_actions, now);
> +
> +            if (OVS_UNLIKELY(error)) {
> +                upcall_fail_cnt++;
> +            } else {
> +                upcall_ok_cnt++;
> +            }
>          }
> 
>          ofpbuf_uninit(&actions);
> @@ -5121,8 +5051,7 @@ fast_path_processing(struct dp_netdev_pmd_thread
> *pmd,
>          DP_PACKET_BATCH_FOR_EACH (packet, packets_) {
>              if (OVS_UNLIKELY(!rules[i])) {
>                  dp_packet_delete(packet);
> -                lost_cnt++;
> -                miss_cnt++;
> +                upcall_fail_cnt++;
>              }
>          }
>      }
> @@ -5140,10 +5069,14 @@ fast_path_processing(struct
> dp_netdev_pmd_thread *pmd,
>          dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches);
>      }
> 
> -    dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt);
> -    dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt);
> -    dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt);
> -    dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt);
> +    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT,
> +                            cnt - upcall_ok_cnt - upcall_fail_cnt);
> +    pmd_perf_update_counter(&pmd->perf_stats,
> PMD_STAT_MASKED_LOOKUP,
> +                            lookup_cnt);
> +    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MISS,
> +                            upcall_ok_cnt);
> +    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_LOST,
> +                            upcall_fail_cnt);
>  }
> 
>  /* Packets enter the datapath from a port (or from recirculation) here.
> diff --git a/tests/pmd.at b/tests/pmd.at index e39a23a..0356f87 100644
> --- a/tests/pmd.at
> +++ b/tests/pmd.at
> @@ -170,13 +170,16 @@ dummy at ovs-dummy: hit:0 missed:0
>  		p0 7/1: (dummy-pmd: configured_rx_queues=4,
> configured_tx_queues=<cleared>, requested_rx_queues=4,
> requested_tx_queues=<cleared>)
>  ])
> 
> -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed
> SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl
> +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed
> +SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl
>  pmd thread numa_id <cleared> core_id <cleared>:
> +	packets received:0
> +	packet recirculations:0
> +	avg. datapath passes per packet:0.00
>  	emc hits:0
>  	megaflow hits:0
> -	avg. subtable lookups per hit:0.00
> -	miss:0
> -	lost:0
> +	avg. subtable lookups per megaflow hit:0.00
> +	miss(i.e. lookup miss with success upcall):0
> +	lost(i.e. lookup miss with failed upcall):0
>  ])
> 
>  ovs-appctl time/stop
> @@ -197,13 +200,16 @@ AT_CHECK([cat ovs-vswitchd.log | filter_flow_install
> | strip_xout], [0], [dnl
> recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:77,dst=50
> :54:00:00:01:78),eth_type(0x0800),ipv4(frag=no), actions: <del>
>  ])
> 
> -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed
> SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl
> +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed
> +SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl
>  pmd thread numa_id <cleared> core_id <cleared>:
> +	packets received:20
> +	packet recirculations:0
> +	avg. datapath passes per packet:1.00
>  	emc hits:19
>  	megaflow hits:0
> -	avg. subtable lookups per hit:0.00
> -	miss:1
> -	lost:0
> +	avg. subtable lookups per megaflow hit:0.00
> +	miss(i.e. lookup miss with success upcall):1
> +	lost(i.e. lookup miss with failed upcall):0
>  ])
> 
>  OVS_VSWITCHD_STOP
> --
> 1.9.1
> 
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list