[ovs-dev] [PATCH v3 4/6] dpif-netdev: Change rxq_scheduling to use rxq processing cycles.

Thu Aug 10 00:42:29 UTC 2017

On 08/09/2017 08:47 AM, Kevin Traynor wrote:
> On 08/08/2017 07:15 PM, Greg Rose wrote:
> > On 08/01/2017 08:58 AM, Kevin Traynor wrote:
> >> Previously rxqs were assigned to pmds by round robin in
> >> port/queue order.
> >>
> >> Now that we have the processing cycles used for existing rxqs,
> >> use that information to try and produced a better balanced
> >> distribution of rxqs across pmds. i.e. given multiple pmds, the
> >> rxqs which have consumed the largest amount of processing cycles
> >> will be placed on different pmds.
> >>
> >> The rxqs are sorted by their processing cycles and assigned (in
> >> sorted order) round robin across pmds.
> >>
> >> Signed-off-by: Kevin Traynor <ktraynor at redhat.com>
> >> ---
> >>    Documentation/howto/dpdk.rst |  7 +++++
> >>    lib/dpif-netdev.c            | 73
> >> +++++++++++++++++++++++++++++++++++---------
> >>    2 files changed, 66 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> >> index af01d3e..a969285 100644
> >> --- a/Documentation/howto/dpdk.rst
> >> +++ b/Documentation/howto/dpdk.rst
> >> @@ -119,4 +119,11 @@ After that PMD threads on cores where RX queues
> >> was pinned will become
> >>      thread.
> >>
> >> +If pmd-rxq-affinity is not set for rxqs, they will be assigned to
> >> pmds (cores)
> >> +automatically. The processing cycles that have been required for each
> >> rxq
> >> +will be used where known to assign rxqs with the highest consumption of
> >> +processing cycles to different pmds.
> >> +
> >> +Rxq to pmds assignment takes place whenever there are configuration
> >> changes.
> >> +
> >>    QoS
> >>    ---
> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >> index 25a521a..a05e586 100644
> >> --- a/lib/dpif-netdev.c
> >> +++ b/lib/dpif-netdev.c
> >> @@ -3295,8 +3295,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr)
> >>    }
> >>
> >> +/* Sort Rx Queues by the processing cycles they are consuming. */
> >> +static int
> >> +rxq_cycle_sort(const void *a, const void *b)
> >> +{
> >> +    struct dp_netdev_rxq * qa;
> >> +    struct dp_netdev_rxq * qb;
> >> +
> >> +    qa = *(struct dp_netdev_rxq **) a;
> >> +    qb = *(struct dp_netdev_rxq **) b;
> >> +
> >> +    if (dp_netdev_rxq_get_cycles(qa, RXQ_CYCLES_PROC_LAST) >=
> >> +            dp_netdev_rxq_get_cycles(qb, RXQ_CYCLES_PROC_LAST)) {
> >> +        return -1;
> >> +    }
> >> +
> >> +    return 1;
> >> +}
> >> +
> >>    /* Assign pmds to queues.  If 'pinned' is true, assign pmds to pinned
> >>     * queues and marks the pmds as isolated.  Otherwise, assign non
> >> isolated
> >>     * pmds to unpinned queues.
> >>     *
> >> + * If 'pinned' is false queues will be sorted by processing cycles
> >> they are
> >> + * consuming and then assigned to pmds in round robin order.
> >> + *
> >>     * The function doesn't touch the pmd threads, it just stores the
> >> assignment
> >>     * in the 'pmd' member of each rxq. */
> >> @@ -3306,18 +3327,14 @@ rxq_scheduling(struct dp_netdev *dp, bool
> >> pinned) OVS_REQUIRES(dp->port_mutex)
> >>        struct dp_netdev_port *port;
> >>        struct rr_numa_list rr;
> >> -
> >> -    rr_numa_list_populate(dp, &rr);
> >> +    struct dp_netdev_rxq ** rxqs = NULL;
> >> +    int i, n_rxqs = 0;
> >> +    struct rr_numa *numa = NULL;
> >> +    int numa_id;
> >>
> >>        HMAP_FOR_EACH (port, node, &dp->ports) {
> >> -        struct rr_numa *numa;
> >> -        int numa_id;
> >> -
> >>            if (!netdev_is_pmd(port->netdev)) {
> >>                continue;
> >>            }
> >>
> >> -        numa_id = netdev_get_numa_id(port->netdev);
> >> -        numa = rr_numa_list_lookup(&rr, numa_id);
> >> -
> >>            for (int qid = 0; qid < port->n_rxq; qid++) {
> >>                struct dp_netdev_rxq *q = &port->rxqs[qid];
> >> @@ -3337,17 +3354,45 @@ rxq_scheduling(struct dp_netdev *dp, bool
> >> pinned) OVS_REQUIRES(dp->port_mutex)
> >>                    }
> >>                } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
> >> -                if (!numa) {
> >> -                    VLOG_WARN("There's no available (non isolated)
> >> pmd thread "
> >> -                              "on numa node %d. Queue %d on port
> >> \'%s\' will "
> >> -                              "not be polled.",
> >> -                              numa_id, qid,
> >> netdev_get_name(port->netdev));
> >> +                if (n_rxqs == 0) {
> >> +                    rxqs = xmalloc(sizeof *rxqs);
> >>                    } else {
> >> -                    q->pmd = rr_numa_get_pmd(numa);
> >> +                    rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1));
> >>                    }
> >> +                /* Store the queue. */
> >> +                rxqs[n_rxqs++] = q;
> >>                }
> >>            }
> >>        }
> >>
> >> +    if (n_rxqs > 1) {
> >> +        /* Sort the queues in order of the processing cycles
> >> +         * they consumed during their last pmd interval. */
> >> +        qsort(rxqs, n_rxqs, sizeof *rxqs, rxq_cycle_sort);
> >> +    }
> >> +
> >> +    rr_numa_list_populate(dp, &rr);
> >> +    /* Assign the sorted queues to pmds in round robin. */
> >> +    for (i = 0; i < n_rxqs; i++) {
> >> +        numa_id = netdev_get_numa_id(rxqs[i]->port->netdev);
> >> +        numa = rr_numa_list_lookup(&rr, numa_id);
> >> +        if (!numa) {
> >> +            VLOG_WARN("There's no available (non isolated) pmd thread "
> >> +                      "on numa node %d. Queue %d on port \'%s\' will "
> >> +                      "not be polled.",
> >> +                      numa_id, netdev_rxq_get_queue_id(rxqs[i]->rx),
> >> +                      netdev_get_name(rxqs[i]->port->netdev));
> >> +            continue;
> >> +        }
> >> +        rxqs[i]->pmd = rr_numa_get_pmd(numa);
> >> +        VLOG_INFO("Core %d on numa node %d assigned port \'%s\' "
> >> +                  "rx queue %d (measured processing cycles %"PRIu64").",
> >> +                  rxqs[i]->pmd->core_id, numa_id,
> >> +                  netdev_rxq_get_name(rxqs[i]->rx),
> >> +                  netdev_rxq_get_queue_id(rxqs[i]->rx),
> >> +                  dp_netdev_rxq_get_cycles(rxqs[i],
> >> RXQ_CYCLES_PROC_LAST));
> >
> > Kevin,
> >
> > I've been reviewing and testing this code and found something odd.  The
> > measured processing cycles are
> > always zero in my setup.
> >
> > sample log output:
> >
> > 2017-08-08T12:48:25.871Z|00417|dpif_netdev|INFO|Core 6 on numa node 0
> > assigned port 'port-em2' rx queue 5 (measured processing cycles
> > 10011304791).
> > 2017-08-08T12:48:25.871Z|00418|dpif_netdev|INFO|Core 6 on numa node 0
> > assigned port 'port-em2' rx queue 4 (measured processing cycles 0).
> >
> > Initially I configure my setup with 16 rxq's and a PMD CPU mask of
> > 0x1FFFE.  Then I've been testing by running
> > iperf traffic with multiple ports 8 or 16 (-P option) to allow
> > 'processing cycles' to count up.  Or at least I think that's
> > what should be happening.  But when I reconfigure the rxq's and cpu mask
> > the processing cycles is always
> > zero.
> >
>
> Hi Greg, thanks for trying it out. I see that rxq 5 has measured cycles
> so it appears to be just on some queues.
>
> The stat that is showing is the processing cycles that was counted for
> the rxq during the last 1 sec run while it was on a pmd. "processing
> cycles" counts time to fetch packets and process them but it does not
> count time spent polling when there are no rx packets.
>
> There's a few reasons it could be 0:
> - The queue is newly added
> - There is no rx traffic on that interface
> - The interface has not distributed the traffic to that particular rxq
> so there is no "processing cycles" done for that queue.
>
> Given the rxq number in the log, I would hazard a guess that it's the
> last issue. You could confirm this by setting pmds > total rxqs, so that
> each pmd has a max of 1 rxq. Then the pmds stats then can indicate if
> there are packets being received on that pmd, and hence rxq. You can
> check that setup with
> ovs-appctl dpif-netdev/pmd-rxq-show
> ovs-appctl dpif-netdev/pmd-stats-clear
> ovs-appctl dpif-netdev/pmd-stats-show

I'll give that a try with your latest revision of the patch series.  Thanks for the
clues!

- Greg