[ovs-dev] [PATCH v5 4/6] dpif-netdev: Change rxq_scheduling to use rxq processing cycles.

Thu Aug 24 23:33:35 UTC 2017

On 08/24/2017 10:06 AM, Darrell Ball wrote:
> 
> 
> On 8/24/17, 1:47 AM, "Darrell Ball" <dball at vmware.com> wrote:
> 
>     There is a minor checkpatch warning
>     
>     
>     == Checking "/home/dball/Desktop/patches/ovs-dev-v5-4-6-dpif-netdev-Change-rxq_scheduling-to-use-rxq-processing-cycles..patch" ==
>     WARNING: Line lacks whitespace around operator
>     #170 FILE: lib/dpif-netdev.c:3456:
>                    Round-robin on the NUMA nodes that do have pmds. */
>     
>     Lines checked: 204, Warnings: 1, Errors: 0
> 
> [Darrell] 
> Maybe try
> * Round-robin on the NUMA nodes that do have pmds. */
> 
>     
>     I marked it below
> 
> Ignore what I marked for the warning inline
> 

I noticed the checkpatch warning too. It's a false positive for the
"Round-robin" in a comment block from a previous commit that I moved to
later in the function. I didn't want to confuse by fixing inline with
the other changes but now you've highlighted it I might as well change it.

>         
>         On 8/23/17, 6:34 AM, "Kevin Traynor" <ktraynor at redhat.com> wrote:
>         
>             Previously rxqs were assigned to pmds by round robin in
>             port/queue order.
>             
>             Now that we have the processing cycles used for existing rxqs,
>             use that information to try and produced a better balanced
>             distribution of rxqs across pmds. i.e. given multiple pmds, the
>             rxqs which have consumed the largest amount of processing cycles
>             will be placed on different pmds.
>             
>             The rxqs are sorted by their processing cycles and assigned (in
>             sorted order) round robin across pmds.
>             
>             Signed-off-by: Kevin Traynor <ktraynor at redhat.com>
>             ---
>              Documentation/howto/dpdk.rst |   7 +++
>              lib/dpif-netdev.c            | 123 ++++++++++++++++++++++++++++++++-----------
>              2 files changed, 99 insertions(+), 31 deletions(-)
>             
>             diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
>             index d7f6610..44737e4 100644
>             --- a/Documentation/howto/dpdk.rst
>             +++ b/Documentation/howto/dpdk.rst
>             @@ -119,4 +119,11 @@ After that PMD threads on cores where RX queues was pinned will become
>                thread.
>              
>             +If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds (cores)
>             +automatically. The processing cycles that have been required for each rxq
>     

I noticed this also
s/required/stored

>             +will be used where known to assign rxqs with the highest consumption of
>             +processing cycles to different pmds.
>     
>     ‘will be used where known to assign rxqs to pmds based on round robin of the sorted rxqs’ ?
>     

changed

>     
>             +
>             +Rxq to pmds assignment takes place whenever there are configuration changes.
>             +
>              QoS
>              ---
>             diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>             index 5670c55..afbf591 100644
>             --- a/lib/dpif-netdev.c
>             +++ b/lib/dpif-netdev.c
>             @@ -712,4 +712,6 @@ static void
>              dp_netdev_rxq_set_interval(struct dp_netdev_rxq *rx,
>                                         unsigned long long cycles);
>             +static uint64_t
>             +dp_netdev_rxq_get_interval(struct dp_netdev_rxq *rx, unsigned idx);
>              static void
>              dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd,
>             @@ -3166,4 +3168,12 @@ dp_netdev_rxq_set_interval(struct dp_netdev_rxq *rx,
>              }
>              
>             +static uint64_t
>             +dp_netdev_rxq_get_interval(struct dp_netdev_rxq *rx, unsigned idx)
>             +{
>             +    unsigned long long tmp;
>             +    atomic_read_relaxed(&rx->cycles_intrvl[idx], &tmp);
>             +    return tmp;
>     
>     Could we use something like ‘intrv_processing_cycles’ instead of ‘tmp’
>     Also _get_intrv_cycles ?; same forward comment I mentioned in patch 3
>     

removed tmp and changed to processing_cycles. made _get_intrvl_cycles to
match the _set_intrvl_cycles in 3/6

>     
>             +}
>             +
>              static int
>              dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
>             @@ -3352,8 +3362,37 @@ rr_numa_list_destroy(struct rr_numa_list *rr)
>              }
>              
>             +/* Sort Rx Queues by the processing cycles they are consuming. */
>             +static int
>             +rxq_cycle_sort(const void *a, const void *b)
>             +{
>             +    struct dp_netdev_rxq * qa;
>             +    struct dp_netdev_rxq * qb;
>             +    uint64_t total_qa, total_qb;
>             +    unsigned i;
>             +
>             +    qa = *(struct dp_netdev_rxq **) a;
>             +    qb = *(struct dp_netdev_rxq **) b;
>             +
>             +    total_qa = total_qb = 0;
>             +    for (i = 0; i < PMD_RXQ_INTERVAL_MAX; i++) {
>             +        total_qa += dp_netdev_rxq_get_interval(qa, i);
>             +        total_qb += dp_netdev_rxq_get_interval(qb, i);
>             +    }
>             +    dp_netdev_rxq_set_cycles(qa, RXQ_CYCLES_PROC_HIST, total_qa);
>             +    dp_netdev_rxq_set_cycles(qb, RXQ_CYCLES_PROC_HIST, total_qb);
>             +
>             +    if (total_qa >= total_qb) {
>             +        return -1;
>             +    }
>             +    return 1;
>             +}
>             +
>              /* Assign pmds to queues.  If 'pinned' is true, assign pmds to pinned
>               * queues and marks the pmds as isolated.  Otherwise, assign non isolated
>               * pmds to unpinned queues.
>               *
>             + * If 'pinned' is false queues will be sorted by processing cycles they are
>             + * consuming and then assigned to pmds in round robin order.
>             + *
>               * The function doesn't touch the pmd threads, it just stores the assignment
>               * in the 'pmd' member of each rxq. */
>             @@ -3364,18 +3403,14 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex)
>                  struct rr_numa_list rr;
>                  struct rr_numa *non_local_numa = NULL;
>             -
>             -    rr_numa_list_populate(dp, &rr);
>             +    struct dp_netdev_rxq ** rxqs = NULL;
>             +    int i, n_rxqs = 0;
>             +    struct rr_numa *numa = NULL;
>             +    int numa_id;
>              
>                  HMAP_FOR_EACH (port, node, &dp->ports) {
>             -        struct rr_numa *numa;
>             -        int numa_id;
>             -
>                      if (!netdev_is_pmd(port->netdev)) {
>                          continue;
>                      }
>              
>             -        numa_id = netdev_get_numa_id(port->netdev);
>             -        numa = rr_numa_list_lookup(&rr, numa_id);
>             -
>                      for (int qid = 0; qid < port->n_rxq; qid++) {
>                          struct dp_netdev_rxq *q = &port->rxqs[qid];
>             @@ -3395,34 +3430,60 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex)
>                              }
>                          } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
>             -                if (!numa) {
>             -                    /* There are no pmds on the queue's local NUMA node.
>             -                       Round-robin on the NUMA nodes that do have pmds. */
>             -                    non_local_numa = rr_numa_list_next(&rr, non_local_numa);
>             -                    if (!non_local_numa) {
>             -                        VLOG_ERR("There is no available (non-isolated) pmd "
>             -                                 "thread for port \'%s\' queue %d. This queue "
>             -                                 "will not be polled. Is pmd-cpu-mask set to "
>             -                                 "zero? Or are all PMDs isolated to other "
>             -                                 "queues?", netdev_get_name(port->netdev),
>             -                                 qid);
>             -                        continue;
>             -                    }
>             -                    q->pmd = rr_numa_get_pmd(non_local_numa);
>             -                    VLOG_WARN("There's no available (non-isolated) pmd thread "
>             -                              "on numa node %d. Queue %d on port \'%s\' will "
>             -                              "be assigned to the pmd on core %d "
>             -                              "(numa node %d). Expect reduced performance.",
>             -                              numa_id, qid, netdev_get_name(port->netdev),
>             -                              q->pmd->core_id, q->pmd->numa_id);
>             +                if (n_rxqs == 0) {
>     
>     checkpatch lack of whitespace warning above
> 
> [Darrell]
> Ignore this – I rechecked this line and it was fine
>     

np, thanks, as mentioned above, I changed the "Round-robin" that was
causing the issue

>     
>     
>     
>             +                    rxqs = xmalloc(sizeof *rxqs);
>                              } else {
>             -                    /* Assign queue to the next (round-robin) PMD on it's local
>             -                       NUMA node. */
>             -                    q->pmd = rr_numa_get_pmd(numa);
>             +                    rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1));
>                              }
>             +                /* Store the queue. */
>             +                rxqs[n_rxqs++] = q;
>                          }
>                      }
>                  }
>              
>             +    if (n_rxqs > 1) {
>             +        /* Sort the queues in order of the processing cycles
>             +         * they consumed during their last pmd interval. */
>             +        qsort(rxqs, n_rxqs, sizeof *rxqs, rxq_cycle_sort);
>             +    }
>             +
>             +    rr_numa_list_populate(dp, &rr);
>             +    /* Assign the sorted queues to pmds in round robin. */
>             +    for (i = 0; i < n_rxqs; i++) {
>             +        numa_id = netdev_get_numa_id(rxqs[i]->port->netdev);
>             +        numa = rr_numa_list_lookup(&rr, numa_id);
>             +        if (!numa) {
>             +            /* There are no pmds on the queue's local NUMA node.
>             +               Round-robin on the NUMA nodes that do have pmds. */
>             +            non_local_numa = rr_numa_list_next(&rr, non_local_numa);
>             +            if (!non_local_numa) {
>             +                VLOG_ERR("There is no available (non-isolated) pmd "
>             +                         "thread for port \'%s\' queue %d. This queue "
>             +                         "will not be polled. Is pmd-cpu-mask set to "
>             +                         "zero? Or are all PMDs isolated to other "
>             +                         "queues?", netdev_rxq_get_name(rxqs[i]->rx),
>             +                         netdev_rxq_get_queue_id(rxqs[i]->rx));
>             +                continue;
>             +            }
>             +            rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa);
>             +            VLOG_WARN("There's no available (non-isolated) pmd thread "
>             +                      "on numa node %d. Queue %d on port \'%s\' will "
>             +                      "be assigned to the pmd on core %d "
>             +                      "(numa node %d). Expect reduced performance.",
>             +                      numa_id, netdev_rxq_get_queue_id(rxqs[i]->rx),
>             +                      netdev_rxq_get_name(rxqs[i]->rx),
>             +                      rxqs[i]->pmd->core_id, rxqs[i]->pmd->numa_id);
>             +        } else {
>             +        rxqs[i]->pmd = rr_numa_get_pmd(numa);
>             +        VLOG_INFO("Core %d on numa node %d assigned port \'%s\' "
>             +                  "rx queue %d (measured processing cycles %"PRIu64").",
>             +                  rxqs[i]->pmd->core_id, numa_id,
>             +                  netdev_rxq_get_name(rxqs[i]->rx),
>             +                  netdev_rxq_get_queue_id(rxqs[i]->rx),
>             +                  dp_netdev_rxq_get_cycles(rxqs[i], RXQ_CYCLES_PROC_HIST));
>             +        }
>             +    }
>             +
>                  rr_numa_list_destroy(&rr);
>             +    free(rxqs);
>              }
>              
>             -- 
>             1.8.3.1
>             
>             
>         
>         
>     
>     
>     
>     
>     
>     
>     
>     
> 
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>