[ovs-dev] [RFC PATCH 6/6] dpif-netdev: Change rxq_scheduling to use rxq processing cycles.

Stokes, Ian ian.stokes at intel.com
Fri Jun 2 17:40:53 UTC 2017


> -----Original Message-----
> From: ovs-dev-bounces at openvswitch.org [mailto:ovs-dev-
> bounces at openvswitch.org] On Behalf Of Kevin Traynor
> Sent: Friday, May 5, 2017 5:34 PM
> To: dev at openvswitch.org
> Subject: [ovs-dev] [RFC PATCH 6/6] dpif-netdev: Change rxq_scheduling to
> use rxq processing cycles.
> 
> Previously rxqs were distributed to pmds by round robin in port/queue
> order.
> 
> Now that we have the processing cycles used for existing rxqs, use that
> information to try and produced a better balanced distribution of rxqs
> across pmds. i.e. given multiple pmds, the rxqs which have consumed the
> largest amount of processing cycles will be placed on different pmds.
> 
> The rxqs are sorted by their processing cycles and distributed (in sorted
> order) round robin across pmds.
> 
> Signed-off-by: Kevin Traynor <ktraynor at redhat.com>
> ---
>  lib/dpif-netdev.c | 67 +++++++++++++++++++++++++++++++++++++++++++-------
> -----
>  1 file changed, 53 insertions(+), 14 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 3131255..89f7d44
> 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3259,8 +3259,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr)  }
> 
> +/* Sort Rx Queues by the processing cycles they are consuming. */
> +static int rxq_cycle_sort(const void *a, const void *b) {
> +    const struct dp_netdev_rxq * qa;
> +    const struct dp_netdev_rxq * qb;
> +
> +    qa = *(struct dp_netdev_rxq **) a;
> +    qb = *(struct dp_netdev_rxq **) b;
> +
> +    if (dp_netdev_rxq_get_cyc_last(qa) >=
> +            dp_netdev_rxq_get_cyc_last(qb)) {
> +        return -1;
> +    }
> +
> +    return 1;
> +}
> +
>  /* Assign pmds to queues.  If 'pinned' is true, assign pmds to pinned
>   * queues and marks the pmds as isolated.  Otherwise, assign non isolated
>   * pmds to unpinned queues.
>   *
> + * If 'pinned' is false queues will be sorted by processing cycles they
> + are
> + * consuming and then assigned to pmds in round robin order.
> + *
>   * The function doesn't touch the pmd threads, it just stores the
> assignment
>   * in the 'pmd' member of each rxq. */
> @@ -3270,18 +3291,14 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned)
> OVS_REQUIRES(dp->port_mutex)
>      struct dp_netdev_port *port;
>      struct rr_numa_list rr;
> -
> -    rr_numa_list_populate(dp, &rr);
> +    struct dp_netdev_rxq ** rxqs = NULL;
> +    int i, n_rxqs = 0;
> +    struct rr_numa *numa = NULL;
> +    int numa_id;
> 
>      HMAP_FOR_EACH (port, node, &dp->ports) {
> -        struct rr_numa *numa;
> -        int numa_id;
> -
>          if (!netdev_is_pmd(port->netdev)) {
>              continue;
>          }
> 
> -        numa_id = netdev_get_numa_id(port->netdev);
> -        numa = rr_numa_list_lookup(&rr, numa_id);
> -
>          for (int qid = 0; qid < port->n_rxq; qid++) {
>              struct dp_netdev_rxq *q = &port->rxqs[qid]; @@ -3301,17
> +3318,39 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned)
> OVS_REQUIRES(dp->port_mutex)
>                  }
>              } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
> -                if (!numa) {
> -                    VLOG_WARN("There's no available (non isolated) pmd
> thread "
> -                              "on numa node %d. Queue %d on port \'%s\'
> will "
> -                              "not be polled.",
> -                              numa_id, qid, netdev_get_name(port-
> >netdev));
> +                if (n_rxqs == 0) {
> +                    rxqs = xmalloc(sizeof *rxqs);
>                  } else {
> -                    q->pmd = rr_numa_get_pmd(numa);
> +                    rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1));
>                  }
> +                /* Store the queue. */
> +                rxqs[n_rxqs++] = q;
>              }
>          }
>      }
> 
> +    if (n_rxqs > 1) {
> +        /* Sort the queues in order of the processing cycles
> +         * they consumed during their last pmd interval. */
> +        qsort(rxqs, n_rxqs, sizeof *rxqs, rxq_cycle_sort);
> +    }
> +
> +    rr_numa_list_populate(dp, &rr);
> +    /* Assign the sorted queues to pmds in round robin. */
> +    for (i = 0; i < n_rxqs; i++) {
> +        numa_id = netdev_get_numa_id(rxqs[i]->port->netdev);
> +        numa = rr_numa_list_lookup(&rr, numa_id);
> +        if (!numa) {
> +            VLOG_WARN("There's no available (non isolated) pmd thread "
> +                      "on numa node %d. Queue %d on port \'%s\' will "
> +                      "not be polled.",
> +                      numa_id, netdev_rxq_get_queue_id(rxqs[i]->rx),
> +                      netdev_get_name(rxqs[i]->port->netdev));
> +            continue;
> +        }
> +        rxqs[i]->pmd = rr_numa_get_pmd(numa);
> +    }
> +
I found this worked as expected during testing, one issue I noticed though was when the wraparound for assigning pmds to queues occurred.

For example assume you have 4 queues (q0,q1,q2,q3) and 3 pmds (pmd1, pmd2, pmd3).

Assuming the queues are sorted in terms of the highest processing cycles as (q0,q1,q2,q3)  they would be assigned PMDs as follows

q0 -> pmd1
q1 -> pmd2
q2 -> pmd3
q4 -> pmd1

At this stage a wraparound occurs and pmd1 is assigned to q4. But now you have our highest processing cycles queue on the same pmd as a queue with lower processing cycles.

I think it would be good to try and avoid this situation although it would need a change to the pmd assignment logic.

How about when you have reached the last pmd assignment, instead of starting at the beginning again you start at the last assigned pmd and work backwards?

So the assignments in this case would look like

q0 -> pmd1
q1 -> pmd2
q2 -> pmd3
q4 -> pmd3

This would help to avoid/minimize the chances of assigning multiple queues to a pmd that already has a queue with a  high processing cycle.

>      rr_numa_list_destroy(&rr);
> +    free(rxqs);
>  }
> 
> --
> 1.8.3.1
> 
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list