[ovs-dev] [RFC PATCH 6/6] dpif-netdev: Change rxq_scheduling to use rxq processing cycles.
Stokes, Ian
ian.stokes at intel.com
Fri Jun 2 17:50:00 UTC 2017
> > -----Original Message-----
> > From: ovs-dev-bounces at openvswitch.org [mailto:ovs-dev-
> > bounces at openvswitch.org] On Behalf Of Kevin Traynor
> > Sent: Friday, May 5, 2017 5:34 PM
> > To: dev at openvswitch.org
> > Subject: [ovs-dev] [RFC PATCH 6/6] dpif-netdev: Change rxq_scheduling
> > to use rxq processing cycles.
> >
> > Previously rxqs were distributed to pmds by round robin in port/queue
> > order.
> >
> > Now that we have the processing cycles used for existing rxqs, use
> > that information to try and produced a better balanced distribution of
> > rxqs across pmds. i.e. given multiple pmds, the rxqs which have
> > consumed the largest amount of processing cycles will be placed on
> different pmds.
> >
> > The rxqs are sorted by their processing cycles and distributed (in
> > sorted
> > order) round robin across pmds.
> >
> > Signed-off-by: Kevin Traynor <ktraynor at redhat.com>
> > ---
> > lib/dpif-netdev.c | 67
> > +++++++++++++++++++++++++++++++++++++++++++-------
> > -----
> > 1 file changed, 53 insertions(+), 14 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> > 3131255..89f7d44
> > 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -3259,8 +3259,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr)
> > }
> >
> > +/* Sort Rx Queues by the processing cycles they are consuming. */
> > +static int rxq_cycle_sort(const void *a, const void *b) {
> > + const struct dp_netdev_rxq * qa;
> > + const struct dp_netdev_rxq * qb;
> > +
> > + qa = *(struct dp_netdev_rxq **) a;
> > + qb = *(struct dp_netdev_rxq **) b;
> > +
> > + if (dp_netdev_rxq_get_cyc_last(qa) >=
> > + dp_netdev_rxq_get_cyc_last(qb)) {
> > + return -1;
> > + }
> > +
> > + return 1;
> > +}
> > +
> > /* Assign pmds to queues. If 'pinned' is true, assign pmds to pinned
> > * queues and marks the pmds as isolated. Otherwise, assign non
> isolated
> > * pmds to unpinned queues.
> > *
> > + * If 'pinned' is false queues will be sorted by processing cycles
> > + they are
> > + * consuming and then assigned to pmds in round robin order.
> > + *
> > * The function doesn't touch the pmd threads, it just stores the
> > assignment
> > * in the 'pmd' member of each rxq. */ @@ -3270,18 +3291,14 @@
> > rxq_scheduling(struct dp_netdev *dp, bool pinned)
> > OVS_REQUIRES(dp->port_mutex)
> > struct dp_netdev_port *port;
> > struct rr_numa_list rr;
> > -
> > - rr_numa_list_populate(dp, &rr);
> > + struct dp_netdev_rxq ** rxqs = NULL;
> > + int i, n_rxqs = 0;
> > + struct rr_numa *numa = NULL;
> > + int numa_id;
> >
> > HMAP_FOR_EACH (port, node, &dp->ports) {
> > - struct rr_numa *numa;
> > - int numa_id;
> > -
> > if (!netdev_is_pmd(port->netdev)) {
> > continue;
> > }
> >
> > - numa_id = netdev_get_numa_id(port->netdev);
> > - numa = rr_numa_list_lookup(&rr, numa_id);
> > -
> > for (int qid = 0; qid < port->n_rxq; qid++) {
> > struct dp_netdev_rxq *q = &port->rxqs[qid]; @@ -3301,17
> > +3318,39 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned)
> > OVS_REQUIRES(dp->port_mutex)
> > }
> > } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
> > - if (!numa) {
> > - VLOG_WARN("There's no available (non isolated) pmd
> > thread "
> > - "on numa node %d. Queue %d on port \'%s\'
> > will "
> > - "not be polled.",
> > - numa_id, qid, netdev_get_name(port-
> > >netdev));
> > + if (n_rxqs == 0) {
> > + rxqs = xmalloc(sizeof *rxqs);
> > } else {
> > - q->pmd = rr_numa_get_pmd(numa);
> > + rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs +
> > + 1));
> > }
> > + /* Store the queue. */
> > + rxqs[n_rxqs++] = q;
> > }
> > }
> > }
> >
> > + if (n_rxqs > 1) {
> > + /* Sort the queues in order of the processing cycles
> > + * they consumed during their last pmd interval. */
> > + qsort(rxqs, n_rxqs, sizeof *rxqs, rxq_cycle_sort);
> > + }
> > +
> > + rr_numa_list_populate(dp, &rr);
> > + /* Assign the sorted queues to pmds in round robin. */
> > + for (i = 0; i < n_rxqs; i++) {
> > + numa_id = netdev_get_numa_id(rxqs[i]->port->netdev);
> > + numa = rr_numa_list_lookup(&rr, numa_id);
> > + if (!numa) {
> > + VLOG_WARN("There's no available (non isolated) pmd thread "
> > + "on numa node %d. Queue %d on port \'%s\' will "
> > + "not be polled.",
> > + numa_id, netdev_rxq_get_queue_id(rxqs[i]->rx),
> > + netdev_get_name(rxqs[i]->port->netdev));
> > + continue;
> > + }
> > + rxqs[i]->pmd = rr_numa_get_pmd(numa);
> > + }
> > +
> I found this worked as expected during testing, one issue I noticed though
> was when the wraparound for assigning pmds to queues occurred.
>
> For example assume you have 4 queues (q0,q1,q2,q3) and 3 pmds (pmd1, pmd2,
> pmd3).
>
> Assuming the queues are sorted in terms of the highest processing cycles
> as (q0,q1,q2,q3) they would be assigned PMDs as follows
>
> q0 -> pmd1
> q1 -> pmd2
> q2 -> pmd3
> q4 -> pmd1
Apologies for the typo, this should be 'q3' instead of 'q4' here and below.
>
> At this stage a wraparound occurs and pmd1 is assigned to q4. But now you
> have our highest processing cycles queue on the same pmd as a queue with
> lower processing cycles.
>
> I think it would be good to try and avoid this situation although it would
> need a change to the pmd assignment logic.
>
> How about when you have reached the last pmd assignment, instead of
> starting at the beginning again you start at the last assigned pmd and
> work backwards?
>
> So the assignments in this case would look like
>
> q0 -> pmd1
> q1 -> pmd2
> q2 -> pmd3
> q4 -> pmd3
>
> This would help to avoid/minimize the chances of assigning multiple queues
> to a pmd that already has a queue with a high processing cycle.
>
> > rr_numa_list_destroy(&rr);
> > + free(rxqs);
> > }
> >
> > --
> > 1.8.3.1
> >
> > _______________________________________________
> > dev mailing list
> > dev at openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
More information about the dev
mailing list