[ovs-dev] [RFC PATCH 6/6] dpif-netdev: Change rxq_scheduling to use rxq processing cycles.

Fri Jun 2 17:50:00 UTC 2017

> > -----Original Message-----
> > From: ovs-dev-bounces at openvswitch.org [mailto:ovs-dev-
> > bounces at openvswitch.org] On Behalf Of Kevin Traynor
> > Sent: Friday, May 5, 2017 5:34 PM
> > To: dev at openvswitch.org
> > Subject: [ovs-dev] [RFC PATCH 6/6] dpif-netdev: Change rxq_scheduling
> > to use rxq processing cycles.
> >
> > Previously rxqs were distributed to pmds by round robin in port/queue
> > order.
> >
> > Now that we have the processing cycles used for existing rxqs, use
> > that information to try and produced a better balanced distribution of
> > rxqs across pmds. i.e. given multiple pmds, the rxqs which have
> > consumed the largest amount of processing cycles will be placed on
> different pmds.
> >
> > The rxqs are sorted by their processing cycles and distributed (in
> > sorted
> > order) round robin across pmds.
> >
> > Signed-off-by: Kevin Traynor <ktraynor at redhat.com>
> > ---
> >  lib/dpif-netdev.c | 67
> > +++++++++++++++++++++++++++++++++++++++++++-------
> > -----
> >  1 file changed, 53 insertions(+), 14 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> > 3131255..89f7d44
> > 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -3259,8 +3259,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr)
> > }
> >
> > +/* Sort Rx Queues by the processing cycles they are consuming. */
> > +static int rxq_cycle_sort(const void *a, const void *b) {
> > +    const struct dp_netdev_rxq * qa;
> > +    const struct dp_netdev_rxq * qb;
> > +
> > +    qa = *(struct dp_netdev_rxq **) a;
> > +    qb = *(struct dp_netdev_rxq **) b;
> > +
> > +    if (dp_netdev_rxq_get_cyc_last(qa) >=
> > +            dp_netdev_rxq_get_cyc_last(qb)) {
> > +        return -1;
> > +    }
> > +
> > +    return 1;
> > +}
> > +
> >  /* Assign pmds to queues.  If 'pinned' is true, assign pmds to pinned
> >   * queues and marks the pmds as isolated.  Otherwise, assign non
> isolated
> >   * pmds to unpinned queues.
> >   *
> > + * If 'pinned' is false queues will be sorted by processing cycles
> > + they are
> > + * consuming and then assigned to pmds in round robin order.
> > + *
> >   * The function doesn't touch the pmd threads, it just stores the
> > assignment
> >   * in the 'pmd' member of each rxq. */ @@ -3270,18 +3291,14 @@
> > rxq_scheduling(struct dp_netdev *dp, bool pinned)
> > OVS_REQUIRES(dp->port_mutex)
> >      struct dp_netdev_port *port;
> >      struct rr_numa_list rr;
> > -
> > -    rr_numa_list_populate(dp, &rr);
> > +    struct dp_netdev_rxq ** rxqs = NULL;
> > +    int i, n_rxqs = 0;
> > +    struct rr_numa *numa = NULL;
> > +    int numa_id;
> >
> >      HMAP_FOR_EACH (port, node, &dp->ports) {
> > -        struct rr_numa *numa;
> > -        int numa_id;
> > -
> >          if (!netdev_is_pmd(port->netdev)) {
> >              continue;
> >          }
> >
> > -        numa_id = netdev_get_numa_id(port->netdev);
> > -        numa = rr_numa_list_lookup(&rr, numa_id);
> > -
> >          for (int qid = 0; qid < port->n_rxq; qid++) {
> >              struct dp_netdev_rxq *q = &port->rxqs[qid]; @@ -3301,17
> > +3318,39 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned)
> > OVS_REQUIRES(dp->port_mutex)
> >                  }
> >              } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
> > -                if (!numa) {
> > -                    VLOG_WARN("There's no available (non isolated) pmd
> > thread "
> > -                              "on numa node %d. Queue %d on port \'%s\'
> > will "
> > -                              "not be polled.",
> > -                              numa_id, qid, netdev_get_name(port-
> > >netdev));
> > +                if (n_rxqs == 0) {
> > +                    rxqs = xmalloc(sizeof *rxqs);
> >                  } else {
> > -                    q->pmd = rr_numa_get_pmd(numa);
> > +                    rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs +
> > + 1));
> >                  }
> > +                /* Store the queue. */
> > +                rxqs[n_rxqs++] = q;
> >              }
> >          }
> >      }
> >
> > +    if (n_rxqs > 1) {
> > +        /* Sort the queues in order of the processing cycles
> > +         * they consumed during their last pmd interval. */
> > +        qsort(rxqs, n_rxqs, sizeof *rxqs, rxq_cycle_sort);
> > +    }
> > +
> > +    rr_numa_list_populate(dp, &rr);
> > +    /* Assign the sorted queues to pmds in round robin. */
> > +    for (i = 0; i < n_rxqs; i++) {
> > +        numa_id = netdev_get_numa_id(rxqs[i]->port->netdev);
> > +        numa = rr_numa_list_lookup(&rr, numa_id);
> > +        if (!numa) {
> > +            VLOG_WARN("There's no available (non isolated) pmd thread "
> > +                      "on numa node %d. Queue %d on port \'%s\' will "
> > +                      "not be polled.",
> > +                      numa_id, netdev_rxq_get_queue_id(rxqs[i]->rx),
> > +                      netdev_get_name(rxqs[i]->port->netdev));
> > +            continue;
> > +        }
> > +        rxqs[i]->pmd = rr_numa_get_pmd(numa);
> > +    }
> > +
> I found this worked as expected during testing, one issue I noticed though
> was when the wraparound for assigning pmds to queues occurred.
> 
> For example assume you have 4 queues (q0,q1,q2,q3) and 3 pmds (pmd1, pmd2,
> pmd3).
> 
> Assuming the queues are sorted in terms of the highest processing cycles
> as (q0,q1,q2,q3)  they would be assigned PMDs as follows
> 
> q0 -> pmd1
> q1 -> pmd2
> q2 -> pmd3
> q4 -> pmd1

Apologies for the typo, this should be 'q3' instead of 'q4' here and below.

> 
> At this stage a wraparound occurs and pmd1 is assigned to q4. But now you
> have our highest processing cycles queue on the same pmd as a queue with
> lower processing cycles.
> 
> I think it would be good to try and avoid this situation although it would
> need a change to the pmd assignment logic.
> 
> How about when you have reached the last pmd assignment, instead of
> starting at the beginning again you start at the last assigned pmd and
> work backwards?
> 
> So the assignments in this case would look like
> 
> q0 -> pmd1
> q1 -> pmd2
> q2 -> pmd3
> q4 -> pmd3
> 
> This would help to avoid/minimize the chances of assigning multiple queues
> to a pmd that already has a queue with a  high processing cycle.
> 
> >      rr_numa_list_destroy(&rr);
> > +    free(rxqs);
> >  }
> >
> > --
> > 1.8.3.1
> >
> > _______________________________________________
> > dev mailing list
> > dev at openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev