[ovs-dev] [PATCH v4] dpif-netdev: Assign ports to pmds on non-local numa node.

Stokes, Ian ian.stokes at intel.com
Wed May 10 13:52:59 UTC 2017


> Hi Ian,
> 
> I'll send a new patch shortly.
> 
> One comment below.
> 
> Thanks for reviewing,
> /Billy.
> 
> > -----Original Message-----
> > From: Stokes, Ian
> > Sent: Wednesday, May 10, 2017 1:19 PM
> > To: O Mahony, Billy <billy.o.mahony at intel.com>; dev at openvswitch.org
> > Subject: RE: [ovs-dev] [PATCH v4] dpif-netdev: Assign ports to pmds on
> > non- local numa node.
> >
> > > From: billyom <billy.o.mahony at intel.com>
> > >
> > > Previously if there is no available (non-isolated) pmd on the numa
> > > node for a port then the port is not polled at all. This can result
> > > in a non- operational system until such time as nics are physically
> > > repositioned. It is preferable to operate with a pmd on the 'wrong'
> > > numa node albeit with lower performance. Local pmds are still chosen
> > when available.
> >
> > Thanks Billy, few minor comments below.
> >
> > >
> > > Signed-off-by: Billy O'Mahony <billy.o.mahony at intel.com>
> > > ---
> > > v4: Fix a checkpatch error
> > > v3: Fix warning messages not appearing when using multiqueue
> > > v2: Add details of warning messages into docs
> > >
> > >  Documentation/intro/install/dpdk.rst | 10 +++++++++
> > >  lib/dpif-netdev.c                    | 40
> > > ++++++++++++++++++++++++++++++++----
> > >  2 files changed, 46 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/Documentation/intro/install/dpdk.rst
> > > b/Documentation/intro/install/dpdk.rst
> > > index d1c0e65..ca73184 100644
> > > --- a/Documentation/intro/install/dpdk.rst
> > > +++ b/Documentation/intro/install/dpdk.rst
> > > @@ -460,6 +460,16 @@ affinitized accordingly.
> > >      pmd thread on a NUMA node is only created if there is at least
> > > one DPDK
> > >      interface from that NUMA node added to OVS.
> > >
> > > +  .. note::
> > > +   On NUMA systems PCI devices are also local to a NUMA node.  Rx
> > > + queues
> > > for
> > > +   PCI device will assigned to a pmd on it's local NUMA node if
> > > + pmd-cpu-
> > > mask
> > > +   has created a pmd thread on that NUMA node.  If not the queue will
> be
> > > +   assigned to a pmd on a remote NUMA node.  This will result in
> reduced
> > > +   maximum throughput on that device.  In the case such a queue
> > > assignment
> > > +   is made a warning message will be logged: "There's no available
> > > +   (non isolated) pmd thread on numa node N. Queue Q on port P will
> > > + be
> > > assigned
> > > +   to a pmd on numa node M. Expect reduced performance."
> >
> > The warning message above is different to what is outputted now by the
> > code (no mention of the core etc).
> > Can you update this?
> >
> > Also a minor nit on formatting, there's a mix of single and double
> > spaces at the start of new sentences throughout the paragraph, can you
> fix these?
> [[BO'M]] The single spaces are used only within the description/quote of
> the error message. Double spaces are used in the help text proper. Or at
> least that is the intention. I'll double check.


Ah, my bad. Thanks for pointing this out.

Ian
> >
> > > +
> > >  - QEMU vCPU thread Affinity
> > >
> > >    A VM performing simple packet forwarding or running complex
> > > packet pipelines diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > > index
> > > b3a0806..bcbd325 100644
> > > --- a/lib/dpif-netdev.c
> > > +++ b/lib/dpif-netdev.c
> > > @@ -3149,10 +3149,13 @@ rr_numa_list_lookup(struct rr_numa_list *rr,
> > > int
> > > numa_id)  }
> > >
> > >  static void
> > > -rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list
> > > *rr)
> > > +rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr,
> > > +                      int *all_numa_ids, unsigned all_numa_ids_sz,
> > > +                      int *num_ids_written)
> > >  {
> > >      struct dp_netdev_pmd_thread *pmd;
> > >      struct rr_numa *numa;
> > > +    unsigned idx = 0;
> > >
> > >      hmap_init(&rr->numas);
> > >
> > > @@ -3170,7 +3173,11 @@ rr_numa_list_populate(struct dp_netdev *dp,
> > > struct rr_numa_list *rr)
> > >          numa->n_pmds++;
> > >          numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof
> > > *numa-
> > > >pmds);
> > >          numa->pmds[numa->n_pmds - 1] = pmd;
> > > +
> > > +        all_numa_ids[idx % all_numa_ids_sz] = pmd->numa_id;
> > > +        idx++;
> > >      }
> > > +    *num_ids_written = idx;
> > >  }
> > >
> > >  static struct dp_netdev_pmd_thread * @@ -3202,8 +3209,15 @@
> > > rxq_scheduling(struct dp_netdev *dp, bool
> > > pinned)
> > > OVS_REQUIRES(dp->port_mutex)  {
> > >      struct dp_netdev_port *port;
> > >      struct rr_numa_list rr;
> > > +    int all_numa_ids [64];
> > > +    int all_numa_ids_sz = sizeof all_numa_ids / sizeof
> all_numa_ids[0];
> > > +    unsigned all_numa_ids_idx = 0;
> > > +    int all_numa_ids_max_idx = 0;
> > > +    int num_numa_ids = 0;
> > >
> > > -    rr_numa_list_populate(dp, &rr);
> > > +    rr_numa_list_populate(dp, &rr, all_numa_ids, all_numa_ids_sz,
> > > +                          &num_numa_ids);
> > > +    all_numa_ids_max_idx = MIN(num_numa_ids - 1, all_numa_ids_sz -
> > > + 1);
> > >
> > >      HMAP_FOR_EACH (port, node, &dp->ports) {
> > >          struct rr_numa *numa;
> > > @@ -3234,10 +3248,28 @@ rxq_scheduling(struct dp_netdev *dp, bool
> > > pinned)
> > > OVS_REQUIRES(dp->port_mutex)
> > >                  }
> > >              } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
> > >                  if (!numa) {
> > > +                    if (all_numa_ids_max_idx < 0) {
> > > +                        VLOG_ERR("There is no available
> > > + (non-isolated)
> > > pmd "
> > > +                                 "thread for port \'%s\'. This port
> > > + will
> > > "
> > > +                                 "not be polled. Is pmd-cpu-mask set
> to "
> > > +                                 "zero? Or are all PMDs isolated to
> > > + other
> > > "
> > > +                                 "queues?", netdev_get_name(port-
> > > >netdev));
> >
> > Could you identify the queue in the ERR above? As it is now the ERR
> > will repeat for each queue that cannot be assigned a pmd on the port
> > without any discernible differences between the ERR messages.
> >
> > > +                        continue;
> > > +                    }
> > > +                    int alt_numa_id = all_numa_ids[all_numa_ids_idx];
> > > +                    struct rr_numa *alt_numa;
> > > +                    alt_numa = rr_numa_list_lookup(&rr, alt_numa_id);
> > > +                    q->pmd = rr_numa_get_pmd(alt_numa);
> > >                      VLOG_WARN("There's no available (non isolated)
> > > pmd thread "
> > >                                "on numa node %d. Queue %d on port
> \'%s\'
> > > will "
> > > -                              "not be polled.",
> > > -                              numa_id, qid, netdev_get_name(port-
> > > >netdev));
> > > +                              "be assigned to the pmd on core %d "
> > > +                              "(numa node %d). Expect reduced
> > > performance.",
> > > +                              numa_id, qid, netdev_get_name(port-
> > > >netdev),
> > > +                              q->pmd->core_id, q->pmd->numa_id);
> > > +                    all_numa_ids_idx++;
> > > +                    if (all_numa_ids_idx > all_numa_ids_max_idx) {
> > > +                        all_numa_ids_idx = 0;
> > > +                    }
> > >                  } else {
> > >                      q->pmd = rr_numa_get_pmd(numa);
> > >                  }
> > > --
> > > 2.7.4
> > >
> > > _______________________________________________
> > > dev mailing list
> > > dev at openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list