[ovs-dev] dpif-netdev: Assign ports to pmds on non-local numa node.

O Mahony, Billy billy.o.mahony at intel.com
Tue Feb 28 11:43:16 UTC 2017


Hi Ilya,

Thanks for the quick response. You make some good points.

It's important to point out that local pmds are still chosen when available. The change only operates to avoid totally non-operational/ non-polled ports.

This is something we came across when deploying DPDK-enabled OVS in OpenStack environments (OPNFV project). Where we had remote (both physically and administratively) multi-node labs already wired up and would have much preferred to have sub-optimal operation that a non-operational OpenStack environment. 

Some further comments below.

Best Regards,
Billy

> -----Original Message-----
> From: Ilya Maximets [mailto:i.maximets at samsung.com]
> Sent: Tuesday, February 28, 2017 11:21 AM
> To: O Mahony, Billy <billy.o.mahony at intel.com>; dev at openvswitch.org
> Cc: Daniele Di Proietto <diproiettod at ovn.org>
> Subject: Re: [ovs-dev] dpif-netdev: Assign ports to pmds on non-local numa
> node.
> 
> Hello.
> 
> On 28.02.2017 13:12, Billy O'Mahony wrote:
> > From: billyom <billy.o.mahony at intel.com>
> >
> > Previously if there is no available (non-isolated) pmd on the numa
> > node for a port then the port is not polled at all. This can result in
> > a non-operational system until such time as nics are physically
> > repositioned.
> 
> Why you can't just reconfigure your pmd-cpu-mask after NICs' repositioning?
[[BO'M]] The idea is to avoid having to repositioning NICs as they may be in remote data-centers administered by other organisations. Also this can be related to multi-node clusters where it won't be just one NIC on one system that needs to be moved but several NICS/nodes affected.
> Maybe you can use pmd-rxq-affinity to assign port on another NUMA node?
[[BO'M]] The low-performance assignment will only occur if there is no available PMD on the local NUMA node. Ie if possible the normal highly performant assignment is made. It is only when it is a choice between lower performance and total non-performance that the lesser of two evils is chosen. 
> 
> The main concern here is that this 'remote' port will degrade performance of
> other ports served by chosen PMD thread significantly.
[[BO'M]] That is a good point that there are second-order consequences on other ports. But certainly with a many cloud systems even if one port is non-operational it means the entire node is effectively down - for instance an OpenStack compute node with a non-working provider network for it's tenant VMs is useless even though the administration and control networks are still working.
> 
> Best regards, Ilya Maximets.
> 
> > It is preferable to operate with a pmd on the 'wrong' numa node albeit
> > with lower performance. Local pmds are still chosen when available.
> 
> > Signed-off-by: Billy O'Mahony <billy.o.mahony at intel.com>
> > ---
> >  lib/dpif-netdev.c | 35 +++++++++++++++++++++++++++++++----
> >  1 file changed, 31 insertions(+), 4 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> > 30907b7..6d57d8f 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -3070,10 +3070,13 @@ rr_numa_list_lookup(struct rr_numa_list *rr,
> > int numa_id)  }
> >
> >  static void
> > -rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr)
> > +rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr,
> > +                      int *all_numa_ids, unsigned all_numa_ids_sz,
> > +                      int *num_ids_written)
> >  {
> >      struct dp_netdev_pmd_thread *pmd;
> >      struct rr_numa *numa;
> > +    unsigned idx = 0;
> >
> >      hmap_init(&rr->numas);
> >
> > @@ -3091,7 +3094,11 @@ rr_numa_list_populate(struct dp_netdev *dp,
> struct rr_numa_list *rr)
> >          numa->n_pmds++;
> >          numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof
> *numa->pmds);
> >          numa->pmds[numa->n_pmds - 1] = pmd;
> > +
> > +        all_numa_ids[idx % all_numa_ids_sz] = pmd->numa_id;
> > +        idx++;
> >      }
> > +    *num_ids_written = idx;
> >  }
> >
> >  static struct dp_netdev_pmd_thread *
> > @@ -3123,8 +3130,14 @@ rxq_scheduling(struct dp_netdev *dp, bool
> > pinned) OVS_REQUIRES(dp->port_mutex)  {
> >      struct dp_netdev_port *port;
> >      struct rr_numa_list rr;
> > +    int all_numa_ids [64];
> > +    int all_numa_ids_sz = sizeof all_numa_ids / sizeof all_numa_ids[0];
> > +    unsigned all_numa_ids_idx = 0;
> > +    int all_numa_ids_max_idx = 0;
> > +    int num_numa_ids = 0;
> >
> > -    rr_numa_list_populate(dp, &rr);
> > +    rr_numa_list_populate(dp, &rr, all_numa_ids, all_numa_ids_sz,
> &num_numa_ids);
> > +    all_numa_ids_max_idx = MIN(num_numa_ids - 1, all_numa_ids_sz -
> > + 1);
> >
> >      HMAP_FOR_EACH (port, node, &dp->ports) {
> >          struct rr_numa *numa;
> > @@ -3155,10 +3168,24 @@ rxq_scheduling(struct dp_netdev *dp, bool
> pinned) OVS_REQUIRES(dp->port_mutex)
> >                  }
> >              } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
> >                  if (!numa) {
> > +                    if (all_numa_ids_max_idx < 0) {
> > +                        VLOG_ERR("There are no pmd threads. "
> > +                                 "Is pmd-cpu-mask set to zero?");
> > +                        continue;
> > +                    }
> >                      VLOG_WARN("There's no available (non isolated) pmd thread "
> >                                "on numa node %d. Queue %d on port \'%s\' will "
> > -                              "not be polled.",
> > -                              numa_id, qid, netdev_get_name(port->netdev));
> > +                              "be assigned to a pmd on numa node %d. Expect "
> > +                              "reduced performance.",
> > +                              numa_id, qid, netdev_get_name(port->netdev),
> > +                              all_numa_ids[all_numa_ids_idx]);
> > +                    numa_id = all_numa_ids[all_numa_ids_idx];
> > +                    numa = rr_numa_list_lookup(&rr, numa_id);
> > +                    q->pmd = rr_numa_get_pmd(numa);
> > +                    all_numa_ids_idx++;
> > +                    if (all_numa_ids_idx > all_numa_ids_max_idx) {
> > +                        all_numa_ids_idx = 0;
> > +                    }
> >                  } else {
> >                      q->pmd = rr_numa_get_pmd(numa);
> >                  }
> >


More information about the dev mailing list