[ovs-dev] [PATCH v6] dpif-netdev: Assign ports to pmds on non-local numa node.
O Mahony, Billy
billy.o.mahony at intel.com
Mon Jun 26 16:40:09 UTC 2017
No problem, Darrell.
My network is acting up at the moment so I'll resubmit the documentation section tomorrow.
/Billy
> -----Original Message-----
> From: Darrell Ball [mailto:dball at vmware.com]
> Sent: Monday, June 26, 2017 5:08 PM
> To: O Mahony, Billy <billy.o.mahony at intel.com>; dev at openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v6] dpif-netdev: Assign ports to pmds on non-
> local numa node.
>
> For the last sentence:
>
> I think
> “In the case such, a queue assignment is made”
> was meant to be
> “In case such a queue assignment is made”
>
>
> On 6/26/17, 8:57 AM, "O Mahony, Billy" <billy.o.mahony at intel.com> wrote:
>
> Sounds good. I'll incorporate those changes.
>
> > -----Original Message-----
> > From: Darrell Ball [mailto:dball at vmware.com]
> > Sent: Monday, June 26, 2017 4:53 PM
> > To: O Mahony, Billy <billy.o.mahony at intel.com>; dev at openvswitch.org
> > Subject: Re: [ovs-dev] [PATCH v6] dpif-netdev: Assign ports to pmds on
> non-
> > local numa node.
> >
> >
> >
> > On 6/26/17, 6:52 AM, "O Mahony, Billy" <billy.o.mahony at intel.com>
> wrote:
> >
> > Hi Darrell,
> >
> >
> >
> > Thanks for reviewing.
> >
> >
> >
> > > -----Original Message-----
> >
> > > From: Darrell Ball [mailto:dball at vmware.com]
> >
> > > Sent: Monday, June 26, 2017 8:04 AM
> >
> > > To: O Mahony, Billy <billy.o.mahony at intel.com>;
> dev at openvswitch.org
> >
> > > Subject: Re: [ovs-dev] [PATCH v6] dpif-netdev: Assign ports to pmds
> on
> > non-
> >
> > > local numa node.
> >
> > >
> >
> > > I think this is helpful in some cases where lower performance is an
> >
> > > acceptable tradeoff with more frugal and/or more flexible usage of
> cpu
> >
> > > resources.
> >
> > >
> >
> > > I did not test it since Ian has already done that, but I reviewed the
> code
> >
> > > change and other related code.
> >
> > >
> >
> > > One comment inline regarding the added documentation.
> >
> > >
> >
> > > Otherwise, Acked-by: Darrell Ball <dlu998 at gmail.com>
> >
> > >
> >
> > >
> >
> > >
> >
> > > On 5/10/17, 8:59 AM, "ovs-dev-bounces at openvswitch.org on behalf
> of
> > Billy
> >
> > > O'Mahony" <ovs-dev-bounces at openvswitch.org on behalf of
> >
> > > billy.o.mahony at intel.com> wrote:
> >
> > >
> >
> > > From: billyom <billy.o.mahony at intel.com>
> >
> > >
> >
> > > Previously if there is no available (non-isolated) pmd on the numa
> node
> >
> > > for a port then the port is not polled at all. This can result in a
> >
> > > non-operational system until such time as nics are physically
> >
> > > repositioned. It is preferable to operate with a pmd on the 'wrong'
> > numa
> >
> > > node albeit with lower performance. Local pmds are still chosen
> when
> >
> > > available.
> >
> > >
> >
> > > Signed-off-by: Billy O'Mahony <billy.o.mahony at intel.com>
> >
> > > ---
> >
> > > v6: Change 'port' to 'queue' in a warning msg
> >
> > > v5: Fix warning msg; Update same in docs
> >
> > > v4: Fix a checkpatch error
> >
> > > v3: Fix warning messages not appearing when using multiqueue
> >
> > > v2: Add details of warning messages into docs
> >
> > >
> >
> > > Documentation/intro/install/dpdk.rst | 10 +++++++++
> >
> > > lib/dpif-netdev.c | 43
> > +++++++++++++++++++++++++++++++---
> >
> > > --
> >
> > > 2 files changed, 48 insertions(+), 5 deletions(-)
> >
> > >
> >
> > > diff --git a/Documentation/intro/install/dpdk.rst
> >
> > > b/Documentation/intro/install/dpdk.rst
> >
> > > index d1c0e65..7a66bff 100644
> >
> > > --- a/Documentation/intro/install/dpdk.rst
> >
> > > +++ b/Documentation/intro/install/dpdk.rst
> >
> > > @@ -460,6 +460,16 @@ affinitized accordingly.
> >
> > > pmd thread on a NUMA node is only created if there is at least
> one
> > DPDK
> >
> > > interface from that NUMA node added to OVS.
> >
> > >
> >
> > > + .. note::
> >
> > > + On NUMA systems PCI devices are also local to a NUMA node.
> Rx
> >
> > > queues for
> >
> > > + PCI device will assigned to a pmd on it's local NUMA node if pmd-
> > cpu-
> >
> > > mask
> >
> > > + has created a pmd thread on that NUMA node. If not the queue
> will
> > be
> >
> > > + assigned to a pmd on a remote NUMA node.
> >
> > >
> >
> > >
> >
> > >
> >
> > > I think the below text is a bit more accurate
> >
> > > + .. note::
> >
> > > + On NUMA systems PCI devices are also local to a NUMA node. Rx
> > queues
> >
> > > for
> >
> > > + a PCI device will be assigned to a pmd on it's local NUMA node.
> >
> >
> >
> > [[BO'M]]
> >
> > Re suggested sentence: "Rx queues for a PCI device will be assigned to
> a
> > pmd on it's local NUMA node."
> >
> > However, with this patch that is no longer always the case. Assignment
> to a
> > pmd on it's local NUMA node will be preferred but it that is not possible
> then
> > assignment will be done to a PMD on a non-local NUMA node.
> >
> >
> >
> > Is the suggested change in order to emphasize that a PMD is created if
> > specified in pmd-cpu-mask but not actually scheduled by the OS unless at
> > least one rxq is assigned to it? That case is covered in the preceeding
> > paragraph (which is not modified by this patch) "A pmd thread on a
> NUMA
> > node is only created if there is at least one DPDK interface from that
> NUMA
> > node added to OVS." Which, now I read it should be modified to read "A
> > PMD thread only becomes runnable if there is at least one DPDK
> interface
> > assigned to it."
> >
> >
> > [Darrell]
> > I unintentionally deleted the below line of text from the beginning of the
> > following paragraph – “my bad”.
> > “If not the queue will be assigned to a pmd on a remote NUMA node. “
> >
> >
> >
> > The full text was intended to be
> >
> > + .. note::
> > + On NUMA systems PCI devices are also local to a NUMA node. Rx
> queues
> > for
> > + a PCI device will be assigned to a pmd on it's local NUMA node. A pmd
> is
> > + created if at least one dpdk interface is added to OVS on that NUMA
> node
> > or
> > + if the pmd-cpu-mask has created a pmd thread on that NUMA node.
> If
> > not the queue will be
> > + assigned to a pmd on a remote NUMA node. This will result in reduced
> > + maximum throughput on that device. In the case such a queue
> > assignment
> > + is made a warning message will be logged: "There's no available (non-
> > + isolated) pmd thread on numa node N. Queue Q on port P will be
> assigned
> > to
> > + the pmd on core C (numa node N'). Expect reduced performance."
> > +
> >
> > which is not totally accurate.
> >
> > Also, as you pointed out, I also added a redundant line from the above
> > paragraph I expected one .. note::
> > and you added a second.
> >
> > This is getting confusing.
> > How about dividing the thread and queue discussions more cleanly and
> > adding more context.
> >
> > .. note::
> > A pmd thread on a NUMA node is only created if there is at least one
> DPDK
> > interface from that NUMA node added to OVS. A pmd thread is
> created by
> > default on a core of a NUMA node or when a specified pmd-cpu-mask
> has
> > indicated so.
> >
> > .. note::
> > On NUMA systems PCI devices are also local to a NUMA node.
> Unbound Rx
> > queues for a PCI device will assigned to a pmd on it's local NUMA node
> if a
> > non-isolated PMD exists on that NUMA node. If not, the queue will be
> > assigned to a non-isolated pmd on a remote NUMA node. This will
> result in
> > reduced maximum throughput on that device. In the case such, a
> queue
> > assignment is made a warning message will be logged: "There's no
> available
> > (non-isolated) pmd thread on numa node N. Queue Q on port P will
> > be assigned to the pmd on core C (numa node N'). Expect reduced
> > performance."
> >
> >
> >
> > A pmd is
> >
> > > + created if at least one dpdk interface is added to OVS on that
> NUMA
> > node
> >
> > > or
> >
> > > + if the pmd-cpu-mask has created a pmd thread on that NUMA
> node.
> >
> >
> >
> > [[BO'M]]
> >
> > Is this part of the suggested change in order to emphasize that a PMD is
> > created if specified in pmd-cpu-mask but not actually scheduled by the
> OS
> > unless at least one rxq is assigned to it? (introduced 2788a1b). That detail
> is
> > covered somewhat in the preceding paragraph (which is not modified by
> this
> > patch) "A pmd thread on a NUMA node is only created if there is at least
> one
> > DPDK interface from that NUMA node added to OVS."
> >
> >
> >
> > Which, now I read it should be modified to actually reflect that. What
> do
> > you think?
> >
> >
> > [Darrell] see above comment
> >
> >
> >
> >
> > >
> >
> > > + This will result in reduced
> >
> > > + maximum throughput on that device. In the case such a queue
> >
> > > assignment
> >
> > > + is made a warning message will be logged: "There's no available
> > (non-
> >
> > > + isolated) pmd thread on numa node N. Queue Q on port P will
> be
> >
> > > assigned to
> >
> > > + the pmd on core C (numa node N'). Expect reduced
> performance."
> >
> > > +
> >
> > > - QEMU vCPU thread Affinity
> >
> > >
> >
> > > A VM performing simple packet forwarding or running complex
> > packet
> >
> > > pipelines
> >
> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >
> > > index b3a0806..34f1963 100644
> >
> > > --- a/lib/dpif-netdev.c
> >
> > > +++ b/lib/dpif-netdev.c
> >
> > > @@ -3149,10 +3149,13 @@ rr_numa_list_lookup(struct
> rr_numa_list
> > *rr,
> >
> > > int numa_id)
> >
> > > }
> >
> > >
> >
> > > static void
> >
> > > -rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list
> *rr)
> >
> > > +rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list
> *rr,
> >
> > > + int *all_numa_ids, unsigned all_numa_ids_sz,
> >
> > > + int *num_ids_written)
> >
> > > {
> >
> > > struct dp_netdev_pmd_thread *pmd;
> >
> > > struct rr_numa *numa;
> >
> > > + unsigned idx = 0;
> >
> > >
> >
> > > hmap_init(&rr->numas);
> >
> > >
> >
> > > @@ -3170,7 +3173,11 @@ rr_numa_list_populate(struct
> dp_netdev
> > *dp,
> >
> > > struct rr_numa_list *rr)
> >
> > > numa->n_pmds++;
> >
> > > numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof
> >
> > > *numa->pmds);
> >
> > > numa->pmds[numa->n_pmds - 1] = pmd;
> >
> > > +
> >
> > > + all_numa_ids[idx % all_numa_ids_sz] = pmd->numa_id;
> >
> > > + idx++;
> >
> > > }
> >
> > > + *num_ids_written = idx;
> >
> > > }
> >
> > >
> >
> > > static struct dp_netdev_pmd_thread *
> >
> > > @@ -3202,8 +3209,15 @@ rxq_scheduling(struct dp_netdev *dp,
> bool
> >
> > > pinned) OVS_REQUIRES(dp->port_mutex)
> >
> > > {
> >
> > > struct dp_netdev_port *port;
> >
> > > struct rr_numa_list rr;
> >
> > > + int all_numa_ids [64];
> >
> > > + int all_numa_ids_sz = sizeof all_numa_ids / sizeof
> all_numa_ids[0];
> >
> > > + unsigned all_numa_ids_idx = 0;
> >
> > > + int all_numa_ids_max_idx = 0;
> >
> > > + int num_numa_ids = 0;
> >
> > >
> >
> > > - rr_numa_list_populate(dp, &rr);
> >
> > > + rr_numa_list_populate(dp, &rr, all_numa_ids, all_numa_ids_sz,
> >
> > > + &num_numa_ids);
> >
> > > + all_numa_ids_max_idx = MIN(num_numa_ids - 1,
> all_numa_ids_sz
> > - 1);
> >
> > >
> >
> > > HMAP_FOR_EACH (port, node, &dp->ports) {
> >
> > > struct rr_numa *numa;
> >
> > > @@ -3234,10 +3248,29 @@ rxq_scheduling(struct dp_netdev *dp,
> bool
> >
> > > pinned) OVS_REQUIRES(dp->port_mutex)
> >
> > > }
> >
> > > } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
> >
> > > if (!numa) {
> >
> > > - VLOG_WARN("There's no available (non isolated) pmd
> > thread "
> >
> > > + if (all_numa_ids_max_idx < 0) {
> >
> > > + VLOG_ERR("There is no available (non-isolated) pmd "
> >
> > > + "thread for port \'%s\' queue %d. This queue "
> >
> > > + "will not be polled. Is pmd-cpu-mask set to "
> >
> > > + "zero? Or are all PMDs isolated to other "
> >
> > > + "queues?", netdev_get_name(port->netdev),
> >
> > > + qid);
> >
> > > + continue;
> >
> > > + }
> >
> > > + int alt_numa_id = all_numa_ids[all_numa_ids_idx];
> >
> > > + struct rr_numa *alt_numa;
> >
> > > + alt_numa = rr_numa_list_lookup(&rr, alt_numa_id);
> >
> > > + q->pmd = rr_numa_get_pmd(alt_numa);
> >
> > > + VLOG_WARN("There's no available (non-isolated) pmd
> > thread "
> >
> > > "on numa node %d. Queue %d on port \'%s\' will "
> >
> > > - "not be polled.",
> >
> > > - numa_id, qid, netdev_get_name(port->netdev));
> >
> > > + "be assigned to the pmd on core %d "
> >
> > > + "(numa node %d). Expect reduced performance.",
> >
> > > + numa_id, qid, netdev_get_name(port->netdev),
> >
> > > + q->pmd->core_id, q->pmd->numa_id);
> >
> > > + all_numa_ids_idx++;
> >
> > > + if (all_numa_ids_idx > all_numa_ids_max_idx) {
> >
> > > + all_numa_ids_idx = 0;
> >
> > > + }
> >
> > > } else {
> >
> > > q->pmd = rr_numa_get_pmd(numa);
> >
> > > }
> >
> > > --
> >
> > > 2.7.4
> >
> > >
> >
> > > _______________________________________________
> >
> > > dev mailing list
> >
> > > dev at openvswitch.org
> >
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__mail.openvswitch.org_mailman_listinfo_ovs-
> >
> 2Ddev&d=DwIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-
> > uZnsw&m=gjJwFxs-
> >
> cPdXvKIlpDPy6Wk6vbKVeJuDjxVKrmLbI_E&s=iK8lUsArOqfRzfqIj6IYOGIAO03
> > YY-a3eKUDZWt18Ig&e=
> >
> > >
> >
> >
> >
> >
>
>
More information about the dev
mailing list