[ovs-dev] [PATCH v6] dpif-netdev: Assign ports to pmds on non-local numa node.
Ilya Maximets
i.maximets at samsung.com
Mon Jun 26 13:48:52 UTC 2017
I don't like the implementation. The bunch of these all_numa_ids*
variables looks completely unreadable.
'all_numa_ids [64];' contains same numa_ids and will be overwritten
from the start if we have more than 64 PMD threads. => possible
broken logic and mapping of all the non-local ports to the same node.
Also, the main concern is that we already have all the required
information about NUMA nodes in 'rr_numa_list rr'. All we need is
to properly iterate over it.
How about something like this (not fully tested):
>------------------------------------------------------------------<
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 4e29085..d17d7e4 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3195,6 +3195,23 @@ rr_numa_list_lookup(struct rr_numa_list *rr, int numa_id)
return NULL;
}
+/* Returns next NUMA from rr list in round-robin fashion. Returns the first
+ * NUMA node if 'NULL' or the last node passed, and 'NULL' if list is empty. */
+static struct rr_numa *
+rr_numa_list_next(struct rr_numa_list *rr, const struct rr_numa *numa)
+{
+ struct hmap_node *node = NULL;
+
+ if (numa) {
+ node = hmap_next(&rr->numas, &numa->node);
+ }
+ if (!node) {
+ node = hmap_first(&rr->numas);
+ }
+
+ return (node) ? CONTAINER_OF(node, struct rr_numa, node) : NULL;
+}
+
static void
rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr)
{
@@ -3249,6 +3266,7 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex)
{
struct dp_netdev_port *port;
struct rr_numa_list rr;
+ struct rr_numa *last_used_nonlocal_numa = NULL;
rr_numa_list_populate(dp, &rr);
@@ -3281,10 +3299,26 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex)
}
} else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
if (!numa) {
- VLOG_WARN("There's no available (non isolated) pmd thread "
+ numa = rr_numa_list_next(&rr, last_used_nonlocal_numa);
+ if (!numa) {
+ VLOG_ERR("There is no available (non-isolated) pmd "
+ "thread for port \'%s\' queue %d. This queue "
+ "will not be polled. Is pmd-cpu-mask set to "
+ "zero? Or are all PMDs isolated to other "
+ "queues?", netdev_get_name(port->netdev),
+ qid);
+ continue;
+ }
+
+ q->pmd = rr_numa_get_pmd(numa);
+ VLOG_WARN("There's no available (non-isolated) pmd thread "
"on numa node %d. Queue %d on port \'%s\' will "
- "not be polled.",
- numa_id, qid, netdev_get_name(port->netdev));
+ "be assigned to the pmd on core %d "
+ "(numa node %d). Expect reduced performance.",
+ numa_id, qid, netdev_get_name(port->netdev),
+ q->pmd->core_id, q->pmd->numa_id);
+
+ last_used_nonlocal_numa = numa;
} else {
q->pmd = rr_numa_get_pmd(numa);
}
>------------------------------------------------------------------<
Best regards, Ilya Maximets.
On 13.06.2017 12:01, O Mahony, Billy wrote:
> Hi All,
>
> Does anyone else have any comments on this patch?
>
> I'm adding Ilya and Jan to the CC as I believe you both had comments on this previously. Apologies if I've forgotten anyone else that commented from the CC!
>
> Regards,
> /Billy
>
>> -----Original Message-----
>> From: Stokes, Ian
>> Sent: Thursday, May 11, 2017 12:09 PM
>> To: O Mahony, Billy <billy.o.mahony at intel.com>; dev at openvswitch.org
>> Subject: RE: [ovs-dev] [PATCH v6] dpif-netdev: Assign ports to pmds on non-
>> local numa node.
>>
>>> Previously if there is no available (non-isolated) pmd on the numa
>>> node for a port then the port is not polled at all. This can result in
>>> a non- operational system until such time as nics are physically
>>> repositioned. It is preferable to operate with a pmd on the 'wrong'
>>> numa node albeit with lower performance. Local pmds are still chosen
>> when available.
>>>
>>> Signed-off-by: Billy O'Mahony <billy.o.mahony at intel.com>
>>> ---
>>> v6: Change 'port' to 'queue' in a warning msg
>>> v5: Fix warning msg; Update same in docs
>>> v4: Fix a checkpatch error
>>> v3: Fix warning messages not appearing when using multiqueue
>>> v2: Add details of warning messages into docs
>>>
>>> Documentation/intro/install/dpdk.rst | 10 +++++++++
>>> lib/dpif-netdev.c | 43
>>> +++++++++++++++++++++++++++++++-----
>>> 2 files changed, 48 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/Documentation/intro/install/dpdk.rst
>>> b/Documentation/intro/install/dpdk.rst
>>> index d1c0e65..7a66bff 100644
>>> --- a/Documentation/intro/install/dpdk.rst
>>> +++ b/Documentation/intro/install/dpdk.rst
>>> @@ -460,6 +460,16 @@ affinitized accordingly.
>>> pmd thread on a NUMA node is only created if there is at least
>>> one DPDK
>>> interface from that NUMA node added to OVS.
>>>
>>> + .. note::
>>> + On NUMA systems PCI devices are also local to a NUMA node. Rx
>>> + queues
>>> for
>>> + PCI device will assigned to a pmd on it's local NUMA node if
>>> + pmd-cpu-
>>> mask
>>> + has created a pmd thread on that NUMA node. If not the queue will be
>>> + assigned to a pmd on a remote NUMA node. This will result in reduced
>>> + maximum throughput on that device. In the case such a queue
>>> assignment
>>> + is made a warning message will be logged: "There's no available (non-
>>> + isolated) pmd thread on numa node N. Queue Q on port P will be
>>> assigned to
>>> + the pmd on core C (numa node N'). Expect reduced performance."
>>> +
>>> - QEMU vCPU thread Affinity
>>>
>>> A VM performing simple packet forwarding or running complex packet
>>> pipelines diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
>>> b3a0806..34f1963 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -3149,10 +3149,13 @@ rr_numa_list_lookup(struct rr_numa_list *rr,
>>> int
>>> numa_id) }
>>>
>>> static void
>>> -rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr)
>>> +rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr,
>>> + int *all_numa_ids, unsigned all_numa_ids_sz,
>>> + int *num_ids_written)
>>> {
>>> struct dp_netdev_pmd_thread *pmd;
>>> struct rr_numa *numa;
>>> + unsigned idx = 0;
>>>
>>> hmap_init(&rr->numas);
>>>
>>> @@ -3170,7 +3173,11 @@ rr_numa_list_populate(struct dp_netdev *dp,
>>> struct rr_numa_list *rr)
>>> numa->n_pmds++;
>>> numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof
>>> *numa-
>>>> pmds);
>>> numa->pmds[numa->n_pmds - 1] = pmd;
>>> +
>>> + all_numa_ids[idx % all_numa_ids_sz] = pmd->numa_id;
>>> + idx++;
>>> }
>>> + *num_ids_written = idx;
>>> }
>>>
>>> static struct dp_netdev_pmd_thread *
>>> @@ -3202,8 +3209,15 @@ rxq_scheduling(struct dp_netdev *dp, bool
>>> pinned)
>>> OVS_REQUIRES(dp->port_mutex) {
>>> struct dp_netdev_port *port;
>>> struct rr_numa_list rr;
>>> + int all_numa_ids [64];
>>> + int all_numa_ids_sz = sizeof all_numa_ids / sizeof all_numa_ids[0];
>>> + unsigned all_numa_ids_idx = 0;
>>> + int all_numa_ids_max_idx = 0;
>>> + int num_numa_ids = 0;
>>>
>>> - rr_numa_list_populate(dp, &rr);
>>> + rr_numa_list_populate(dp, &rr, all_numa_ids, all_numa_ids_sz,
>>> + &num_numa_ids);
>>> + all_numa_ids_max_idx = MIN(num_numa_ids - 1, all_numa_ids_sz -
>>> + 1);
>>>
>>> HMAP_FOR_EACH (port, node, &dp->ports) {
>>> struct rr_numa *numa;
>>> @@ -3234,10 +3248,29 @@ rxq_scheduling(struct dp_netdev *dp, bool
>>> pinned)
>>> OVS_REQUIRES(dp->port_mutex)
>>> }
>>> } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
>>> if (!numa) {
>>> - VLOG_WARN("There's no available (non isolated) pmd
>>> thread "
>>> + if (all_numa_ids_max_idx < 0) {
>>> + VLOG_ERR("There is no available
>>> + (non-isolated)
>>> pmd "
>>> + "thread for port \'%s\' queue %d.
>>> + This
>>> queue "
>>> + "will not be polled. Is pmd-cpu-mask
>>> + set
>>> to "
>>> + "zero? Or are all PMDs isolated to
>>> + other
>>> "
>>> + "queues?", netdev_get_name(port-
>>>> netdev),
>>> + qid);
>>> + continue;
>>> + }
>>> + int alt_numa_id = all_numa_ids[all_numa_ids_idx];
>>> + struct rr_numa *alt_numa;
>>> + alt_numa = rr_numa_list_lookup(&rr, alt_numa_id);
>>> + q->pmd = rr_numa_get_pmd(alt_numa);
>>> + VLOG_WARN("There's no available (non-isolated)
>>> + pmd
>>> thread "
>>> "on numa node %d. Queue %d on port \'%s\'
>>> will "
>>> - "not be polled.",
>>> - numa_id, qid, netdev_get_name(port-
>>>> netdev));
>>> + "be assigned to the pmd on core %d "
>>> + "(numa node %d). Expect reduced
>>> performance.",
>>> + numa_id, qid, netdev_get_name(port-
>>>> netdev),
>>> + q->pmd->core_id, q->pmd->numa_id);
>>> + all_numa_ids_idx++;
>>> + if (all_numa_ids_idx > all_numa_ids_max_idx) {
>>> + all_numa_ids_idx = 0;
>>> + }
>>> } else {
>>> q->pmd = rr_numa_get_pmd(numa);
>>> }
>>> --
>>> 2.7.4
>>>
>>
>> Thanks Billy, LGTM and tested ok.
>>
>> Acked-by: Ian Stokes <ian.stokes at intel.com>
>>
>>> _______________________________________________
>>> dev mailing list
>>> dev at openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
>
More information about the dev
mailing list