[ovs-dev] [PATCH v10] dpif-netdev: Assign ports to pmds on non-local numa node.
Billy O'Mahony
billy.o.mahony at intel.com
Mon Jul 10 12:35:38 UTC 2017
Previously if there is no available (non-isolated) pmd on the numa node
for a port then the port is not polled at all. This can result in a
non-operational system until such time as nics are physically
repositioned. It is preferable to operate with a pmd on the 'wrong' numa
node albeit with lower performance. Local pmds are still chosen when
available.
Signed-off-by: Billy O'Mahony <billy.o.mahony at intel.com>
Signed-off-by: Ilya Maximets <i.maximets at samsung.com>
Co-authored-by: Ilya Maximets <i.maximets at samsung.com>
Tested-by: Ian Stokes <ian.stokes at intel.com>
Acked-by: Ian Stokes <ian.stokes at intel.com>
---
v10: tweak code comments, docs based on reviews
v9: v8 missed some comments on v7
v8: Some coding style issues; doc tweak
v7: Incorporate review comments on docs and implementation
v6: Change 'port' to 'queue' in a warning msg
v5: Fix warning msg; Update same in docs
v4: Fix a checkpatch error
v3: Fix warning messages not appearing when using multiqueue
v2: Add details of warning messages into docs
Documentation/intro/install/dpdk.rst | 21 +++++++++++++++---
lib/dpif-netdev.c | 42 +++++++++++++++++++++++++++++++++---
2 files changed, 57 insertions(+), 6 deletions(-)
diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst
index e83f852..e1f1108 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -449,7 +449,7 @@ affinitized accordingly.
A poll mode driver (pmd) thread handles the I/O of all DPDK interfaces
assigned to it. A pmd thread shall poll the ports for incoming packets,
- switch the packets and send to tx port. pmd thread is CPU bound, and needs
+ switch the packets and send to tx port. A pmd thread is CPU bound, and needs
to be affinitized to isolated cores for optimum performance.
By setting a bit in the mask, a pmd thread is created and pinned to the
@@ -458,8 +458,23 @@ affinitized accordingly.
$ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x4
.. note::
- pmd thread on a NUMA node is only created if there is at least one DPDK
- interface from that NUMA node added to OVS.
+ A pmd thread on a NUMA node is only created if there is at least one DPDK
+ interface from that NUMA node added to OVS. A pmd thread is created by
+ default on a core of a NUMA node or when a specified pmd-cpu-mask has
+ indicated so. Even though a PMD thread may exist, the thread only starts
+ consuming CPU cycles if there is least one receive queue assigned to
+ the pmd.
+
+ .. note::
+ On NUMA systems PCI devices are also local to a NUMA node. Unbound rx
+ queues for a PCI device will be assigned to a pmd on it's local NUMA node
+ if a non-isolated PMD exists on that NUMA node. If not, the queue will be
+ assigned to a non-isolated pmd on a remote NUMA node. This will result in
+ reduced maximum throughput on that device and possibly on other devices
+ assigned to that pmd thread. If such a queue assignment is made a warning
+ message will be logged: "There's no available (non-isolated) pmd thread on
+ numa node N. Queue Q on port P will be assigned to the pmd on core C
+ (numa node N'). Expect reduced performance."
- QEMU vCPU thread Affinity
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 4e29085..07b2033 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3195,6 +3195,24 @@ rr_numa_list_lookup(struct rr_numa_list *rr, int numa_id)
return NULL;
}
+/* Returns the next node in numa list following 'numa' in round-robin fashion.
+ * Returns first node if 'numa' is a null pointer or the last node in 'rr'.
+ * Returns NULL if 'rr' numa list is empty. */
+static struct rr_numa *
+rr_numa_list_next(struct rr_numa_list *rr, const struct rr_numa *numa)
+{
+ struct hmap_node *node = NULL;
+
+ if (numa) {
+ node = hmap_next(&rr->numas, &numa->node);
+ }
+ if (!node) {
+ node = hmap_first(&rr->numas);
+ }
+
+ return (node) ? CONTAINER_OF(node, struct rr_numa, node) : NULL;
+}
+
static void
rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr)
{
@@ -3249,6 +3267,7 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex)
{
struct dp_netdev_port *port;
struct rr_numa_list rr;
+ struct rr_numa *non_local_numa = NULL;
rr_numa_list_populate(dp, &rr);
@@ -3281,11 +3300,28 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex)
}
} else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
if (!numa) {
- VLOG_WARN("There's no available (non isolated) pmd thread "
+ /* There are no pmds on the queue's local NUMA node.
+ Round-robin on the NUMA nodes that do have pmds. */
+ non_local_numa = rr_numa_list_next(&rr, non_local_numa);
+ if (!non_local_numa) {
+ VLOG_ERR("There is no available (non-isolated) pmd "
+ "thread for port \'%s\' queue %d. This queue "
+ "will not be polled. Is pmd-cpu-mask set to "
+ "zero? Or are all PMDs isolated to other "
+ "queues?", netdev_get_name(port->netdev),
+ qid);
+ continue;
+ }
+ q->pmd = rr_numa_get_pmd(non_local_numa);
+ VLOG_WARN("There's no available (non-isolated) pmd thread "
"on numa node %d. Queue %d on port \'%s\' will "
- "not be polled.",
- numa_id, qid, netdev_get_name(port->netdev));
+ "be assigned to the pmd on core %d "
+ "(numa node %d). Expect reduced performance.",
+ numa_id, qid, netdev_get_name(port->netdev),
+ q->pmd->core_id, q->pmd->numa_id);
} else {
+ /* Assign queue to the next (round-robin) PMD on it's local
+ NUMA node. */
q->pmd = rr_numa_get_pmd(numa);
}
}
--
2.7.4
More information about the dev
mailing list