[ovs-dev] [PATCH 4/4] dpif-netdev: Allow cross-NUMA polling on selected ports
anurag2k at gmail.com
anurag2k at gmail.com
Tue Jun 29 11:27:56 UTC 2021
From: Anurag Agarwal <anurag.agarwal at ericsson.com>
Today dpif-netdev considers PMD threads on a non-local NUMA node for
automatic assignment of the rxqs of a port only if there are no local,
non-isolated PMDs.
On typical servers with both physical ports on one NUMA node, this often
leaves the PMDs on the other NUMA node under-utilized, wasting CPU
resources. The alternative, to manually pin the rxqs to PMDs on remote
NUMA nodes, also has drawbacks as it limits OVS' ability to auto
load-balance the rxqs.
This patch introduces a new interface configuration option to allow
ports to be automatically polled by PMDs on any NUMA node:
ovs-vsctl set interface <Name> other_config:cross-numa-polling=true
If this option is not present or set to false, legacy behaviour applies.
Signed-off-by: Anurag Agarwal <anurag.agarwal at ericsson.com>
Signed-off-by: Jan Scheurich <jan.scheurich at ericsson.com>
Signed-off-by: Rudra Surya Bhaskara Rao <rudrasurya.r at acldigital.com>
---
Documentation/topics/dpdk/pmd.rst | 28 ++++++++++++++++++++++++++--
lib/dpif-netdev.c | 35 +++++++++++++++++++++++++----------
tests/pmd.at | 30 ++++++++++++++++++++++++++++++
vswitchd/vswitch.xml | 20 ++++++++++++++++++++
4 files changed, 101 insertions(+), 12 deletions(-)
diff --git a/Documentation/topics/dpdk/pmd.rst b/Documentation/topics/dpdk/pmd.rst
index d63750e..abe1cda 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -78,8 +78,27 @@ To show port/Rx queue assignment::
$ ovs-appctl dpif-netdev/pmd-rxq-show
-Rx queues may be manually pinned to cores. This will change the default Rx
-queue assignment to PMD threads::
+Normally, Rx queues are assigned to PMD threads automatically. By default
+OVS only assigns Rx queues to PMD threads executing on the same NUMA
+node in order to avoid unnecessary latency for accessing packet buffers
+across the NUMA boundary. Typically this overhead is higher for vhostuser
+ports than for physical ports due to the packet copy that is done for all
+rx packets.
+
+On NUMA servers with physical ports only on one NUMA node, the NUMA-local
+polling policy can lead to an under-utilization of the PMD threads on the
+remote NUMA node. For the overall OVS performance it may in such cases be
+beneficial to utilize the spare capacity and allow polling of a physical
+port's rxqs across NUMA nodes despite the overhead involved.
+The policy can be set per port with the following configuration option::
+
+ $ ovs-vsctl set Interface <iface> \
+ other_config:cross-numa-polling=true|false
+
+The default value is false.
+
+Rx queues may also be manually pinned to cores. This will change the default
+Rx queue assignment to PMD threads::
$ ovs-vsctl set Interface <iface> \
other_config:pmd-rxq-affinity=<rxq-affinity-list>
@@ -194,6 +213,11 @@ or can be triggered by using::
Rx queue utilization of the PMD as a percentage. Prior to this, tracking of
stats was not available.
+.. versionchanged:: 2.15.0
+
+ Added the interface parameter ``other_config:cross-numa-polling`` and the
+ ``no-isol`` option for ``pmd-rxq-affinity``.
+
Automatic assignment of Port/Rx Queue to PMD Threads (experimental)
-------------------------------------------------------------------
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 7d9078f..6b9a151 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -478,6 +478,7 @@ struct dp_netdev_port {
bool emc_enabled; /* If true EMC will be used. */
char *type; /* Port type as requested by user. */
char *rxq_affinity_list; /* Requested affinity of rx queues. */
+ bool cross_numa_polling; /* If true cross polling will be enabled */
};
/* Contained by struct dp_netdev_flow's 'stats' member. */
@@ -4548,6 +4549,7 @@ dpif_netdev_port_set_config(struct dpif *dpif, odp_port_t port_no,
int error = 0;
const char *affinity_list = smap_get(cfg, "pmd-rxq-affinity");
bool emc_enabled = smap_get_bool(cfg, "emc-enable", true);
+ bool cross_numa_polling = smap_get_bool(cfg, "cross-numa-polling", false);
ovs_mutex_lock(&dp->port_mutex);
error = get_port_by_number(dp, port_no, &port);
@@ -4555,6 +4557,11 @@ dpif_netdev_port_set_config(struct dpif *dpif, odp_port_t port_no,
goto unlock;
}
+ if (cross_numa_polling != port->cross_numa_polling) {
+ port->cross_numa_polling = cross_numa_polling;
+ dp_netdev_request_reconfigure(dp);
+ }
+
if (emc_enabled != port->emc_enabled) {
struct dp_netdev_pmd_thread *pmd;
struct ds ds = DS_EMPTY_INITIALIZER;
@@ -5173,8 +5180,8 @@ rxq_scheduling(struct dp_netdev *dp, bool dry_run)
struct dp_netdev_port *port;
struct dp_netdev_rxq ** rxqs = NULL;
struct rr_numa_list rr;
- struct rr_numa *numa = NULL;
- struct rr_numa *non_local_numa = NULL;
+ struct rr_numa *local_numa = NULL;
+ struct rr_numa *next_numa = NULL;
int n_rxqs = 0;
int numa_id;
bool assign_cyc = dp->pmd_rxq_assign_cyc;
@@ -5214,12 +5221,20 @@ rxq_scheduling(struct dp_netdev *dp, bool dry_run)
numa_id = netdev_get_numa_id(rxqs[i]->port->netdev);
cycles = dp_netdev_rxq_get_cycles(rxqs[i], RXQ_CYCLES_PROC_HIST);
- numa = rr_numa_list_lookup(&rr, numa_id);
- if (!numa) {
- /* There are no pmds on the queue's local NUMA node.
- Round robin on the NUMA nodes that do have pmds. */
- non_local_numa = rr_numa_list_next(&rr, non_local_numa);
- if (!non_local_numa) {
+ if (!(rxqs[i]->port->cross_numa_polling)) {
+ /* Try to find a local pmd. */
+ local_numa = rr_numa_list_lookup(&rr, numa_id);
+ } else {
+ /* Allow polling by any pmd. */
+ local_numa = NULL;
+ }
+
+ if (!local_numa) {
+ /* Port configured for cross-NUMA polling or there are no pmds
+ * on the queue's local NUMA node.
+ * Round robin on the NUMA nodes that do have pmds. */
+ next_numa = rr_numa_list_next(&rr, next_numa);
+ if (!next_numa) {
if (!dry_run) {
VLOG_ERR("There is no available (non-isolated) pmd "
"thread for port \'%s\' queue %d. This queue "
@@ -5231,7 +5246,7 @@ rxq_scheduling(struct dp_netdev *dp, bool dry_run)
continue;
}
rxqs[i]->pmd =
- rr_numa_assign_least_loaded_pmd(non_local_numa, cycles);
+ rr_numa_assign_least_loaded_pmd(next_numa, cycles);
if (!dry_run) {
VLOG_WARN("There's no available (non-isolated) pmd thread "
"on numa node %d. Queue %d on port \'%s\' will "
@@ -5242,7 +5257,7 @@ rxq_scheduling(struct dp_netdev *dp, bool dry_run)
rxqs[i]->pmd->core_id, rxqs[i]->pmd->numa_id);
}
} else {
- rxqs[i]->pmd = rr_numa_assign_least_loaded_pmd(numa, cycles);
+ rxqs[i]->pmd = rr_numa_assign_least_loaded_pmd(local_numa, cycles);
if (!dry_run) {
if (assign_cyc) {
VLOG_INFO("Core %d on numa node %d assigned port \'%s\' "
diff --git a/tests/pmd.at b/tests/pmd.at
index 57b5fb8..263c722 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -357,6 +357,36 @@ icmp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10
OVS_VSWITCHD_STOP
AT_CLEANUP
+AT_SETUP([PMD - Enable cross numa polling])
+OVS_VSWITCHD_START(
+ [add-port br0 p1 -- set Interface p1 type=dummy-pmd ofport_request=1 options:n_rxq=4 -- \
+ set Open_vSwitch . other_config:pmd-cpu-mask=3
+], [], [], [--dummy-numa 0,1])
+
+AT_CHECK([ovs-ofctl add-flow br0 action=controller])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show | cut -f 3 -d ' ' | sort | uniq], [0], [dnl
+0
+])
+
+dnl Enable cross numa polling and check numa ids
+AT_CHECK([ovs-vsctl set Interface p1 other_config:cross-numa-polling=true])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show | cut -f 3 -d ' ' | sort | uniq], [0], [dnl
+0
+1
+])
+
+dnl Disable cross numa polling and check numa ids
+AT_CHECK([ovs-vsctl set Interface p1 other_config:cross-numa-polling=false])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show | cut -f 3 -d ' ' | sort | uniq], [0], [dnl
+0
+])
+
+OVS_VSWITCHD_STOP(["/|WARN|/d"])
+AT_CLEANUP
+
AT_SETUP([PMD - change numa node])
OVS_VSWITCHD_START(
[add-port br0 p1 -- set Interface p1 type=dummy-pmd ofport_request=1 options:n_rxq=2 -- \
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 97bbb11..7fa7146 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -3252,6 +3252,26 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
</p>
</column>
+ <column name="other_config" key="cross-numa-polling"
+ type='{"type": "boolean"}'>
+ <p>
+ Specifies if the RX queues of the port can be automatically assigned
+ to PMD threads on any NUMA node or only on the local NUMA node of
+ the port.
+ </p>
+ <p>
+ Polling of physical ports from a non-local PMD thread incurs some
+ performance penalty due to the access to packet data across the NUMA
+ barrier. This option can still increase the overall performance if
+ it allows better utilization of those non-local PMDs threads.
+ It is most useful together with the auto load-balancing of RX queues
+ (see other_config:auto_lb in table Open_vSwitch).
+ </p>
+ <p>
+ Defaults to false.
+ </p>
+ </column>
+
<column name="options" key="xdp-mode"
type='{"type": "string",
"enum": ["set", ["best-effort", "native-with-zerocopy",
--
2.7.4
More information about the dev
mailing list