[ovs-dev] [PATCH v7] netdev-dpdk: Add support for vHost dequeue zero copy (experimental)
Ciara Loftus
ciara.loftus at intel.com
Tue Dec 19 16:44:17 UTC 2017
Enabled per port like so:
ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
If the device is already active at the point of enabling/disabling the
feature via ovs-vsctl, one must reload the guest driver or reboot the VM
in order for the feature to be successfully enabled/disabled.
When packets from a vHost device with zero copy enabled are destined for
a single 'dpdk' port, the number of tx descriptors on that 'dpdk' port
must be set to a smaller value. 128 is recommended. This can be achieved
like so:
ovs-vsctl set Interface dpdkport options:n_txq_desc=128
Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send
to should not exceed 128. Due to this requirement, the feature is
considered 'experimental'.
Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
---
v7:
* Remove support for zc for dpdkvhostuser ports & patch 1 in the series.
* Remove entry from howto Documentation as the document in its current
state is only relevant to dpdkvhostuser ports which do not support the
feature.
* Use ref: link to queue size documentation.
* Elaborate on details of NIC descriptors limitation eg. cases where
packets are destined to >1 NICs.
* Fix comment about padding in netdev-dpdk struct.
Documentation/intro/install/dpdk.rst | 2 +
Documentation/topics/dpdk/vhost-user.rst | 71 +++++++++++++++++++++++
NEWS | 1 +
lib/netdev-dpdk.c | 97 ++++++++++++++++++++++++++++----
vswitchd/vswitch.xml | 13 +++++
5 files changed, 174 insertions(+), 10 deletions(-)
diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst
index 3fecb5c..087eb88 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -518,6 +518,8 @@ The above command sets the number of rx queues for DPDK physical interface.
The rx queues are assigned to pmd threads on the same NUMA node in a
round-robin fashion.
+.. _dpdk-queues-sizes:
+
DPDK Physical Port Queue Sizes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst
index 8447e2d..d82ca40 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -458,3 +458,74 @@ Sample XML
</domain>
.. _QEMU documentation: http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD
+
+vhost-user Dequeue Zero Copy (experimental)
+-------------------------------------------
+
+Normally when dequeuing a packet from a vHost User device, a memcpy operation
+must be used to copy that packet from guest address space to host address
+space. This memcpy can be removed by enabling dequeue zero-copy like so::
+
+ $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
+
+With this feature enabled, a reference (pointer) to the packet is passed to
+the host, instead of a copy of the packet. Removing this memcpy can give a
+performance improvement for some use cases, for example switching large packets
+between different VMs. However additional packet loss may be observed.
+
+Note that the feature is disabled by default and must be explicitly enabled
+by using the command above.
+
+The feature cannot be enabled when the device is active. If the device is
+active when enabling the feature via ovs-vsctl, a driver reload or reboot
+is required for the changes to take effect.
+
+The same logic applies for disabling the feature - it must be disabled when
+the device is inactive. To disable the feature::
+
+ $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=false
+
+The feature is only available to dpdkvhostuserclient port types.
+
+A limitation exists whereby if packets from a vHost port with dq-zero-copy=true
+are destined for a 'dpdk' type port, the number of tx descriptors (n_txq_desc)
+for that port must be reduced to a smaller number, 128 being the recommended
+value. This can be achieved by issuing the following command::
+
+ $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
+
+Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send to
+should not exceed 128. For example, in case of a bond over two physical ports
+in balance-tcp mode, one must divide 128 by the number of links in the bond.
+
+Refer to :ref:`<dpdk-queue-sizes>` for more information.
+
+The reason for this limitation is due to how the zero copy functionality is
+implemented. The vHost device's 'tx used vring', a virtio structure used for
+tracking used ie. sent descriptors, will only be updated when the NIC frees
+the corresponding mbuf. If we don't free the mbufs frequently enough, that
+vring will be starved and packets will no longer be processed. One way to
+ensure we don't encounter this scenario, is to configure n_txq_desc to a small
+enough number such that the 'mbuf free threshold' for the NIC will be hit more
+often and thus free mbufs more frequently. The value of 128 is suggested, but
+values of 64 and 256 have been tested and verified to work too, with differing
+performance characteristics. A value of 512 can be used too, if the virtio
+queue size in the guest is increased to 1024 (available to configure in QEMU
+versions v2.10 and greater). This value can be set like so::
+
+ $ qemu-system-x86_64 ... -chardev socket,id=char1,path=<sockpath>,server
+ -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
+ -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,
+ tx_queue_size=1024
+
+Because of this limitation, this feature is condsidered 'experimental'.
+
+The feature currently does not fully work with QEMU >= v2.7 due to a bug in
+DPDK which will be addressed in an upcoming release. The patch to fix this
+issue can be found on
+`Patchwork
+<http://dpdk.org/dev/patchwork/patch/32198/>`__
+
+Further information can be found in the
+`DPDK documentation
+<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__
diff --git a/NEWS b/NEWS
index 752e98f..1eaa32c 100644
--- a/NEWS
+++ b/NEWS
@@ -24,6 +24,7 @@ Post-v2.8.0
- DPDK:
* Add support for DPDK v17.11
* Add support for vHost IOMMU
+ * Add support for vHost dequeue zero copy (experimental)
v2.8.0 - 31 Aug 2017
--------------------
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 8f22264..4b82a06 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -378,7 +378,10 @@ struct netdev_dpdk {
/* True if vHost device is 'up' and has been reconfigured at least once */
bool vhost_reconfigured;
- /* 3 pad bytes here. */
+
+ /* True if dq-zero-copy feature has successfully been enabled */
+ bool dq_zc_enabled;
+ /* 2 pad bytes here. */
);
PADDED_MEMBERS(CACHE_LINE_SIZE,
@@ -889,6 +892,7 @@ common_construct(struct netdev *netdev, dpdk_port_t port_no,
dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
ovsrcu_index_init(&dev->vid, -1);
dev->vhost_reconfigured = false;
+ dev->dq_zc_enabled = false;
dev->attached = false;
ovsrcu_init(&dev->qos_conf, NULL);
@@ -1371,6 +1375,27 @@ netdev_dpdk_ring_set_config(struct netdev *netdev, const struct smap *args,
return 0;
}
+static bool
+dpdk_vhost_check_zero_copy_config(struct netdev_dpdk *dev,
+ const struct smap *args)
+{
+ bool needs_reconfigure = false;
+ bool zc_requested = smap_get_bool(args, "dq-zero-copy", false);
+
+ if (zc_requested &&
+ !(dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
+ dev->vhost_driver_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
+ needs_reconfigure = true;
+ } else if (!zc_requested &&
+ (dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
+ dev->vhost_driver_flags &= ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
+ needs_reconfigure = true;
+ }
+
+ /* Only try to change ZC mode when device is down */
+ return needs_reconfigure && (netdev_dpdk_get_vid(dev) == -1);
+}
+
static int
netdev_dpdk_vhost_client_set_config(struct netdev *netdev,
const struct smap *args,
@@ -1378,15 +1403,23 @@ netdev_dpdk_vhost_client_set_config(struct netdev *netdev,
{
struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
const char *path;
+ bool needs_reconfigure = false;
ovs_mutex_lock(&dev->mutex);
if (!(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT)) {
path = smap_get(args, "vhost-server-path");
if (path && strcmp(path, dev->vhost_id)) {
strcpy(dev->vhost_id, path);
- netdev_request_reconfigure(netdev);
+ needs_reconfigure = true;
}
}
+
+ needs_reconfigure |= dpdk_vhost_check_zero_copy_config(dev, args);
+
+ if (needs_reconfigure) {
+ netdev_request_reconfigure(netdev);
+ }
+
ovs_mutex_unlock(&dev->mutex);
return 0;
@@ -2719,6 +2752,22 @@ netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev)
}
}
+static bool
+vhost_zero_copy_status_changed(struct netdev_dpdk *dev, bool *enable_zc)
+{
+ if ((dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)
+ && !dev->dq_zc_enabled) {
+ *enable_zc = true;
+ return true;
+ } else if (!(dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)
+ && dev->dq_zc_enabled) {
+ *enable_zc = false;
+ return true;
+ }
+
+ return false;
+}
+
/*
* Remove a virtio-net device from the specific vhost port. Use dev->remove
* flag to stop any more packets from being sent or received to/from a VM and
@@ -2764,6 +2813,7 @@ destroy_device(int vid)
*/
ovsrcu_quiesce_start();
VLOG_INFO("vHost Device '%s' has been removed", ifname);
+ netdev_request_reconfigure(&dev->up);
} else {
VLOG_INFO("vHost Device '%s' not found", ifname);
}
@@ -3278,24 +3328,39 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev)
{
struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
int err;
- uint64_t vhost_flags = 0;
+ uint64_t vhost_flags = dev->vhost_driver_flags;
+ bool enable_zc = false;
+ bool zc_status_changed = false;
ovs_mutex_lock(&dev->mutex);
- /* Configure vHost client mode if requested and if the following criteria
- * are met:
- * 1. Device hasn't been registered yet.
- * 2. A path has been specified.
+ zc_status_changed = vhost_zero_copy_status_changed(dev, &enable_zc);
+
+ /* Reconfigure the device if either:
+ * 1. Device hasn't been registered and a path has been specified.
+ * 2. Requested zero copy mode has changed from current configuration.
*/
- if (!(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT)
- && strlen(dev->vhost_id)) {
- /* Register client-mode device. */
+ if ((!(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT)
+ && strlen(dev->vhost_id))
+ || zc_status_changed) {
+
vhost_flags |= RTE_VHOST_USER_CLIENT;
/* Enable IOMMU support, if explicitly requested. */
if (dpdk_vhost_iommu_enabled()) {
vhost_flags |= RTE_VHOST_USER_IOMMU_SUPPORT;
}
+
+ /* Device has already been registered. Need to unregister */
+ if (dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT) {
+ err = rte_vhost_driver_unregister(dev->vhost_id);
+
+ if (err) {
+ VLOG_ERR("Error unregistering vHost socket %s", dev->vhost_id);
+ goto unlock;
+ }
+ }
+
err = rte_vhost_driver_register(dev->vhost_id, vhost_flags);
if (err) {
VLOG_ERR("vhost-user device setup failure for device %s\n",
@@ -3333,6 +3398,18 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev)
"client port: %s\n", dev->up.name);
goto unlock;
}
+
+ if (zc_status_changed) {
+ if (enable_zc) {
+ dev->dq_zc_enabled = true;
+ VLOG_INFO("Zero copy enabled for vHost port %s",
+ dev->up.name);
+ } else {
+ dev->dq_zc_enabled = false;
+ VLOG_INFO("Zero copy disabled for vHost port %s",
+ dev->up.name);
+ }
+ }
}
err = dpdk_vhost_reconfigure_helper(dev);
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 018d644..e51795c 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -2669,6 +2669,19 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
</p>
</column>
+ <column name="options" key="dq-zero-copy"
+ type='{"type": "boolean"}'>
+ <p>
+ The value specifies whether or not to enable dequeue zero copy on
+ the given interface.
+ The port must be in an inactive state in order to enable or disable
+ this feature. One must reload the guest driver or reboot the VM if
+ they wish to enable/disable the feature when the device is active.
+ Only supported by dpdkvhostuserclient interfaces.
+ The feature is considered experimental.
+ </p>
+ </column>
+
<column name="options" key="n_rxq_desc"
type='{"type": "integer", "minInteger": 1, "maxInteger": 4096}'>
<p>
--
2.7.5
More information about the dev
mailing list