[ovs-dev] [PATCH v6 2/2] netdev-dpdk: Enable optional dequeue zero copy for vHost User
Jan Scheurich
jan.scheurich at ericsson.com
Fri Dec 15 14:24:40 UTC 2017
> From: ovs-dev-bounces at openvswitch.org [mailto:ovs-dev-bounces at openvswitch.org] On Behalf Of Ciara Loftus
> Sent: Friday, 15 December, 2017 14:14
> To: dev at openvswitch.org
> Subject: [ovs-dev] [PATCH v6 2/2] netdev-dpdk: Enable optional dequeue zero copy for vHost User
>
> Enabled per port like so:
> ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
>
> The feature is disabled by default and can only be enabled/disabled when
> a vHost port is down.
>
> When packets from a vHost device with zero copy enabled are destined for
> a 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be
> set to a smaller value. 128 is recommended. This can be achieved like
> so:
We should clarify here that the sum of the tx descriptors of all 'dpdk' ports the VM will send to should not exceed 128. In case of a bond over two physical ports in balance-tcp mode, you would, for example have to divide 128 by the number of links in the bond.
>
> ovs-vsctl set Interface dpdkport options:n_txq_desc=128
>
> Due to the requirement above, the feature is considered 'experimental'.
>
> Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
> ---
> v6:
> * Note the feature is experimental.
> * Mention bug in DPDK & temporary requirement on QEMU < v2.7
>
> Documentation/howto/dpdk.rst | 33 ++++++++++++
> Documentation/topics/dpdk/vhost-user.rst | 68 ++++++++++++++++++++++++
> NEWS | 2 +
> lib/netdev-dpdk.c | 89 +++++++++++++++++++++++++++++++-
> vswitchd/vswitch.xml | 11 ++++
> 5 files changed, 202 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index d123819..3e1b8f8 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -709,3 +709,36 @@ devices to bridge ``br0``. Once complete, follow the below steps:
> Check traffic on multiple queues::
>
> $ cat /proc/interrupts | grep virtio
> +
> +PHY-VM-PHY (vHost Dequeue Zero Copy)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +vHost dequeue zero copy functionality can be validated using the
> +PHY-VM-PHY configuration. To begin, follow the steps described in
> +:ref:`dpdk-phy-phy` to create and initialize the database, start
> +ovs-vswitchd and add ``dpdk``-type and ``dpdkvhostuser``-type devices
> +and flows to bridge ``br0``. Once complete, follow the below steps:
> +
> +1. Enable dequeue zero copy on the vHost devices.
> +
> + $ ovs-vsctl set Interface dpdkvhostuser0 options:dq-zero-copy=true
> + $ ovs-vsctl set Interface dpdkvhostuser1 options:dq-zero-copy=true
> +
> +The following log should be observed for each device:
> +
> + netdev_dpdk|INFO|Zero copy enabled for vHost socket <name>
> +
> +2. Reduce the number of txq descriptors on the phy ports.
> +
> + $ ovs-vsctl set Interface phy0 options:n_txq_desc=128
> + $ ovs-vsctl set Interface phy1 options:n_txq_desc=128
> +
> +3. Proceed with the test by launching the VM and configuring guest
> +forwarding, be it via the vHost loopback method or kernel forwarding
> +method, and sending traffic. The following log should be oberved for
> +each device as it becomes active during VM boot:
> +
> + VHOST_CONFIG: dequeue zero copy is enabled
> +
> +It is essential that step 1 is performed before booting the VM, otherwise
> +the feature will not be enabled.
> diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst
> index 8447e2d..5a65360 100644
> --- a/Documentation/topics/dpdk/vhost-user.rst
> +++ b/Documentation/topics/dpdk/vhost-user.rst
> @@ -458,3 +458,71 @@ Sample XML
> </domain>
>
> .. _QEMU documentation: http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD
> +
> +vhost-user Dequeue Zero Copy (experimental)
> +-------------------------------------------
> +
> +Normally when dequeuing a packet from a vHost User device, a memcpy operation
> +must be used to copy that packet from guest address space to host address
> +space. This memcpy can be removed by enabling dequeue zero-copy like so:
> +
> + $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
> +
> +With this feature enabled, a reference (pointer) to the packet is passed to
> +the host, instead of a copy of the packet. Removing this memcpy can give a
> +performance improvement for some use cases, for example switching large packets
> +between different VMs. However additional packet loss may be observed.
> +
> +Note that the feature is disabled by default and must be explicitly enabled
> +by using the command above.
> +
> +The feature cannot be enabled when the device is active (ie. VM booted). If
> +you wish to enable the feature after the VM has booted, you must shutdown
> +the VM and bring it back up.
> +
> +The same logic applies for disabling the feature - it must be disabled when
> +the device is inactive, for example before VM boot. To disable the feature:
> +
> + $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=false
> +
> +The feature is available to both dpdkvhostuser and dpdkvhostuserclient port
> +types.
> +
> +A limitation exists whereby if packets from a vHost port with dq-zero-copy=true
> +are destined for a 'dpdk' type port, the number of tx descriptors (n_txq_desc)
> +for that port must be reduced to a smaller number, 128 being the recommended
> +value. This can be achieved by issuing the following command:
> +
> + $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> +
> +More information on the n_txq_desc option can be found in the "DPDK Physical
> +Port Queue Sizes" section of the `intro/install/dpdk.rst` guide.
> +
> +The reason for this limitation is due to how the zero copy functionality is
> +implemented. The vHost device's 'tx used vring', a virtio structure used for
> +tracking used ie. sent descriptors, will only be updated when the NIC frees
> +the corresponding mbuf. If we don't free the mbufs frequently enough, that
> +vring will be starved and packets will no longer be processed. One way to
> +ensure we don't encounter this scenario, is to configure n_txq_desc to a small
> +enough number such that the 'mbuf free threshold' for the NIC will be hit more
> +often and thus free mbufs more frequently. The value of 128 is suggested, but
> +values of 64 and 256 have been tested and verified to work too, with differing
> +performance characteristics. A value of 512 can be used too, if the virtio
> +queue size in the guest is increased to 1024 (available to configure in QEMU
> +versions v2.10 and greater). This value can be set like so:
> +
> + $ qemu-system-x86_64 ... -chardev socket,id=char1,path=<sockpath>,server
> + -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
> + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,
> + tx_queue_size=1024
> +
> +Because of this limitation, this feature is condsidered 'experimental'.
> +
> +The feature currently does not fully work with QEMU >= v2.7 due to a bug in
> +DPDK which will be addressed in an upcoming release. The patch to fix this
> +issue can be found here:
> +http://dpdk.org/dev/patchwork/patch/32198/
> +
> +Further information can be found in the
> +`DPDK documentation
> +<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__
> diff --git a/NEWS b/NEWS
> index 49d2fa5..5921b50 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -23,6 +23,8 @@ Post-v2.8.0
> - DPDK:
> * Add support for DPDK v17.11
> * Add support for vHost IOMMU
> + * Optional dequeue zero copy feature for vHost ports (experimental) can be
> + enabled per port via the boolean 'dq-zero-copy' option.
Shorten to
* Add support for vHost deque zero copy (experimental)
>
> v2.8.0 - 31 Aug 2017
> --------------------
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 5068d4e..b52e993 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -379,6 +379,9 @@ struct netdev_dpdk {
> /* True if vHost device is 'up' and has been reconfigured at least once */
> bool vhost_reconfigured;
> /* 3 pad bytes here. */
> +
> + /* True if dq-zero-copy feature has successfully been enabled */
> + bool dq_zc_enabled;
> );
>
> PADDED_MEMBERS(CACHE_LINE_SIZE,
> @@ -889,6 +892,7 @@ common_construct(struct netdev *netdev, dpdk_port_t port_no,
> dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> ovsrcu_index_init(&dev->vid, -1);
> dev->vhost_reconfigured = false;
> + dev->dq_zc_enabled = false;
> dev->attached = false;
>
> ovsrcu_init(&dev->qos_conf, NULL);
> @@ -1400,6 +1404,29 @@ netdev_dpdk_ring_set_config(struct netdev *netdev, const struct smap *args,
> return 0;
> }
>
> +static void
> +dpdk_vhost_set_config_helper(struct netdev_dpdk *dev,
> + const struct smap *args)
> +{
> + bool needs_reconfigure = false;
> + bool zc_requested = smap_get_bool(args, "dq-zero-copy", false);
> +
> + if (zc_requested &&
> + !(dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
> + dev->vhost_driver_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
> + needs_reconfigure = true;
> + } else if (!zc_requested &&
> + (dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
> + dev->vhost_driver_flags &= ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
> + needs_reconfigure = true;
> + }
> +
> + /* Only try to change ZC mode when device is down */
> + if (needs_reconfigure && (netdev_dpdk_get_vid(dev) == -1)) {
> + netdev_request_reconfigure(&dev->up);
> + }
> +}
> +
> static int
> netdev_dpdk_vhost_client_set_config(struct netdev *netdev,
> const struct smap *args,
> @@ -1416,6 +1443,23 @@ netdev_dpdk_vhost_client_set_config(struct netdev *netdev,
> netdev_request_reconfigure(netdev);
> }
> }
> +
> + dpdk_vhost_set_config_helper(dev, args);
> +
> + ovs_mutex_unlock(&dev->mutex);
> +
> + return 0;
> +}
> +
> +static int
> +netdev_dpdk_vhost_set_config(struct netdev *netdev,
> + const struct smap *args,
> + char **errp OVS_UNUSED)
> +{
> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +
> + ovs_mutex_lock(&dev->mutex);
> + dpdk_vhost_set_config_helper(dev, args);
> ovs_mutex_unlock(&dev->mutex);
>
> return 0;
> @@ -2748,6 +2792,46 @@ netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev)
> }
> }
>
> +static void
> +vhost_change_zero_copy_mode(struct netdev_dpdk *dev, bool client_mode,
> + bool enable)
> +{
> + int err = rte_vhost_driver_unregister(dev->vhost_id);
> +
> + if (err) {
> + VLOG_ERR("Error unregistering vHost socket %s; can't change zero copy "
> + "mode", dev->vhost_id);
> + } else {
> + err = dpdk_setup_vhost_device(dev, client_mode);
> + if (err) {
> + VLOG_ERR("Error changing zero copy mode for vHost socket %s",
> + dev->vhost_id);
> + } else if (enable) {
> + dev->dq_zc_enabled = true;
> + VLOG_INFO("Zero copy enabled for vHost socket %s", dev->vhost_id);
> + } else {
> + dev->dq_zc_enabled = false;
> + VLOG_INFO("Zero copy disabled for vHost socket %s", dev->vhost_id);
> + }
> + }
> +}
> +
> +static void
> +vhost_check_zero_copy_status(struct netdev_dpdk *dev)
> +{
> + bool mode = dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT;
> +
> + if ((dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)
> + && !dev->dq_zc_enabled) {
> + /* ZC disabled but requested to be enabled, enable it. */
> + vhost_change_zero_copy_mode(dev, mode, true);
> + } else if (!(dev->vhost_driver_flags &
> + RTE_VHOST_USER_DEQUEUE_ZERO_COPY) && dev->dq_zc_enabled) {
> + /* ZC enabled but requested to be disabled, disable it. */
> + vhost_change_zero_copy_mode(dev, mode, false);
> + }
> +}
> +
> /*
> * Remove a virtio-net device from the specific vhost port. Use dev->remove
> * flag to stop any more packets from being sent or received to/from a VM and
> @@ -2793,6 +2877,7 @@ destroy_device(int vid)
> */
> ovsrcu_quiesce_start();
> VLOG_INFO("vHost Device '%s' has been removed", ifname);
> + netdev_request_reconfigure(&dev->up);
> } else {
> VLOG_INFO("vHost Device '%s' not found", ifname);
> }
> @@ -3284,6 +3369,8 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
> /* Carrier status may need updating. */
> netdev_change_seq_changed(&dev->up);
> }
> + } else {
> + vhost_check_zero_copy_status(dev);
> }
>
> return 0;
> @@ -3446,7 +3533,7 @@ static const struct netdev_class dpdk_vhost_class =
> NULL,
> netdev_dpdk_vhost_construct,
> netdev_dpdk_vhost_destruct,
> - NULL,
> + netdev_dpdk_vhost_set_config,
> NULL,
> netdev_dpdk_vhost_send,
> netdev_dpdk_vhost_get_carrier,
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 21ffaf5..330b028 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -2669,6 +2669,17 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
> </p>
> </column>
>
> + <column name="options" key="dq-zero-copy"
> + type='{"type": "boolean"}'>
> + <p>
> + The value specifies whether or not to enable dequeue zero copy on
> + the given interface. The feature is considered experimental.
> + The port must be in an inactive state in order to enable or disable
> + this feature.
> + Only supported by dpdkvhostuserclient and dpdkvhostuser interfaces.
> + </p>
> + </column>
> +
> <column name="options" key="n_rxq_desc"
> type='{"type": "integer", "minInteger": 1, "maxInteger": 4096}'>
> <p>
> --
> 2.7.5
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
More information about the dev
mailing list