[ovs-dev] [ovs-dev, v5, 2/2] netdev-dpdk: Enable optional dequeue zero copy for vHost User
Loftus, Ciara
ciara.loftus at intel.com
Mon Dec 18 13:20:07 UTC 2017
>
> On 18.12.2017 15:28, Loftus, Ciara wrote:
> >>
> >> Not a full review.
> >
> > Thanks for your feedback.
> >
> >>
> >> General thoughts:
> >>
> >> If following conditions are true:
> >>
> >> 1. We don't need to add new feature to deprecated vhostuser port.
> >
> > Agree.
> >
> >>
> >> 2. We actually don't need to have ability to change ZC config if vhost-
> server-
> >> path
> >> already configured.
> >>
> >> Let me explain this condition:
> >> To change 'dq-zero-copy' if VM already started we need to stop VM,
> >> change the
> >> 'dq-zero-copy' and start the VM back. It's not much simpler than stop
> VM,
> >> re-add
> >> port with different value for 'dq-zero-copy' and start VM back. I'm
> assuming
> >> here
> >> that VM restart is much more complex and unlikely operation than port
> >> removing
> >> from OVS and adding back. This will require documenting, but we
> already
> >> have a
> >> bunch of docs about how to modify this config option in current version.
> >
> > Don't fully agree. I can't say whether your assumption is correct.
> > Port-deletion & re-addition has repercussions eg. loss of statistics from an
> interface. Retaining those stats might be important for some.
> > Why not leave it up to the user to choose their preferred method of
> enabling feature ie. reboot the VM or delete & add the port.
> >
> >>
> >> we may drop patch #1 from the series and use just something like
> following
> >> code instead of code changes in this patch (not tested, just for a
> reference):
> >>
> >> ---------------
> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >> index 2325f0e..c4cbf7c 100644
> >> --- a/lib/netdev-dpdk.c
> >> +++ b/lib/netdev-dpdk.c
> >> @@ -379,6 +379,9 @@ struct netdev_dpdk {
> >> /* True if vHost device is 'up' and has been reconfigured at least once
> */
> >> bool vhost_reconfigured;
> >> /* 3 pad bytes here. */
> >> +
> >> + /* True if dq-zero-copy feature requested for vhost device. */
> >> + bool vhost_dq_zc;
> >> );
> >>
> >> PADDED_MEMBERS(CACHE_LINE_SIZE,
> >> @@ -889,6 +892,7 @@ common_construct(struct netdev *netdev,
> >> dpdk_port_t port_no,
> >> dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> >> ovsrcu_index_init(&dev->vid, -1);
> >> dev->vhost_reconfigured = false;
> >> + dev->vhost_dq_zc = false;
> >> dev->attached = false;
> >>
> >> ovsrcu_init(&dev->qos_conf, NULL);
> >> @@ -1384,6 +1388,7 @@ netdev_dpdk_vhost_client_set_config(struct
> >> netdev *netdev,
> >> path = smap_get(args, "vhost-server-path");
> >> if (path && strcmp(path, dev->vhost_id)) {
> >> strcpy(dev->vhost_id, path);
> >> + dev->vhost_dq_zc = smap_get_bool(args, "dq-zero-copy", false);
> >> netdev_request_reconfigure(netdev);
> >> }
> >> }
> >> @@ -3288,6 +3293,11 @@ netdev_dpdk_vhost_client_reconfigure(struct
> >> netdev *netdev)
> >> if (dpdk_vhost_iommu_enabled()) {
> >> vhost_flags |= RTE_VHOST_USER_IOMMU_SUPPORT;
> >> }
> >> +
> >> + /* Enable Zero Copy mode if configured. */
> >> + if (dev->vhost_dq_zc) {
> >> + vhost_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
> >> + }
> >> err = rte_vhost_driver_register(dev->vhost_id, vhost_flags);
> >> if (err) {
> >> VLOG_ERR("vhost-user device setup failure for device %s\n",
> >>
> >> ---------------
> >>
> >> Thoughts?
> >
> > The above is a lot shorter of course which is nice, but I don't think we
> should sacrifice the ability to enable the feature post-boot for this.
>
> But you wrote in documentation below:
>
> +The feature cannot be enabled when the device is active (ie. VM booted). If
> +you wish to enable the feature after the VM has booted, you must
> shutdown
> +the VM and bring it back up.
>
> What you mean saying "sacrifice the ability to enable the feature post-boot"?
> Am I missed something?
My understanding is that your suggestion means you sacrifice the ability to enable the feature post *first* VM boot. Am I correct?
My suggestion (and how the patch is implemented now) allows you to enable the feature post *first* VM boot, by means of a re-boot.
Thanks,
Ciara
>
> >
> > Thanks,
> > Ciara
> >
> >>
> >>
> >> Best regards, Ilya Maximets.
> >>
> >> P.S. Few comments about patch itself are inline.
> >>
> >> On 08.12.2017 18:26, Ciara Loftus wrote:
> >>> Enabled per port like so:
> >>> ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
> >>>
> >>> The feature is disabled by default and can only be enabled/disabled
> when
> >>> a vHost port is down.
> >>>
> >>> When packets from a vHost device with zero copy enabled are destined
> for
> >>> a 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be
> >>> set to a smaller value. 128 is recommended. This can be achieved like
> >>> so:
> >>>
> >>> ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> >>>
> >>> Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
> >>> ---
> >>> v5:
> >>> * Rebase
> >>> * Update docs with warning of potential packet loss
> >>> * Update docs with info on NIC & Virtio descriptor values and working
> >>> combinations
> >>>
> >>> Documentation/howto/dpdk.rst | 33 ++++++++++++
> >>> Documentation/topics/dpdk/vhost-user.rst | 61
> >> ++++++++++++++++++++++
> >>> NEWS | 2 +
> >>> lib/netdev-dpdk.c | 89
> >> +++++++++++++++++++++++++++++++-
> >>> vswitchd/vswitch.xml | 11 ++++
> >>> 5 files changed, 195 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/Documentation/howto/dpdk.rst
> >> b/Documentation/howto/dpdk.rst
> >>> index d123819..3e1b8f8 100644
> >>> --- a/Documentation/howto/dpdk.rst
> >>> +++ b/Documentation/howto/dpdk.rst
> >>> @@ -709,3 +709,36 @@ devices to bridge ``br0``. Once complete, follow
> >> the below steps:
> >>> Check traffic on multiple queues::
> >>>
> >>> $ cat /proc/interrupts | grep virtio
> >>> +
> >>> +PHY-VM-PHY (vHost Dequeue Zero Copy)
> >>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>> +
> >>> +vHost dequeue zero copy functionality can be validated using the
> >>> +PHY-VM-PHY configuration. To begin, follow the steps described in
> >>> +:ref:`dpdk-phy-phy` to create and initialize the database, start
> >>> +ovs-vswitchd and add ``dpdk``-type and ``dpdkvhostuser``-type devices
> >>> +and flows to bridge ``br0``. Once complete, follow the below steps:
> >>> +
> >>> +1. Enable dequeue zero copy on the vHost devices.
> >>> +
> >>> + $ ovs-vsctl set Interface dpdkvhostuser0 options:dq-zero-
> copy=true
> >>> + $ ovs-vsctl set Interface dpdkvhostuser1 options:dq-zero-
> copy=true
> >>> +
> >>> +The following log should be observed for each device:
> >>> +
> >>> + netdev_dpdk|INFO|Zero copy enabled for vHost socket <name>
> >>> +
> >>> +2. Reduce the number of txq descriptors on the phy ports.
> >>> +
> >>> + $ ovs-vsctl set Interface phy0 options:n_txq_desc=128
> >>> + $ ovs-vsctl set Interface phy1 options:n_txq_desc=128
> >>> +
> >>> +3. Proceed with the test by launching the VM and configuring guest
> >>> +forwarding, be it via the vHost loopback method or kernel forwarding
> >>> +method, and sending traffic. The following log should be oberved for
> >>> +each device as it becomes active during VM boot:
> >>> +
> >>> + VHOST_CONFIG: dequeue zero copy is enabled
> >>> +
> >>> +It is essential that step 1 is performed before booting the VM,
> otherwise
> >>> +the feature will not be enabled.
> >>> diff --git a/Documentation/topics/dpdk/vhost-user.rst
> >> b/Documentation/topics/dpdk/vhost-user.rst
> >>> index 8b1c671..8587370 100644
> >>> --- a/Documentation/topics/dpdk/vhost-user.rst
> >>> +++ b/Documentation/topics/dpdk/vhost-user.rst
> >>> @@ -441,3 +441,64 @@ Sample XML
> >>> </domain>
> >>>
> >>> .. _QEMU documentation: http://git.qemu-
> >> project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-
> >> user.txt;h=7890d7169;hb=HEAD
> >>> +
> >>> +vhost-user Dequeue Zero Copy
> >>> +-------------------------------------
> >>> +
> >>> +Normally when dequeuing a packet from a vHost User device, a
> memcpy
> >> operation
> >>> +must be used to copy that packet from guest address space to host
> >> address
> >>> +space. This memcpy can be removed by enabling dequeue zero-copy
> like
> >> so:
> >>> +
> >>> + $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-
> >> copy=true
> >>> +
> >>> +With this feature enabled, a reference (pointer) to the packet is passed
> to
> >>> +the host, instead of a copy of the packet. Removing this memcpy can
> give
> >> a
> >>> +performance improvement for some use cases, for example switching
> >> large packets
> >>> +between different VMs. However additional packet loss may be
> >> observed.
> >>> +
> >>> +Note that the feature is disabled by default and must be explicitly
> enabled
> >>> +by using the command above.
> >>> +
> >>> +The feature cannot be enabled when the device is active (ie. VM
> booted).
> >> If
> >>> +you wish to enable the feature after the VM has booted, you must
> >> shutdown
> >>> +the VM and bring it back up.
> >>> +
> >>> +The same logic applies for disabling the feature - it must be disabled
> when
> >>> +the device is inactive, for example before VM boot. To disable the
> >> feature:
> >>> +
> >>> + $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-
> >> copy=false
> >>> +
> >>> +The feature is available to both dpdkvhostuser and
> dpdkvhostuserclient
> >> port
> >>> +types.
> >>> +
> >>> +A limitation exists whereby if packets from a vHost port with dq-zero-
> >> copy=true
> >>> +are destined for a 'dpdk' type port, the number of tx descriptors
> >> (n_txq_desc)
> >>> +for that port must be reduced to a smaller number, 128 being the
> >> recommended
> >>> +value. This can be achieved by issuing the following command:
> >>> +
> >>> + $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> >>> +
> >>> +More information on the n_txq_desc option can be found in the "DPDK
> >> Physical
> >>> +Port Queue Sizes" section of the `intro/install/dpdk.rst` guide.
> >>
> >> I think some :ref: link should be here.
> >>
> >>> +
> >>> +The reason for this limitation is due to how the zero copy functionality
> is
> >>> +implemented. The vHost device's 'tx used vring', a virtio structure used
> for
> >>> +tracking used ie. sent descriptors, will only be updated when the NIC
> >> frees
> >>> +the corresponding mbuf. If we don't free the mbufs frequently
> enough,
> >> that
> >>> +vring will be starved and packets will no longer be processed. One way
> to
> >>> +ensure we don't encounter this scenario, is to configure n_txq_desc to
> a
> >> small
> >>> +enough number such that the 'mbuf free threshold' for the NIC will be
> hit
> >> more
> >>> +often and thus free mbufs more frequently. The value of 128 is
> >> suggested, but
> >>> +values of 64 and 256 have been tested and verified to work too, with
> >> differing
> >>> +performance characteristics. A value of 512 can be used too, if the virtio
> >>> +queue size in the guest is increased to 1024 (available to configure in
> >> QEMU
> >>> +versions v2.10 and greater). This value can be set like so:
> >>> +
> >>> + $ qemu-system-x86_64 ... -chardev
> >> socket,id=char1,path=<sockpath>,server
> >>> + -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
> >>> + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,
> >>> + tx_queue_size=1024
> >>> +
> >>> +Further information can be found in the
> >>> +`DPDK documentation
> >>>
> +<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__
> >>> diff --git a/NEWS b/NEWS
> >>> index d45904e..50630f8 100644
> >>> --- a/NEWS
> >>> +++ b/NEWS
> >>> @@ -18,6 +18,8 @@ Post-v2.8.0
> >>> - DPDK:
> >>> * Add support for DPDK v17.11
> >>> * Add support for vHost IOMMU
> >>> + * Optional dequeue zero copy feature for vHost ports can be
> enabled
> >> per
> >>> + port via the boolean 'dq-zero-copy' option.
> >>>
> >>> v2.8.0 - 31 Aug 2017
> >>> --------------------
> >>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >>> index 879019e..03cc6cb 100644
> >>> --- a/lib/netdev-dpdk.c
> >>> +++ b/lib/netdev-dpdk.c
> >>> @@ -361,6 +361,9 @@ struct netdev_dpdk {
> >>> /* True if vHost device is 'up' and has been reconfigured at least
> once
> >> */
> >>> bool vhost_reconfigured;
> >>> /* 3 pad bytes here. */
> >>> +
> >>> + /* True if dq-zero-copy feature has successfully been enabled */
> >>> + bool dq_zc_enabled;
> >>
> >> "pad bytes" should be fixed above or removed entirely, or we may just
> >> revert padding
> >> here too because it's incorrect just like in dp_netdev_pmd_thread. The
> only
> >> difference
> >> is that struct netdev_dpdk aligned to cacheline.
> >>
> >>> );
> >>>
> >>> PADDED_MEMBERS(CACHE_LINE_SIZE,
> >>> @@ -871,6 +874,7 @@ common_construct(struct netdev *netdev,
> >> dpdk_port_t port_no,
> >>> dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> >>> ovsrcu_index_init(&dev->vid, -1);
> >>> dev->vhost_reconfigured = false;
> >>> + dev->dq_zc_enabled = false;
> >>> dev->attached = false;
> >>>
> >>> ovsrcu_init(&dev->qos_conf, NULL);
> >>> @@ -1383,6 +1387,29 @@ netdev_dpdk_ring_set_config(struct netdev
> >> *netdev, const struct smap *args,
> >>> return 0;
> >>> }
> >>>
> >>> +static void
> >>> +dpdk_vhost_set_config_helper(struct netdev_dpdk *dev,
> >>> + const struct smap *args)
> >>> +{
> >>> + bool needs_reconfigure = false;
> >>> + bool zc_requested = smap_get_bool(args, "dq-zero-copy", false);
> >>> +
> >>> + if (zc_requested &&
> >>> + !(dev->vhost_driver_flags &
> >> RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
> >>> + dev->vhost_driver_flags |=
> >> RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
> >>> + needs_reconfigure = true;
> >>> + } else if (!zc_requested &&
> >>> + (dev->vhost_driver_flags &
> >> RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
> >>> + dev->vhost_driver_flags &=
> >> ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
> >>> + needs_reconfigure = true;
> >>> + }
> >>> +
> >>> + /* Only try to change ZC mode when device is down */
> >>> + if (needs_reconfigure && (netdev_dpdk_get_vid(dev) == -1)) {
> >>> + netdev_request_reconfigure(&dev->up);
> >>> + }
> >>> +}
> >>> +
> >>> static int
> >>> netdev_dpdk_vhost_client_set_config(struct netdev *netdev,
> >>> const struct smap *args,
> >>> @@ -1399,6 +1426,23 @@ netdev_dpdk_vhost_client_set_config(struct
> >> netdev *netdev,
> >>> netdev_request_reconfigure(netdev);
> >>> }
> >>> }
> >>> +
> >>> + dpdk_vhost_set_config_helper(dev, args);
> >>> +
> >>> + ovs_mutex_unlock(&dev->mutex);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static int
> >>> +netdev_dpdk_vhost_set_config(struct netdev *netdev,
> >>> + const struct smap *args,
> >>> + char **errp OVS_UNUSED)
> >>> +{
> >>> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> >>> +
> >>> + ovs_mutex_lock(&dev->mutex);
> >>> + dpdk_vhost_set_config_helper(dev, args);
> >>> ovs_mutex_unlock(&dev->mutex);
> >>>
> >>> return 0;
> >>> @@ -2723,6 +2767,46 @@ netdev_dpdk_txq_map_clear(struct
> >> netdev_dpdk *dev)
> >>> }
> >>> }
> >>>
> >>> +static void
> >>> +vhost_change_zero_copy_mode(struct netdev_dpdk *dev, bool
> >> client_mode,
> >>> + bool enable)
> >>> +{
> >>> + int err = rte_vhost_driver_unregister(dev->vhost_id);
> >>> +
> >>> + if (err) {
> >>> + VLOG_ERR("Error unregistering vHost socket %s; can't change zero
> >> copy "
> >>> + "mode", dev->vhost_id);
> >>> + } else {
> >>> + err = dpdk_setup_vhost_device(dev, client_mode);
> >>> + if (err) {
> >>> + VLOG_ERR("Error changing zero copy mode for vHost socket %s",
> >>> + dev->vhost_id);
> >>> + } else if (enable) {
> >>> + dev->dq_zc_enabled = true;
> >>> + VLOG_INFO("Zero copy enabled for vHost socket %s", dev-
> >>> vhost_id);
> >>> + } else {
> >>> + dev->dq_zc_enabled = false;
> >>> + VLOG_INFO("Zero copy disabled for vHost socket %s", dev-
> >>> vhost_id);
> >>> + }
> >>> + }
> >>> +}
> >>> +
> >>> +static void
> >>> +vhost_check_zero_copy_status(struct netdev_dpdk *dev)
> >>> +{
> >>> + bool mode = dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT;
> >>> +
> >>> + if ((dev->vhost_driver_flags &
> >> RTE_VHOST_USER_DEQUEUE_ZERO_COPY)
> >>> + && !dev->dq_zc_enabled) {
> >>> + /* ZC disabled but requested to be enabled, enable it. */
> >>> + vhost_change_zero_copy_mode(dev, mode, true);
> >>> + } else if (!(dev->vhost_driver_flags &
> >>> + RTE_VHOST_USER_DEQUEUE_ZERO_COPY) && dev-
> >>> dq_zc_enabled) {
> >>> + /* ZC enabled but requested to be disabled, disable it. */
> >>> + vhost_change_zero_copy_mode(dev, mode, false);
> >>> + }
> >>> +}
> >>> +
> >>> /*
> >>> * Remove a virtio-net device from the specific vhost port. Use dev-
> >>> remove
> >>> * flag to stop any more packets from being sent or received to/from a
> VM
> >> and
> >>> @@ -2768,6 +2852,7 @@ destroy_device(int vid)
> >>> */
> >>> ovsrcu_quiesce_start();
> >>> VLOG_INFO("vHost Device '%s' has been removed", ifname);
> >>> + netdev_request_reconfigure(&dev->up);
> >>> } else {
> >>> VLOG_INFO("vHost Device '%s' not found", ifname);
> >>> }
> >>> @@ -3259,6 +3344,8 @@ dpdk_vhost_reconfigure_helper(struct
> >> netdev_dpdk *dev)
> >>> /* Carrier status may need updating. */
> >>> netdev_change_seq_changed(&dev->up);
> >>> }
> >>> + } else {
> >>> + vhost_check_zero_copy_status(dev);
> >>> }
> >>>
> >>> return 0;
> >>> @@ -3421,7 +3508,7 @@ static const struct netdev_class
> dpdk_vhost_class
> >> =
> >>> NULL,
> >>> netdev_dpdk_vhost_construct,
> >>> netdev_dpdk_vhost_destruct,
> >>> - NULL,
> >>> + netdev_dpdk_vhost_set_config,
> >>> NULL,
> >>> netdev_dpdk_vhost_send,
> >>> netdev_dpdk_vhost_get_carrier,
> >>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> >>> index 4c317d0..e8409b4 100644
> >>> --- a/vswitchd/vswitch.xml
> >>> +++ b/vswitchd/vswitch.xml
> >>> @@ -2669,6 +2669,17 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
> >> type=patch options:peer=p1 \
> >>> </p>
> >>> </column>
> >>>
> >>> + <column name="options" key="dq-zero-copy"
> >>> + type='{"type": "boolean"}'>
> >>> + <p>
> >>> + The value specifies whether or not to enable dequeue zero copy
> on
> >>> + the given interface.
> >>> + The port must be in an inactive state in order to enable or disable
> >>> + this feature.
> >>> + Only supported by dpdkvhostuserclient and dpdkvhostuser
> >> interfaces.
> >>> + </p>
> >>> + </column>
> >>> +
> >>> <column name="options" key="n_rxq_desc"
> >>> type='{"type": "integer", "minInteger": 1, "maxInteger": 4096}'>
> >>> <p>
> >>>
More information about the dev
mailing list