[ovs-dev] [ovs-dev, v5, 2/2] netdev-dpdk: Enable optional dequeue zero copy for vHost User

Loftus, Ciara ciara.loftus at intel.com
Mon Dec 18 13:20:07 UTC 2017


> 
> On 18.12.2017 15:28, Loftus, Ciara wrote:
> >>
> >> Not a full review.
> >
> > Thanks for your feedback.
> >
> >>
> >> General thoughts:
> >>
> >> If following conditions are true:
> >>
> >> 1. We don't need to add new feature to deprecated vhostuser port.
> >
> > Agree.
> >
> >>
> >> 2. We actually don't need to have ability to change ZC config if vhost-
> server-
> >> path
> >>    already configured.
> >>
> >>    Let me explain this condition:
> >>    To change 'dq-zero-copy' if VM already started we need to stop VM,
> >> change the
> >>    'dq-zero-copy' and start the VM back. It's not much simpler than stop
> VM,
> >> re-add
> >>    port with different value for 'dq-zero-copy' and start VM back. I'm
> assuming
> >> here
> >>    that VM restart is much more complex and unlikely operation than port
> >> removing
> >>    from OVS and adding back. This will require documenting, but we
> already
> >> have a
> >>    bunch of docs about how to modify this config option in current version.
> >
> > Don't fully agree. I can't say whether your assumption is correct.
> > Port-deletion & re-addition has repercussions eg. loss of statistics from an
> interface. Retaining those stats might be important for some.
> > Why not leave it up to the user to choose their preferred method of
> enabling feature ie. reboot the VM or delete & add the port.
> >
> >>
> >> we may drop patch #1 from the series and use just something like
> following
> >> code instead of code changes in this patch (not tested, just for a
> reference):
> >>
> >> ---------------
> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >> index 2325f0e..c4cbf7c 100644
> >> --- a/lib/netdev-dpdk.c
> >> +++ b/lib/netdev-dpdk.c
> >> @@ -379,6 +379,9 @@ struct netdev_dpdk {
> >>          /* True if vHost device is 'up' and has been reconfigured at least once
> */
> >>          bool vhost_reconfigured;
> >>          /* 3 pad bytes here. */
> >> +
> >> +        /* True if dq-zero-copy feature requested for vhost device. */
> >> +        bool vhost_dq_zc;
> >>      );
> >>
> >>      PADDED_MEMBERS(CACHE_LINE_SIZE,
> >> @@ -889,6 +892,7 @@ common_construct(struct netdev *netdev,
> >> dpdk_port_t port_no,
> >>      dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> >>      ovsrcu_index_init(&dev->vid, -1);
> >>      dev->vhost_reconfigured = false;
> >> +    dev->vhost_dq_zc = false;
> >>      dev->attached = false;
> >>
> >>      ovsrcu_init(&dev->qos_conf, NULL);
> >> @@ -1384,6 +1388,7 @@ netdev_dpdk_vhost_client_set_config(struct
> >> netdev *netdev,
> >>          path = smap_get(args, "vhost-server-path");
> >>          if (path && strcmp(path, dev->vhost_id)) {
> >>              strcpy(dev->vhost_id, path);
> >> +            dev->vhost_dq_zc = smap_get_bool(args, "dq-zero-copy", false);
> >>              netdev_request_reconfigure(netdev);
> >>          }
> >>      }
> >> @@ -3288,6 +3293,11 @@ netdev_dpdk_vhost_client_reconfigure(struct
> >> netdev *netdev)
> >>          if (dpdk_vhost_iommu_enabled()) {
> >>              vhost_flags |= RTE_VHOST_USER_IOMMU_SUPPORT;
> >>          }
> >> +
> >> +        /* Enable Zero Copy mode if configured. */
> >> +        if (dev->vhost_dq_zc) {
> >> +            vhost_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
> >> +        }
> >>          err = rte_vhost_driver_register(dev->vhost_id, vhost_flags);
> >>          if (err) {
> >>              VLOG_ERR("vhost-user device setup failure for device %s\n",
> >>
> >> ---------------
> >>
> >> Thoughts?
> >
> > The above is a lot shorter of course which is nice, but I don't think we
> should sacrifice the ability to enable the feature post-boot for this.
> 
> But you wrote in documentation below:
> 
> +The feature cannot be enabled when the device is active (ie. VM booted). If
> +you wish to enable the feature after the VM has booted, you must
> shutdown
> +the VM and bring it back up.
> 
> What you mean saying "sacrifice the ability to enable the feature post-boot"?
> Am I missed something?

My understanding is that your suggestion means you sacrifice the ability to enable the feature post *first* VM boot. Am I correct?
My suggestion (and how the patch is implemented now) allows you to enable the feature post *first* VM boot, by means of a re-boot.

Thanks,
Ciara

> 
> >
> > Thanks,
> > Ciara
> >
> >>
> >>
> >> Best regards, Ilya Maximets.
> >>
> >> P.S. Few comments about patch itself are inline.
> >>
> >> On 08.12.2017 18:26, Ciara Loftus wrote:
> >>> Enabled per port like so:
> >>> ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
> >>>
> >>> The feature is disabled by default and can only be enabled/disabled
> when
> >>> a vHost port is down.
> >>>
> >>> When packets from a vHost device with zero copy enabled are destined
> for
> >>> a 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be
> >>> set to a smaller value. 128 is recommended. This can be achieved like
> >>> so:
> >>>
> >>> ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> >>>
> >>> Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
> >>> ---
> >>> v5:
> >>> * Rebase
> >>> * Update docs with warning of potential packet loss
> >>> * Update docs with info on NIC & Virtio descriptor values and working
> >>> combinations
> >>>
> >>>  Documentation/howto/dpdk.rst             | 33 ++++++++++++
> >>>  Documentation/topics/dpdk/vhost-user.rst | 61
> >> ++++++++++++++++++++++
> >>>  NEWS                                     |  2 +
> >>>  lib/netdev-dpdk.c                        | 89
> >> +++++++++++++++++++++++++++++++-
> >>>  vswitchd/vswitch.xml                     | 11 ++++
> >>>  5 files changed, 195 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/Documentation/howto/dpdk.rst
> >> b/Documentation/howto/dpdk.rst
> >>> index d123819..3e1b8f8 100644
> >>> --- a/Documentation/howto/dpdk.rst
> >>> +++ b/Documentation/howto/dpdk.rst
> >>> @@ -709,3 +709,36 @@ devices to bridge ``br0``. Once complete, follow
> >> the below steps:
> >>>     Check traffic on multiple queues::
> >>>
> >>>         $ cat /proc/interrupts | grep virtio
> >>> +
> >>> +PHY-VM-PHY (vHost Dequeue Zero Copy)
> >>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>> +
> >>> +vHost dequeue zero copy functionality can  be validated using the
> >>> +PHY-VM-PHY configuration. To begin, follow the steps described in
> >>> +:ref:`dpdk-phy-phy` to create and initialize the database, start
> >>> +ovs-vswitchd and add ``dpdk``-type and ``dpdkvhostuser``-type devices
> >>> +and flows to bridge ``br0``. Once complete, follow the below steps:
> >>> +
> >>> +1. Enable dequeue zero copy on the vHost devices.
> >>> +
> >>> +       $ ovs-vsctl set Interface dpdkvhostuser0 options:dq-zero-
> copy=true
> >>> +       $ ovs-vsctl set Interface dpdkvhostuser1 options:dq-zero-
> copy=true
> >>> +
> >>> +The following log should be observed for each device:
> >>> +
> >>> +       netdev_dpdk|INFO|Zero copy enabled for vHost socket <name>
> >>> +
> >>> +2. Reduce the number of txq descriptors on the phy ports.
> >>> +
> >>> +       $ ovs-vsctl set Interface phy0 options:n_txq_desc=128
> >>> +       $ ovs-vsctl set Interface phy1 options:n_txq_desc=128
> >>> +
> >>> +3. Proceed with the test by launching the VM and configuring guest
> >>> +forwarding, be it via the vHost loopback method or kernel forwarding
> >>> +method, and sending traffic. The following log should be oberved for
> >>> +each device as it becomes active during VM boot:
> >>> +
> >>> +       VHOST_CONFIG: dequeue zero copy is enabled
> >>> +
> >>> +It is essential that step 1 is performed before booting the VM,
> otherwise
> >>> +the feature will not be enabled.
> >>> diff --git a/Documentation/topics/dpdk/vhost-user.rst
> >> b/Documentation/topics/dpdk/vhost-user.rst
> >>> index 8b1c671..8587370 100644
> >>> --- a/Documentation/topics/dpdk/vhost-user.rst
> >>> +++ b/Documentation/topics/dpdk/vhost-user.rst
> >>> @@ -441,3 +441,64 @@ Sample XML
> >>>      </domain>
> >>>
> >>>  .. _QEMU documentation: http://git.qemu-
> >> project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-
> >> user.txt;h=7890d7169;hb=HEAD
> >>> +
> >>> +vhost-user Dequeue Zero Copy
> >>> +-------------------------------------
> >>> +
> >>> +Normally when dequeuing a packet from a vHost User device, a
> memcpy
> >> operation
> >>> +must be used to copy that packet from guest address space to host
> >> address
> >>> +space. This memcpy can be removed by enabling dequeue zero-copy
> like
> >> so:
> >>> +
> >>> +    $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-
> >> copy=true
> >>> +
> >>> +With this feature enabled, a reference (pointer) to the packet is passed
> to
> >>> +the host, instead of a copy of the packet. Removing this memcpy can
> give
> >> a
> >>> +performance improvement for some use cases, for example switching
> >> large packets
> >>> +between different VMs. However additional packet loss may be
> >> observed.
> >>> +
> >>> +Note that the feature is disabled by default and must be explicitly
> enabled
> >>> +by using the command above.
> >>> +
> >>> +The feature cannot be enabled when the device is active (ie. VM
> booted).
> >> If
> >>> +you wish to enable the feature after the VM has booted, you must
> >> shutdown
> >>> +the VM and bring it back up.
> >>> +
> >>> +The same logic applies for disabling the feature - it must be disabled
> when
> >>> +the device is inactive, for example before VM boot. To disable the
> >> feature:
> >>> +
> >>> +    $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-
> >> copy=false
> >>> +
> >>> +The feature is available to both dpdkvhostuser and
> dpdkvhostuserclient
> >> port
> >>> +types.
> >>> +
> >>> +A limitation exists whereby if packets from a vHost port with dq-zero-
> >> copy=true
> >>> +are destined for a 'dpdk' type port, the number of tx descriptors
> >> (n_txq_desc)
> >>> +for that port must be reduced to a smaller number, 128 being the
> >> recommended
> >>> +value. This can be achieved by issuing the following command:
> >>> +
> >>> +    $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> >>> +
> >>> +More information on the n_txq_desc option can be found in the "DPDK
> >> Physical
> >>> +Port Queue Sizes" section of the  `intro/install/dpdk.rst` guide.
> >>
> >> I think some :ref: link should be here.
> >>
> >>> +
> >>> +The reason for this limitation is due to how the zero copy functionality
> is
> >>> +implemented. The vHost device's 'tx used vring', a virtio structure used
> for
> >>> +tracking used ie. sent descriptors, will only be updated when the NIC
> >> frees
> >>> +the corresponding mbuf. If we don't free the mbufs frequently
> enough,
> >> that
> >>> +vring will be starved and packets will no longer be processed. One way
> to
> >>> +ensure we don't encounter this scenario, is to configure n_txq_desc to
> a
> >> small
> >>> +enough number such that the 'mbuf free threshold' for the NIC will be
> hit
> >> more
> >>> +often and thus free mbufs more frequently. The value of 128 is
> >> suggested, but
> >>> +values of 64 and 256 have been tested and verified to work too, with
> >> differing
> >>> +performance characteristics. A value of 512 can be used too, if the virtio
> >>> +queue size in the guest is increased to 1024 (available to configure in
> >> QEMU
> >>> +versions v2.10 and greater). This value can be set like so:
> >>> +
> >>> +    $ qemu-system-x86_64 ... -chardev
> >> socket,id=char1,path=<sockpath>,server
> >>> +      -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
> >>> +      -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,
> >>> +      tx_queue_size=1024
> >>> +
> >>> +Further information can be found in the
> >>> +`DPDK documentation
> >>>
> +<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__
> >>> diff --git a/NEWS b/NEWS
> >>> index d45904e..50630f8 100644
> >>> --- a/NEWS
> >>> +++ b/NEWS
> >>> @@ -18,6 +18,8 @@ Post-v2.8.0
> >>>     - DPDK:
> >>>       * Add support for DPDK v17.11
> >>>       * Add support for vHost IOMMU
> >>> +     * Optional dequeue zero copy feature for vHost ports can be
> enabled
> >> per
> >>> +       port via the boolean 'dq-zero-copy' option.
> >>>
> >>>  v2.8.0 - 31 Aug 2017
> >>>  --------------------
> >>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >>> index 879019e..03cc6cb 100644
> >>> --- a/lib/netdev-dpdk.c
> >>> +++ b/lib/netdev-dpdk.c
> >>> @@ -361,6 +361,9 @@ struct netdev_dpdk {
> >>>          /* True if vHost device is 'up' and has been reconfigured at least
> once
> >> */
> >>>          bool vhost_reconfigured;
> >>>          /* 3 pad bytes here. */
> >>> +
> >>> +        /* True if dq-zero-copy feature has successfully been enabled */
> >>> +        bool dq_zc_enabled;
> >>
> >> "pad bytes" should be fixed above or removed entirely, or we may just
> >> revert padding
> >> here too because it's incorrect just like in dp_netdev_pmd_thread. The
> only
> >> difference
> >> is that struct netdev_dpdk aligned to cacheline.
> >>
> >>>      );
> >>>
> >>>      PADDED_MEMBERS(CACHE_LINE_SIZE,
> >>> @@ -871,6 +874,7 @@ common_construct(struct netdev *netdev,
> >> dpdk_port_t port_no,
> >>>      dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> >>>      ovsrcu_index_init(&dev->vid, -1);
> >>>      dev->vhost_reconfigured = false;
> >>> +    dev->dq_zc_enabled = false;
> >>>      dev->attached = false;
> >>>
> >>>      ovsrcu_init(&dev->qos_conf, NULL);
> >>> @@ -1383,6 +1387,29 @@ netdev_dpdk_ring_set_config(struct netdev
> >> *netdev, const struct smap *args,
> >>>      return 0;
> >>>  }
> >>>
> >>> +static void
> >>> +dpdk_vhost_set_config_helper(struct netdev_dpdk *dev,
> >>> +                             const struct smap *args)
> >>> +{
> >>> +    bool needs_reconfigure = false;
> >>> +    bool zc_requested = smap_get_bool(args, "dq-zero-copy", false);
> >>> +
> >>> +    if (zc_requested &&
> >>> +            !(dev->vhost_driver_flags &
> >> RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
> >>> +        dev->vhost_driver_flags |=
> >> RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
> >>> +        needs_reconfigure = true;
> >>> +    } else if (!zc_requested &&
> >>> +            (dev->vhost_driver_flags &
> >> RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
> >>> +        dev->vhost_driver_flags &=
> >> ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
> >>> +        needs_reconfigure = true;
> >>> +    }
> >>> +
> >>> +    /* Only try to change ZC mode when device is down */
> >>> +    if (needs_reconfigure && (netdev_dpdk_get_vid(dev) == -1)) {
> >>> +        netdev_request_reconfigure(&dev->up);
> >>> +    }
> >>> +}
> >>> +
> >>>  static int
> >>>  netdev_dpdk_vhost_client_set_config(struct netdev *netdev,
> >>>                                      const struct smap *args,
> >>> @@ -1399,6 +1426,23 @@ netdev_dpdk_vhost_client_set_config(struct
> >> netdev *netdev,
> >>>              netdev_request_reconfigure(netdev);
> >>>          }
> >>>      }
> >>> +
> >>> +    dpdk_vhost_set_config_helper(dev, args);
> >>> +
> >>> +    ovs_mutex_unlock(&dev->mutex);
> >>> +
> >>> +    return 0;
> >>> +}
> >>> +
> >>> +static int
> >>> +netdev_dpdk_vhost_set_config(struct netdev *netdev,
> >>> +                             const struct smap *args,
> >>> +                             char **errp OVS_UNUSED)
> >>> +{
> >>> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> >>> +
> >>> +    ovs_mutex_lock(&dev->mutex);
> >>> +    dpdk_vhost_set_config_helper(dev, args);
> >>>      ovs_mutex_unlock(&dev->mutex);
> >>>
> >>>      return 0;
> >>> @@ -2723,6 +2767,46 @@ netdev_dpdk_txq_map_clear(struct
> >> netdev_dpdk *dev)
> >>>      }
> >>>  }
> >>>
> >>> +static void
> >>> +vhost_change_zero_copy_mode(struct netdev_dpdk *dev, bool
> >> client_mode,
> >>> +                            bool enable)
> >>> +{
> >>> +    int err = rte_vhost_driver_unregister(dev->vhost_id);
> >>> +
> >>> +    if (err) {
> >>> +        VLOG_ERR("Error unregistering vHost socket %s; can't change zero
> >> copy "
> >>> +                "mode", dev->vhost_id);
> >>> +    } else {
> >>> +        err = dpdk_setup_vhost_device(dev, client_mode);
> >>> +        if (err) {
> >>> +            VLOG_ERR("Error changing zero copy mode for vHost socket %s",
> >>> +                    dev->vhost_id);
> >>> +        } else if (enable) {
> >>> +            dev->dq_zc_enabled = true;
> >>> +            VLOG_INFO("Zero copy enabled for vHost socket %s", dev-
> >>> vhost_id);
> >>> +        } else {
> >>> +            dev->dq_zc_enabled = false;
> >>> +            VLOG_INFO("Zero copy disabled for vHost socket %s", dev-
> >>> vhost_id);
> >>> +        }
> >>> +    }
> >>> +}
> >>> +
> >>> +static void
> >>> +vhost_check_zero_copy_status(struct netdev_dpdk *dev)
> >>> +{
> >>> +    bool mode = dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT;
> >>> +
> >>> +    if ((dev->vhost_driver_flags &
> >> RTE_VHOST_USER_DEQUEUE_ZERO_COPY)
> >>> +                && !dev->dq_zc_enabled) {
> >>> +        /* ZC disabled but requested to be enabled, enable it. */
> >>> +        vhost_change_zero_copy_mode(dev, mode, true);
> >>> +    } else if (!(dev->vhost_driver_flags &
> >>> +            RTE_VHOST_USER_DEQUEUE_ZERO_COPY) && dev-
> >>> dq_zc_enabled) {
> >>> +        /* ZC enabled but requested to be disabled, disable it. */
> >>> +        vhost_change_zero_copy_mode(dev, mode, false);
> >>> +    }
> >>> +}
> >>> +
> >>>  /*
> >>>   * Remove a virtio-net device from the specific vhost port.  Use dev-
> >>> remove
> >>>   * flag to stop any more packets from being sent or received to/from a
> VM
> >> and
> >>> @@ -2768,6 +2852,7 @@ destroy_device(int vid)
> >>>           */
> >>>          ovsrcu_quiesce_start();
> >>>          VLOG_INFO("vHost Device '%s' has been removed", ifname);
> >>> +        netdev_request_reconfigure(&dev->up);
> >>>      } else {
> >>>          VLOG_INFO("vHost Device '%s' not found", ifname);
> >>>      }
> >>> @@ -3259,6 +3344,8 @@ dpdk_vhost_reconfigure_helper(struct
> >> netdev_dpdk *dev)
> >>>              /* Carrier status may need updating. */
> >>>              netdev_change_seq_changed(&dev->up);
> >>>          }
> >>> +    } else {
> >>> +        vhost_check_zero_copy_status(dev);
> >>>      }
> >>>
> >>>      return 0;
> >>> @@ -3421,7 +3508,7 @@ static const struct netdev_class
> dpdk_vhost_class
> >> =
> >>>          NULL,
> >>>          netdev_dpdk_vhost_construct,
> >>>          netdev_dpdk_vhost_destruct,
> >>> -        NULL,
> >>> +        netdev_dpdk_vhost_set_config,
> >>>          NULL,
> >>>          netdev_dpdk_vhost_send,
> >>>          netdev_dpdk_vhost_get_carrier,
> >>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> >>> index 4c317d0..e8409b4 100644
> >>> --- a/vswitchd/vswitch.xml
> >>> +++ b/vswitchd/vswitch.xml
> >>> @@ -2669,6 +2669,17 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
> >> type=patch options:peer=p1 \
> >>>          </p>
> >>>        </column>
> >>>
> >>> +      <column name="options" key="dq-zero-copy"
> >>> +              type='{"type": "boolean"}'>
> >>> +        <p>
> >>> +          The value specifies whether or not to enable dequeue zero copy
> on
> >>> +          the given interface.
> >>> +          The port must be in an inactive state in order to enable or disable
> >>> +          this feature.
> >>> +          Only supported by dpdkvhostuserclient and dpdkvhostuser
> >> interfaces.
> >>> +        </p>
> >>> +      </column>
> >>> +
> >>>        <column name="options" key="n_rxq_desc"
> >>>                type='{"type": "integer", "minInteger": 1, "maxInteger": 4096}'>
> >>>          <p>
> >>>


More information about the dev mailing list