[ovs-dev] [ovs-dev, v5, 2/2] netdev-dpdk: Enable optional dequeue zero copy for vHost User

Ilya Maximets i.maximets at samsung.com
Mon Dec 18 12:37:28 UTC 2017


On 18.12.2017 15:28, Loftus, Ciara wrote:
>>
>> Not a full review.
> 
> Thanks for your feedback.
> 
>>
>> General thoughts:
>>
>> If following conditions are true:
>>
>> 1. We don't need to add new feature to deprecated vhostuser port.
> 
> Agree.
> 
>>
>> 2. We actually don't need to have ability to change ZC config if vhost-server-
>> path
>>    already configured.
>>
>>    Let me explain this condition:
>>    To change 'dq-zero-copy' if VM already started we need to stop VM,
>> change the
>>    'dq-zero-copy' and start the VM back. It's not much simpler than stop VM,
>> re-add
>>    port with different value for 'dq-zero-copy' and start VM back. I'm assuming
>> here
>>    that VM restart is much more complex and unlikely operation than port
>> removing
>>    from OVS and adding back. This will require documenting, but we already
>> have a
>>    bunch of docs about how to modify this config option in current version.
> 
> Don't fully agree. I can't say whether your assumption is correct.
> Port-deletion & re-addition has repercussions eg. loss of statistics from an interface. Retaining those stats might be important for some.
> Why not leave it up to the user to choose their preferred method of enabling feature ie. reboot the VM or delete & add the port.
> 
>>
>> we may drop patch #1 from the series and use just something like following
>> code instead of code changes in this patch (not tested, just for a reference):
>>
>> ---------------
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 2325f0e..c4cbf7c 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -379,6 +379,9 @@ struct netdev_dpdk {
>>          /* True if vHost device is 'up' and has been reconfigured at least once */
>>          bool vhost_reconfigured;
>>          /* 3 pad bytes here. */
>> +
>> +        /* True if dq-zero-copy feature requested for vhost device. */
>> +        bool vhost_dq_zc;
>>      );
>>
>>      PADDED_MEMBERS(CACHE_LINE_SIZE,
>> @@ -889,6 +892,7 @@ common_construct(struct netdev *netdev,
>> dpdk_port_t port_no,
>>      dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>>      ovsrcu_index_init(&dev->vid, -1);
>>      dev->vhost_reconfigured = false;
>> +    dev->vhost_dq_zc = false;
>>      dev->attached = false;
>>
>>      ovsrcu_init(&dev->qos_conf, NULL);
>> @@ -1384,6 +1388,7 @@ netdev_dpdk_vhost_client_set_config(struct
>> netdev *netdev,
>>          path = smap_get(args, "vhost-server-path");
>>          if (path && strcmp(path, dev->vhost_id)) {
>>              strcpy(dev->vhost_id, path);
>> +            dev->vhost_dq_zc = smap_get_bool(args, "dq-zero-copy", false);
>>              netdev_request_reconfigure(netdev);
>>          }
>>      }
>> @@ -3288,6 +3293,11 @@ netdev_dpdk_vhost_client_reconfigure(struct
>> netdev *netdev)
>>          if (dpdk_vhost_iommu_enabled()) {
>>              vhost_flags |= RTE_VHOST_USER_IOMMU_SUPPORT;
>>          }
>> +
>> +        /* Enable Zero Copy mode if configured. */
>> +        if (dev->vhost_dq_zc) {
>> +            vhost_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
>> +        }
>>          err = rte_vhost_driver_register(dev->vhost_id, vhost_flags);
>>          if (err) {
>>              VLOG_ERR("vhost-user device setup failure for device %s\n",
>>
>> ---------------
>>
>> Thoughts?
> 
> The above is a lot shorter of course which is nice, but I don't think we should sacrifice the ability to enable the feature post-boot for this.

But you wrote in documentation below:

+The feature cannot be enabled when the device is active (ie. VM booted). If
+you wish to enable the feature after the VM has booted, you must shutdown
+the VM and bring it back up.

What you mean saying "sacrifice the ability to enable the feature post-boot"?
Am I missed something?

> 
> Thanks,
> Ciara
> 
>>
>>
>> Best regards, Ilya Maximets.
>>
>> P.S. Few comments about patch itself are inline.
>>
>> On 08.12.2017 18:26, Ciara Loftus wrote:
>>> Enabled per port like so:
>>> ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
>>>
>>> The feature is disabled by default and can only be enabled/disabled when
>>> a vHost port is down.
>>>
>>> When packets from a vHost device with zero copy enabled are destined for
>>> a 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be
>>> set to a smaller value. 128 is recommended. This can be achieved like
>>> so:
>>>
>>> ovs-vsctl set Interface dpdkport options:n_txq_desc=128
>>>
>>> Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
>>> ---
>>> v5:
>>> * Rebase
>>> * Update docs with warning of potential packet loss
>>> * Update docs with info on NIC & Virtio descriptor values and working
>>> combinations
>>>
>>>  Documentation/howto/dpdk.rst             | 33 ++++++++++++
>>>  Documentation/topics/dpdk/vhost-user.rst | 61
>> ++++++++++++++++++++++
>>>  NEWS                                     |  2 +
>>>  lib/netdev-dpdk.c                        | 89
>> +++++++++++++++++++++++++++++++-
>>>  vswitchd/vswitch.xml                     | 11 ++++
>>>  5 files changed, 195 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/Documentation/howto/dpdk.rst
>> b/Documentation/howto/dpdk.rst
>>> index d123819..3e1b8f8 100644
>>> --- a/Documentation/howto/dpdk.rst
>>> +++ b/Documentation/howto/dpdk.rst
>>> @@ -709,3 +709,36 @@ devices to bridge ``br0``. Once complete, follow
>> the below steps:
>>>     Check traffic on multiple queues::
>>>
>>>         $ cat /proc/interrupts | grep virtio
>>> +
>>> +PHY-VM-PHY (vHost Dequeue Zero Copy)
>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> +
>>> +vHost dequeue zero copy functionality can  be validated using the
>>> +PHY-VM-PHY configuration. To begin, follow the steps described in
>>> +:ref:`dpdk-phy-phy` to create and initialize the database, start
>>> +ovs-vswitchd and add ``dpdk``-type and ``dpdkvhostuser``-type devices
>>> +and flows to bridge ``br0``. Once complete, follow the below steps:
>>> +
>>> +1. Enable dequeue zero copy on the vHost devices.
>>> +
>>> +       $ ovs-vsctl set Interface dpdkvhostuser0 options:dq-zero-copy=true
>>> +       $ ovs-vsctl set Interface dpdkvhostuser1 options:dq-zero-copy=true
>>> +
>>> +The following log should be observed for each device:
>>> +
>>> +       netdev_dpdk|INFO|Zero copy enabled for vHost socket <name>
>>> +
>>> +2. Reduce the number of txq descriptors on the phy ports.
>>> +
>>> +       $ ovs-vsctl set Interface phy0 options:n_txq_desc=128
>>> +       $ ovs-vsctl set Interface phy1 options:n_txq_desc=128
>>> +
>>> +3. Proceed with the test by launching the VM and configuring guest
>>> +forwarding, be it via the vHost loopback method or kernel forwarding
>>> +method, and sending traffic. The following log should be oberved for
>>> +each device as it becomes active during VM boot:
>>> +
>>> +       VHOST_CONFIG: dequeue zero copy is enabled
>>> +
>>> +It is essential that step 1 is performed before booting the VM, otherwise
>>> +the feature will not be enabled.
>>> diff --git a/Documentation/topics/dpdk/vhost-user.rst
>> b/Documentation/topics/dpdk/vhost-user.rst
>>> index 8b1c671..8587370 100644
>>> --- a/Documentation/topics/dpdk/vhost-user.rst
>>> +++ b/Documentation/topics/dpdk/vhost-user.rst
>>> @@ -441,3 +441,64 @@ Sample XML
>>>      </domain>
>>>
>>>  .. _QEMU documentation: http://git.qemu-
>> project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-
>> user.txt;h=7890d7169;hb=HEAD
>>> +
>>> +vhost-user Dequeue Zero Copy
>>> +-------------------------------------
>>> +
>>> +Normally when dequeuing a packet from a vHost User device, a memcpy
>> operation
>>> +must be used to copy that packet from guest address space to host
>> address
>>> +space. This memcpy can be removed by enabling dequeue zero-copy like
>> so:
>>> +
>>> +    $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-
>> copy=true
>>> +
>>> +With this feature enabled, a reference (pointer) to the packet is passed to
>>> +the host, instead of a copy of the packet. Removing this memcpy can give
>> a
>>> +performance improvement for some use cases, for example switching
>> large packets
>>> +between different VMs. However additional packet loss may be
>> observed.
>>> +
>>> +Note that the feature is disabled by default and must be explicitly enabled
>>> +by using the command above.
>>> +
>>> +The feature cannot be enabled when the device is active (ie. VM booted).
>> If
>>> +you wish to enable the feature after the VM has booted, you must
>> shutdown
>>> +the VM and bring it back up.
>>> +
>>> +The same logic applies for disabling the feature - it must be disabled when
>>> +the device is inactive, for example before VM boot. To disable the
>> feature:
>>> +
>>> +    $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-
>> copy=false
>>> +
>>> +The feature is available to both dpdkvhostuser and dpdkvhostuserclient
>> port
>>> +types.
>>> +
>>> +A limitation exists whereby if packets from a vHost port with dq-zero-
>> copy=true
>>> +are destined for a 'dpdk' type port, the number of tx descriptors
>> (n_txq_desc)
>>> +for that port must be reduced to a smaller number, 128 being the
>> recommended
>>> +value. This can be achieved by issuing the following command:
>>> +
>>> +    $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
>>> +
>>> +More information on the n_txq_desc option can be found in the "DPDK
>> Physical
>>> +Port Queue Sizes" section of the  `intro/install/dpdk.rst` guide.
>>
>> I think some :ref: link should be here.
>>
>>> +
>>> +The reason for this limitation is due to how the zero copy functionality is
>>> +implemented. The vHost device's 'tx used vring', a virtio structure used for
>>> +tracking used ie. sent descriptors, will only be updated when the NIC
>> frees
>>> +the corresponding mbuf. If we don't free the mbufs frequently enough,
>> that
>>> +vring will be starved and packets will no longer be processed. One way to
>>> +ensure we don't encounter this scenario, is to configure n_txq_desc to a
>> small
>>> +enough number such that the 'mbuf free threshold' for the NIC will be hit
>> more
>>> +often and thus free mbufs more frequently. The value of 128 is
>> suggested, but
>>> +values of 64 and 256 have been tested and verified to work too, with
>> differing
>>> +performance characteristics. A value of 512 can be used too, if the virtio
>>> +queue size in the guest is increased to 1024 (available to configure in
>> QEMU
>>> +versions v2.10 and greater). This value can be set like so:
>>> +
>>> +    $ qemu-system-x86_64 ... -chardev
>> socket,id=char1,path=<sockpath>,server
>>> +      -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
>>> +      -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,
>>> +      tx_queue_size=1024
>>> +
>>> +Further information can be found in the
>>> +`DPDK documentation
>>> +<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__
>>> diff --git a/NEWS b/NEWS
>>> index d45904e..50630f8 100644
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -18,6 +18,8 @@ Post-v2.8.0
>>>     - DPDK:
>>>       * Add support for DPDK v17.11
>>>       * Add support for vHost IOMMU
>>> +     * Optional dequeue zero copy feature for vHost ports can be enabled
>> per
>>> +       port via the boolean 'dq-zero-copy' option.
>>>
>>>  v2.8.0 - 31 Aug 2017
>>>  --------------------
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 879019e..03cc6cb 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -361,6 +361,9 @@ struct netdev_dpdk {
>>>          /* True if vHost device is 'up' and has been reconfigured at least once
>> */
>>>          bool vhost_reconfigured;
>>>          /* 3 pad bytes here. */
>>> +
>>> +        /* True if dq-zero-copy feature has successfully been enabled */
>>> +        bool dq_zc_enabled;
>>
>> "pad bytes" should be fixed above or removed entirely, or we may just
>> revert padding
>> here too because it's incorrect just like in dp_netdev_pmd_thread. The only
>> difference
>> is that struct netdev_dpdk aligned to cacheline.
>>
>>>      );
>>>
>>>      PADDED_MEMBERS(CACHE_LINE_SIZE,
>>> @@ -871,6 +874,7 @@ common_construct(struct netdev *netdev,
>> dpdk_port_t port_no,
>>>      dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>>>      ovsrcu_index_init(&dev->vid, -1);
>>>      dev->vhost_reconfigured = false;
>>> +    dev->dq_zc_enabled = false;
>>>      dev->attached = false;
>>>
>>>      ovsrcu_init(&dev->qos_conf, NULL);
>>> @@ -1383,6 +1387,29 @@ netdev_dpdk_ring_set_config(struct netdev
>> *netdev, const struct smap *args,
>>>      return 0;
>>>  }
>>>
>>> +static void
>>> +dpdk_vhost_set_config_helper(struct netdev_dpdk *dev,
>>> +                             const struct smap *args)
>>> +{
>>> +    bool needs_reconfigure = false;
>>> +    bool zc_requested = smap_get_bool(args, "dq-zero-copy", false);
>>> +
>>> +    if (zc_requested &&
>>> +            !(dev->vhost_driver_flags &
>> RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
>>> +        dev->vhost_driver_flags |=
>> RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
>>> +        needs_reconfigure = true;
>>> +    } else if (!zc_requested &&
>>> +            (dev->vhost_driver_flags &
>> RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
>>> +        dev->vhost_driver_flags &=
>> ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
>>> +        needs_reconfigure = true;
>>> +    }
>>> +
>>> +    /* Only try to change ZC mode when device is down */
>>> +    if (needs_reconfigure && (netdev_dpdk_get_vid(dev) == -1)) {
>>> +        netdev_request_reconfigure(&dev->up);
>>> +    }
>>> +}
>>> +
>>>  static int
>>>  netdev_dpdk_vhost_client_set_config(struct netdev *netdev,
>>>                                      const struct smap *args,
>>> @@ -1399,6 +1426,23 @@ netdev_dpdk_vhost_client_set_config(struct
>> netdev *netdev,
>>>              netdev_request_reconfigure(netdev);
>>>          }
>>>      }
>>> +
>>> +    dpdk_vhost_set_config_helper(dev, args);
>>> +
>>> +    ovs_mutex_unlock(&dev->mutex);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int
>>> +netdev_dpdk_vhost_set_config(struct netdev *netdev,
>>> +                             const struct smap *args,
>>> +                             char **errp OVS_UNUSED)
>>> +{
>>> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>> +
>>> +    ovs_mutex_lock(&dev->mutex);
>>> +    dpdk_vhost_set_config_helper(dev, args);
>>>      ovs_mutex_unlock(&dev->mutex);
>>>
>>>      return 0;
>>> @@ -2723,6 +2767,46 @@ netdev_dpdk_txq_map_clear(struct
>> netdev_dpdk *dev)
>>>      }
>>>  }
>>>
>>> +static void
>>> +vhost_change_zero_copy_mode(struct netdev_dpdk *dev, bool
>> client_mode,
>>> +                            bool enable)
>>> +{
>>> +    int err = rte_vhost_driver_unregister(dev->vhost_id);
>>> +
>>> +    if (err) {
>>> +        VLOG_ERR("Error unregistering vHost socket %s; can't change zero
>> copy "
>>> +                "mode", dev->vhost_id);
>>> +    } else {
>>> +        err = dpdk_setup_vhost_device(dev, client_mode);
>>> +        if (err) {
>>> +            VLOG_ERR("Error changing zero copy mode for vHost socket %s",
>>> +                    dev->vhost_id);
>>> +        } else if (enable) {
>>> +            dev->dq_zc_enabled = true;
>>> +            VLOG_INFO("Zero copy enabled for vHost socket %s", dev-
>>> vhost_id);
>>> +        } else {
>>> +            dev->dq_zc_enabled = false;
>>> +            VLOG_INFO("Zero copy disabled for vHost socket %s", dev-
>>> vhost_id);
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +static void
>>> +vhost_check_zero_copy_status(struct netdev_dpdk *dev)
>>> +{
>>> +    bool mode = dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT;
>>> +
>>> +    if ((dev->vhost_driver_flags &
>> RTE_VHOST_USER_DEQUEUE_ZERO_COPY)
>>> +                && !dev->dq_zc_enabled) {
>>> +        /* ZC disabled but requested to be enabled, enable it. */
>>> +        vhost_change_zero_copy_mode(dev, mode, true);
>>> +    } else if (!(dev->vhost_driver_flags &
>>> +            RTE_VHOST_USER_DEQUEUE_ZERO_COPY) && dev-
>>> dq_zc_enabled) {
>>> +        /* ZC enabled but requested to be disabled, disable it. */
>>> +        vhost_change_zero_copy_mode(dev, mode, false);
>>> +    }
>>> +}
>>> +
>>>  /*
>>>   * Remove a virtio-net device from the specific vhost port.  Use dev-
>>> remove
>>>   * flag to stop any more packets from being sent or received to/from a VM
>> and
>>> @@ -2768,6 +2852,7 @@ destroy_device(int vid)
>>>           */
>>>          ovsrcu_quiesce_start();
>>>          VLOG_INFO("vHost Device '%s' has been removed", ifname);
>>> +        netdev_request_reconfigure(&dev->up);
>>>      } else {
>>>          VLOG_INFO("vHost Device '%s' not found", ifname);
>>>      }
>>> @@ -3259,6 +3344,8 @@ dpdk_vhost_reconfigure_helper(struct
>> netdev_dpdk *dev)
>>>              /* Carrier status may need updating. */
>>>              netdev_change_seq_changed(&dev->up);
>>>          }
>>> +    } else {
>>> +        vhost_check_zero_copy_status(dev);
>>>      }
>>>
>>>      return 0;
>>> @@ -3421,7 +3508,7 @@ static const struct netdev_class dpdk_vhost_class
>> =
>>>          NULL,
>>>          netdev_dpdk_vhost_construct,
>>>          netdev_dpdk_vhost_destruct,
>>> -        NULL,
>>> +        netdev_dpdk_vhost_set_config,
>>>          NULL,
>>>          netdev_dpdk_vhost_send,
>>>          netdev_dpdk_vhost_get_carrier,
>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>>> index 4c317d0..e8409b4 100644
>>> --- a/vswitchd/vswitch.xml
>>> +++ b/vswitchd/vswitch.xml
>>> @@ -2669,6 +2669,17 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
>> type=patch options:peer=p1 \
>>>          </p>
>>>        </column>
>>>
>>> +      <column name="options" key="dq-zero-copy"
>>> +              type='{"type": "boolean"}'>
>>> +        <p>
>>> +          The value specifies whether or not to enable dequeue zero copy on
>>> +          the given interface.
>>> +          The port must be in an inactive state in order to enable or disable
>>> +          this feature.
>>> +          Only supported by dpdkvhostuserclient and dpdkvhostuser
>> interfaces.
>>> +        </p>
>>> +      </column>
>>> +
>>>        <column name="options" key="n_rxq_desc"
>>>                type='{"type": "integer", "minInteger": 1, "maxInteger": 4096}'>
>>>          <p>
>>>


More information about the dev mailing list