[ovs-dev] [PATCH v8] netdev-dpdk: Add support for vHost dequeue zero copy (experimental)

Ilya Maximets i.maximets at samsung.com
Fri Jan 19 15:56:28 UTC 2018


On 19.01.2018 18:31, Loftus, Ciara wrote:
>>
>> On 05.01.2018 19:13, Ciara Loftus wrote:
>>> Zero copy is disabled by default. To enable it, set the 'dq-zero-copy'
>>> option to 'true' when configuring the Interface:
>>>
>>> ovs-vsctl set Interface dpdkvhostuserclient0
>>> options:vhost-server-path=/tmp/dpdkvhostuserclient0
>>> options:dq-zero-copy=true
>>>
>>> When packets from a vHost device with zero copy enabled are destined for
>>> a single 'dpdk' port, the number of tx descriptors on that 'dpdk' port
>>> must be set to a smaller value. 128 is recommended. This can be achieved
>>> like so:
>>>
>>> ovs-vsctl set Interface dpdkport options:n_txq_desc=128
>>>
>>> Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send
>>> to should not exceed 128. Due to this requirement, the feature is
>>> considered 'experimental'.
>>>
>>> Testing of the patch showed a 15% improvement when switching 512B
>>> packets between vHost devices on different VMs on the same host when
>>> zero copy was enabled on the transmitting device.
>>>
>>> Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
>>> ---
>>> v8:
>>> * Disallow configurability after vHost device has been registered &
>>> update docs accordingly.
>>> * Give performance datapoint in commit message.
>>>
>>>  Documentation/intro/install/dpdk.rst     |  2 +
>>>  Documentation/topics/dpdk/vhost-user.rst | 72
>> ++++++++++++++++++++++++++++++++
>>>  NEWS                                     |  1 +
>>>  lib/netdev-dpdk.c                        |  9 +++-
>>>  vswitchd/vswitch.xml                     | 11 +++++
>>>  5 files changed, 94 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/Documentation/intro/install/dpdk.rst
>> b/Documentation/intro/install/dpdk.rst
>>> index 3fecb5c..087eb88 100644
>>> --- a/Documentation/intro/install/dpdk.rst
>>> +++ b/Documentation/intro/install/dpdk.rst
>>> @@ -518,6 +518,8 @@ The above command sets the number of rx queues
>> for DPDK physical interface.
>>>  The rx queues are assigned to pmd threads on the same NUMA node in a
>>>  round-robin fashion.
>>>
>>> +.. _dpdk-queues-sizes:
>>> +
>>>  DPDK Physical Port Queue Sizes
>>>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>> diff --git a/Documentation/topics/dpdk/vhost-user.rst
>> b/Documentation/topics/dpdk/vhost-user.rst
>>> index 8447e2d..1a6c6d0 100644
>>> --- a/Documentation/topics/dpdk/vhost-user.rst
>>> +++ b/Documentation/topics/dpdk/vhost-user.rst
>>> @@ -458,3 +458,75 @@ Sample XML
>>>      </domain>
>>>
>>>  .. _QEMU documentation: http://git.qemu-
>> project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-
>> user.txt;h=7890d7169;hb=HEAD
>>> +
>>> +vhost-user Dequeue Zero Copy (experimental)
>>> +-------------------------------------------
>>> +
>>> +Normally when dequeuing a packet from a vHost User device, a memcpy
>> operation
>>> +must be used to copy that packet from guest address space to host
>> address
>>> +space. This memcpy can be removed by enabling dequeue zero-copy like
>> so::
>>> +
>>> +    $ ovs-vsctl add-port br0 dpdkvhostuserclient0 -- set Interface \
>>> +        dpdkvhostuserclient0 type=dpdkvhostuserclient \
>>> +        options:vhost-server-path=/tmp/dpdkvhostclient0 \
>>> +        options:dq-zero-copy=true
>>> +
>>> +With this feature enabled, a reference (pointer) to the packet is passed to
>>> +the host, instead of a copy of the packet. Removing this memcpy can give
>> a
>>> +performance improvement for some use cases, for example switching
>> large packets
>>> +between different VMs. However additional packet loss may be
>> observed.
>>> +
>>> +Note that the feature is disabled by default and must be explicitly enabled
>>> +by setting the 'dq-zero-copy' option to 'true' while specifying the
>>> +'vhost-server-path' option as above. If you wish to split out the command
>> into
>>> +multiple commands as below, ensure 'dq-zero-copy' is set before
>>> +'vhost-server-path'::
>>> +
>>> +    $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-
>> copy=true
>>> +    $ ovs-vsctl set Interface dpdkvhostuserclient0 \
>>> +        options:vhost-server-path=/tmp/dpdkvhostclient0
>>> +
>>> +The feature is only available to dpdkvhostuserclient port types.
>>> +
>>> +A limitation exists whereby if packets from a vHost port with dq-zero-
>> copy=true
>>> +are destined for a 'dpdk' type port, the number of tx descriptors
>> (n_txq_desc)
>>> +for that port must be reduced to a smaller number, 128 being the
>> recommended
>>> +value. This can be achieved by issuing the following command::
>>> +
>>> +    $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
>>> +
>>> +Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send to
>>> +should not exceed 128. For example, in case of a bond over two physical
>> ports
>>> +in balance-tcp mode, one must divide 128 by the number of links in the
>> bond.
>>> +
>>> +Refer to :ref:`<dpdk-queue-sizes>` for more information.
>>> +
>>> +The reason for this limitation is due to how the zero copy functionality is
>>> +implemented. The vHost device's 'tx used vring', a virtio structure used for
>>> +tracking used ie. sent descriptors, will only be updated when the NIC
>> frees
>>> +the corresponding mbuf. If we don't free the mbufs frequently enough,
>> that
>>> +vring will be starved and packets will no longer be processed. One way to
>>> +ensure we don't encounter this scenario, is to configure n_txq_desc to a
>> small
>>> +enough number such that the 'mbuf free threshold' for the NIC will be hit
>> more
>>> +often and thus free mbufs more frequently. The value of 128 is
>> suggested, but
>>> +values of 64 and 256 have been tested and verified to work too, with
>> differing
>>> +performance characteristics. A value of 512 can be used too, if the virtio
>>> +queue size in the guest is increased to 1024 (available to configure in
>> QEMU
>>> +versions v2.10 and greater). This value can be set like so::
>>> +
>>> +    $ qemu-system-x86_64 ... -chardev
>> socket,id=char1,path=<sockpath>,server
>>> +      -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
>>> +      -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,
>>> +      tx_queue_size=1024
>>> +
>>> +Because of this limitation, this feature is condsidered 'experimental'.
>>> +
>>> +The feature currently does not fully work with QEMU >= v2.7 due to a bug
>> in
>>> +DPDK which will be addressed in an upcoming release. The patch to fix this
>>> +issue can be found on
>>> +`Patchwork
>>> +<http://dpdk.org/dev/patchwork/patch/32198/>`__
>>> +
>>> +Further information can be found in the
>>> +`DPDK documentation
>>> +<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__
>>
>> There are many keywords like 'vhost-server-path', 'dq-zero-copy', 'true' and
>> so on in above text. IMHO, they should be marked like ``something`` to be
>> highlighted.
>>
>>> diff --git a/NEWS b/NEWS
>>> index 752e98f..1eaa32c 100644
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -24,6 +24,7 @@ Post-v2.8.0
>>>     - DPDK:
>>>       * Add support for DPDK v17.11
>>>       * Add support for vHost IOMMU
>>> +     * Add support for vHost dequeue zero copy (experimental)
>>>
>>>  v2.8.0 - 31 Aug 2017
>>>  --------------------
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 8f22264..d100b31 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -1384,6 +1384,10 @@ netdev_dpdk_vhost_client_set_config(struct
>> netdev *netdev,
>>>          path = smap_get(args, "vhost-server-path");
>>>          if (path && strcmp(path, dev->vhost_id)) {
>>>              strcpy(dev->vhost_id, path);
>>> +            /* check zero copy configuration */
>>> +            if (smap_get_bool(args, "dq-zero-copy", false)) {
>>> +                dev->vhost_driver_flags |=
>> RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
>>
>> In case of rte_vhost_driver_register() failure while reconfiguration we'll not
>> be able to disable "dq-zero-copy" for that port, becose we're only anble to
>> enable it.
>> For example, it could happen in case of typo in vhost_id.
> 
> Hi Ilya,
> 
> Can you clarify what you want here. Previously we had the ability to enable/disable zc but I removed it due to your comments.
> Do you want to be able to enable/disable zc before successful driver register but not after? What is the benefit?

Hello Ciara,
I'm thinking about following case:

Trying to register device with vhost-server-path=A and dq-zero-copy=true and
it fails for whatever reason. After that I can change vhost-server-path to
something else, but have no way to disable the zero-copy mode.

I'm not sure if there any profit from changing zero-copy mode after the
driver register failure, but it's not hard to fix and, probably, could
save someone's nerves.

I think, simple 'else' condition should work here:

    } else {
        dev->vhost_driver_flags &= ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
    }

What do you think?

> 
> Thanks,
> Ciara
> 
>>
>>> +            }
>>>              netdev_request_reconfigure(netdev);
>>>          }
>>>      }
>>> @@ -3278,7 +3282,7 @@ netdev_dpdk_vhost_client_reconfigure(struct
>> netdev *netdev)
>>>  {
>>>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>>      int err;
>>> -    uint64_t vhost_flags = 0;
>>> +    uint64_t vhost_flags = dev->vhost_driver_flags;
>>
>> This should be under the mutex.
>>
>>>
>>>      ovs_mutex_lock(&dev->mutex);
>>>
>>> @@ -3307,6 +3311,9 @@ netdev_dpdk_vhost_client_reconfigure(struct
>> netdev *netdev)
>>>              VLOG_INFO("vHost User device '%s' created in 'client' mode, "
>>>                        "using client socket '%s'",
>>>                        dev->up.name, dev->vhost_id);
>>> +            if (dev->vhost_driver_flags &
>> RTE_VHOST_USER_DEQUEUE_ZERO_COPY) {
>>> +                VLOG_INFO("Zero copy enabled for vHost port %s", dev-
>>> up.name);
>>> +            }
>>>          }
>>>
>>>          err = rte_vhost_driver_callback_register(dev->vhost_id,
>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>>> index 018d644..16b3c3e 100644
>>> --- a/vswitchd/vswitch.xml
>>> +++ b/vswitchd/vswitch.xml
>>> @@ -2669,6 +2669,17 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
>> type=patch options:peer=p1 \
>>>          </p>
>>>        </column>
>>>
>>> +      <column name="options" key="dq-zero-copy"
>>> +              type='{"type": "boolean"}'>
>>> +        <p>
>>> +          The value specifies whether or not to enable dequeue zero copy on
>>> +          the given interface.
>>> +          Must be set before vhost-server-path is specified.
>>> +          Only supported by dpdkvhostuserclient interfaces.
>>> +          The feature is considered experimental.
>>> +        </p>
>>> +      </column>
>>> +
>>>        <column name="options" key="n_rxq_desc"
>>>                type='{"type": "integer", "minInteger": 1, "maxInteger": 4096}'>
>>>          <p>
>>>


More information about the dev mailing list