[ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

Kevin Traynor ktraynor at redhat.com
Thu Dec 7 14:54:36 UTC 2017


On 12/07/2017 12:27 PM, Kavanagh, Mark B wrote:
>> From: Ilya Maximets [mailto:i.maximets at samsung.com]
>> Sent: Thursday, December 7, 2017 12:19 PM
>> To: Kavanagh, Mark B <mark.b.kavanagh at intel.com>; Stokes, Ian
>> <ian.stokes at intel.com>; dev at openvswitch.org
>> Cc: maxime.coquelin at redhat.com; jan.scheurich at ericsson.com; Mooney, Sean K
>> <sean.k.mooney at intel.com>; Fischetti, Antonio <antonio.fischetti at intel.com>;
>> Bie, Tiwei <tiwei.bie at intel.com>; Mcnamara, John <john.mcnamara at intel.com>;
>> Guoshuai Li <ligs at dtdream.com>; ktraynor at redhat.com; Loftus, Ciara
>> <ciara.loftus at intel.com>; Yuanhan Liu <yliu at fridaylinux.org>
>> Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
>>
>> On 07.12.2017 14:32, Kavanagh, Mark B wrote:
>>> Yuanhan's old email address was used in the previous mail - updated
>> correctly here.
>>> -Mark
>>>
>>>> -----Original Message-----
>>>> From: ovs-dev-bounces at openvswitch.org [mailto:ovs-dev-
>> bounces at openvswitch.org]
>>>> On Behalf Of Kavanagh, Mark B
>>>> Sent: Thursday, December 7, 2017 11:24 AM
>>>> To: Stokes, Ian <ian.stokes at intel.com>; dev at openvswitch.org
>>>> Cc: Liu, Yuanhan <yuanhan.liu at intel.com>; Bie, Tiwei <tiwei.bie at intel.com>;
>>>> Mcnamara, John <john.mcnamara at intel.com>; maxime.coquelin at redhat.com;
>>>> i.maximets at samsung.com
>>>> Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
>>>>
>>>> Hi folks,
>>>>
>>>> Thanks for all of your respective reviews and testing of this patchset.
>>>>
>>>> It seems, however, that an issue in DPDK v17.11 has come to light that
>> affects
>>>> guests which use testpmd.
>>>> Specifically, traffic does not reach a guest when traffic is live prior to
>>>> kicking off the testpmd app within said guest, and at least two forwarding
>>>> cores are used. [1]
>>>>
>>>> This is explained in additional detail is [2], an extract of which is
>>>> reproduced below:
>>>>
>>>> 	"the vector Rx could be broken if backend has consumed all the avail
>>>> descs before the
>>>> 	 device is started. Because in current implementation, the vector Rx will
>>>> return immediately
>>>> 	 without refilling the avail ring if the used ring is empty. So we have
>>>> to refill
>>>> 	 the avail ring after flushing the elements in the used ring."
>>>>
>>>> This issue was initially uncovered by Antonio Fischetti, as part of the
>> 17.11
>>>> patchset validation, and has since been localized to DPDK, rather than OvS.
>>>> As a result, it seems now that we should not move to 17.11, but seek an
>> out-
>>>> of-tree 17.11.1 stable/bugfix release. I'm interested in opinions on this -
>>>> should we:
>>>> 	a) move to 17.11 now, note the issue above in the 'errata' section of the
>>>> documentation, and move to 17.11.1 when it becomes available in February of
>>>> next year
>>>> 	b) request the early release of the 17.11.1 stable branch, which
>>>> incorporates a fix for this issue (the possibility, and availability, of
>> which
>>>> are both TBD).
>>>>
>>>> Thanks in advance,
>>>> Mark
>>
>> Hmm. Isn't it a guest driver issue? Do we need to care so much about running
>> OVS inside the VM? If I assumed right, if we're running OVS not inside the VM,
>> there will be no issues on the OVS side. The issue happens if guest
>> application
>> uses DPDK 17.11, but it will happen regardless of the DPDK version on the host
>> side.
>>
>> So, the only bad scenario is running OVS inside the VM with virtio-net PMD
>> driver.
>> My question is: Do we need to care about this scenario so much?
>> If the answer is "not really" then we can use variant a).
>> But if it's very important thing we need to support than b) will be
>> preferable.
>>
>> Am I missing something?
> 
> Hey Ilya,
> 
> I've just had the exact same conversation with Sean Mooney locally.
> 
> I think the point that both yourself and Sean has made is completely valid, which puts option a) back on the table.
> 

a) Sounds ok to me. I think an early DPDK17.11.1 before OVS 2.9 would be
good in addition though. It is nicer that an OVS 2.9 user doesn't have
to know they can't use the latest DPDK in the guest.

Other than the outstanding documentation, the patches LGTM.

> Antonio has agreed to test that this works; once he confirms I can push v5 of the 17.11 patchet, and simply document the guest DPDK issue.
> 
> Thanks again,
> Mark
> 
> 
>>
>> Best regards, Ilya Maximets.
>>
>>
>>>>
>>>> [1] http://dpdk.org/ml/archives/dev/2017-December/082801.html
>>>> [2] http://dpdk.org/ml/archives/dev/2017-December/082874.html
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Stokes, Ian
>>>>> Sent: Thursday, December 7, 2017 8:43 AM
>>>>> To: Kavanagh, Mark B <mark.b.kavanagh at intel.com>; dev at openvswitch.org
>>>>> Cc: ktraynor at redhat.com; maxime.coquelin at redhat.com;
>> i.maximets at samsung.com;
>>>>> jan.scheurich at ericsson.com; Mooney, Sean K <sean.k.mooney at intel.com>
>>>>> Subject: RE: [ovs-dev][PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
>>>>>
>>>>>> DPDK v17.11 introduces support for the vHost IOMMU feature.
>>>>>> This is a security feature, which restricts the vhost memory that a
>> virtio
>>>>>> device may access.
>>>>>
>>>>> Hi all,
>>>>>
>>>>> There were a few requests for changes in the v4 of this patch and it's an
>>>>> important aspect of the DPDK 17.11 support for OVS.
>>>>>
>>>>> I'd like to get this series in to the pull request this week. If people
>> have
>>>>> flagged issues for the latest revision I'd appreciate if they could review
>>>> the
>>>>> latest revision and flag any new issues that need to be raised.
>>>>>
>>>>> Ian
>>>>>
>>>>>>
>>>>>> This feature also enables the vhost REPLY_ACK protocol, the
>> implementation
>>>>>> of which is known to work in newer versions of QEMU (i.e. v2.10.0), but
>> is
>>>>>> buggy in older versions (v2.7.0 - v2.9.0, inclusive). As such, the
>> feature
>>>>>> is disabled by default in (and should remain so), for the aforementioned
>>>>>> older QEMU verions. Starting with QEMU v2.9.1, vhost-iommu-support can
>>>>>> safely be enabled, even without having an IOMMU device, with no
>>>>>> performance penalty.
>>>>>>
>>>>>> This patch adds a new global config option, vhost-iommu-support, that
>>>>>> controls enablement of the vhost IOMMU feature:
>>>>>>
>>>>>>     ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
>>>>>>
>>>>>> This value defaults to false; to enable IOMMU support, this field should
>>>>>> be set to true when setting other global parameters on init (such as
>>>>>> "dpdk-socket-mem", for example). Changing the value at runtime is not
>>>>>> supported, and requires restarting the vswitch daemon.
>>>>>>
>>>>>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh at intel.com>
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> v4:  - rebase to HEAD of master
>>>>>>      - clarify that IOMMU support applies exclusively to vhost user
>>>>>>        client ports.
>>>>>>      - reword caveats regarding modifying vhost-iommu-support at runtime
>>>>>>      - use smap_get_bool() to parse vhost-iommu-support, instead of
>>>>>>        process_vhost_flags().
>>>>>>      - add stub implementation of dpdk_vhost_iommu_enabled().
>>>>>>      - move stdbool.h #include outside of DPDK compiler guards.
>>>>>>      - remove mention of vhost IOMMU mode from
>>>>>>        netdev_dpdk_vhost_client_configure() log.
>>>>>>
>>>>>> v3:  - erroneous; disregard.
>>>>>>
>>>>>> v2:  - rebase to HEAD of master
>>>>>>      - refactor vHost IOMMU enablement mechanism (use a global
>>>>>>        config option, instead of the previous per-port approach).
>>>>>> ---
>>>>>>  Documentation/topics/dpdk/vhost-user.rst | 27
>> +++++++++++++++++++++++++++
>>>>>>  NEWS                                     |  1 +
>>>>>>  lib/dpdk-stub.c                          |  6 ++++++
>>>>>>  lib/dpdk.c                               | 12 ++++++++++++
>>>>>>  lib/dpdk.h                               |  3 +++
>>>>>>  lib/netdev-dpdk.c                        | 14 ++++++++++----
>>>>>>  vswitchd/vswitch.xml                     | 15 +++++++++++++++
>>>>>>  7 files changed, 74 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/Documentation/topics/dpdk/vhost-user.rst
>>>>>> b/Documentation/topics/dpdk/vhost-user.rst
>>>>>> index 5347995..ffef653 100644
>>>>>> --- a/Documentation/topics/dpdk/vhost-user.rst
>>>>>> +++ b/Documentation/topics/dpdk/vhost-user.rst
>>>>>> @@ -273,6 +273,33 @@ One benefit of using this mode is the ability for
>>>>>> vHost ports to 'reconnect' in  event of the switch crashing or being
>>>>>> brought down. Once it is brought back up,  the vHost ports will reconnect
>>>>>> automatically and normal service will resume.
>>>>>>
>>>>>> +vhost-user-client IOMMU Support
>>>>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>> +
>>>>>> +vhost IOMMU is a feature which restricts the vhost memory that a virtio
>>>>>> +device can access, and as such is useful in deployments in which
>> security
>>>>>> is a concern.
>>>>>> +
>>>>>> +IOMMU support may be enabled via a global config value, ```vhost-iommu-
>>>>>> support```.
>>>>>> +Setting this to true enables vhost IOMMU support for all vhost ports
>>>>>> +when/where
>>>>>> +available::
>>>>>> +
>>>>>> +    $ ovs-vsctl set Open_vSwitch.other_config:vhost-iommu-support=true
>>>>>> +
>>>>>> +The default value is false.
>>>>>> +
>>>>>> +.. important::
>>>>>> +
>>>>>> +    Changing this value requires restarting the daemon.
>>>>>> +
>>>>>> +.. important::
>>>>>> +
>>>>>> +    Enabling the IOMMU feature also enables the vhost user reply-ack
>>>>>> protocol;
>>>>>> +    this is known to work on QEMU v2.10.0, but is buggy on older
>> versions
>>>>>> +    (2.7.0 - 2.9.0, inclusive). Consequently, the IOMMU feaure is
>>>>>> disabled by
>>>>>> +    default (and should remain so if using the aforementioned versions
>> of
>>>>>> QEMU).
>>>>>> +    Starting with QEMU v2.9.1, vhost-iommu-support can safely be
>> enabled,
>>>>>> even
>>>>>> +    without having an IOMMU device, with no performance penalty.
>>>>>> +
>>>>>>  .. _dpdk-testpmd:
>>>>>>
>>>>>>  DPDK in the Guest
>>>>>> diff --git a/NEWS b/NEWS
>>>>>> index d4a1c9a..99c47d8 100644
>>>>>> --- a/NEWS
>>>>>> +++ b/NEWS
>>>>>> @@ -15,6 +15,7 @@ Post-v2.8.0
>>>>>>       * Add support for compiling OVS with the latest Linux 4.13 kernel
>>>>>>     - DPDK:
>>>>>>       * Add support for DPDK v17.11
>>>>>> +     * Add support for vHost IOMMU
>>>>>>
>>>>>>  v2.8.0 - 31 Aug 2017
>>>>>>  --------------------
>>>>>> diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c index daef729..3602180
>>>>>> 100644
>>>>>> --- a/lib/dpdk-stub.c
>>>>>> +++ b/lib/dpdk-stub.c
>>>>>> @@ -48,3 +48,9 @@ dpdk_get_vhost_sock_dir(void)  {
>>>>>>      return NULL;
>>>>>>  }
>>>>>> +
>>>>>> +bool
>>>>>> +dpdk_vhost_iommu_enabled(void)
>>>>>> +{
>>>>>> +    return false;
>>>>>> +}
>>>>>> diff --git a/lib/dpdk.c b/lib/dpdk.c
>>>>>> index 8da6c32..6710d10 100644
>>>>>> --- a/lib/dpdk.c
>>>>>> +++ b/lib/dpdk.c
>>>>>> @@ -41,6 +41,7 @@ VLOG_DEFINE_THIS_MODULE(dpdk);
>>>>>>  static FILE *log_stream = NULL;       /* Stream for DPDK log redirection
>>>>>> */
>>>>>>
>>>>>>  static char *vhost_sock_dir = NULL;   /* Location of vhost-user sockets
>>>>>> */
>>>>>> +static bool vhost_iommu_enabled = false; /* Status of vHost IOMMU
>>>>>> +support */
>>>>>>
>>>>>>  static int
>>>>>>  process_vhost_flags(char *flag, const char *default_val, int size, @@ -
>>>>>> 345,6 +346,11 @@ dpdk_init__(const struct smap *ovs_other_config)
>>>>>>          vhost_sock_dir = sock_dir_subcomponent;
>>>>>>      }
>>>>>>
>>>>>> +    vhost_iommu_enabled = smap_get_bool(ovs_other_config,
>>>>>> +                                        "vhost-iommu-support", false);
>>>>>> +    VLOG_INFO("IOMMU support for vhost-user-client %s.",
>>>>>> +               vhost_iommu_enabled ? "enabled" : "disabled");
>>>>>> +
>>>>>>      argv = grow_argv(&argv, 0, 1);
>>>>>>      argc = 1;
>>>>>>      argv[0] = xstrdup(ovs_get_program_name()); @@ -482,6 +488,12 @@
>>>>>> dpdk_get_vhost_sock_dir(void)
>>>>>>      return vhost_sock_dir;
>>>>>>  }
>>>>>>
>>>>>> +bool
>>>>>> +dpdk_vhost_iommu_enabled(void)
>>>>>> +{
>>>>>> +    return vhost_iommu_enabled;
>>>>>> +}
>>>>>> +
>>>>>>  void
>>>>>>  dpdk_set_lcore_id(unsigned cpu)
>>>>>>  {
>>>>>> diff --git a/lib/dpdk.h b/lib/dpdk.h
>>>>>> index 673a1f1..dc58d96 100644
>>>>>> --- a/lib/dpdk.h
>>>>>> +++ b/lib/dpdk.h
>>>>>> @@ -17,6 +17,8 @@
>>>>>>  #ifndef DPDK_H
>>>>>>  #define DPDK_H
>>>>>>
>>>>>> +#include <stdbool.h>
>>>>>> +
>>>>>>  #ifdef DPDK_NETDEV
>>>>>>
>>>>>>  #include <rte_config.h>
>>>>>> @@ -35,5 +37,6 @@ struct smap;
>>>>>>  void dpdk_init(const struct smap *ovs_other_config);  void
>>>>>> dpdk_set_lcore_id(unsigned cpu);  const char
>>>>>> *dpdk_get_vhost_sock_dir(void);
>>>>>> +bool dpdk_vhost_iommu_enabled(void);
>>>>>>
>>>>>>  #endif /* dpdk.h */
>>>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index f552444..9715c39
>>>>>> 100644
>>>>>> --- a/lib/netdev-dpdk.c
>>>>>> +++ b/lib/netdev-dpdk.c
>>>>>> @@ -3253,6 +3253,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev
>>>>>> *netdev)  {
>>>>>>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>>>>>      int err;
>>>>>> +    uint64_t vhost_flags = 0;
>>>>>>
>>>>>>      ovs_mutex_lock(&dev->mutex);
>>>>>>
>>>>>> @@ -3263,16 +3264,21 @@ netdev_dpdk_vhost_client_reconfigure(struct
>> netdev
>>>>>> *netdev)
>>>>>>       */
>>>>>>      if (!(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT)
>>>>>>              && strlen(dev->vhost_id)) {
>>>>>> -        /* Register client-mode device */
>>>>>> -        err = rte_vhost_driver_register(dev->vhost_id,
>>>>>> -                                        RTE_VHOST_USER_CLIENT);
>>>>>> +        /* Register client-mode device. */
>>>>>> +        vhost_flags |= RTE_VHOST_USER_CLIENT;
>>>>>> +
>>>>>> +        /* Enable IOMMU support, if explicitly requested. */
>>>>>> +        if (dpdk_vhost_iommu_enabled()) {
>>>>>> +            vhost_flags |= RTE_VHOST_USER_IOMMU_SUPPORT;
>>>>>> +        }
>>>>>> +        err = rte_vhost_driver_register(dev->vhost_id, vhost_flags);
>>>>>>          if (err) {
>>>>>>              VLOG_ERR("vhost-user device setup failure for device %s\n",
>>>>>>                       dev->vhost_id);
>>>>>>              goto unlock;
>>>>>>          } else {
>>>>>>              /* Configuration successful */
>>>>>> -            dev->vhost_driver_flags |= RTE_VHOST_USER_CLIENT;
>>>>>> +            dev->vhost_driver_flags |= vhost_flags;
>>>>>>              VLOG_INFO("vHost User device '%s' created in 'client' mode,
>> "
>>>>>>                        "using client socket '%s'",
>>>>>>                        dev->up.name, dev->vhost_id); diff --git
>>>>>> a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index c145e1a..6c6df50
>>>>>> 100644
>>>>>> --- a/vswitchd/vswitch.xml
>>>>>> +++ b/vswitchd/vswitch.xml
>>>>>> @@ -344,6 +344,21 @@
>>>>>>          </p>
>>>>>>        </column>
>>>>>>
>>>>>> +      <column name="other_config" key="vhost-iommu-support"
>>>>>> +              type='{"type": "boolean"}'>
>>>>>> +        <p>
>>>>>> +          vHost IOMMU is a  security feature, which restricts the vhost
>>>>>> memory
>>>>>> +          that a virtio device may access. vHost IOMMU support is
>>>>>> disabled by
>>>>>> +          default, due to a bug in QEMU implementations of the vhost
>>>>>> REPLY_ACK
>>>>>> +          protocol, (on which vHost IOMMU relies) prior to v2.9.1.
>>>>>> Setting this
>>>>>> +          value to <code>true</code> enables vHost IOMMU support for
>>>>>> vHost User
>>>>>> +          Client ports in OvS-DPDK, starting from DPDK v17.11.
>>>>>> +        </p>
>>>>>> +        <p>
>>>>>> +          Changing this value requires restarting the daemon.
>>>>>> +        </p>
>>>>>> +      </column>
>>>>>> +
>>>>>>        <column name="other_config" key="n-handler-threads"
>>>>>>                type='{"type": "integer", "minInteger": 1}'>
>>>>>>          <p>
>>>>>> --
>>>>>> 2.7.4
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev at openvswitch.org
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>>>
>>>



More information about the dev mailing list