[ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
Kavanagh, Mark B
mark.b.kavanagh at intel.com
Thu Dec 7 12:27:32 UTC 2017
>From: Ilya Maximets [mailto:i.maximets at samsung.com]
>Sent: Thursday, December 7, 2017 12:19 PM
>To: Kavanagh, Mark B <mark.b.kavanagh at intel.com>; Stokes, Ian
><ian.stokes at intel.com>; dev at openvswitch.org
>Cc: maxime.coquelin at redhat.com; jan.scheurich at ericsson.com; Mooney, Sean K
><sean.k.mooney at intel.com>; Fischetti, Antonio <antonio.fischetti at intel.com>;
>Bie, Tiwei <tiwei.bie at intel.com>; Mcnamara, John <john.mcnamara at intel.com>;
>Guoshuai Li <ligs at dtdream.com>; ktraynor at redhat.com; Loftus, Ciara
><ciara.loftus at intel.com>; Yuanhan Liu <yliu at fridaylinux.org>
>Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
>
>On 07.12.2017 14:32, Kavanagh, Mark B wrote:
>> Yuanhan's old email address was used in the previous mail - updated
>correctly here.
>> -Mark
>>
>>> -----Original Message-----
>>> From: ovs-dev-bounces at openvswitch.org [mailto:ovs-dev-
>bounces at openvswitch.org]
>>> On Behalf Of Kavanagh, Mark B
>>> Sent: Thursday, December 7, 2017 11:24 AM
>>> To: Stokes, Ian <ian.stokes at intel.com>; dev at openvswitch.org
>>> Cc: Liu, Yuanhan <yuanhan.liu at intel.com>; Bie, Tiwei <tiwei.bie at intel.com>;
>>> Mcnamara, John <john.mcnamara at intel.com>; maxime.coquelin at redhat.com;
>>> i.maximets at samsung.com
>>> Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
>>>
>>> Hi folks,
>>>
>>> Thanks for all of your respective reviews and testing of this patchset.
>>>
>>> It seems, however, that an issue in DPDK v17.11 has come to light that
>affects
>>> guests which use testpmd.
>>> Specifically, traffic does not reach a guest when traffic is live prior to
>>> kicking off the testpmd app within said guest, and at least two forwarding
>>> cores are used. [1]
>>>
>>> This is explained in additional detail is [2], an extract of which is
>>> reproduced below:
>>>
>>> "the vector Rx could be broken if backend has consumed all the avail
>>> descs before the
>>> device is started. Because in current implementation, the vector Rx will
>>> return immediately
>>> without refilling the avail ring if the used ring is empty. So we have
>>> to refill
>>> the avail ring after flushing the elements in the used ring."
>>>
>>> This issue was initially uncovered by Antonio Fischetti, as part of the
>17.11
>>> patchset validation, and has since been localized to DPDK, rather than OvS.
>>> As a result, it seems now that we should not move to 17.11, but seek an
>out-
>>> of-tree 17.11.1 stable/bugfix release. I'm interested in opinions on this -
>>> should we:
>>> a) move to 17.11 now, note the issue above in the 'errata' section of the
>>> documentation, and move to 17.11.1 when it becomes available in February of
>>> next year
>>> b) request the early release of the 17.11.1 stable branch, which
>>> incorporates a fix for this issue (the possibility, and availability, of
>which
>>> are both TBD).
>>>
>>> Thanks in advance,
>>> Mark
>
>Hmm. Isn't it a guest driver issue? Do we need to care so much about running
>OVS inside the VM? If I assumed right, if we're running OVS not inside the VM,
>there will be no issues on the OVS side. The issue happens if guest
>application
>uses DPDK 17.11, but it will happen regardless of the DPDK version on the host
>side.
>
>So, the only bad scenario is running OVS inside the VM with virtio-net PMD
>driver.
>My question is: Do we need to care about this scenario so much?
>If the answer is "not really" then we can use variant a).
>But if it's very important thing we need to support than b) will be
>preferable.
>
>Am I missing something?
Hey Ilya,
I've just had the exact same conversation with Sean Mooney locally.
I think the point that both yourself and Sean has made is completely valid, which puts option a) back on the table.
Antonio has agreed to test that this works; once he confirms I can push v5 of the 17.11 patchet, and simply document the guest DPDK issue.
Thanks again,
Mark
>
>Best regards, Ilya Maximets.
>
>
>>>
>>> [1] http://dpdk.org/ml/archives/dev/2017-December/082801.html
>>> [2] http://dpdk.org/ml/archives/dev/2017-December/082874.html
>>>
>>>
>>>> -----Original Message-----
>>>> From: Stokes, Ian
>>>> Sent: Thursday, December 7, 2017 8:43 AM
>>>> To: Kavanagh, Mark B <mark.b.kavanagh at intel.com>; dev at openvswitch.org
>>>> Cc: ktraynor at redhat.com; maxime.coquelin at redhat.com;
>i.maximets at samsung.com;
>>>> jan.scheurich at ericsson.com; Mooney, Sean K <sean.k.mooney at intel.com>
>>>> Subject: RE: [ovs-dev][PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
>>>>
>>>>> DPDK v17.11 introduces support for the vHost IOMMU feature.
>>>>> This is a security feature, which restricts the vhost memory that a
>virtio
>>>>> device may access.
>>>>
>>>> Hi all,
>>>>
>>>> There were a few requests for changes in the v4 of this patch and it's an
>>>> important aspect of the DPDK 17.11 support for OVS.
>>>>
>>>> I'd like to get this series in to the pull request this week. If people
>have
>>>> flagged issues for the latest revision I'd appreciate if they could review
>>> the
>>>> latest revision and flag any new issues that need to be raised.
>>>>
>>>> Ian
>>>>
>>>>>
>>>>> This feature also enables the vhost REPLY_ACK protocol, the
>implementation
>>>>> of which is known to work in newer versions of QEMU (i.e. v2.10.0), but
>is
>>>>> buggy in older versions (v2.7.0 - v2.9.0, inclusive). As such, the
>feature
>>>>> is disabled by default in (and should remain so), for the aforementioned
>>>>> older QEMU verions. Starting with QEMU v2.9.1, vhost-iommu-support can
>>>>> safely be enabled, even without having an IOMMU device, with no
>>>>> performance penalty.
>>>>>
>>>>> This patch adds a new global config option, vhost-iommu-support, that
>>>>> controls enablement of the vhost IOMMU feature:
>>>>>
>>>>> ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
>>>>>
>>>>> This value defaults to false; to enable IOMMU support, this field should
>>>>> be set to true when setting other global parameters on init (such as
>>>>> "dpdk-socket-mem", for example). Changing the value at runtime is not
>>>>> supported, and requires restarting the vswitch daemon.
>>>>>
>>>>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh at intel.com>
>>>>>
>>>>> ---
>>>>>
>>>>> v4: - rebase to HEAD of master
>>>>> - clarify that IOMMU support applies exclusively to vhost user
>>>>> client ports.
>>>>> - reword caveats regarding modifying vhost-iommu-support at runtime
>>>>> - use smap_get_bool() to parse vhost-iommu-support, instead of
>>>>> process_vhost_flags().
>>>>> - add stub implementation of dpdk_vhost_iommu_enabled().
>>>>> - move stdbool.h #include outside of DPDK compiler guards.
>>>>> - remove mention of vhost IOMMU mode from
>>>>> netdev_dpdk_vhost_client_configure() log.
>>>>>
>>>>> v3: - erroneous; disregard.
>>>>>
>>>>> v2: - rebase to HEAD of master
>>>>> - refactor vHost IOMMU enablement mechanism (use a global
>>>>> config option, instead of the previous per-port approach).
>>>>> ---
>>>>> Documentation/topics/dpdk/vhost-user.rst | 27
>+++++++++++++++++++++++++++
>>>>> NEWS | 1 +
>>>>> lib/dpdk-stub.c | 6 ++++++
>>>>> lib/dpdk.c | 12 ++++++++++++
>>>>> lib/dpdk.h | 3 +++
>>>>> lib/netdev-dpdk.c | 14 ++++++++++----
>>>>> vswitchd/vswitch.xml | 15 +++++++++++++++
>>>>> 7 files changed, 74 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/Documentation/topics/dpdk/vhost-user.rst
>>>>> b/Documentation/topics/dpdk/vhost-user.rst
>>>>> index 5347995..ffef653 100644
>>>>> --- a/Documentation/topics/dpdk/vhost-user.rst
>>>>> +++ b/Documentation/topics/dpdk/vhost-user.rst
>>>>> @@ -273,6 +273,33 @@ One benefit of using this mode is the ability for
>>>>> vHost ports to 'reconnect' in event of the switch crashing or being
>>>>> brought down. Once it is brought back up, the vHost ports will reconnect
>>>>> automatically and normal service will resume.
>>>>>
>>>>> +vhost-user-client IOMMU Support
>>>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>> +
>>>>> +vhost IOMMU is a feature which restricts the vhost memory that a virtio
>>>>> +device can access, and as such is useful in deployments in which
>security
>>>>> is a concern.
>>>>> +
>>>>> +IOMMU support may be enabled via a global config value, ```vhost-iommu-
>>>>> support```.
>>>>> +Setting this to true enables vhost IOMMU support for all vhost ports
>>>>> +when/where
>>>>> +available::
>>>>> +
>>>>> + $ ovs-vsctl set Open_vSwitch.other_config:vhost-iommu-support=true
>>>>> +
>>>>> +The default value is false.
>>>>> +
>>>>> +.. important::
>>>>> +
>>>>> + Changing this value requires restarting the daemon.
>>>>> +
>>>>> +.. important::
>>>>> +
>>>>> + Enabling the IOMMU feature also enables the vhost user reply-ack
>>>>> protocol;
>>>>> + this is known to work on QEMU v2.10.0, but is buggy on older
>versions
>>>>> + (2.7.0 - 2.9.0, inclusive). Consequently, the IOMMU feaure is
>>>>> disabled by
>>>>> + default (and should remain so if using the aforementioned versions
>of
>>>>> QEMU).
>>>>> + Starting with QEMU v2.9.1, vhost-iommu-support can safely be
>enabled,
>>>>> even
>>>>> + without having an IOMMU device, with no performance penalty.
>>>>> +
>>>>> .. _dpdk-testpmd:
>>>>>
>>>>> DPDK in the Guest
>>>>> diff --git a/NEWS b/NEWS
>>>>> index d4a1c9a..99c47d8 100644
>>>>> --- a/NEWS
>>>>> +++ b/NEWS
>>>>> @@ -15,6 +15,7 @@ Post-v2.8.0
>>>>> * Add support for compiling OVS with the latest Linux 4.13 kernel
>>>>> - DPDK:
>>>>> * Add support for DPDK v17.11
>>>>> + * Add support for vHost IOMMU
>>>>>
>>>>> v2.8.0 - 31 Aug 2017
>>>>> --------------------
>>>>> diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c index daef729..3602180
>>>>> 100644
>>>>> --- a/lib/dpdk-stub.c
>>>>> +++ b/lib/dpdk-stub.c
>>>>> @@ -48,3 +48,9 @@ dpdk_get_vhost_sock_dir(void) {
>>>>> return NULL;
>>>>> }
>>>>> +
>>>>> +bool
>>>>> +dpdk_vhost_iommu_enabled(void)
>>>>> +{
>>>>> + return false;
>>>>> +}
>>>>> diff --git a/lib/dpdk.c b/lib/dpdk.c
>>>>> index 8da6c32..6710d10 100644
>>>>> --- a/lib/dpdk.c
>>>>> +++ b/lib/dpdk.c
>>>>> @@ -41,6 +41,7 @@ VLOG_DEFINE_THIS_MODULE(dpdk);
>>>>> static FILE *log_stream = NULL; /* Stream for DPDK log redirection
>>>>> */
>>>>>
>>>>> static char *vhost_sock_dir = NULL; /* Location of vhost-user sockets
>>>>> */
>>>>> +static bool vhost_iommu_enabled = false; /* Status of vHost IOMMU
>>>>> +support */
>>>>>
>>>>> static int
>>>>> process_vhost_flags(char *flag, const char *default_val, int size, @@ -
>>>>> 345,6 +346,11 @@ dpdk_init__(const struct smap *ovs_other_config)
>>>>> vhost_sock_dir = sock_dir_subcomponent;
>>>>> }
>>>>>
>>>>> + vhost_iommu_enabled = smap_get_bool(ovs_other_config,
>>>>> + "vhost-iommu-support", false);
>>>>> + VLOG_INFO("IOMMU support for vhost-user-client %s.",
>>>>> + vhost_iommu_enabled ? "enabled" : "disabled");
>>>>> +
>>>>> argv = grow_argv(&argv, 0, 1);
>>>>> argc = 1;
>>>>> argv[0] = xstrdup(ovs_get_program_name()); @@ -482,6 +488,12 @@
>>>>> dpdk_get_vhost_sock_dir(void)
>>>>> return vhost_sock_dir;
>>>>> }
>>>>>
>>>>> +bool
>>>>> +dpdk_vhost_iommu_enabled(void)
>>>>> +{
>>>>> + return vhost_iommu_enabled;
>>>>> +}
>>>>> +
>>>>> void
>>>>> dpdk_set_lcore_id(unsigned cpu)
>>>>> {
>>>>> diff --git a/lib/dpdk.h b/lib/dpdk.h
>>>>> index 673a1f1..dc58d96 100644
>>>>> --- a/lib/dpdk.h
>>>>> +++ b/lib/dpdk.h
>>>>> @@ -17,6 +17,8 @@
>>>>> #ifndef DPDK_H
>>>>> #define DPDK_H
>>>>>
>>>>> +#include <stdbool.h>
>>>>> +
>>>>> #ifdef DPDK_NETDEV
>>>>>
>>>>> #include <rte_config.h>
>>>>> @@ -35,5 +37,6 @@ struct smap;
>>>>> void dpdk_init(const struct smap *ovs_other_config); void
>>>>> dpdk_set_lcore_id(unsigned cpu); const char
>>>>> *dpdk_get_vhost_sock_dir(void);
>>>>> +bool dpdk_vhost_iommu_enabled(void);
>>>>>
>>>>> #endif /* dpdk.h */
>>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index f552444..9715c39
>>>>> 100644
>>>>> --- a/lib/netdev-dpdk.c
>>>>> +++ b/lib/netdev-dpdk.c
>>>>> @@ -3253,6 +3253,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev
>>>>> *netdev) {
>>>>> struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>>>> int err;
>>>>> + uint64_t vhost_flags = 0;
>>>>>
>>>>> ovs_mutex_lock(&dev->mutex);
>>>>>
>>>>> @@ -3263,16 +3264,21 @@ netdev_dpdk_vhost_client_reconfigure(struct
>netdev
>>>>> *netdev)
>>>>> */
>>>>> if (!(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT)
>>>>> && strlen(dev->vhost_id)) {
>>>>> - /* Register client-mode device */
>>>>> - err = rte_vhost_driver_register(dev->vhost_id,
>>>>> - RTE_VHOST_USER_CLIENT);
>>>>> + /* Register client-mode device. */
>>>>> + vhost_flags |= RTE_VHOST_USER_CLIENT;
>>>>> +
>>>>> + /* Enable IOMMU support, if explicitly requested. */
>>>>> + if (dpdk_vhost_iommu_enabled()) {
>>>>> + vhost_flags |= RTE_VHOST_USER_IOMMU_SUPPORT;
>>>>> + }
>>>>> + err = rte_vhost_driver_register(dev->vhost_id, vhost_flags);
>>>>> if (err) {
>>>>> VLOG_ERR("vhost-user device setup failure for device %s\n",
>>>>> dev->vhost_id);
>>>>> goto unlock;
>>>>> } else {
>>>>> /* Configuration successful */
>>>>> - dev->vhost_driver_flags |= RTE_VHOST_USER_CLIENT;
>>>>> + dev->vhost_driver_flags |= vhost_flags;
>>>>> VLOG_INFO("vHost User device '%s' created in 'client' mode,
>"
>>>>> "using client socket '%s'",
>>>>> dev->up.name, dev->vhost_id); diff --git
>>>>> a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index c145e1a..6c6df50
>>>>> 100644
>>>>> --- a/vswitchd/vswitch.xml
>>>>> +++ b/vswitchd/vswitch.xml
>>>>> @@ -344,6 +344,21 @@
>>>>> </p>
>>>>> </column>
>>>>>
>>>>> + <column name="other_config" key="vhost-iommu-support"
>>>>> + type='{"type": "boolean"}'>
>>>>> + <p>
>>>>> + vHost IOMMU is a security feature, which restricts the vhost
>>>>> memory
>>>>> + that a virtio device may access. vHost IOMMU support is
>>>>> disabled by
>>>>> + default, due to a bug in QEMU implementations of the vhost
>>>>> REPLY_ACK
>>>>> + protocol, (on which vHost IOMMU relies) prior to v2.9.1.
>>>>> Setting this
>>>>> + value to <code>true</code> enables vHost IOMMU support for
>>>>> vHost User
>>>>> + Client ports in OvS-DPDK, starting from DPDK v17.11.
>>>>> + </p>
>>>>> + <p>
>>>>> + Changing this value requires restarting the daemon.
>>>>> + </p>
>>>>> + </column>
>>>>> +
>>>>> <column name="other_config" key="n-handler-threads"
>>>>> type='{"type": "integer", "minInteger": 1}'>
>>>>> <p>
>>>>> --
>>>>> 2.7.4
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev at openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
>>
More information about the dev
mailing list