[ovs-dev] [PATCH 2/2] netdev-dpdk: add support for vhost IOMMU feature

Mooney, Sean K sean.k.mooney at intel.com
Mon Nov 27 14:25:03 UTC 2017



> -----Original Message-----
> From: Kevin Traynor [mailto:ktraynor at redhat.com]
> Sent: Monday, November 27, 2017 1:32 PM
> To: Kavanagh, Mark B <mark.b.kavanagh at intel.com>; Mooney, Sean K
> <sean.k.mooney at intel.com>; Jan Scheurich <jan.scheurich at ericsson.com>;
> dev at openvswitch.org
> Cc: maxime.coquelin at redhat.com; Flavio Leitner <fbl at redhat.com>; Franck
> Baudin <fbaudin at redhat.com>; Ilya Maximets <i.maximets at samsung.com>;
> Stokes, Ian <ian.stokes at intel.com>; Loftus, Ciara
> <ciara.loftus at intel.com>; Darrell Ball <dball at vmware.com>; Aaron Conole
> <aconole at redhat.com>
> Subject: Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: add support for vhost
> IOMMU feature
> 
> On 11/27/2017 12:55 PM, Kavanagh, Mark B wrote:
> >
> >
> >> From: Mooney, Sean K
> >> Sent: Sunday, November 26, 2017 11:28 PM
> >> To: Kavanagh, Mark B <mark.b.kavanagh at intel.com>; Jan Scheurich
> >> <jan.scheurich at ericsson.com>; Kevin Traynor <ktraynor at redhat.com>;
> >> dev at openvswitch.org
> >> Cc: maxime.coquelin at redhat.com; Flavio Leitner <fbl at redhat.com>;
> >> Franck Baudin <fbaudin at redhat.com>; Ilya Maximets
> >> <i.maximets at samsung.com>; Stokes, Ian <ian.stokes at intel.com>;
> Loftus,
> >> Ciara <ciara.loftus at intel.com>; Darrell Ball <dball at vmware.com>;
> >> Aaron Conole <aconole at redhat.com>; Mooney, Sean K
> >> <sean.k.mooney at intel.com>
> >> Subject: RE: [ovs-dev] [PATCH 2/2] netdev-dpdk: add support for
> vhost
> >> IOMMU feature
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: Kavanagh, Mark B
> >>> Sent: Friday, November 24, 2017 4:45 PM
> >>> To: Jan Scheurich <jan.scheurich at ericsson.com>; Kevin Traynor
> >>> <ktraynor at redhat.com>; dev at openvswitch.org
> >>> Cc: maxime.coquelin at redhat.com; Flavio Leitner <fbl at redhat.com>;
> >>> Franck Baudin <fbaudin at redhat.com>; Mooney, Sean K
> >>> <sean.k.mooney at intel.com>; Ilya Maximets <i.maximets at samsung.com>;
> >>> Stokes, Ian <ian.stokes at intel.com>; Loftus, Ciara
> >>> <ciara.loftus at intel.com>; Darrell Ball <dball at vmware.com>; Aaron
> >>> Conole <aconole at redhat.com>
> >>> Subject: RE: [ovs-dev] [PATCH 2/2] netdev-dpdk: add support for
> >>> vhost IOMMU feature
> >>>
> >>> Sounds good guys - I'll get cracking on this on Monday.
> >>>
> >>> Cheers,
> >>> Mark
> >>>
> >>>> -----Original Message-----
> >>>> From: Jan Scheurich [mailto:jan.scheurich at ericsson.com]
> >>>> Sent: Friday, November 24, 2017 4:21 PM
> >>>> To: Kevin Traynor <ktraynor at redhat.com>; Kavanagh, Mark B
> >>>> <mark.b.kavanagh at intel.com>; dev at openvswitch.org
> >>>> Cc: maxime.coquelin at redhat.com; Flavio Leitner <fbl at redhat.com>;
> >>> Franck
> >>>> Baudin <fbaudin at redhat.com>; Mooney, Sean K
> >>>> <sean.k.mooney at intel.com>; Ilya Maximets <i.maximets at samsung.com>;
> >>>> Stokes, Ian <ian.stokes at intel.com>; Loftus, Ciara
> >>>> <ciara.loftus at intel.com>;
> >>> Darrell
> >>>> Ball <dball at vmware.com>; Aaron Conole <aconole at redhat.com>
> >>>> Subject: RE: [ovs-dev] [PATCH 2/2] netdev-dpdk: add support for
> >>>> vhost IOMMU feature
> >>>>
> >>>> +1
> >>>> Jan
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Kevin Traynor [mailto:ktraynor at redhat.com]
> >>>>> Sent: Friday, 24 November, 2017 17:11
> >>>>> To: Mark Kavanagh <mark.b.kavanagh at intel.com>;
> dev at openvswitch.org
> >>>>> Cc: maxime.coquelin at redhat.com; Flavio Leitner <fbl at redhat.com>;
> >>>>> Franck
> >>>> Baudin <fbaudin at redhat.com>; Mooney, Sean K
> >>>>> <sean.k.mooney at intel.com>; Ilya Maximets
> <i.maximets at samsung.com>;
> >>>>> Ian
> >>>> Stokes <ian.stokes at intel.com>; Loftus, Ciara
> >>>>> <ciara.loftus at intel.com>; Darrell Ball <dball at vmware.com>; Aaron
> >>>>> Conole
> >>>> <aconole at redhat.com>; Jan Scheurich
> >>>>> <jan.scheurich at ericsson.com>
> >>>>> Subject: Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: add support for
> >>> vhost
> >>>>> IOMMU
> >>>> feature
> >>>>>
> >>>>> On 11/16/2017 11:01 AM, Mark Kavanagh wrote:
> >>>>>> DPDK v17.11 introduces support for the vHost IOMMU feature.
> >>>>>> This is a security feature, that restricts the vhost memory that
> >>>>>> a virtio device may access.
> >>>>>>
> >>>>>> This feature also enables the vhost REPLY_ACK protocol, the
> >>>>>> implementation of which is known to work in newer versions of
> >>>>>> QEMU (i.e. v2.10.0), but is buggy in older versions (v2.7.0 -
> >>>>>> v2.9.0, inclusive). As such, the feature is disabled by default
> >>>>>> in (and should remain so, for the aforementioned older QEMU
> verions).
> >>>>>> Starting with QEMU v2.9.1, vhost-iommu-support can safely be
> >>>>>> enabled, even without having an IOMMU device, with no
> performance
> >>>>>> penalty.
> >>>>>>
> >>>>>> This patch adds a new vhost port option, vhost-iommu-support, to
> >>>>>> allow enablement of the vhost IOMMU feature:
> >>>>>>
> >>>>>>     $ ovs-vsctl add-port br0 vhost-client-1 \
> >>>>>>         -- set Interface vhost-client-1 type=dpdkvhostuserclient
> \
> >>>>>>              options:vhost-server-path=$VHOST_USER_SOCKET_PATH
> \
> >>>>>>              options:vhost-iommu-support=true
> >>>>>>
> >>>>>
> >>>>> Hi Mark, All,
> >>>>>
> >>>>> I'm thinking about this and whether the current approach provides
> >>>>> more than what is actually needed by users at the cost of making
> >>>>> the user interface more complex.
> >> [Mooney, Sean K] I am personally split on this. To enable iommu
> >> support in openstack with the above interface we would have to do
> two
> >> things. 1 extend the image metatdata or flavor extra specs to allow
> >> requesting a vIOMMU. Second we would have to modify os-vif to
> produce
> >> the add-port command above. Using this interfaces gives us a nice
> >> parity in our api as we only enable iommu support in ovs if we
> enable
> >> for qemu. E.g. if the falvor/image does not request a virtualized
> >> iommu we wont declare its support in the options.
> >>>>>
> >>>>> As an alternative, how about having a global other_config (to be
> >>>>> set like vhost-socket-dir can be) for this instead of having to
> >>>>> set it for each individual interface. It would mean that it would
> >>>>> only have to be set once, instead of having this (ugly?!) option
> >>>>> every time a vhost port is added, so it's a less intrusive change
> >>>>> and I can't really think that a user would require to do this per
> >>>>> vhostclient
> >> [Mooney, Sean K]  well one taught that instantly comes to mind is if
> >> I set The global other_config option what happens if I do not
> request
> >> a iommu in qemu Or I have an old qemu.  If it would have any
> negative
> >> effect on network connectivity Or prevent the vm from functioning,
> >> that would require the nova scheduler to be Aware of which node had
> >> this option set and take that into account when placing vms.
> >> I assume if it had no negative effects  then we would not need a
> >> option, global or per Interface.
> >>>>> interface??? It's pain to have to add this at all for a bug in
> >>>>> QEMU and I'm sure in 1/2/3 years time someone will say that users
> >>>>> could still be using QEMU < 2.9.1 and we can't remove it, so it
> >>>>> would be nice to keep it as discreet as possible as we're going
> to
> >>>>> be stuck
> >>> with it for a while.
> >>>>>
> >>>>> I assume that a user would only use one version of QEMU at a time
> >>> and
> >>>>> would either want or not want this feature. In the worst case, if
> >>>>> that proved completely wrong in the future, then a per interface
> >>>>> override could easily be added. Once there's a way to maintain
> >>>>> backwards compatibility (which there would be) I'd rather err on
> >>>>> the side of introducing just enough enough functionality over
> >>>>> increasing complexity for the user.
> >>>>>
> >>>>> What do you think?
> >> [Mooney, Sean K] I'm not sure that a single qemu version is a valid
> >> assumption to make.
> 
> Hi Sean, the solution in the current patch is effectively to allow
> iommu/different QEMU versions on a *per port* basis. I am assuming that
> level of granularity is not needed and instead proposing to allow
> iommu/different QEMU versions on a *per OVS instance* basis. It's not
> making any assumption about QEMU versions across multiple nodes.
> 
> >> At least in an OpenStack context where rolling upgrades are a thing.
> >> But you are right That it will be less common however if it was no
> >> existent we would not have the issue with Live migration between
> >> nodes with different feature sets that is also trying to be
> addressed
> >> this Cycle. If we add a global config option for iommu support that
> >> is yet another thing that needs To be accounted for during live
> >> migration.
> 
> Yes that's true, but it's difficult problem anyway and I don't think an
> additional binary field would make it much more complex. The
> alternative is never allow the iommu feature to be enabled, or drop
> support for QEMU versions < 2.9.1. It's default off, so you should have
> backwards compatibility at least.
> 
> >>>>>
> >
> > Hi Kevin, Jan,
> >
> > Any thoughts on Sean's concerns?
> >
> 
> Sean can confirm, but my reading is that Sean's primary concern is with
> the proposal to add any config option to enable iommu - not with the
> proposal to change from a per port to per OVS basis.
[Mooney, Sean K] so I had a quick chat with mark about the background to this.
My current understanding is that form qemu 2.6-2.9 there is a bug that causes the
Guest to hang due to a lack of support for a required ack to correctly negotiate the
Use of the iommu feature. As a result a config option either global or per interfaces is
Required. As an aside OpenStack does not currently support requesting iommu virtualization so 
I don’t see why we could not put a requirement on our side that you use qemu 2.9.1 or newer
To avail of that feature. Going back to ovs then assuming we used a single global config vaule
In for example the other_config dictionary of the Open_vSwitch table in the ovsdb.
e.g. ovs-vsctl set Open_vswitch . other_config:vhost-iommu-support=true
I would be fine with that approach if we can confirm 1 thing first.

Assuming qemu > 3.9.1 + other_config:vhost-iommu-support=true it is possible
To boot a vm with and without "-device intel-iommu,intremap=on,eim=off,caching-mode=on"
Specified on the qemu commandline.

That is to say assuming all software versions have support for this feature,
if we globally enable vhost iommu support negotiation in ovs, and we spawn a
Vm with or with a virtualized iommu then we it will spawn correctly with network connectivity
In both cases. 

If we cannot assert the above to be true I would prefer to have the per interface option though
I tend to agree that since this is just working around a qemu but the global option is less intrusive
Simpler to removed long term. For example if ovs chooses in the future to drop support for
Qemu < 3.9.1 it can deprecate suprt for old qemu and the other_config:vhost-iommu-support option changing
Its default from false to true in release n. then in release n+1 I can drop support for old qemu versions
and remove the other_config:vhost-iommu-support option entirely. 


> 
> Kevin.
> 
> > I'll hold off on implementation until we have consensus.
> >
> > Thanks,
> > Mark
> >
> >
> >
> >
> >>>>> thanks,
> >>>>> Kevin.



More information about the dev mailing list