[ovs-dev] [RFC PATCH ovn] Introduce representor port plugging support
frode.nordahl at canonical.com
Thu Jun 10 14:13:01 UTC 2021
On Thu, Jun 10, 2021 at 1:46 PM Ilya Maximets <i.maximets at ovn.org> wrote:
> On 6/10/21 8:36 AM, Han Zhou wrote:
> > On Thu, May 13, 2021 at 9:25 AM Frode Nordahl <frode.nordahl at canonical.com <mailto:frode.nordahl at canonical.com>> wrote:
> >> On Thu, May 13, 2021 at 5:12 PM Ilya Maximets <i.maximets at ovn.org <mailto:i.maximets at ovn.org>> wrote:
> >> >
> >> > On 5/9/21 4:03 PM, Frode Nordahl wrote:
> >> > > Introduce plugging module that adds and removes ports on the
> >> > > integration bridge, as directed by Port_Binding options.
> >> > >
> >> > > Traditionally it has been the CMSs responsibility to create Virtual
> >> > > Interfaces (VIFs) as part of instance (Container, Pod, Virtual
> >> > > Machine etc.) life cycle, and subsequently manage plug/unplug
> >> > > operations on the Open vSwitch integration bridge.
> >> > >
> >> > > With the advent of NICs connected to multiple distinct CPUs we can
> >> > > have a topology where the instance runs on one host and Open
> >> > > vSwitch and OVN runs on a different host, the smartnic CPU.
> >> > >
> >> > > The act of plugging and unplugging the representor port in Open
> >> > > vSwitch running on the smartnic host CPU would be the same for
> >> > > every smartnic variant (thanks to the devlink-port
> >> > > infrastructure) and every CMS (Kubernetes, LXD, OpenStack, etc.).
> >> > > As such it is natural to extend OVN to provide this common
> >> > > functionality through its CMS facing API.
> >> >
> >> > Hi, Frode. Thanks for putting this together, but it doesn't look
> >> > natural to me. OVN, AFAIK, never touched physical devices or
> >> > interacted with the kernel directly. This change introduces completely
> >> > new functionality inside OVN. With the same effect we can run a fully
> >> > separate service on these smartnic CPUs that will do plugging
> >> > and configuration job for CMS. You may even make it independent
> >> > from a particular CMS by creating a REST API for it or whatever.
> >> > This will additionally allow using same service for non-OVN setups.
> >> Ilya,
> >> Thank you for taking the time to comment, much appreciated.
> >> Yes, this is new functionality, NICs with separate control plane CPUs
> >> and isolation from the host are also new, so this is one proposal for
> >> how we could go about to enable the use of them.
> >> The OVN controller does today get pretty close to the physical realm
> >> by maintaining patch ports in Open vSwitch based on bridge mapping
> >> configuration and presence of bridges to physical interfaces. It also
> >> does react to events of physical interfaces being plugged into the
> >> Open vSwitch instance it manages, albeit to date some other entity has
> >> been doing the act of adding the port into the bridge.
> >> The rationale for proposing to use the OVN database for coordinating
> >> this is that the information about which ports to bind, and where to
> >> bind them is already there. The timing of the information flow from
> >> the CMS is also suitable for the task.
> >> OVN relies on OVS library code, and all the necessary libraries for
> >> interfacing with the kernel through netlink and friends are there or
> >> would be easy to add. The rationale for using the netlink-devlink
> >> interface is that it provides a generic infrastructure for these types
> >> of NICs. So by using this interface we should be able to support most
> >> if not all of the variants of these cards.
> >> Providing a separate OVN service to do the task could work, but would
> >> have the cost of an extra SB DB connection, IDL and monitors.
> IMHO, CMS should never connect to Southbound DB. It's just because the
> Southbound DB is not meant to be a public interface, it just happened
> to be available for connections. I know that OpenStack has metadata
> agents that connects to Sb DB, but if it's really required for them, I
> think, there should be a different way to get/set required information
> without connection to the Southbound.
The CMS-facing API is the Northbound DB, I was not suggesting direct
use of the Southbound DB by external to OVN services. My suggestion
was to have a separate OVN process do this if your objection was to
handle it as part of the ovn-controller process.
> >> I fear it would be quite hard to build a whole separate project with
> >> its own API, feels like a lot of duplicated effort when the flow of
> >> data and APIs in OVN already align so well with CMSs interested in
> >> using this?
> >> > Interactions with physical devices also makes OVN linux-dependent
> >> > at least for this use case, IIUC.
> >> This specific bit would be linux-specific in the first iteration, yes.
> >> But the vendors manufacturing and distributing the hardware do often
> >> have drivers for other platforms, I am sure the necessary
> >> infrastructure will become available there too over time, if it is not
> >> there already.
> >> We do currently have platform specific macros in the OVN build system,
> >> so we could enable the functionality when built on a compatible
> >> platform.
> >> > Maybe, others has different opinions.
> >> I appreciate your opinion, and enjoy discussing this topic.
> >> > Another though is that there is, obviously, a network connection
> >> > between the host and smartnic system. Maybe it's possible to just
> >> > add an extra remote to the local ovsdb-server so CMS daemon on the
> >> > host system could just add interfaces over the network connection?
> >> There are a few issues with such an approach. One of the main goals
> >> with providing and using a NIC with control plane CPUs is having an
> >> extra layer of security and isolation which is separate from the
> >> hypervisor host the card happens to share a PCI complex with and draw
> >> power from. Requiring a connection between the two for operation would
> >> defy this purpose.
> >> In addition to that, this class of cards provide visibility into
> >> kernel interfaces, enumeration of representor ports etc. only from the
> >> NIC control plane CPU side of the PCI complex, this information is not
> >> provided to the host. So if a hypervisor host CMS agent were to do the
> >> plugging through a remote ovsdb connection, it would have to
> >> communicate with something else running on the NIC control plane CPU
> >> to retrieve the information it needs before it can know what to relay
> >> back over the ovsdb connection.
> >> --
> >> Frode Nordahl
> >> > Best regards, Ilya Maximets.
> > Here are my 2 cents.
> > Initially I had similar concerns to Ilya, and it seems OVN should stay away from the physical interface plugging. As a reference, here is how ovn-kubernetes is doing it without adding anything to OVN: https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing <https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing>
> AFAICT, a big part of the work is already done on the ovn-k8s side:
I am aware of the on-going effort to implement support for this in
ovn-kubernetes directly. What we have identified is that there are
other CMSs that want this functionality, and with that we have an
opportunity to generalise and provide an abstraction in a common place
that all the consuming CMSes can benefit from.
> > However, thinking more about it, the proposed approach in this patch just expands the way how OVN can bind ports, utilizing the communication channel of OVN (OVSDB connections). If all the information regarding port binding can be specified by the CMS from NB, then it is not unnatural for ovn-controller to perform interface binding directly (instead of passively accepting what is attached by CMS). This kind of information already existed to some extent - the "requested_chassis" option in OpenStack. Now it seems this idea is just expanding it to a specific interface. The difference is that "requested_chassis" is used for validation only, but now we want to directly apply it. So I think at least I don't have a strong opinion on the idea.
> While it's, probably, OK for OVN to add port to the OVSDB, in many
> cases these ports will require a lot of extra configuration which
> is typically done by os-vif or CNI/device plugins. Imagine that OVS
> is running with userspace datapath and you need to plug-in some DPDK
> ports, where you have to specify the port type, DPDK port config,
> porbably also number of rx/tx queues, number of descriptors in these
> queues. You may also want to configure affinity for these queues per
> PMD thread in OVS. For kernel interfaces it might be easier, but they
> also might require some extra configuration that OVN will have to
> think about now. This is a typical job for CMS to configure this
> kind of stuff, and that is why projects like os-vif or large variety
> of CNI/device plugins exists.
CNI is Kubernetes specific, os-vif is OpenStack specific. And both of
them get their information from the CMS. Providing support for
userspace datapath and DPDK would require more information, some of
which is available through devlink, some fit well in key/value
options. Our initial target would be to support the kernel representor
> > There are some benefits:
> > 1) The mechanism can be reused by different CMSes, which may simplify CMS implementation.
> > 2) Compared with the ovn-k8s approach, it reuses OVN's communication channel, which avoids an extra CMS communication channel on the smart NIC side. (of course this can be achieved by a connection between the BM and smart NIC with *restricted* API just to convey the necessary information)
> The problem I see is that at least ovn-k8s, AFAIU, will require
> the daemon on the Smart NIC anyways to monitor and configure
> OVN components, e.g. to configure ovn-remote for ovn-controller
> or run management appctl commands if required.
> So, I don't see the point why it can't do plugging if it's already
This pattern is Kubernetes specific and is not the case for other
CMSes. The current proposal for enabling Smart NIC with control plane
CPUs for ovn-kubernetes could be simplified if the networking platform
provided means for coordinating more of the network related bits.
> > As to the negative side, it would increase OVN's complexity, and as mentioned by Ilya potentially breaks OVN's platform independence. To avoid this, I think the *plugging* module itself needs to be independent and pluggable. It can be extended as independent plugins. The plugin would need to define what information is needed in LSP's "options", and then implement corresponding drivers. With this approach, even the regular VIFs can be attached by ovn-controller if CMS can tell the interface name. Anyway, this is just my brief thinking.
> Aside from plugging,
> I also don't see the reason to have devlink code in OVN just
> because it runs once on the init stage, IIUC. So, I don't
> understand why this information about the hardware cannot be
> retrieved during the host provisioning and stored somewhere
> (Nova config?). Isn't hardware inventory a job for tripleo
> or something?
As noted in one of the TODOs in the commit message of the RFC one of
the work items to further develop this is to monitor devlink for
changes, the end product will not do one-time initialization.
While I agree with you that the current SR-IOV acceleration workflow
configuration is pretty static and can be done at deploy time, this
proposal prepares for the next generation subfunction workflow where
you will have a much higher density, and run-time configuration and
discovery of representor ports. There is a paper about it from netdev
conf, and all of this is part of the devlink infrastructure (Linux
> Best regards, Ilya Maximets.
More information about the dev