[ovs-dev] [RFC PATCH ovn] Introduce representor port plugging support

Han Zhou zhouhan at gmail.com
Tue Jul 6 18:11:24 UTC 2021


On Tue, Jul 6, 2021 at 1:19 AM Frode Nordahl <frode.nordahl at canonical.com>
wrote:
>
> On Mon, Jul 5, 2021 at 6:57 PM Numan Siddique <numans at ovn.org> wrote:
> >
> > On Mon, Jul 5, 2021 at 12:12 PM Frode Nordahl
> > <frode.nordahl at canonical.com> wrote:
> > >
> > > On Wed, Jun 30, 2021 at 12:32 AM Numan Siddique <numans at ovn.org>
wrote:
> > > >
> > > > On Thu, Jun 10, 2021 at 10:13 AM Frode Nordahl
> > > > <frode.nordahl at canonical.com> wrote:
> > > > >
> > > > > On Thu, Jun 10, 2021 at 1:46 PM Ilya Maximets <i.maximets at ovn.org>
wrote:
> > > > > >
> > > > > > On 6/10/21 8:36 AM, Han Zhou wrote:
> > > > > > >
> > > > > > >
> > > > > > > On Thu, May 13, 2021 at 9:25 AM Frode Nordahl <
frode.nordahl at canonical.com <mailto:frode.nordahl at canonical.com>> wrote:
> > > > > > >>
> > > > > > >> On Thu, May 13, 2021 at 5:12 PM Ilya Maximets <
i.maximets at ovn.org <mailto:i.maximets at ovn.org>> wrote:
> > > > > > >> >
> > > > > > >> > On 5/9/21 4:03 PM, Frode Nordahl wrote:
> > > > > > >> > > Introduce plugging module that adds and removes ports on
the
> > > > > > >> > > integration bridge, as directed by Port_Binding options.
> > > > > > >> > >
> > > > > > >> > > Traditionally it has been the CMSs responsibility to
create Virtual
> > > > > > >> > > Interfaces (VIFs) as part of instance (Container, Pod,
Virtual
> > > > > > >> > > Machine etc.) life cycle, and subsequently manage
plug/unplug
> > > > > > >> > > operations on the Open vSwitch integration bridge.
> > > > > > >> > >
> > > > > > >> > > With the advent of NICs connected to multiple distinct
CPUs we can
> > > > > > >> > > have a topology where the instance runs on one host and
Open
> > > > > > >> > > vSwitch and OVN runs on a different host, the smartnic
CPU.
> > > > > > >> > >
> > > > > > >> > > The act of plugging and unplugging the representor port
in Open
> > > > > > >> > > vSwitch running on the smartnic host CPU would be the
same for
> > > > > > >> > > every smartnic variant (thanks to the devlink-port[0][1]
> > > > > > >> > > infrastructure) and every CMS (Kubernetes, LXD,
OpenStack, etc.).
> > > > > > >> > > As such it is natural to extend OVN to provide this
common
> > > > > > >> > > functionality through its CMS facing API.
> > > > > > >> >
> > > > > > >> > Hi, Frode.  Thanks for putting this together, but it
doesn't look
> > > > > > >> > natural to me.  OVN, AFAIK, never touched physical devices
or
> > > > > > >> > interacted with the kernel directly.  This change
introduces completely
> > > > > > >> > new functionality inside OVN.  With the same effect we can
run a fully
> > > > > > >> > separate service on these smartnic CPUs that will do
plugging
> > > > > > >> > and configuration job for CMS.  You may even make it
independent
> > > > > > >> > from a particular CMS by creating a REST API for it or
whatever.
> > > > > > >> > This will additionally allow using same service for
non-OVN setups.
> > > > > > >>
> > > > > > >> Ilya,
> > > > > > >>
> > > > > > >> Thank you for taking the time to comment, much appreciated.
> > > > > > >>
> > > > > > >> Yes, this is new functionality, NICs with separate control
plane CPUs
> > > > > > >> and isolation from the host are also new, so this is one
proposal for
> > > > > > >> how we could go about to enable the use of them.
> > > > > > >>
> > > > > > >> The OVN controller does today get pretty close to the
physical realm
> > > > > > >> by maintaining patch ports in Open vSwitch based on bridge
mapping
> > > > > > >> configuration and presence of bridges to physical
interfaces. It also
> > > > > > >> does react to events of physical interfaces being plugged
into the
> > > > > > >> Open vSwitch instance it manages, albeit to date some other
entity has
> > > > > > >> been doing the act of adding the port into the bridge.
> > > > > > >>
> > > > > > >> The rationale for proposing to use the OVN database for
coordinating
> > > > > > >> this is that the information about which ports to bind, and
where to
> > > > > > >> bind them is already there. The timing of the information
flow from
> > > > > > >> the CMS is also suitable for the task.
> > > > > > >>
> > > > > > >> OVN relies on OVS library code, and all the necessary
libraries for
> > > > > > >> interfacing with the kernel through netlink and friends are
there or
> > > > > > >> would be easy to add. The rationale for using the
netlink-devlink
> > > > > > >> interface is that it provides a generic infrastructure for
these types
> > > > > > >> of NICs. So by using this interface we should be able to
support most
> > > > > > >> if not all of the variants of these cards.
> > > > > > >>
> > > > > > >>
> > > > > > >> Providing a separate OVN service to do the task could work,
but would
> > > > > > >> have the cost of an extra SB DB connection, IDL and monitors.
> > > > > >
> > > > > > IMHO, CMS should never connect to Southbound DB.  It's just
because the
> > > > > > Southbound DB is not meant to be a public interface, it just
happened
> > > > > > to be available for connections.  I know that OpenStack has
metadata
> > > > > > agents that connects to Sb DB, but if it's really required for
them, I
> > > > > > think, there should be a different way to get/set required
information
> > > > > > without connection to the Southbound.
> > > > >
> > > > > The CMS-facing API is the Northbound DB, I was not suggesting
direct
> > > > > use of the Southbound DB by external to OVN services. My
suggestion
> > > > > was to have a separate OVN process do this if your objection was
to
> > > > > handle it as part of the ovn-controller process.
> > > > >
> > > > > > >>
> > > > > > >> I fear it would be quite hard to build a whole separate
project with
> > > > > > >> its own API, feels like a lot of duplicated effort when the
flow of
> > > > > > >> data and APIs in OVN already align so well with CMSs
interested in
> > > > > > >> using this?
> > > > > > >>
> > > > > > >> > Interactions with physical devices also makes OVN
linux-dependent
> > > > > > >> > at least for this use case, IIUC.
> > > > > > >>
> > > > > > >> This specific bit would be linux-specific in the first
iteration, yes.
> > > > > > >> But the vendors manufacturing and distributing the hardware
do often
> > > > > > >> have drivers for other platforms, I am sure the necessary
> > > > > > >> infrastructure will become available there too over time, if
it is not
> > > > > > >> there already.
> > > > > > >>
> > > > > > >> We do currently have platform specific macros in the OVN
build system,
> > > > > > >> so we could enable the functionality when built on a
compatible
> > > > > > >> platform.
> > > > > > >>
> > > > > > >> > Maybe, others has different opinions.
> > > > > > >>
> > > > > > >> I appreciate your opinion, and enjoy discussing this topic.
> > > > > > >>
> > > > > > >> > Another though is that there is, obviously, a network
connection
> > > > > > >> > between the host and smartnic system.  Maybe it's possible
to just
> > > > > > >> > add an extra remote to the local ovsdb-server so CMS
daemon on the
> > > > > > >> > host system could just add interfaces over the network
connection?
> > > > > > >>
> > > > > > >> There are a few issues with such an approach. One of the
main goals
> > > > > > >> with providing and using a NIC with control plane CPUs is
having an
> > > > > > >> extra layer of security and isolation which is separate from
the
> > > > > > >> hypervisor host the card happens to share a PCI complex with
and draw
> > > > > > >> power from. Requiring a connection between the two for
operation would
> > > > > > >> defy this purpose.
> > > > > > >>
> > > > > > >> In addition to that, this class of cards provide visibility
into
> > > > > > >> kernel interfaces, enumeration of representor ports etc.
only from the
> > > > > > >> NIC control plane CPU side of the PCI complex, this
information is not
> > > > > > >> provided to the host. So if a hypervisor host CMS agent were
to do the
> > > > > > >> plugging through a remote ovsdb connection, it would have to
> > > > > > >> communicate with something else running on the NIC control
plane CPU
> > > > > > >> to retrieve the information it needs before it can know what
to relay
> > > > > > >> back over the ovsdb connection.
> > > > > > >>
> > > > > > >> --
> > > > > > >> Frode Nordahl
> > > > > > >>
> > > > > > >> > Best regards, Ilya Maximets.
> > > > > > >
> > > > > > > Here are my 2 cents.
> > > > > > >
> > > > > > > Initially I had similar concerns to Ilya, and it seems OVN
should stay away from the physical interface plugging. As a reference, here
is how ovn-kubernetes is doing it without adding anything to OVN:
https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing
<
https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing
>
> > > > > >
> > > > > > AFAICT, a big part of the work is already done on the ovn-k8s
side:
> > > > > >   https://github.com/ovn-org/ovn-kubernetes/pull/2005
> > > > > >   https://github.com/ovn-org/ovn-kubernetes/pull/2042
> > > > >
> > > > > I am aware of the on-going effort to implement support for this in
> > > > > ovn-kubernetes directly. What we have identified is that there are
> > > > > other CMSs that want this functionality, and with that we have an
> > > > > opportunity to generalise and provide an abstraction in a common
place
> > > > > that all the consuming CMSes can benefit from.
> > > > >
> > > > > > >
> > > > > > > However, thinking more about it, the proposed approach in
this patch just expands the way how OVN can bind ports, utilizing the
communication channel of OVN (OVSDB connections). If all the information
regarding port binding can be specified by the CMS from NB, then it is not
unnatural for ovn-controller to perform interface binding directly (instead
of passively accepting what is attached by CMS). This kind of information
already existed to some extent - the "requested_chassis" option in
OpenStack. Now it seems this idea is just expanding it to a specific
interface. The difference is that "requested_chassis" is used for
validation only, but now we want to directly apply it. So I think at least
I don't have a strong opinion on the idea.
> > > > > >
> > > > > > While it's, probably, OK for OVN to add port to the OVSDB, in
many
> > > > > > cases these ports will require a lot of extra configuration
which
> > > > > > is typically done by os-vif or CNI/device plugins.  Imagine
that OVS
> > > > > > is running with userspace datapath and you need to plug-in some
DPDK
> > > > > > ports, where you have to specify the port type, DPDK port
config,
> > > > > > porbably also number of rx/tx queues, number of descriptors in
these
> > > > > > queues.  You may also want to configure affinity for these
queues per
> > > > > > PMD thread in OVS.  For kernel interfaces it might be easier,
but they
> > > > > > also might require some extra configuration that OVN will have
to
> > > > > > think about now.  This is a typical job for CMS to configure
this
> > > > > > kind of stuff, and that is why projects like os-vif or large
variety
> > > > > > of CNI/device plugins exists.
> > > > >
> > > > > CNI is Kubernetes specific, os-vif is OpenStack specific. And
both of
> > > > > them get their information from the CMS. Providing support for
> > > > > userspace datapath and DPDK would require more information, some
of
> > > > > which is available through devlink, some fit well in key/value
> > > > > options. Our initial target would be to support the kernel
representor
> > > > > port workflow.
> > > > >
> > > > > > >
> > > > > > > There are some benefits:
> > > > > > > 1) The mechanism can be reused by different CMSes, which may
simplify CMS implementation.
> > > > > > > 2) Compared with the ovn-k8s approach, it reuses OVN's
communication channel, which avoids an extra CMS communication channel on
the smart NIC side. (of course this can be achieved by a connection between
the BM and smart NIC with *restricted* API just to convey the necessary
information)
> > > > > >
> > > > > > The problem I see is that at least ovn-k8s, AFAIU, will require
> > > > > > the daemon on the Smart NIC anyways to monitor and configure
> > > > > > OVN components, e.g. to configure ovn-remote for ovn-controller
> > > > > > or run management appctl commands if required.
> > > > > > So, I don't see the point why it can't do plugging if it's
already
> > > > > > there.
> > > > >
> > > > > This pattern is Kubernetes specific and is not the case for other
> > > > > CMSes. The current proposal for enabling Smart NIC with control
plane
> > > > > CPUs for ovn-kubernetes could be simplified if the networking
platform
> > > > > provided means for coordinating more of the network related bits.
> > > > >
> > > > > > >
> > > > > > > As to the negative side, it would increase OVN's complexity,
and as mentioned by Ilya potentially breaks OVN's platform independence. To
avoid this, I think the *plugging* module itself needs to be independent
and pluggable. It can be extended as independent plugins. The plugin would
need to define what information is needed in LSP's "options", and then
implement corresponding drivers. With this approach, even the regular VIFs
can be attached by ovn-controller if CMS can tell the interface name.
Anyway, this is just my brief thinking.
> > > > > >
> > > > > > Aside from plugging,
> > > > > > I also don't see the reason to have devlink code in OVN just
> > > > > > because it runs once on the init stage, IIUC.  So, I don't
> > > > > > understand why this information about the hardware cannot be
> > > > > > retrieved during the host provisioning and stored somewhere
> > > > > > (Nova config?).  Isn't hardware inventory a job for tripleo
> > > > > > or something?
> > > > >
> > > > > As noted in one of the TODOs in the commit message of the RFC one
of
> > > > > the work items to further develop this is to monitor devlink for
> > > > > changes, the end product will not do one-time initialization.
> > > > >
> > > > > While I agree with you that the current SR-IOV acceleration
workflow
> > > > > configuration is pretty static and can be done at deploy time,
this
> > > > > proposal prepares for the next generation subfunction workflow
where
> > > > > you will have a much higher density, and run-time configuration
and
> > > > > discovery of representor ports. There is a paper about it from
netdev
> > > > > conf[2], and all of this is part of the devlink infrastructure
(Linux
> > > > > kernel ABI).
> > > > >
> > > > > 2:
https://netdevconf.info/0x14/pub/papers/45/0x14-paper45-talk-paper.pdf
> > >
> > > > My 2 cents.
> > >
> > > Thank you for weighing in with your opinion, Numan.
> > >
> > > > I agree with Ilya.  It doesn't seem natural to me for OVN to create
> > > > OVS ports and also
> > > > to use devlink infrastructure.  I think an external entity can do
all these.
> > >
> > > No doubt an external entity could do this, the challenge is that due
> > > to how the SmartNIC with separate control plane CPUs are laid out
> > > existing CMS tooling cannot do it, and we are working out a way to
> > > efficiently and securely coordinate how and when this external entity
> > > does the plugging. At the end of the day the data being passed around
> > > is networking related, and I believe it is possible to create a
> > > solution that would be consumable by all OVN-enabled CMS's. We'd be
> > > happy to prove this out outside of the ovn-controller first if that is
> > > more palatable, see below.
> > >
> > > > The other downside I see with this RFC is that it will be now
> > > > monitoring all the port binding
> > > > rows without any condition because when a port binding is created
the
> > > > chassis column will
> > > > be NULL.  Correct me If I'm wrong here.
> > >
> > > The intention of the condition is to consider unbound ports for
> > > plugging, as soon as they have been plugged and bound there will be a
> > > value in the chassis column and at that point only the respective
> > > chassis will monitor them. I guess this could be a problem if there is
> > > a large number of unbound ports in a deployment, from your experience
> > > would that be the case? (When I think of it I do remember the load
> > > balancer vip perhaps being a consumer of unbound ports).
> > >
> > > We considered using a separate LSP type, but apart from the plugging
> > > bit these LSP's will behave just like regular VIFs from OVN Port
> > > Binding code POV, and if all your hypervisors have SmartNICs with
> > > control plane CPUs there would be just as many of those ports.
> >
> > In my opinion having a separate type would be better.  At the least it
> > will not impact
> > normal or existing deployments.
> >
> > Let's say we call a type as "representor"  (just picking a name for
> > now), then each ovn-controller
> > can add a conditional monitor to get updates for these types (just
> > like how we do now for patch or external ports).
> >
> > What do you think of the below approach
> >
> > 1.  A new LSP type - representor is added.  CMS when creating this
> > lport also sets the requested-chassis option.
>
> Works for me.
>
I think it is better not to add a new LSP type for this purpose. It would
be better to make this "interface plugging" feature generic, so that it can
work for other types as well. It could be supported in the future for a
regular VIF, or for a new LSP type, so I think this behavior is better not
to be tied to a LSP type.

We could add a new column in the Port_Binding table: plugged_by.
ovn-controller can monitor all port_bindings that have "plugged_by" equals
to current chassis. This way it should be backward compatible and not
affecting the use cases that doesn't need "plugged_by".

The details of plugging information can be stored in the options column
with new options defined, such as: "plug_type". For representor port the
"plug_type" can be defined as "representor". For each plug_type there can
be type-specific information defined and also stored in the options column.
(Or we can define a new column "plug_options" just for plugging
information).

> > 2. ovn-controller on the requested chassis sees this lport type,
> > creates an OVS port/interface and possibly sets some options or hints
> > in the external_ids column
> >  of ovs interface.  At this point, it will not do anything.
>
> This could potentially help us avoid the issue of adding more
> processes and IDLs talking to the SB DB.
>
> However, at this point in time, the ovn-controller does not know the
> details of the representor port, such as name/ifindex for the kernel
> representor port, and the interface it creates in OVS has to match the
> kernel representor port, and afict it is not possible to change these
> details after port creation. This information is provided by the
> devlink lookup in the current proposal.
>
> It might work if it was possible to create the port and interface
> separately, but that would cause a referential integrity violation.
> Another alternative could be that ovn-vif created a new interface
> record in 4), attaching that to the port record created by
> ovn-controller replacing a throw-away interface record, but we would
> be approaching hacky territory.
>
> Could we do some sort of local IPC perhaps through a Unix domain socket?

I think it would be good to have separate *modules* to handle interface
plugging (one module per plug_type), but there is no need to have a
separate *process* for it, which increases the complexity and the IPC cost.
The modules can be invoked as callbacks registered for each plug_type, and
responsible for explaining the options defined for its type, which is
directly retrieved from the port_binding record from SB DB. This way the
core ovn-controller components are kept untouched (except calling the
plugging modules at some point, but not worrying about the details),
without introducing a separate process. (A separate process is needed only
if there are performance concerns or blocking I/Os).

It is similar to how different dpif providers (or different dev types) are
supported in OVS.

>
> > 3.  An external entity (similar to your proposed pluggable.c) running
> > on the chassis, would have built an shash of  "devlink_ports" using
> > the devlink sockets.
> >
> > 4.  This entity will also be monitoring the ovs interfaces.  When it
> > sees a new interface with the proper hints, will set the
> > external_ids:iface-id=<lport_name>
> >
I think setting iface-id (if still necessary) shall be done together with
the port creation, by the plugging module (through a callback).

> > 5. ovn-controller when it sees the external_ids:iface-id,  will bind
> > the port and add flows.
> >
> >
> > With (2) we now add the support in ovn-controller to create OVS ports,
> >  but at the same time OVN will not bother about the internal details
> > of SMART nics
> > or bother about the devlink sockets.
> >
> > I'm not sure if the above approach seems hacky.  The external entity
> > could be called "ovn-vif".
> >
> > What do you think ?
>
> This is an interesting idea and it could work if we figure out a way
> for ovn-controller and ovn-vif to exchange information before the
> port/interface is created.
>

I'd really prefer not to introduce such extra IPC. Would the plug_type
mechanism be generic enough and avoid such external entities?

Thanks,
Han

> > >
> > > > Perhaps there can be a separate project called - ovn-vif which does
this ?
> > >
> > > Creating a separate ovn-org project called ovn-vif would allow us to
> > > move forward, we would of course have to document the optional CMS API
> > > in core OVN pointing to an optional external dependency for their use.
> > > Would this be an acceptable compromise?
> >
> > I personally do not have any objections to that.  Maybe others have
> > different opinions.
>
> Great.
>
> Cheers!
>
> --
> Frode Nordahl
>
>
> >
> > Thanks
> > Numan
> >
> > >
> > > --
> > > Frode Nordahl
> > >
> > > > Thanks
> > > > Numan
> > > >
> > > > >
> > > > >
> > > > > --
> > > > > Frode Nordahl
> > > > >
> > > > > > Best regards, Ilya Maximets.
> > > > > _______________________________________________
> > > > > dev mailing list
> > > > > dev at openvswitch.org
> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > >
> > > _______________________________________________
> > > dev mailing list
> > > dev at openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
>
>
>
> --
> Frode Nordahl
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list