[ovs-dev] [PATCH RFC ovn] Add VXLAN support for non-VTEP datapath bindings

Wed Mar 25 17:03:23 UTC 2020

On Mon, Mar 23, 2020 at 7:47 PM Ben Pfaff <blp at ovn.org> wrote:
>
> On Mon, Mar 23, 2020 at 06:39:14PM -0400, Ihar Hrachyshka wrote:
> > First, some questions as to implementation (or feasibility) of several
> > todo items in my list for the patch.
> >
> > 1) I initially thought that, because VXLAN would have limited space
> > for both networks and ports in its VNI, the encap type would not be
> > able to support as many of both as Geneve / STT, and so we would need
> > to enforce the limit programmatically somehow. But in OVN context, is
> > it even doable? North DB resources may be created before any chassis
> > are registered; once a chassis that is VXLAN only joins, it's too late
> > to forbid the spilling resources from existence (though it may be a
> > good time to detect this condition and perhaps fail to register the
> > chassis / configure flow tables). How do we want to handle this case?
> > Do we fail to start VXLAN configured ovn-controller when too many
> > networks / ports per network created? Do we forbid creating too many
> > resources when a chassis is registered that is VXLAN only? Both? Or do
> > we leave it up to the deployment / CMS to control the chassis / north
> > DB configuration?
> >
> > 2) Similar to the issue above, I originally planned to forbid using
> > ACLs relying on ingress port when a VXLAN chassis is involved (because
> > the VNI won't carry the information). I believe the approach should be
> > similar to how we choose to handle the issue with the maximum number
> > of resources, described above.
> >
> > I am new to OVN so maybe there are existing examples for such
> > situations already that I could get inspiration from. Let me know what
> > you think.
>
> I don't have good solutions for the above resource limit problems.  We
> designed OVN so that this kind of resource limit wouldn't be a problem
> in practice, so we didn't think through what would happen if the limits
> suddenly became more stringent.
>
> I think that it falls upon the CMS by default.
>

For ACLs, I think it's fair to put the burden on CMS (just because it
should be easy for them to follow the simple rule: "Don't use ingress
matching ACLs in your OVN driver.")

While having a guard against overflowing resource number limits in CMS
may be helpful (for example, for immediate failure mode feedback to
CMS user - compare to async notification about a CMS resource to OVSDB
primitive conversion),

I believe OVN should handle the case too. The risk of not doing it is
- the limits are reached, and we start to send traffic that belongs to
one network to another, because their lower 12 bits of datapath ID are
the same.

While CMS could guard against that, it may be less aware about chassis
configuration than OVN. A dumb way to resolve this in CMS would be
having a global configuration option set by deployment tool that
configures OVN and that would know whether any VXLAN capable chassis
are deployed in the cluster. A more proper way to solve it would be to
make CMS aware of chassis configuration by maintaining a cache of
Chassis table records and checking their encap types on each network /
port created.

The same could be done by OVN itself, and arguably OVN is the owner of
the data source (encap records) and is in a better position to control
it:

1. on network creation, if VXLAN is enabled on any chassis, count
networks; if result >= limit, fail; same for ports per network;
2. on ovn-controller start, if VXLAN is enabled for the chassis,
calculate networks / ports per network; if result >= limit, fail to
start the service.

Note that in most common scenario, all chassis have the same
encapsulation types registered; there are multiple ovn-controller
nodes; and resources are created after all chassis are registered in
the database. So point (2) above is to handle a corner case that
probably won't ever happen in real life. (1) is a hot path.

Any specific objections to having this kind of guards in OVN itself?
This may be in addition to CMS side guards (to avoid even trying to
create CMS resources that are known to fail to sync to OVN).

(A similar approach may be extended to ACLs allowed though it's not as
pressing because there are no known CMS that rely on unsupported
ACLs.)

Cheers,
Ihar

> > > > Assuming we pick a term to use to describe these out-of-cluster
> > > > switches, we should consider the impact of the rename. Renaming
> > > > internal symbols / functions is trivial. But "vtep" is used in OVN
> > > > schema (for example, for port binding 'type' attribute). Do we want to
> > > > rename those too? If so, what considerations should we apply when
> > > > doing it? Any guidance as to maintaining backwards compatibility?
> > > >
> > > > Also, is such a rename something that should happen at the same moment
> > > > when we add support for VXLAN for in-cluster communication? Or should
> > > > it be a separate work item? (If so, do we expect it to land before or
> > > > after the core VXLAN implementation lands?)
> > >
> > > We can't (or at any rate should not) change the terms in the schema, but
> > > we can change other places and point out to people in a few places that
> > > a "ramp switch" is sometimes, confusingly, called a "vtep".
> >
> > Gotcha. Any preferences as to whether to consider it a preparatory
> > work item; a follow-up; or a part of the VXLAN implementation? (I lean
> > towards handling the ramp term introduction as an independent
> > preparatory step.)
>
> I sent out a patch for people to look at.
>