[ovs-discuss] OVN: availability zones concept

Ben Pfaff blp at ovn.org
Thu Mar 7 18:54:03 UTC 2019

On Wed, Mar 06, 2019 at 10:32:29PM -0800, Han Zhou wrote:
> On Wed, Mar 6, 2019 at 9:06 AM Ben Pfaff <blp at ovn.org> wrote:
> >
> > On Tue, Mar 05, 2019 at 09:39:37PM -0800, Han Zhou wrote:
> > > On Tue, Mar 5, 2019 at 7:24 PM Ben Pfaff <blp at ovn.org> wrote:
> > > > What's the effective difference between an OVN deployment with 3 zones,
> > > > and a collection of 3 OVN deployments?  Is it simply that the 3-zone
> > > > deployment shares databases?  Is that a significant advantage?
> > >
> > > Hi Ben, based on the discussions there are two cases:
> > >
> > > For completely separated zones (no overlapping) v.s. separate OVN
> > > deployments, the difference is that separate OVN deployments requires
> > > some sort of federation at a higher layer, so that a single CMS can
> > > operate multiple OVN deployments. Of course separate zones in same OVN
> > > still requires changes in CMS to operate but the change may be smaller
> > > in some cases.
> > >
> > > For overlapping zones v.s. separate OVN deployments, the difference is
> > > more obvious. Separate OVN deployments doesn't allow overlapping.
> > > Overlapping zones allows sharing gateways between different groups of
> > > hypervisors.
> >
> > OK.  The difference is obvious in the case where there is overlap.
> >
> > > If the purpose is only reducing tunnel mesh size, I think it may be
> > > better to avoid the zone concept but instead create tunnels (and bfd
> > > sessions) on-demand, as discussed here:
> > > https://mail.openvswitch.org/pipermail/ovs-discuss/2019-March/048281.html
> >
> > Except in cases where we have BFD sessions, it is possible to entirely
> > avoid having explicitly defined tunnels, since the tunnels can be
> > defined in the flow table.  The ovs-fields(7) manpage describes these
> > under "flow-based tunnels" in the TUNNEL FIELDS section.  Naively, doing
> > it this way would require, on each hypervisor, a few OpenFlow flows per
> > remote chassis, as opposed to one port per remote chassis.  That
> > probably scales better.  If necessary, it could be made to scale better
> > than that by using send-to-controller actions to add flows for tunnels
> > as packets arrive for them or as packets need to go through them.
> Thanks Ben for the pointer. I have to admit I was not aware of these
> different ways of using tunnels. The documentation is very clear, and
> now I understand what OVN currently uses is "Intermediate models",
> i.e. partially flow-based - remote-ips are port based while keys are
> flow based.

Thanks for the documentation fixes!

> While purely flow-based tunnel is attractive in terms of flexibility,
> it seems not fit very well for OVN use case because we do need BFD
> sessions. 

I think that OVN only uses BFD for a few of its ports--only for gateways
with HA, right?  Those could continue to have ports.

> For the "send-to-controller", i.e. reactively set up flows when
> packets arrives, I hope it is not really needed for solving the tunnel
> scaling problem, since it introduces data plane latency which could be
> a bigger problem. (But I am not sure if reactive mode in general is a
> good idea - it might be a reasonable trade-off for solving the scale
> problem of each HV pre-installing flows for all related datapaths in a
> full-mesh alike scenario. Anyway, not directly related to current
> topic).

It would introduce data plane latency for the first packet to go to or
from a particular hypervisor.  After that there would be no further
additional latency.  It would probably not be noticeable.

Let me be clear that I am not pushing this solution.  It will complicate
things, and I do not like unnecessary complication.  I am just pointing
out that is possible.

> So I would propose to keep the current partially flow-based tunnel
> usage in OVN and optimize the tunnel setup only between peers that are
> logically connected, if this satisfies the scaling goal of OVN users.
> Even with this optimization, we may need to make it as a configurable
> option, since in small scale use cases users may in practice prefer
> the original behavior to avoid the latency of tunnel setup.


More information about the discuss mailing list