[ovs-discuss] OVN: availability zones concept
blp at ovn.org
Thu Mar 7 18:54:03 UTC 2019
On Wed, Mar 06, 2019 at 10:32:29PM -0800, Han Zhou wrote:
> On Wed, Mar 6, 2019 at 9:06 AM Ben Pfaff <blp at ovn.org> wrote:
> > On Tue, Mar 05, 2019 at 09:39:37PM -0800, Han Zhou wrote:
> > > On Tue, Mar 5, 2019 at 7:24 PM Ben Pfaff <blp at ovn.org> wrote:
> > > > What's the effective difference between an OVN deployment with 3 zones,
> > > > and a collection of 3 OVN deployments? Is it simply that the 3-zone
> > > > deployment shares databases? Is that a significant advantage?
> > >
> > > Hi Ben, based on the discussions there are two cases:
> > >
> > > For completely separated zones (no overlapping) v.s. separate OVN
> > > deployments, the difference is that separate OVN deployments requires
> > > some sort of federation at a higher layer, so that a single CMS can
> > > operate multiple OVN deployments. Of course separate zones in same OVN
> > > still requires changes in CMS to operate but the change may be smaller
> > > in some cases.
> > >
> > > For overlapping zones v.s. separate OVN deployments, the difference is
> > > more obvious. Separate OVN deployments doesn't allow overlapping.
> > > Overlapping zones allows sharing gateways between different groups of
> > > hypervisors.
> > OK. The difference is obvious in the case where there is overlap.
> > > If the purpose is only reducing tunnel mesh size, I think it may be
> > > better to avoid the zone concept but instead create tunnels (and bfd
> > > sessions) on-demand, as discussed here:
> > > https://mail.openvswitch.org/pipermail/ovs-discuss/2019-March/048281.html
> > Except in cases where we have BFD sessions, it is possible to entirely
> > avoid having explicitly defined tunnels, since the tunnels can be
> > defined in the flow table. The ovs-fields(7) manpage describes these
> > under "flow-based tunnels" in the TUNNEL FIELDS section. Naively, doing
> > it this way would require, on each hypervisor, a few OpenFlow flows per
> > remote chassis, as opposed to one port per remote chassis. That
> > probably scales better. If necessary, it could be made to scale better
> > than that by using send-to-controller actions to add flows for tunnels
> > as packets arrive for them or as packets need to go through them.
> Thanks Ben for the pointer. I have to admit I was not aware of these
> different ways of using tunnels. The documentation is very clear, and
> now I understand what OVN currently uses is "Intermediate models",
> i.e. partially flow-based - remote-ips are port based while keys are
> flow based.
Thanks for the documentation fixes!
> While purely flow-based tunnel is attractive in terms of flexibility,
> it seems not fit very well for OVN use case because we do need BFD
I think that OVN only uses BFD for a few of its ports--only for gateways
with HA, right? Those could continue to have ports.
> For the "send-to-controller", i.e. reactively set up flows when
> packets arrives, I hope it is not really needed for solving the tunnel
> scaling problem, since it introduces data plane latency which could be
> a bigger problem. (But I am not sure if reactive mode in general is a
> good idea - it might be a reasonable trade-off for solving the scale
> problem of each HV pre-installing flows for all related datapaths in a
> full-mesh alike scenario. Anyway, not directly related to current
It would introduce data plane latency for the first packet to go to or
from a particular hypervisor. After that there would be no further
additional latency. It would probably not be noticeable.
Let me be clear that I am not pushing this solution. It will complicate
things, and I do not like unnecessary complication. I am just pointing
out that is possible.
> So I would propose to keep the current partially flow-based tunnel
> usage in OVN and optimize the tunnel setup only between peers that are
> logically connected, if this satisfies the scaling goal of OVN users.
> Even with this optimization, we may need to make it as a configurable
> option, since in small scale use cases users may in practice prefer
> the original behavior to avoid the latency of tunnel setup.
More information about the discuss