[ovs-dev] [RFC PATCH ovn 0/4] Use Distributed Gateway Port for ovn-controller scalability.

Han Zhou hzhou at ovn.org
Tue Aug 3 19:16:51 UTC 2021


On Tue, Aug 3, 2021 at 11:57 AM Numan Siddique <numans at ovn.org> wrote:
>
> On Tue, Aug 3, 2021 at 2:34 PM Han Zhou <hzhou at ovn.org> wrote:
> >
> > On Tue, Aug 3, 2021 at 11:09 AM Numan Siddique <numans at ovn.org> wrote:
> > >
> > > On Fri, Jul 30, 2021 at 3:22 AM Han Zhou <hzhou at ovn.org> wrote:
> > > >
> > > > Note: This patch series is on top of a pending patch that is still
under
> > > > review:
> >
http://patchwork.ozlabs.org/project/ovn/patch/20210729080218.1235041-1-hzhou@ovn.org/
> > > >
> > > > It is RFC because: a) it is based on the unmerged patch. b) DDlog
> > > > changes are not done yet. Below is a copy of the commit message of
the
> > last
> > > > patch in this series:
> > > >
> > > > For a fully distributed virtual network dataplane, ovn-controller
> > > > flood-fills datapaths that are connected through patch ports. This
> > > > creates scale problems in ovn-controller when the connected
datapaths
> > > > are too many.
> > > >
> > > > In a particular situation, when distributed gateway ports are used
to
> > > > connect logical routers to logical switches, when there is no need
for
> > > > distributed processing of those gateway ports (e.g. no dnat_and_snat
> > > > configured), the datapaths on the other side of the gateway ports
are
> > > > not needed locally on the current chassis. This patch avoids pulling
> > > > those datapaths to local in those scenarios.
> > > >
> > > > There are two scenarios that can greatly benefit from this
optimization.
> > > >
> > > > 1) When there are multiple tenants, each has its own logical
topology,
> > > >    but sharing the same external/provider networks, connected to
their
> > > >    own logical routers with DGPs. Without this optimization, each
> > > >    ovn-controller would process all logical topology of all tenants
and
> > > >    program flows for all of them, even if there are only workloads
of a
> > > >    very few number of tenants on the node where the ovn-controller
is
> > > >    running, because the shared external network connects all
tenants.
> > > >    With this change, only the logical topologies relevant to the
node
> > > >    are processed and programmed on the node.
> > > >
> > > > 2) In some deployments, such as ovn-kubernetes, logical switches are
> > > >    bound to chassises instead of distributed, because each chassis
is
> > > >    assigned dedicated subnets. With the current implementation,
> > > >    ovn-controller on each node processes all logical switches and
all
> > > >    ports on them, without knowing that they are not distributed at
all.
> > > >    At large scale with N nodes (N = hundreds or even more), there
are
> > > >    roughly N times processing power wasted for the logical
connectivity
> > > >    related flows. With this change, those depolyments can utilize
DGP
> > > >    to connect the node level logical switches to distributed
router(s),
> > > >    with gateway chassis (or HA chassis without really HA) of the DGP
> > > >    set to the chassis where the logical switch is bound. This
inherently
> > > >    tells OVN the mapping between logical switch and chassis, and
> > > >    ovn-controller would smartly avoid processing topologies of other
> > node
> > > >    level logical switches, which would hugely save compute cost of
each
> > > >    ovn-controller.
> > > >
> > > > For 2), test result for an ovn-kubernetes alike deployment shows
> > > > signficant improvement of ovn-controller, both CPU (>90% reduced)
and
> > memory.
> > > >
> > > > Topology:
> > > >
> > > > - 1000 nodes, 1 LS with 10 LSPs per node, connected to a distributed
> > > >   router.
> > > >
> > > > - 2 large port-groups PG1 and PG2, each with 2000 LSPs
> > > >
> > > > - 10 stateful ACLs: 5 from PG1 to PG2, 5 from PG2 to PG1
> > > >
> > > > - 1 GR per node, connected to the distributed router through a join
> > > >   switch. Each GR also connects to an external logical switch per
node.
> > > >   (This part is to keep the test environment close to a real
> > > >    ovn-kubernetes setup but shouldn't make much difference for the
> > > >    comparison)
> > > >
> > > > ==== Before the change ====
> > > > OVS flows per node: 297408
> > > > ovn-controller memory: 772696 KB
> > > > ovn-controller recompute: 13s
> > > > ovn-controller restart (recompute + reinstall OVS flows): 63s
> > > >
> > > > ==== After the change (also use DGP to connect node level LSes) ====
> > > > OVS flows per node: 81139 (~70% reduced)
> > > > ovn-controller memory: 163464 KB (~80% reduced)
> > > > ovn-controller recompute: 0.86s (>90% reduced)
> > > > ovn-controller restart (recompute + reinstall OVS flows): 5s (>90%
> > reduced)
> > >
> > > Hi Han,
> > >
> > > Thanks for these RFC patches.  The improvements are significant.
> > > That's awesome.
> > >
> > > If I understand this RFC correctly, ovn-k8s will set the
> > > gateway_chassis for each logical
> > > router port of the cluster router (ovn_cluster_router) connecting to
> > > the node logical switch right ?
> > >
> > > If so, instead of using the multiple gw port feature, why can't
> > > ovn-k8s just set the chassis=<node_chassis_name>
> > > in the logical switch other_config option ?
> > >
> > > ovn-controllers can exclude the logical switches from the
> > > local_datapaths if they don't belong to the local chassis.
> > >
> > > I'm not entirely sure if this would work.  Any thoughts ?  If the same
> > > can be achieved using the chassis option
> > > instead of multiple gw router ports, perhaps the former seems better
> > > to me as it would be less work for ovn-k8s.
> > > And there will be fewer resources in SB DB.   What do you think ?
> > > Otherwise +1 from me for
> > > this RFC series.
> > >
> >
> > Thanks Numan for the feedback!
> > The reason why not introducing a new option in LS is:
> > 1) The multiple DGP support is a valuable feature regardless of the use
> > case of this RFC.
> > 2) Don't flood-fill beyond DGP port is also valuable regardless of the
> > ovn-k8s use case. As mentioned it would also help the OpenStack
scalability
> > when multi-tenant sharing same provider networks.
> > 3) If 1) and 2) are both implemented, there is no need for an extra
> > mechanism for "bind logical switches to chassis", because the outcome
of 1)
> > and 2) are sufficient. The changes in ovn-k8s would be the same, i.e.
set
> > the chassis somewhere, either to a LRP or a LS. I have sent a WIP PR to
the
> > ovn-k8s repo and it appears to be a very small change:
> > https://github.com/ovn-org/ovn-kubernetes/pull/2388
> >
> > In addition, a separate option on LS seems unnatural to me, because the
end
> > user must understand what they are doing by setting that option. In
> > contrast, the DGP more flexibly and accurately tells what OVN should do.
> > Maybe the name "Distributed Gateway Port" is somehow confusing, but the
> > chassis-redirect port behind it is telling OVN that the user wants the
> > traffic to be redirected to a chassis for the LRP. There can be
different
> > scenarios such as a single LS connecting to multiple DGPs and vice
versa,
> > all are valid setups that can be supported by this feature. While
setting a
> > chassis option for a LS is arbitrary and it is easy to create conflict
> > setups, e.g. setting such an option for LS-join. Of course we can say
the
> > user is responsible for what they are setting, but I just don't see it
> > necessary for now.
> >
> > Does this make sense?
>
> Thanks for the detailed explanation.  Makes sense to me.  As you
> mentioned, the name "gateway" is
> a bit odd.  Since much of the traffic would be E-W.
>
> Would you be fine rebasing the DGP patch and this RFC series and
reposting ?
> I'd like to test it out in our setup and which would also help in
> understanding the feature better.

Thanks Numan. Yes I am working on the rebasing.
While resolving conflicts I found a problem of the commit:
1c9e46ab5 northd: add check_pkt_larger lflows for ingress traffic

I will reply the original patch email to discuss.


More information about the dev mailing list