[ovs-dev] [RFC PATCH ovn 0/4] Use Distributed Gateway Port for ovn-controller scalability.

Krzysztof Klimonda kklimonda at syntaxhighlighted.com
Wed Aug 4 10:48:24 UTC 2021


Hi Han,

On Tue, Aug 3, 2021, at 20:33, Han Zhou wrote:
> On Tue, Aug 3, 2021 at 11:09 AM Numan Siddique <numans at ovn.org> wrote:
> >
> > On Fri, Jul 30, 2021 at 3:22 AM Han Zhou <hzhou at ovn.org> wrote:
> > >
> > > Note: This patch series is on top of a pending patch that is still under
> > > review:
> http://patchwork.ozlabs.org/project/ovn/patch/20210729080218.1235041-1-hzhou@ovn.org/
> > >
> > > It is RFC because: a) it is based on the unmerged patch. b) DDlog
> > > changes are not done yet. Below is a copy of the commit message of the
> last
> > > patch in this series:
> > >
> > > For a fully distributed virtual network dataplane, ovn-controller
> > > flood-fills datapaths that are connected through patch ports. This
> > > creates scale problems in ovn-controller when the connected datapaths
> > > are too many.
> > >
> > > In a particular situation, when distributed gateway ports are used to
> > > connect logical routers to logical switches, when there is no need for
> > > distributed processing of those gateway ports (e.g. no dnat_and_snat
> > > configured), the datapaths on the other side of the gateway ports are
> > > not needed locally on the current chassis. This patch avoids pulling
> > > those datapaths to local in those scenarios.
> > >
> > > There are two scenarios that can greatly benefit from this optimization.
> > >
> > > 1) When there are multiple tenants, each has its own logical topology,
> > >    but sharing the same external/provider networks, connected to their
> > >    own logical routers with DGPs. Without this optimization, each
> > >    ovn-controller would process all logical topology of all tenants and
> > >    program flows for all of them, even if there are only workloads of a
> > >    very few number of tenants on the node where the ovn-controller is
> > >    running, because the shared external network connects all tenants.
> > >    With this change, only the logical topologies relevant to the node
> > >    are processed and programmed on the node.
> > >
> > > 2) In some deployments, such as ovn-kubernetes, logical switches are
> > >    bound to chassises instead of distributed, because each chassis is
> > >    assigned dedicated subnets. With the current implementation,
> > >    ovn-controller on each node processes all logical switches and all
> > >    ports on them, without knowing that they are not distributed at all.
> > >    At large scale with N nodes (N = hundreds or even more), there are
> > >    roughly N times processing power wasted for the logical connectivity
> > >    related flows. With this change, those depolyments can utilize DGP
> > >    to connect the node level logical switches to distributed router(s),
> > >    with gateway chassis (or HA chassis without really HA) of the DGP
> > >    set to the chassis where the logical switch is bound. This inherently
> > >    tells OVN the mapping between logical switch and chassis, and
> > >    ovn-controller would smartly avoid processing topologies of other
> node
> > >    level logical switches, which would hugely save compute cost of each
> > >    ovn-controller.
> > >
> > > For 2), test result for an ovn-kubernetes alike deployment shows
> > > signficant improvement of ovn-controller, both CPU (>90% reduced) and
> memory.
> > >
> > > Topology:
> > >
> > > - 1000 nodes, 1 LS with 10 LSPs per node, connected to a distributed
> > >   router.
> > >
> > > - 2 large port-groups PG1 and PG2, each with 2000 LSPs
> > >
> > > - 10 stateful ACLs: 5 from PG1 to PG2, 5 from PG2 to PG1
> > >
> > > - 1 GR per node, connected to the distributed router through a join
> > >   switch. Each GR also connects to an external logical switch per node.
> > >   (This part is to keep the test environment close to a real
> > >    ovn-kubernetes setup but shouldn't make much difference for the
> > >    comparison)
> > >
> > > ==== Before the change ====
> > > OVS flows per node: 297408
> > > ovn-controller memory: 772696 KB
> > > ovn-controller recompute: 13s
> > > ovn-controller restart (recompute + reinstall OVS flows): 63s
> > >
> > > ==== After the change (also use DGP to connect node level LSes) ====
> > > OVS flows per node: 81139 (~70% reduced)
> > > ovn-controller memory: 163464 KB (~80% reduced)
> > > ovn-controller recompute: 0.86s (>90% reduced)
> > > ovn-controller restart (recompute + reinstall OVS flows): 5s (>90%
> reduced)
> >
> > Hi Han,
> >
> > Thanks for these RFC patches.  The improvements are significant.
> > That's awesome.
> >
> > If I understand this RFC correctly, ovn-k8s will set the
> > gateway_chassis for each logical
> > router port of the cluster router (ovn_cluster_router) connecting to
> > the node logical switch right ?
> >
> > If so, instead of using the multiple gw port feature, why can't
> > ovn-k8s just set the chassis=<node_chassis_name>
> > in the logical switch other_config option ?
> >
> > ovn-controllers can exclude the logical switches from the
> > local_datapaths if they don't belong to the local chassis.
> >
> > I'm not entirely sure if this would work.  Any thoughts ?  If the same
> > can be achieved using the chassis option
> > instead of multiple gw router ports, perhaps the former seems better
> > to me as it would be less work for ovn-k8s.
> > And there will be fewer resources in SB DB.   What do you think ?
> > Otherwise +1 from me for
> > this RFC series.
> >
> 
> Thanks Numan for the feedback!
> The reason why not introducing a new option in LS is:
> 1) The multiple DGP support is a valuable feature regardless of the use
> case of this RFC.
> 2) Don't flood-fill beyond DGP port is also valuable regardless of the
> ovn-k8s use case. As mentioned it would also help the OpenStack scalability
> when multi-tenant sharing same provider networks.

Does this optimization for OpenStack usecase works also when FIPs (dnat_and_snat) are in use? The commit message is a little unclear.

Best Regards,
Krzysztof


> 3) If 1) and 2) are both implemented, there is no need for an extra
> mechanism for "bind logical switches to chassis", because the outcome of 1)
> and 2) are sufficient. The changes in ovn-k8s would be the same, i.e. set
> the chassis somewhere, either to a LRP or a LS. I have sent a WIP PR to the
> ovn-k8s repo and it appears to be a very small change:
> https://github.com/ovn-org/ovn-kubernetes/pull/2388
> 
> In addition, a separate option on LS seems unnatural to me, because the end
> user must understand what they are doing by setting that option. In
> contrast, the DGP more flexibly and accurately tells what OVN should do.
> Maybe the name "Distributed Gateway Port" is somehow confusing, but the
> chassis-redirect port behind it is telling OVN that the user wants the
> traffic to be redirected to a chassis for the LRP. There can be different
> scenarios such as a single LS connecting to multiple DGPs and vice versa,
> all are valid setups that can be supported by this feature. While setting a
> chassis option for a LS is arbitrary and it is easy to create conflict
> setups, e.g. setting such an option for LS-join. Of course we can say the
> user is responsible for what they are setting, but I just don't see it
> necessary for now.
> 
> Does this make sense?
> 
> > Thanks
> > Numan
> >
> > >
> > > Han Zhou (4):
> > >   ovn-northd: Avoid ha_ref_chassis calculation when there is only one
> > >     chassis in ha_chassis_group.
> > >   binding.c: Refactor binding_handle_port_binding_changes.
> > >   binding.c: Create a new function
> > >     consider_patch_port_for_local_datapaths.
> > >   ovn-controller: Don't flood fill local datapaths beyond DGP boundary.
> > >
> > >  controller/binding.c   | 190 +++++++++++++++++++++++++++++------------
> > >  northd/ovn-northd.c    |  39 +++++++--
> > >  ovn-architecture.7.xml |  26 ++++++
> > >  ovn-nb.xml             |   6 ++
> > >  tests/ovn.at           |  67 +++++++++++++++
> > >  5 files changed, 268 insertions(+), 60 deletions(-)
> > >
> > > --
> > > 2.30.2
> > >
> > > _______________________________________________
> > > dev mailing list
> > > dev at openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 


-- 
  Krzysztof Klimonda
  kklimonda at syntaxhighlighted.com


More information about the dev mailing list