[ovs-dev] [RFC PATCH ovn 0/4] Use Distributed Gateway Port for ovn-controller scalability.

Numan Siddique numans at ovn.org
Tue Aug 3 18:09:35 UTC 2021


On Fri, Jul 30, 2021 at 3:22 AM Han Zhou <hzhou at ovn.org> wrote:
>
> Note: This patch series is on top of a pending patch that is still under
> review: http://patchwork.ozlabs.org/project/ovn/patch/20210729080218.1235041-1-hzhou@ovn.org/
>
> It is RFC because: a) it is based on the unmerged patch. b) DDlog
> changes are not done yet. Below is a copy of the commit message of the last
> patch in this series:
>
> For a fully distributed virtual network dataplane, ovn-controller
> flood-fills datapaths that are connected through patch ports. This
> creates scale problems in ovn-controller when the connected datapaths
> are too many.
>
> In a particular situation, when distributed gateway ports are used to
> connect logical routers to logical switches, when there is no need for
> distributed processing of those gateway ports (e.g. no dnat_and_snat
> configured), the datapaths on the other side of the gateway ports are
> not needed locally on the current chassis. This patch avoids pulling
> those datapaths to local in those scenarios.
>
> There are two scenarios that can greatly benefit from this optimization.
>
> 1) When there are multiple tenants, each has its own logical topology,
>    but sharing the same external/provider networks, connected to their
>    own logical routers with DGPs. Without this optimization, each
>    ovn-controller would process all logical topology of all tenants and
>    program flows for all of them, even if there are only workloads of a
>    very few number of tenants on the node where the ovn-controller is
>    running, because the shared external network connects all tenants.
>    With this change, only the logical topologies relevant to the node
>    are processed and programmed on the node.
>
> 2) In some deployments, such as ovn-kubernetes, logical switches are
>    bound to chassises instead of distributed, because each chassis is
>    assigned dedicated subnets. With the current implementation,
>    ovn-controller on each node processes all logical switches and all
>    ports on them, without knowing that they are not distributed at all.
>    At large scale with N nodes (N = hundreds or even more), there are
>    roughly N times processing power wasted for the logical connectivity
>    related flows. With this change, those depolyments can utilize DGP
>    to connect the node level logical switches to distributed router(s),
>    with gateway chassis (or HA chassis without really HA) of the DGP
>    set to the chassis where the logical switch is bound. This inherently
>    tells OVN the mapping between logical switch and chassis, and
>    ovn-controller would smartly avoid processing topologies of other node
>    level logical switches, which would hugely save compute cost of each
>    ovn-controller.
>
> For 2), test result for an ovn-kubernetes alike deployment shows
> signficant improvement of ovn-controller, both CPU (>90% reduced) and memory.
>
> Topology:
>
> - 1000 nodes, 1 LS with 10 LSPs per node, connected to a distributed
>   router.
>
> - 2 large port-groups PG1 and PG2, each with 2000 LSPs
>
> - 10 stateful ACLs: 5 from PG1 to PG2, 5 from PG2 to PG1
>
> - 1 GR per node, connected to the distributed router through a join
>   switch. Each GR also connects to an external logical switch per node.
>   (This part is to keep the test environment close to a real
>    ovn-kubernetes setup but shouldn't make much difference for the
>    comparison)
>
> ==== Before the change ====
> OVS flows per node: 297408
> ovn-controller memory: 772696 KB
> ovn-controller recompute: 13s
> ovn-controller restart (recompute + reinstall OVS flows): 63s
>
> ==== After the change (also use DGP to connect node level LSes) ====
> OVS flows per node: 81139 (~70% reduced)
> ovn-controller memory: 163464 KB (~80% reduced)
> ovn-controller recompute: 0.86s (>90% reduced)
> ovn-controller restart (recompute + reinstall OVS flows): 5s (>90% reduced)

Hi Han,

Thanks for these RFC patches.  The improvements are significant.
That's awesome.

If I understand this RFC correctly, ovn-k8s will set the
gateway_chassis for each logical
router port of the cluster router (ovn_cluster_router) connecting to
the node logical switch right ?

If so, instead of using the multiple gw port feature, why can't
ovn-k8s just set the chassis=<node_chassis_name>
in the logical switch other_config option ?

ovn-controllers can exclude the logical switches from the
local_datapaths if they don't belong to the local chassis.

I'm not entirely sure if this would work.  Any thoughts ?  If the same
can be achieved using the chassis option
instead of multiple gw router ports, perhaps the former seems better
to me as it would be less work for ovn-k8s.
And there will be fewer resources in SB DB.   What do you think ?
Otherwise +1 from me for
this RFC series.

Thanks
Numan

>
> Han Zhou (4):
>   ovn-northd: Avoid ha_ref_chassis calculation when there is only one
>     chassis in ha_chassis_group.
>   binding.c: Refactor binding_handle_port_binding_changes.
>   binding.c: Create a new function
>     consider_patch_port_for_local_datapaths.
>   ovn-controller: Don't flood fill local datapaths beyond DGP boundary.
>
>  controller/binding.c   | 190 +++++++++++++++++++++++++++++------------
>  northd/ovn-northd.c    |  39 +++++++--
>  ovn-architecture.7.xml |  26 ++++++
>  ovn-nb.xml             |   6 ++
>  tests/ovn.at           |  67 +++++++++++++++
>  5 files changed, 268 insertions(+), 60 deletions(-)
>
> --
> 2.30.2
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>


More information about the dev mailing list