[ovs-dev] [PATCH ovn 0/6] Use Distributed Gateway Port for ovn-controller scalability.

Han Zhou hzhou at ovn.org
Fri Aug 13 22:55:28 UTC 2021


For a fully distributed virtual network dataplane, ovn-controller
flood-fills datapaths that are connected through patch ports. This
creates scale problems in ovn-controller when the connected datapaths
are too many.

In a particular situation, when distributed gateway ports are used to
connect logical routers to logical switches, when there is no need for
distributed processing of those gateway ports (e.g. no dnat_and_snat
configured), the datapaths on the other side of the gateway ports are
not needed locally on the current chassis. This patch avoids pulling
those datapaths to local in those scenarios.

There are two scenarios that can greatly benefit from this optimization.

1) When there are multiple tenants, each has its own logical topology,
   but sharing the same external/provider networks, connected to their
   own logical routers with DGPs. Without this optimization, each
   ovn-controller would process all logical topology of all tenants and
   program flows for all of them, even if there are only workloads of a
   very few number of tenants on the node where the ovn-controller is
   running, because the shared external network connects all tenants.
   With this change, only the logical topologies relevant to the node
   are processed and programmed on the node.

2) In some deployments, such as ovn-kubernetes, logical switches are
   bound to chassises instead of distributed, because each chassis is
   assigned dedicated subnets. With the current implementation,
   ovn-controller on each node processes all logical switches and all
   ports on them, without knowing that they are not distributed at all.
   At large scale with N nodes (N = hundreds or even more), there are
   roughly N times processing power wasted for the logical connectivity
   related flows. With this change, those depolyments can utilize DGP
   to connect the node level logical switches to distributed router(s),
   with gateway chassis (or HA chassis without really HA) of the DGP
   set to the chassis where the logical switch is bound. This inherently
   tells OVN the mapping between logical switch and chassis, and
   ovn-controller would smartly avoid processing topologies of other node
   level logical switches, which would hugely save compute cost of each
   ovn-controller.

For 2), test result for an ovn-kubernetes alike deployment shows
signficant improvement of ovn-controller, both CPU (>90% reduced) and memory.

Topology:

- 1000 nodes, 1 LS with 10 LSPs per node, connected to a distributed
  router.

- 2 large port-groups PG1 and PG2, each with 2000 LSPs

- 10 stateful ACLs: 5 from PG1 to PG2, 5 from PG2 to PG1

- 1 GR per node, connected to the distributed router through a join
  switch. Each GR also connects to an external logical switch per node.
  (This part is to keep the test environment close to a real
   ovn-kubernetes setup but shouldn't make much difference for the
   comparison)

==== Before the change ====
OVS flows per node: 297408
ovn-controller memory: 772696 KB
ovn-controller recompute: 13s
ovn-controller restart (recompute + reinstall OVS flows): 63s

==== After the change (also use DGP to connect node level LSes) ====
OVS flows per node: 81139 (~70% reduced)
ovn-controller memory: 163464 KB (~80% reduced)
ovn-controller recompute: 0.86s (>90% reduced)
ovn-controller restart (recompute + reinstall OVS flows): 5s (>90% reduced)

Han Zhou (6):
  ovn-northd: Avoid ha_ref_chassis calculation when there is only one
    chassis in ha_chassis_group.
  binding.c: Refactor binding_handle_port_binding_changes.
  binding.c: Create a new function
    consider_patch_port_for_local_datapaths.
  ovn-sb.xml: Add the missing documentation for redirect-type.
  ovn-architecture: Add description of a limitation for distributed
    gateway ports.
  ovn-controller: Don't flood fill local datapaths beyond DGP boundary.

 controller/binding.c    | 142 ++++++++++++++++++++++++----------------
 controller/local_data.c |  66 ++++++++++++++++---
 controller/local_data.h |  13 +++-
 northd/ovn-northd.c     |  86 +++++++++++++++++++++---
 northd/ovn_northd.dl    |  32 ++++++++-
 ovn-architecture.7.xml  |  55 ++++++++++++++++
 ovn-nb.xml              |   6 ++
 ovn-sb.xml              |  25 +++++++
 tests/ovn-northd.at     |  11 +++-
 tests/ovn.at            |  67 +++++++++++++++++++
 10 files changed, 421 insertions(+), 82 deletions(-)

-- 
2.30.2



More information about the dev mailing list