[ovs-dev] [RFC PATCH ovn 0/4] Use Distributed Gateway Port for ovn-controller scalability.

Han Zhou hzhou at ovn.org
Fri Aug 13 23:35:18 UTC 2021


On Tue, Aug 10, 2021 at 9:43 AM Mark Gray <mark.d.gray at redhat.com> wrote:
>
> On 03/08/2021 19:33, Han Zhou wrote:
> > On Tue, Aug 3, 2021 at 11:09 AM Numan Siddique <numans at ovn.org> wrote:
> >>
> >> On Fri, Jul 30, 2021 at 3:22 AM Han Zhou <hzhou at ovn.org> wrote:
> >>>
> >>> Note: This patch series is on top of a pending patch that is still
under
> >>> review:
> >
http://patchwork.ozlabs.org/project/ovn/patch/20210729080218.1235041-1-hzhou@ovn.org/
> >>>
> >>> It is RFC because: a) it is based on the unmerged patch. b) DDlog
> >>> changes are not done yet. Below is a copy of the commit message of the
> > last
> >>> patch in this series:
> >>>
> >>> For a fully distributed virtual network dataplane, ovn-controller
> >>> flood-fills datapaths that are connected through patch ports. This
> >>> creates scale problems in ovn-controller when the connected datapaths
> >>> are too many.
> >>>
> >>> In a particular situation, when distributed gateway ports are used to
> >>> connect logical routers to logical switches, when there is no need for
> >>> distributed processing of those gateway ports (e.g. no dnat_and_snat
> >>> configured), the datapaths on the other side of the gateway ports are
> >>> not needed locally on the current chassis. This patch avoids pulling
> >>> those datapaths to local in those scenarios.
> >>>
> >>> There are two scenarios that can greatly benefit from this
optimization.
> >>>
> >>> 1) When there are multiple tenants, each has its own logical topology,
> >>>    but sharing the same external/provider networks, connected to their
> >>>    own logical routers with DGPs. Without this optimization, each
> >>>    ovn-controller would process all logical topology of all tenants
and
> >>>    program flows for all of them, even if there are only workloads of
a
> >>>    very few number of tenants on the node where the ovn-controller is
> >>>    running, because the shared external network connects all tenants.
> >>>    With this change, only the logical topologies relevant to the node
> >>>    are processed and programmed on the node.
> >>>
> >>> 2) In some deployments, such as ovn-kubernetes, logical switches are
> >>>    bound to chassises instead of distributed, because each chassis is
> >>>    assigned dedicated subnets. With the current implementation,
> >>>    ovn-controller on each node processes all logical switches and all
> >>>    ports on them, without knowing that they are not distributed at
all.
> >>>    At large scale with N nodes (N = hundreds or even more), there are
> >>>    roughly N times processing power wasted for the logical
connectivity
> >>>    related flows. With this change, those depolyments can utilize DGP
> >>>    to connect the node level logical switches to distributed
router(s),
> >>>    with gateway chassis (or HA chassis without really HA) of the DGP
> >>>    set to the chassis where the logical switch is bound. This
inherently
> >>>    tells OVN the mapping between logical switch and chassis, and
> >>>    ovn-controller would smartly avoid processing topologies of other
> > node
> >>>    level logical switches, which would hugely save compute cost of
each
> >>>    ovn-controller.
> >>>
> >>> For 2), test result for an ovn-kubernetes alike deployment shows
> >>> signficant improvement of ovn-controller, both CPU (>90% reduced) and
> > memory.
> >>>
> >>> Topology:
> >>>
> >>> - 1000 nodes, 1 LS with 10 LSPs per node, connected to a distributed
> >>>   router.
> >>>
> >>> - 2 large port-groups PG1 and PG2, each with 2000 LSPs
> >>>
> >>> - 10 stateful ACLs: 5 from PG1 to PG2, 5 from PG2 to PG1
> >>>
> >>> - 1 GR per node, connected to the distributed router through a join
> >>>   switch. Each GR also connects to an external logical switch per
node.
> >>>   (This part is to keep the test environment close to a real
> >>>    ovn-kubernetes setup but shouldn't make much difference for the
> >>>    comparison)
> >>>
> >>> ==== Before the change ====
> >>> OVS flows per node: 297408
> >>> ovn-controller memory: 772696 KB
> >>> ovn-controller recompute: 13s
> >>> ovn-controller restart (recompute + reinstall OVS flows): 63s
> >>>
> >>> ==== After the change (also use DGP to connect node level LSes) ====
> >>> OVS flows per node: 81139 (~70% reduced)
> >>> ovn-controller memory: 163464 KB (~80% reduced)
> >>> ovn-controller recompute: 0.86s (>90% reduced)
> >>> ovn-controller restart (recompute + reinstall OVS flows): 5s (>90%
> > reduced)
> >>
> >> Hi Han,
> >>
> >> Thanks for these RFC patches.  The improvements are significant.
> >> That's awesome.
> >>
> >> If I understand this RFC correctly, ovn-k8s will set the
> >> gateway_chassis for each logical
> >> router port of the cluster router (ovn_cluster_router) connecting to
> >> the node logical switch right ?
> >>
> >> If so, instead of using the multiple gw port feature, why can't
> >> ovn-k8s just set the chassis=<node_chassis_name>
> >> in the logical switch other_config option ?
> >>
> >> ovn-controllers can exclude the logical switches from the
> >> local_datapaths if they don't belong to the local chassis.
> >>
> >> I'm not entirely sure if this would work.  Any thoughts ?  If the same
> >> can be achieved using the chassis option
> >> instead of multiple gw router ports, perhaps the former seems better
> >> to me as it would be less work for ovn-k8s.
> >> And there will be fewer resources in SB DB.   What do you think ?
> >> Otherwise +1 from me for
> >> this RFC series.
> >>
> >
> > Thanks Numan for the feedback!
> > The reason why not introducing a new option in LS is:
> > 1) The multiple DGP support is a valuable feature regardless of the use
> > case of this RFC.
> > 2) Don't flood-fill beyond DGP port is also valuable regardless of the
> > ovn-k8s use case. As mentioned it would also help the OpenStack
scalability
> > when multi-tenant sharing same provider networks.
> > 3) If 1) and 2) are both implemented, there is no need for an extra
> > mechanism for "bind logical switches to chassis", because the outcome
of 1)
> > and 2) are sufficient. The changes in ovn-k8s would be the same, i.e.
set
> > the chassis somewhere, either to a LRP or a LS. I have sent a WIP PR to
the
> > ovn-k8s repo and it appears to be a very small change:
> > https://github.com/ovn-org/ovn-kubernetes/pull/2388
> >
> > In addition, a separate option on LS seems unnatural to me, because the
end
> > user must understand what they are doing by setting that option. In
>
> This is a great series Han. Although, I haven't looked into all the
> details. I think I disagree with this point. For me, at least, the idea
> of setting the chassis for a switch appears a more intuitive way of
> configuring this as it follows an established pattern (that we use for
> routers).

Thanks for the feedback. I think logical switches are different from
routers because for routers there are no VIF bindings, so assigning a
chassis to a router is clearly telling that the router pipelines will be
happening on the assigned chassis.
Logical switches mostly have VIF ports and they need to be attached to
chassises already, so assigning a chassis to a logical switch can easily
create conflict configurations. Probably that can be solved by
documentation to make sure users understand how to use it properly and
validations in the code to detect conflict configurations and write some
warning logs. Even though, it restricted the use case to only something
like ovn-k8s. There are scenarios that one LS connecting to multiple
routers that can benefit from this feature, too. In that case it is
inappropriate to set a logical switch to a single chassis. You might want
to bind them to multiple chassises then the configurations become way more
complex. While without any such settings, the settings provided by
distributed gateway port is already sufficient for all these scenarios with
the optimizations in this patch series. I'd rather document how to utilize
the distributed gateway port to effectively achieve the outcome of
attaching a logical switch to a chassis, rather than provide an extra
option just for a special use case while managing all the exceptions. If we
call the distributed gateway port something like "distributed chassis
redirect port" it may be more natural. But since the name has been used for
a long time I am hesitant to create more confusion now.

Anyway, I submitted the formal patches:
https://patchwork.ozlabs.org/project/ovn/list/?series=258021
It is rebased on the master and with the DDlog part added, and some other
minor enhancements. Please take a look.

Thanks,
Han
>
> > contrast, the DGP more flexibly and accurately tells what OVN should do.
> > Maybe the name "Distributed Gateway Port" is somehow confusing, but the
>
> Yes, this could be the case.
>
> > chassis-redirect port behind it is telling OVN that the user wants the
> > traffic to be redirected to a chassis for the LRP. There can be
different
> > scenarios such as a single LS connecting to multiple DGPs and vice
versa,
> > all are valid setups that can be supported by this feature. While
setting a
> > chassis option for a LS is arbitrary and it is easy to create conflict
> > setups, e.g. setting such an option for LS-join. Of course we can say
the
> > user is responsible for what they are setting, but I just don't see it
> > necessary for now.
> >
> > Does this make sense?
> >
> >> Thanks
> >> Numan
> >>
> >>>
> >>> Han Zhou (4):
> >>>   ovn-northd: Avoid ha_ref_chassis calculation when there is only one
> >>>     chassis in ha_chassis_group.
> >>>   binding.c: Refactor binding_handle_port_binding_changes.
> >>>   binding.c: Create a new function
> >>>     consider_patch_port_for_local_datapaths.
> >>>   ovn-controller: Don't flood fill local datapaths beyond DGP
boundary.
> >>>
> >>>  controller/binding.c   | 190
+++++++++++++++++++++++++++++------------
> >>>  northd/ovn-northd.c    |  39 +++++++--
> >>>  ovn-architecture.7.xml |  26 ++++++
> >>>  ovn-nb.xml             |   6 ++
> >>>  tests/ovn.at           |  67 +++++++++++++++
> >>>  5 files changed, 268 insertions(+), 60 deletions(-)
> >>>
> >>> --
> >>> 2.30.2
> >>>
> >>> _______________________________________________
> >>> dev mailing list
> >>> dev at openvswitch.org
> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>
> > _______________________________________________
> > dev mailing list
> > dev at openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
>


More information about the dev mailing list