[ovs-dev] [PATCH ovn] controller: Avoid unnecessary load balancer flow processing.

Han Zhou hzhou at ovn.org
Fri Jul 9 16:11:17 UTC 2021


On Fri, Jul 9, 2021 at 1:01 AM Dumitru Ceara <dceara at redhat.com> wrote:
>
> On 7/9/21 3:46 AM, Han Zhou wrote:
> > On Thu, Jul 8, 2021 at 1:10 PM Mark Michelson <mmichels at redhat.com>
wrote:
> >>
> >> Acked-by: Mark Michelson <mmichels at redhat.com>
> >>
> > Thanks Dumitru and Mark!
> >
>
> Hi Mark and Han,
>
> Thanks for the reviews!
>
> >>
> >> If I'm to extrapolate this a bit more, it sounds like any time a
> >> Datapath_Binding is updated, it results in the potential for many
> >> Logical_Flows to be re-calculated. In this particular case, it was
> >> triggered based on modifying a wide-reaching load balancer. I suppose
> >> the same thing would happen if Datapath_Binding external_ids were
updated.
> >>
> > Yes. Luckily it seems external_ids of datapath_binding shouldn't change
> > very often. But I agree this is a general problem of the I-P.
> >
>
> I see it in the same way.  I-P doesn't give per column change tracking
> information.
>
> >> I wonder if it's worth investigating trying to create a new reference
> >> type for the purposes of the IDL. The idea would be that if a column in
> >> table A references rows in table B, then if the rows in table B are
> >> updated, then any relevant rows in table A will reported as updated as
> >> well. However, any columns in tables C, D, E, etc. that reference table
> >> A will not result in the rows in tables C, D, E, etc. being reported as
> >> updated.
> >>
> >> Essentially, it's the same as what you've done here, except we can have
> >> the IDL fill in the sbrec_load_balancer in the Datapath_Binding record
> >> instead of having to look it up when needed.
> >>
> > It may be helpful to have a new reference type, but deciding when and
where
> > to use it would be very tricky. If not careful enough, we might be
creating
> > some dependency in the code without getting proper change notification.
> >
> >> Alternatively, would it be worth listing the relevant columns that a
> >> particular type "cares" about? For instance, Logical_Datapaths are only
> >> affected if the tunnel_key is updated on a Datapath_Binding. The
> >> load_balancers and external_ids are irrelevant. Would it be worthwhile
> >> to be able to note in the reference the columns that should elicit an
> >> update?
> >
> > This sounds more reliable. Ideally, if referenced columns are noted, the
> > generated IDL structure of A's field B should only contains the columns
> > that are interested by A. This way we can make sure it is impossible to
> > secretly use a column without being notified of updates that possibly
need
> > to be handled by I-P. I am not sure how hard this would be.
> >
>
> I think this is part of a larger discussion (there were some started a
> while back) about refactoring ovn-controller in general.  As far as I've
> seen there are multiple options:
>
> - keep using the IDL and enhance it (and the schema), suggested above
> - stop using IDL, use ovsdb-cs directly instead and implement a new type
> of incremental processing in ovn-controller.  This can be either ddlog
> on top of ovsdb-cs (like ovn-northd-ddlog does), or something else.
>
> >>
> >> Is it worth it? Or is this such a niche use case that it's not worth
> >> implementing?
> >
> > I think we can see if there are more such scenarios that justify the IDL
> > change. I am ok with changing the schema for the current use case to
> > replace the reference by just raw uuid. However, we should never
directly
> > use IDL from the ctx and access any table, this would break the I-P
> > dependency tracking. Please see more comments inlined.
> >
>
> In the general case I tend to agree but in this specific case
> Datapath_Binding doesn't really depend on Load_Balancer, it just uses
> those uuids as hints.  Please see my replies inline.
>
> >>
> >> On 7/7/21 10:56 AM, Dumitru Ceara wrote:
> >>> Whenever a Load_Balancer is updated, e.g., a VIP is added, the
following
> >>> sequence of events happens:
> >>>
> >>> 1. The Southbound Load_Balancer record is updated.
> >>> 2. The Southbound Datapath_Binding records on which the Load_Balancer
is
> >>>     applied are updated.
> >>> 3. Southbound ovsdb-server sends updates about the Load_Balancer and
> >>>     Datapath_Binding records to ovn-controller.
> >>> 4. The IDL layer in ovn-controller processes the updates at #3, but
> >>>     because of the SB schema references between tables [0] all logical
> >>>     flows referencing the updated Datapath_Binding are marked as
> >>>     "updated".  The same is true for Logical_DP_Group records
> >>>     referencing the Datapath_Binding, and also for all logical flows
> >>>     pointing to the new "updated" datapath groups.
> >>> 5. ovn-controller ends up recomputing (removing/readding) all flows
for
> >>>     all these tracked updates.
> >>>
> >>>  From the SB Schema:
> >>>          "Datapath_Binding": {
> >>>              "columns": {
> >>>                  [...]
> >>>                  "load_balancers": {"type": {"key": {"type": "uuid",
> >>>                                                     "refTable":
> > "Load_Balancer",
> >>>                                                     "refType":
"weak"},
> >>>                                              "min": 0,
> >>>                                              "max": "unlimited"}},
> >>>          [...]
> >>>          "Load_Balancer": {
> >>>              "columns": {
> >>>                  "datapaths": {
> >>>                  [...]
> >>>                      "type": {"key": {"type": "uuid",
> >>>                                       "refTable": "Datapath_Binding"},
> >>>                               "min": 0, "max": "unlimited"}},
> >>>          [...]
> >>>          "Logical_DP_Group": {
> >>>              "columns": {
> >>>                  "datapaths":
> >>>                      {"type": {"key": {"type": "uuid",
> >>>                                        "refTable": "Datapath_Binding",
> >>>                                        "refType": "weak"},
> >>>                                "min": 0, "max": "unlimited"}}},
> >>>          [...]
> >>>          "Logical_Flow": {
> >>>              "columns": {
> >>>                  "logical_datapath":
> >>>                      {"type": {"key": {"type": "uuid",
> >>>                                        "refTable":
"Datapath_Binding"},
> >>>                                "min": 0, "max": 1}},
> >>>                  "logical_dp_group":
> >>>                      {"type": {"key": {"type": "uuid",
> >>>                                        "refTable":
"Logical_DP_Group"},
> >>>
> >>> In order to avoid this unnecessary Logical_Flow notification storm we
> >>> now remove the explicit reference from Datapath_Binding to
> >>> Load_Balancer and instead store raw UUIDs.
> >>>
> >>> This means that on the ovn-controller side we need to perform a
> >>> Load_Balancer table lookup by UUID whenever a new datapath is added,
> >>> but that doesn't happen too often and the cost of the lookup is
> >>> negligible compared to the huge cost of processing the unnecessary
> >>> logical flow updates.
> >>>
> >>> This change is backwards compatible because the contents stored in the
> >>> database are not changed, just that the schema constraints are
relaxed a
> >>> bit.
> >>>
> >>> Some performance measurements, on a scale test deployment simulating
an
> >>> ovn-kubernetes deployment with 120 nodes and a large load balancer
> >>> with 16K VIPs associated to each node's logical switch, the event
> >>> processing loop time in ovn-controller, when adding a new VIP, is
> >>> reduced from ~39 seconds to ~8 seconds.
> >>>
> >>> There's no need to change the northd DDlog implementation.
> >>>
> >>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1978605
> >>> Signed-off-by: Dumitru Ceara <dceara at redhat.com>
> >>> ---
> >>>   controller/lflow.c          |  6 ++++--
> >>>   controller/lflow.h          |  2 ++
> >>>   controller/ovn-controller.c |  2 ++
> >>>   lib/inc-proc-eng.h          |  1 +
> >>>   northd/ovn-northd.c         | 14 ++++++--------
> >>>   ovn-sb.ovsschema            |  6 ++----
> >>>   6 files changed, 17 insertions(+), 14 deletions(-)
> >>>
> >>> diff --git a/controller/lflow.c b/controller/lflow.c
> >>> index 60aa011ff..877bd49b0 100644
> >>> --- a/controller/lflow.c
> >>> +++ b/controller/lflow.c
> >>> @@ -1744,8 +1744,10 @@ lflow_processing_end:
> >>>       /* Add load balancer hairpin flows if the datapath has any load
> > balancers
> >>>        * associated. */
> >>>       for (size_t i = 0; i < dp->n_load_balancers; i++) {
> >>> -        consider_lb_hairpin_flows(dp->load_balancers[i],
> >>> -                                  l_ctx_in->local_datapaths,
> >>> +        const struct sbrec_load_balancer *lb =
> >>> +            sbrec_load_balancer_get_for_uuid(l_ctx_in->sb_idl,
> >>> +                                             &dp->load_balancers[i]);
> >
> > We should never directly access DB tables this way within the I-P
engine.
> > We will lose track of the dependencies. Now that we removed the
reference,
> > we should add load_balancer table as a new dependency, get the table
data
> > through EN_OVSDB_GET, and explicitly add a change handler to avoid
> > recompute.
> >
>
> There's some confusion here.
>
> Datapath doesn't really depend on Load_Balancer.  From the beginning
> there shouldn't have been a reference from Datapath_Binding to
> Load_Balancer.
>
> As a matter of fact NB.Load_Balancer used to be completely handled by
> ovn-northd and would generate logical flows.  So the real dependency is
> "en_lflow_output" depends on SB load balancers.
>
> We already have an I-P handler for that:
> lflow_output_sb_load_balancer_handler()
>
> We don't *really* need to store load_balancer references in
> Datapath_Binding.  The only reason they were added there when
> Load_Balancer flow generation was moved to ovn-controller was to avoid
> walking all load balancers whenever a datapath was added.
>
> However, when a load balancer is changed there's no change to be handled
> for datapaths themselves.
>
> As a matter of fact, an alternative implementation (that obviously
> doesn't scale well) is:
>
> SBREC_LOAD_BALANCER_TABLE_FOR_EACH (lb, l_ctx_in->lb_table) {
>    for (size_t i = 0; i < lb->n_datapaths; i++) {
>       if (lb->datapaths[i] == dp) {
>           consider_lb_hairpin_flows(lb, ...);
>           break;
>       }
>    }
> }
>
> To avoid this potentially expensive table walk, we use the load_balancer
> uuids stored in the datapath record itself (it's probably best to see
> those as hints I guess).
>
Thanks for the explain. What you described is indeed a dependency between
lflow and sb_load_balancer because in lflow's compute/change handlers
sb_load_balancer data is required. (otherwise we would not need to call
sbrec_load_balancer_get_for_uuid().

However, since this dependency is already captured in the I-P, it is just
easy for this use case. We should simply use
sbrec_load_balancer_table_get_for_uuid() instead, which takes struct
sbrec_load_balancer_table* as argument and we already have it in the
lflow_ctx_in.lb_table as the input to lflow engine node.

> >>> +        consider_lb_hairpin_flows(lb, l_ctx_in->local_datapaths,
> >>>                                     l_ctx_out->flow_table);
> >>>       }
> >>>
> >>> diff --git a/controller/lflow.h b/controller/lflow.h
> >>> index c17ff6dd4..5ba7af894 100644
> >>> --- a/controller/lflow.h
> >>> +++ b/controller/lflow.h
> >>> @@ -40,6 +40,7 @@
> >>>   #include "openvswitch/list.h"
> >>>
> >>>   struct ovn_extend_table;
> >>> +struct ovsdb_idl;
> >>>   struct ovsdb_idl_index;
> >>>   struct ovn_desired_flow_table;
> >>>   struct hmap;
> >>> @@ -126,6 +127,7 @@ void lflow_resource_destroy(struct
> > lflow_resource_ref *);
> >>>   void lflow_resource_clear(struct lflow_resource_ref *);
> >>>
> >>>   struct lflow_ctx_in {
> >>> +    struct ovsdb_idl *sb_idl;
> >>>       struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath;
> >>>       struct ovsdb_idl_index *sbrec_logical_flow_by_logical_datapath;
> >>>       struct ovsdb_idl_index *sbrec_logical_flow_by_logical_dp_group;
> >>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> >>> index 9050380f3..24443bad1 100644
> >>> --- a/controller/ovn-controller.c
> >>> +++ b/controller/ovn-controller.c
> >>> @@ -2016,6 +2016,7 @@ init_lflow_ctx(struct engine_node *node,
> >>>           engine_get_input_data("port_groups", node);
> >>>       struct shash *port_groups = &pg_data->port_groups_cs_local;
> >>>
> >>> +    l_ctx_in->sb_idl = engine_get_context()->ovnsb_idl;
> >>>       l_ctx_in->sbrec_multicast_group_by_name_datapath =
> >>>           sbrec_mc_group_by_name_dp;
> >>>       l_ctx_in->sbrec_logical_flow_by_logical_datapath =
> >>> @@ -3124,6 +3125,7 @@ main(int argc, char *argv[])
> >>>           }
> >>>
> >>>           struct engine_context eng_ctx = {
> >>> +            .ovnsb_idl = ovnsb_idl_loop.idl,
> >>>               .ovs_idl_txn = ovs_idl_txn,
> >>>               .ovnsb_idl_txn = ovnsb_idl_txn,
> >>>               .client_ctx = &ctrl_engine_ctx
> >>> diff --git a/lib/inc-proc-eng.h b/lib/inc-proc-eng.h
> >>> index 7e9f5bb70..506da51a1 100644
> >>> --- a/lib/inc-proc-eng.h
> >>> +++ b/lib/inc-proc-eng.h
> >>> @@ -64,6 +64,7 @@
> >>>   #define ENGINE_MAX_OVSDB_INDEX 256
> >>>
> >>>   struct engine_context {
> >>> +    struct ovsdb_idl *ovnsb_idl;
> >>>       struct ovsdb_idl_txn *ovs_idl_txn;
> >>>       struct ovsdb_idl_txn *ovnsb_idl_txn;
> >>>       void *client_ctx;
> >>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> >>> index 570c6a3ef..8649a590b 100644
> >>> --- a/northd/ovn-northd.c
> >>> +++ b/northd/ovn-northd.c
> >>> @@ -3529,19 +3529,17 @@ build_ovn_lbs(struct northd_context *ctx,
> > struct hmap *datapaths,
> >>>               continue;
> >>>           }
> >>>
> >>> -        const struct sbrec_load_balancer **sbrec_lbs =
> >>> -            xmalloc(od->nbs->n_load_balancer * sizeof *sbrec_lbs);
> >>> +        struct uuid *lb_uuids =
> >>> +            xmalloc(od->nbs->n_load_balancer * sizeof *lb_uuids);
> >>>           for (size_t i = 0; i < od->nbs->n_load_balancer; i++) {
> >>>               const struct uuid *lb_uuid =
> >>>                   &od->nbs->load_balancer[i]->header_.uuid;
> >>>               lb = ovn_northd_lb_find(lbs, lb_uuid);
> >>> -            sbrec_lbs[i] = lb->slb;
> >>> +            lb_uuids[i] = lb->slb->header_.uuid;
> >>>           }
> >>> -
> >>> -        sbrec_datapath_binding_set_load_balancers(
> >>> -            od->sb, (struct sbrec_load_balancer **)sbrec_lbs,
> >>> -            od->nbs->n_load_balancer);
> >>> -        free(sbrec_lbs);
> >>> +        sbrec_datapath_binding_set_load_balancers(od->sb, lb_uuids,
> >>> +
> >  od->nbs->n_load_balancer);
> >>> +        free(lb_uuids);
> >
> > I believe we need to change the deletion part as well. Now that we don't
> > have the weak reference, we need to take care of the LB deletions by
> > removing the reference explicitly from table datapath_binding.
>
> This is handling both addition and deletion, it's just storing the list
> of load balancers that are currently associated to the logical datapath.
>
> If a load balancer was deleted it won't be in the list.
>
> We already have tests validating this works fine:
>
>
https://github.com/ovn-org/ovn/blob/ecc941c70f1675b66f6ecb2cdf09df362f62c97a/tests/ovn-northd.at#L2432
>
Oh you are right. Sorry that my brain was somehow wired with I-P when
reviewing this.

> >
> > Thanks,
> > Han
> >
> >
>
> Thanks,
> Dumitru
>
> >>>       }
> >>>   }
> >>>
> >>> diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
> >>> index bbf60781d..57fbc3ca4 100644
> >>> --- a/ovn-sb.ovsschema
> >>> +++ b/ovn-sb.ovsschema
> >>> @@ -1,7 +1,7 @@
> >>>   {
> >>>       "name": "OVN_Southbound",
> >>>       "version": "20.18.0",
> >>> -    "cksum": "1816525029 26536",
> >>> +    "cksum": "779623696 26386",
> >>>       "tables": {
> >>>           "SB_Global": {
> >>>               "columns": {
> >>> @@ -166,9 +166,7 @@
> >>>                        "type": {"key": {"type": "integer",
> >>>                                         "minInteger": 1,
> >>>                                         "maxInteger": 16777215}}},
> >>> -                "load_balancers": {"type": {"key": {"type": "uuid",
> >>> -                                                   "refTable":
> > "Load_Balancer",
> >>> -                                                   "refType":
"weak"},
> >>> +                "load_balancers": {"type": {"key": {"type": "uuid"},
> >>>                                               "min": 0,
> >>>                                               "max": "unlimited"}},
> >>>                   "external_ids": {
> >>>
> >>
> >> _______________________________________________
> >> dev mailing list
> >> dev at openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
>


More information about the dev mailing list