[ovs-dev] [patch_v10] ovn: Add datapaths of interest filtering.

Darrell Ball dball at vmware.com
Wed Dec 7 03:53:24 UTC 2016



On 12/4/16, 9:48 PM, "ovs-dev-bounces at openvswitch.org on behalf of Mickey Spiegel" <ovs-dev-bounces at openvswitch.org on behalf of mickeys.dev at gmail.com> wrote:

    On Sun, Dec 4, 2016 at 4:13 PM, Darrell Ball <dlu998 at gmail.com> wrote:
    
    > This patch adds datapaths of interest support where only datapaths of
    > local interest are monitored by the ovn-controller ovsdb client.  The
    > idea is to do a flood fill in ovn-controller of datapath associations
    > calculated by northd. A new column is added to the SB database
    > datapath_binding table - related_datapaths to facilitate this so all
    > datapaths associations are known quickly in ovn-controller.  This
    > allows monitoring to adapt quickly with a single new monitor setting
    > for all datapaths of interest locally.
    >
    
    This is getting close. I have a few questions and comments inline.
    
    >
    > We monitor both logical flows and logical ports. The mac bindings
    > will likely go away and multicast groups are relatively few.
    > To reduce risk and overall latency we only suppress logical flows as
    > a starting point. Logical ports are not suppressed as a starting
    > point and but then are monitor suppressed as appropriate from that
    > point on. The initial suppression has little practical gain and not
    > doing it allows for us to be flexible in terms of how different port
    > types are supported.
    >
    
    I lack familiarity with conditional monitoring behavior when you do
    not explicitly suppress as a starting point. A new hv comes up, and
    starts downloading stuff from SB, including port_bindings. At some
    point the first VIF on that hv comes up, as a result of which the hv's
    controller calls sbrec_port_binding_add_clause_logical_port.
    Do all other port_bindings get suppressed at that point?
    If so, won't this lead to different behavior between the first time a
    VIF comes up on a hv, and subsequent additions of VIFs on
    unrelated and previously unseen datapaths?
    Wouldn't the behavior be more consistent if you suppress from the
    beginning?

I commented on this before; I wanted to push this part until later; but fine, let us push it now.
The issue is l2gateway logical ports which may legitimately exist as the only port in a LS datapath
on a given HV.
L2gateway LPs are only known on a HV when the LP l2gateway-chassis option is downloaded to a HV.
If we want initial suppression of LPs, then we need to do something special for l2gateway ports.
For now, monitor all l2gateway ports unconditionally.
L2gateways are both a logical port and a chassis; not a datapath as in the case of l3gateway
 (Possibly later, a physical HV level external_id could be used to specify the presence of the l2gateway logical port
similarly to VIFs.).
The number of l2gateway ports could be on the order of tenants, at least.

    
    >
    > To validate the gain, we take the approach to verify the l3gateway
    > router flow suppression from the non-l3gateway HVs POV and also add
    > a separate test to verify the flow and port handling scale advantages.
    > The separate test added verifies the logical port and logical flow
    > suppression advantage. This measures the difference in flows and ports
    > sent to the HV clients where the far majority of the overall work is.
    > The less ports and flows that are unnecessarily processed on the HVs,
    > the more benefit, excluding latency skew. The separate test is
    > simple for interpretation and shows an order of magnitude advantage;
    > the test uses only logical switches. The advantage would increase with
    > logical routers. The test is conservative in estimating the numerical
    > advantage to increase the probability of the check passing the first
    > time, since there are some timing conssiderations that affect the numbers.
    >
    
    This description should be updated to mention the distributed
    logical router tests that you added.


sure
    
    >
    > Liran Schour contributed enhancements to the conditional
    > monitoring granularity in ovs and also submitted patches
    > for ovn usage of conditional monitoring which aided this patch
    > though sharing of concepts through code review work.
    >
    > Ben Pfaff suggested that northd could be used to pre-populate
    > related datapaths for ovn-controller to use.  That idea is
    > used as part of this patch.
    >
    > CC: Ben Pfaff <blp at ovn.org>
    > CC: Liran Schour <LIRANS at il.ibm.com>
    > Signed-off-by: Darrell Ball <dlu998 at gmail.com>
    > ---
    >
    > v9->v10: Fix ordering issue on northd as pointed out by Mickey.
    >
    >          Remove patch flood fill triggering optimization to
    >          make the code easier to verify and harder to break.
    >
    >          Fix a couple issues on ovn-controller side, including
    >          stale checking.
    >
    >          Add lots more focused testing - now 2 separate extensive
    >          tests and beefed up l3gateway test.
    >
    >          Remove dependency on port filtering for l3gateways filtering
    >          and associate l3gateway chassis name as part of datapath,
    >          similarly to logical ports. Some agreement on adding this
    >          extra field to the datapath binding table.
    >
    
    I certainly agree. It is good to have the filtering depend only
    on the contents of datapath_binding, with no dependencies
    on port_bindings. This way there is no issue with ordering
    dependencies when filtering port_bindings.
    
    >
    > v8->v9: Reinstate logical port monitoring from patch V3
    >         now to handle some LP high influence topologies, especially
    >         in lieu of a lack of incremental processing right now.
    >         The original plan was to forgo monitoring logical ports
    >         in the first commit, but without incremental processing,
    >         port processing gains importance.
    >
    >         Parse out some functions
    >
    >         Add a separate test for conditional monitoring to
    >         guesstimate the reduction in logical port and flow events
    >         seen at the HVs.
    >
    > v7->v8: Start wth logical flow conditional monitoring off.
    >         Remove deprecated code.
    >         Recover SB DB version number change.
    >         Change test to be more definitive.
    >
    > v6->v7: Rebase
    >
    > v5->v6: Rebase; fix stale handling.
    >
    > v4->v5: Correct cleanup of monitors.
    >         Fix warning.
    >
    > v3->v4: Refactor after incremental processing backout.
    >         Limit filtering to logical flows to limit risk.
    >
    > v2->v3: Line length violation fixups :/
    >
    > v1->v2: Added logical port removal monitoring handling, factoring
    >         in recent changes for incremental processing.  Added some
    >         comments in the code regarding initial conditions for
    >         database conditional monitoring.
    >
    >  ovn/controller/binding.c        |  32 +++-
    >  ovn/controller/lflow.c          |   5 +
    >  ovn/controller/ovn-controller.c | 188 ++++++++++++++++++++
    >  ovn/controller/ovn-controller.h |  10 ++
    >  ovn/controller/patch.c          |  24 +++
    >  ovn/northd/ovn-northd.c         | 121 +++++++++++++
    >  ovn/ovn-sb.ovsschema            |  16 +-
    >  ovn/ovn-sb.xml                  |  13 ++
    >  tests/ovn.at                    | 371 ++++++++++++++++++++++++++++++
    > +++++++++-
    >  9 files changed, 770 insertions(+), 10 deletions(-)
    >
    > diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
    > index fb76032..f24e2e6 100644
    > --- a/ovn/controller/binding.c
    > +++ b/ovn/controller/binding.c
    > @@ -68,7 +68,8 @@ static void
    >  get_local_iface_ids(const struct ovsrec_bridge *br_int,
    >                      struct shash *lport_to_iface,
    >                      struct sset *all_lports,
    > -                    struct sset *egress_ifaces)
    > +                    struct sset *egress_ifaces,
    > +                    const struct controller_ctx *ctx)
    >  {
    >      int i;
    >
    > @@ -88,6 +89,9 @@ get_local_iface_ids(const struct ovsrec_bridge *br_int,
    >              iface_id = smap_get(&iface_rec->external_ids, "iface-id");
    >
    >              if (iface_id) {
    > +                sbrec_port_binding_add_clause_
    > logical_port(ctx->ovnsb_idl,
    > +                                                           OVSDB_F_EQ,
    > +                                                           iface_id);
    >                  shash_add(lport_to_iface, iface_id, iface_rec);
    >                  sset_add(all_lports, iface_id);
    >              }
    > @@ -106,8 +110,13 @@ get_local_iface_ids(const struct ovsrec_bridge
    > *br_int,
    >
    >  static void
    >  add_local_datapath(struct hmap *local_datapaths,
    > -        const struct sbrec_port_binding *binding_rec)
    > +        const struct sbrec_port_binding *binding_rec,
    > +        const struct controller_ctx *ctx,
    > +        const struct sbrec_chassis *chassis_rec)
    >  {
    > +
    > +    do_datapath_flood_fill(ctx, binding_rec->datapath, chassis_rec);
    > +
    >
    
    In previous versions, this was at the end of add_local_datapath.
    In this version you moved it to the beginning.
    Is there an advantage in running flood fill when the second, third,
    fourth, etc ports are added for the same datapath?
    It does add processing.
    
All tests pass either way.

I moved it here because conceptually, I don’t want the local_datapaths check stopping
datapaths of interest checking. 
i.e. I don’t want any shortcircuiting of one feature by another.

    
    >      if (get_local_datapath(local_datapaths,
    >                             binding_rec->datapath->tunnel_key)) {
    >          return;
    > @@ -293,7 +302,7 @@ consider_local_datapath(struct controller_ctx *ctx,
    >              /* Add child logical port to the set of all local ports. */
    >              sset_add(all_lports, binding_rec->logical_port);
    >          }
    > -        add_local_datapath(local_datapaths, binding_rec);
    > +        add_local_datapath(local_datapaths, binding_rec, ctx,
    > chassis_rec);
    >          if (iface_rec && qos_map && ctx->ovs_idl_txn) {
    >              get_qos_params(binding_rec, qos_map);
    >          }
    > @@ -328,7 +337,7 @@ consider_local_datapath(struct controller_ctx *ctx,
    >          }
    >
    >          sset_add(all_lports, binding_rec->logical_port);
    > -        add_local_datapath(local_datapaths, binding_rec);
    > +        add_local_datapath(local_datapaths, binding_rec, ctx,
    > chassis_rec);
    >          if (binding_rec->chassis == chassis_rec) {
    >              return;
    >          }
    > @@ -342,7 +351,7 @@ consider_local_datapath(struct controller_ctx *ctx,
    >          const char *chassis = smap_get(&binding_rec->options,
    >                                         "l3gateway-chassis");
    >          if (!strcmp(chassis, chassis_rec->name) && ctx->ovnsb_idl_txn) {
    > -            add_local_datapath(local_datapaths, binding_rec);
    > +            add_local_datapath(local_datapaths, binding_rec, ctx,
    > chassis_rec);
    >          }
    >      } else if (chassis_rec && binding_rec->chassis == chassis_rec) {
    >          if (ctx->ovnsb_idl_txn) {
    > @@ -351,6 +360,9 @@ consider_local_datapath(struct controller_ctx *ctx,
    >              for (int i = 0; i < binding_rec->n_mac; i++) {
    >                  VLOG_INFO("Releasing %s", binding_rec->mac[i]);
    >              }
    > +            sbrec_port_binding_remove_clause_logical_port(
    > +                ctx->ovnsb_idl, OVSDB_F_EQ, binding_rec->logical_port);
    > +
    >              sbrec_port_binding_set_chassis(binding_rec, NULL);
    >              sset_find_and_delete(all_lports, binding_rec->logical_port);
    >          }
    > @@ -381,13 +393,21 @@ binding_run(struct controller_ctx *ctx, const struct
    > ovsrec_bridge *br_int,
    >      hmap_init(&qos_map);
    >      if (br_int) {
    >          get_local_iface_ids(br_int, &lport_to_iface, all_lports,
    > -                            &egress_ifaces);
    > +                            &egress_ifaces, ctx);
    >      }
    >
    > +    set_all_datapaths_of_interest_stale();
    > +
    >      /* Run through each binding record to see if it is resident on this
    >       * chassis and update the binding accordingly.  This includes both
    >       * directly connected logical ports and children of those ports. */
    >      SBREC_PORT_BINDING_FOR_EACH(binding_rec, ctx->ovnsb_idl) {
    > +
    > +        /* Set any datapath, used by any port and that we are monitoring
    > as used,
    >
    
    This line is too long.

fixed
    
    
    > +         * so we don't remove the monitoring.
    > +         */
    > +        set_datapath_of_interest_used(binding_rec->datapath);
    > +
    >
    
    This seems too broad. This would set the datapath as used if
    you have received any SB port_bindings for that datapath, even
    if those port_bindings are not relevant to this chassis.

It is not broad; it checks the datapath of interest DB and if a datapath is monitored
it sets it to NOT STALE; otherwise it changes nothing.

    How would datapaths ever be removed?

Datapaths get removed when no logical port refers to them. This is the definition of a stale datapath – 
so I know it has no dependencies.

    Why isn't the stale flag processing in add_datapath_of_interest
    enough?
    Was there some corner case that was missed?

I think the code in consider_local_datapath() could filter calls to
add_local_datapath(); one of the examples is if we don’t have a transaction ?

        if (!strcmp(chassis, chassis_rec->name) && ctx->ovnsb_idl_txn) {

I don’t want to have a dependency on something that should be irrelevant.

    
             consider_local_datapath(ctx, chassis_rec, binding_rec,
    >                                  sset_is_empty(&egress_ifaces) ? NULL :
    >                                  &qos_map, local_datapaths,
    > &lport_to_iface,
    > diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
    > index 4e67365..765eae4 100644
    > --- a/ovn/controller/lflow.c
    > +++ b/ovn/controller/lflow.c
    > @@ -167,6 +167,11 @@ consider_logical_flow(const struct lport_index
    > *lports,
    >      if (!ldp) {
    >          return;
    >      }
    > +
    > +    VLOG_DBG("consider logical flow: tunnel_key %lu "
    > +             "\"match\" \"%s\", \"actions\" \"%s\"",
    > +             ldp->tunnel_key, lflow->match, lflow->actions);
    > +
    >      if (is_switch(ldp)) {
    >          /* For a logical switch datapath, local_datapaths tells us if
    > there
    >           * are any local ports for this datapath.  If not, we can skip
    > diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-
    > controller.c
    > index fea7841..c06bd81 100644
    > --- a/ovn/controller/ovn-controller.c
    > +++ b/ovn/controller/ovn-controller.c
    > @@ -69,6 +69,14 @@ OVS_NO_RETURN static void usage(void);
    >
    >  static char *ovs_remote;
    >
    > +struct dp_of_interest_node {
    > +    struct hmap_node hmap_node;
    > +    const struct sbrec_datapath_binding *sb_db;
    > +    bool stale;
    > +};
    > +
    > +static struct hmap dps_of_interest = HMAP_INITIALIZER(&dps_of_interest);
    > +
    >  struct local_datapath *
    >  get_local_datapath(const struct hmap *local_datapaths, uint32_t
    > tunnel_key)
    >  {
    > @@ -128,6 +136,181 @@ get_bridge(struct ovsdb_idl *ovs_idl, const char
    > *br_name)
    >      return NULL;
    >  }
    >
    > +static bool
    > +remote_l3gat_lrd_port(const struct sbrec_datapath_binding *dp,
    > +                             const struct sbrec_chassis *chassis_rec)
    > +{
    > +    const char *l3gateway_chassis = smap_get(&dp->options,
    > +                                             "l3gateway-chassis");
    > +    if(l3gateway_chassis){
    > +        if(strcmp(l3gateway_chassis, chassis_rec->name)){
    > +            return true;
    > +        }
    > +    }
    > +    return false;
    > +}
    > +
    > +void
    > +set_all_datapaths_of_interest_stale(void)
    > +{
    > +    struct dp_of_interest_node *node;
    > +    HMAP_FOR_EACH(node, hmap_node, &dps_of_interest) {
    > +            node->stale = true;
    > +    }
    > +}
    > +
    > +static void
    > +remove_datapath_of_interest(struct controller_ctx *ctx,
    > +                            const struct sbrec_datapath_binding *dp)
    >
    
    The only call to remove_datapath_of_interest is from
    clean_stale_datapaths_of_interest.
    Why not pass in dp_cur_node rather than dp, and avoid
    the hash lookup?
    Or just put all the logic in
    clean_stale_datapaths_of_interest?

Actually, remove_datapath_of_interest() is left over from a transition in coding
when it had an external caller as well.
Hence, remove_datapath_of_interest() is deprecated now; it is gone.

    
    > +{
    > +
    > +    struct dp_of_interest_node *node;
    > +    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,
    > uuid_hash(&dp->header_.uuid),
    > +                             &dps_of_interest) {
    > +        if (ctx && uuid_equals(&node->sb_db->header_.uuid,
    > &dp->header_.uuid)) {
    > +
    > +            sbrec_port_binding_remove_clause_datapath(ctx->ovnsb_idl,
    > +                                                      OVSDB_F_EQ,
    > +                                                      &dp->header_.uuid);
    > +            sbrec_logical_flow_remove_clause_logical_datapath(
    > +                ctx->ovnsb_idl, OVSDB_F_EQ, &dp->header_.uuid);
    > +            hmap_remove(&dps_of_interest, &node->hmap_node);
    > +            free(node);
    > +            return;
    > +        }
    > +    }
    > +}
    > +
    > +static void
    > +clean_stale_datapaths_of_interest(struct controller_ctx *ctx)
    > +{
    > +    struct dp_of_interest_node *dp_cur_node, *dp_next_node;
    > +    HMAP_FOR_EACH_SAFE (dp_cur_node, dp_next_node, hmap_node,
    > +                        &dps_of_interest) {
    > +        if(dp_cur_node->stale == true) {
    > +            remove_datapath_of_interest(ctx, dp_cur_node->sb_db);
    > +        }
    > +    }
    > +}
    > +
    > +void
    > +set_datapath_of_interest_used(const struct sbrec_datapath_binding *dp)
    > +{
    > +
    > +    struct dp_of_interest_node *node;
    > +    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,
    > uuid_hash(&dp->header_.uuid),
    > +                             &dps_of_interest) {
    > +        if (uuid_equals(&node->sb_db->header_.uuid, &dp->header_.uuid)) {
    > +            node->stale = false;
    > +        }
    > +    }
    > +}
    > +
    > +static void
    > +add_datapath_of_interest(const struct controller_ctx *ctx,
    > +                         const struct sbrec_datapath_binding *dp)
    > +{
    > +    struct dp_of_interest_node *node;
    > +
    > +    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,
    > uuid_hash(&dp->header_.uuid),
    > +                             &dps_of_interest) {
    > +        if (uuid_equals(&node->sb_db->header_.uuid, &dp->header_.uuid)) {
    > +            node->stale = false;
    > +            return;
    > +        }
    > +    }
    > +
    > +    VLOG_DBG("monitor datapath with tunnel_key %lu", dp->tunnel_key);
    > +
    > +    sbrec_logical_flow_add_clause_logical_datapath(ctx->ovnsb_idl,
    > +                                                   OVSDB_F_EQ,
    > +                                                   &dp->header_.uuid);
    > +    sbrec_port_binding_add_clause_datapath(ctx->ovnsb_idl, OVSDB_F_EQ,
    > +                                           &dp->header_.uuid);
    > +
    > +    node = xzalloc(sizeof *node);
    > +    hmap_insert(&dps_of_interest, &node->hmap_node,
    > +                uuid_hash(&dp->header_.uuid));
    > +    node->sb_db = dp;
    > +    return;
    > +}
    > +
    > +void
    > +do_datapath_flood_fill(const struct controller_ctx *ctx,
    > +                              const struct sbrec_datapath_binding
    > *dp_start,
    > +                                          const struct sbrec_chassis
    > *chassis_rec)
    > +{
    > +
    > +    if(remote_l3gat_lrd_port(dp_start, chassis_rec)) {
    > +        return;
    > +    }
    > +
    > +    struct dp_checked {
    > +        struct hmap_node hmap_node;
    > +        const struct sbrec_datapath_binding *sb_db;
    > +    };
    > +    struct dp_checked *dp_node;
    > +    struct hmap related_dp_processed = HMAP_INITIALIZER(&related_dp_
    > processed);
    >
    
    This third level of lists is not easy to follow, having
    dps_of_interest

    related_dp_processed
This map is to remember what we saw and not follow it again
during processing one list.

    interest_datapaths


    It looks like this is similar to the flood_fill placement in
    add_local_datapath.
    Should each datapath be considered once in each run
    through the main ovn-controller.c loop, such that
    different flood_fill runs going through the same
    datapaths do not investigate any further (as in v9)?
    Or should each flood_fill run to completion,
    recalculating the entire set of related datapaths each
    time that flood_fill is called, even if many of those
    datapaths are already present in dps_of_interest
    (as in this v10)?

Yes, I changed it from v9 to v10.
The number of datapaths is small relative to the number of ports, so it is
only slightly more expensive. 
The issue is premature cleanup due to partial port bindings at startup - I’ll investigate this more later.
Also, in general, I don’t want to make any assumptions about input data that I don’t have to.

    +
    > +    struct interest_dps_list_elem {
    > +        struct ovs_list list_node;
    > +        const struct sbrec_datapath_binding * dp;
    > +    };
    > +    struct ovs_list interest_datapaths;
    > +    ovs_list_init(&interest_datapaths);
    > +
    > +    struct interest_dps_list_elem *dp_list_elem =
    > +        xzalloc(sizeof *dp_list_elem);
    > +    dp_list_elem->dp = dp_start;
    > +    ovs_list_push_back(&interest_datapaths, &dp_list_elem->list_node);
    > +
    > +    while (!ovs_list_is_empty(&interest_datapaths)) {
    > +
    > +        struct ovs_list *list_node_ptr =
    > +            ovs_list_pop_front(&interest_datapaths);
    > +        dp_list_elem = CONTAINER_OF(list_node_ptr,
    > +            struct interest_dps_list_elem, list_node);
    > +           size_t num_related_datapaths =
    > +            dp_list_elem->dp->n_related_datapaths;
    > +           struct sbrec_datapath_binding **related_datapaths =
    > +            dp_list_elem->dp->related_datapaths;
    > +
    > +        dp_node = xzalloc(sizeof *dp_node);
    > +        dp_node->sb_db = dp_list_elem->dp;
    > +        hmap_insert(&related_dp_processed, &dp_node->hmap_node,
    > +                     uuid_hash(&dp_node->sb_db->header_.uuid));
    > +
    > +        add_datapath_of_interest(ctx, dp_list_elem->dp);
    > +        free(dp_list_elem);
    > +        for (int i = 0; i < num_related_datapaths; i++) {
    > +            bool found = false;
    > +            HMAP_FOR_EACH_WITH_HASH (
    > +                    dp_node, hmap_node,
    > +                    uuid_hash(&related_datapaths[i]->header_.uuid),
    > +                    &related_dp_processed) {
    > +
    > +                if (uuid_equals(&dp_node->sb_db->header_.uuid,
    > +                                &related_datapaths[i]->header_.uuid)) {
    > +                    found = true;
    > +                    break;
    > +                }
    > +            }
    > +            if(!found &&
    > +               !remote_l3gat_lrd_port(related_datapaths[i],
    > chassis_rec)) {
    > +                dp_list_elem = xzalloc(sizeof *dp_list_elem);
    > +                dp_list_elem->dp = related_datapaths[i];
    > +                ovs_list_push_back(&interest_datapaths,
    > &dp_list_elem->list_node);
    > +            }
    > +        }
    > +    }
    > +    struct dp_checked *cur_node, *next_node;
    > +    HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node,
    > &related_dp_processed) {
    > +        hmap_remove(&related_dp_processed, &cur_node->hmap_node);
    > +        free(cur_node);
    > +    }
    > +    hmap_destroy(&related_dp_processed);
    > +}
    > +
    >  static const struct ovsrec_bridge *
    >  create_br_int(struct controller_ctx *ctx,
    >                const struct ovsrec_open_vswitch *cfg,
    > @@ -465,6 +648,10 @@ main(int argc, char *argv[])
    >          ovsdb_idl_create(ovnsb_remote, &sbrec_idl_class, true, true));
    >      ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
    >
    > +    /* Start with suppressing all logical flows and then later request
    > those
    > +     * that are specifically needed. */
    > +    sbrec_logical_flow_add_clause_false(ovnsb_idl_loop.idl);
    > +
    >      /* Track the southbound idl. */
    >      ovsdb_idl_track_add_all(ovnsb_idl_loop.idl);
    >
    > @@ -561,6 +748,7 @@ main(int argc, char *argv[])
    >              lport_index_destroy(&lports);
    >          }
    >
    > +        clean_stale_datapaths_of_interest(&ctx);
    >          sset_destroy(&all_lports);
    >
    >          struct local_datapath *cur_node, *next_node;
    > diff --git a/ovn/controller/ovn-controller.h b/ovn/controller/ovn-
    > controller.h
    > index 8a3e855..e35b5b0 100644
    > --- a/ovn/controller/ovn-controller.h
    > +++ b/ovn/controller/ovn-controller.h
    > @@ -76,6 +76,16 @@ const struct ovsrec_bridge *get_bridge(struct ovsdb_idl
    > *,
    >  const struct sbrec_chassis *get_chassis(struct ovsdb_idl *,
    >                                          const char *chassis_id);
    >
    > +void do_datapath_flood_fill(const struct controller_ctx *,
    > +                            const struct sbrec_datapath_binding *,
    > +                            const struct sbrec_chassis *);
    > +
    > +void
    > +set_datapath_of_interest_used(const struct sbrec_datapath_binding *dp);
    > +
    > +void
    > +set_all_datapaths_of_interest_stale(void);
    > +
    >  /* Must be a bit-field ordered from most-preferred (higher number) to
    >   * least-preferred (lower number). */
    >  enum chassis_tunnel_type {
    > diff --git a/ovn/controller/patch.c b/ovn/controller/patch.c
    > index 79a7d81..9e8e856 100644
    > --- a/ovn/controller/patch.c
    > +++ b/ovn/controller/patch.c
    > @@ -330,6 +330,9 @@ add_logical_patch_ports(struct controller_ctx *ctx,
    >              free(dst_name);
    >              free(src_name);
    >              add_patched_datapath(patched_datapaths, binding, local_port);
    > +
    > +            sbrec_port_binding_add_clause_logical_port(ctx->ovnsb_idl,
    > +                                                       OVSDB_F_EQ, peer);
    >
    
    Why is this necessary?
    This could add logical ports that are unrelated to this chassis.

This is for peer patch ports.
This has no benefit anymore and it will monitor some patch ports unnecessarily going forward.
I removed it and the corresponding cleanup.
patch.c changes are now zero.

    The northd logic adds related_datapaths for any patch port.
    The flood fill based on related_datapaths should take care of
    datapaths across patch ports, starting from datapaths that
    this chassis does care about.
    
    The add_local_datapath logic works for starting points
    including vifs, l2gateway, and l3gateway.
    What do you think is missing?


                 if (local_port) {
    >                  if (binding->chassis != chassis_rec &&
    > ctx->ovnsb_idl_txn) {
    >                      sbrec_port_binding_set_chassis(binding, chassis_rec);
    > @@ -339,6 +342,26 @@ add_logical_patch_ports(struct controller_ctx *ctx,
    >      }
    >  }
    >
    > +/* Non-logical patch ports should be unaffected. */
    > +static void
    > +remove_patch_port_monitors(struct controller_ctx *ctx,
    > +                            const struct ovsrec_port *port)
    > +{
    > +    for (size_t i = 0; i < port->n_interfaces; i++) {
    > +        struct ovsrec_interface *iface = port->interfaces[i];
    > +        if (strcmp(iface->type, "patch")) {
    > +            continue;
    >
    
    I believe your code adds clauses for l3gateway ports above,
    but does not remove those monitors here.

    
    +        }
    > +        const char *iface_id = smap_get(&iface->external_ids,
    > +                                        "iface-id");
    > +        if (iface_id) {
    
    
    Patch interfaces don't have iface_id. They have options "peer",
    and the ports (not interfaces) have external_ids with key
    "ovn-logical-patch-port" or "ovn-l3gateway-port".

That was a sloppy copy/paste from binding.c to check for external ids that I never finished afterwards.
I intended to check for these:
            smap_get(&port->external_ids, "ovn-l3gateway-port")
            smap_get(&port->external_ids, "ovn-logical-patch-port")

The iface_id argument below also did not get updated.
This code is deprecated now anyways.


    >
    
    +            sbrec_port_binding_remove_clause_logical_port(ctx->ovnsb_idl,
    > +                                                          OVSDB_F_EQ,
    > +                                                          iface_id);
    > +        }
    > +    }
    > +}
    > +
    >  void
    >  patch_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
    >            const char *chassis_id, struct hmap *local_datapaths,
    > @@ -374,6 +397,7 @@ patch_run(struct controller_ctx *ctx, const struct
    > ovsrec_bridge *br_int,
    >      SHASH_FOR_EACH_SAFE (port_node, port_next_node, &existing_ports) {
    >          struct ovsrec_port *port = port_node->data;
    >          shash_delete(&existing_ports, port_node);
    > +        remove_patch_port_monitors(ctx, port);
    >          remove_port(ctx, port);
    >      }
    >      shash_destroy(&existing_ports);
    > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
    > index f2dc353..7ea5dc9 100644
    > --- a/ovn/northd/ovn-northd.c
    > +++ b/ovn/northd/ovn-northd.c
    > @@ -231,6 +231,42 @@ struct tnlid_node {
    >      uint32_t tnlid;
    >  };
    >
    > +struct related_datapath_node {
    > +    struct hmap_node hmap_node;
    > +    const struct sbrec_datapath_binding *sb_db;
    > +};
    > +
    > +static void
    > +add_related_datapath(struct hmap *related_datapaths,
    > +                     const struct sbrec_datapath_binding *sb,
    > +                     size_t *n_related_datapaths)
    > +{
    > +    struct related_datapath_node *node;
    > +    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,
    > +                             uuid_hash(&sb->header_.uuid),
    > +                             related_datapaths) {
    > +        if (uuid_equals(&sb->header_.uuid, &node->sb_db->header_.uuid)) {
    > +            return;
    > +        }
    > +    }
    > +
    > +    node = xzalloc(sizeof *node);
    > +    hmap_insert(related_datapaths, &node->hmap_node,
    > +                uuid_hash(&sb->header_.uuid));
    > +    node->sb_db = sb;
    > +    (*n_related_datapaths)++;
    > +}
    > +
    > +static void
    > +destroy_related_datapaths(struct hmap *related_datapaths)
    > +{
    > +    struct related_datapath_node *node;
    > +    HMAP_FOR_EACH_POP (node, hmap_node, related_datapaths) {
    > +        free(node);
    > +    }
    > +    hmap_destroy(related_datapaths);
    > +}
    > +
    >  static void
    >  destroy_tnlids(struct hmap *tnlids)
    >  {
    > @@ -376,6 +412,8 @@ struct ovn_datapath {
    >      size_t n_router_ports;
    >
    >      struct hmap port_tnlids;
    > +    struct hmap related_datapaths;
    > +    size_t n_related_datapaths;
    >      uint32_t port_key_hint;
    >
    >      bool has_unknown;
    > @@ -426,6 +464,7 @@ ovn_datapath_create(struct hmap *datapaths, const
    > struct uuid *key,
    >      od->nbr = nbr;
    >      hmap_init(&od->port_tnlids);
    >      hmap_init(&od->ipam);
    > +    hmap_init(&od->related_datapaths);
    >      od->port_key_hint = 0;
    >      hmap_insert(datapaths, &od->key_node, uuid_hash(&od->key));
    >      return od;
    > @@ -441,6 +480,7 @@ ovn_datapath_destroy(struct hmap *datapaths, struct
    > ovn_datapath *od)
    >          hmap_remove(datapaths, &od->key_node);
    >          destroy_tnlids(&od->port_tnlids);
    >          destroy_ipam(&od->ipam);
    > +        destroy_related_datapaths(&od->related_datapaths);
    >          free(od->router_ports);
    >          free(od);
    >      }
    > @@ -671,6 +711,69 @@ struct ovn_port {
    >      struct ovs_list list;       /* In list of similar records. */
    >  };
    >
    > +static void
    > +populate_sb_related_datapaths(struct ovn_datapath *od)
    > +{
    > +    struct sbrec_datapath_binding **sb_related_datapaths
    > +        = xmalloc(sizeof(*sb_related_datapaths) *
    > od->n_related_datapaths);
    > +    int rdi = 0;
    > +    struct related_datapath_node *related_datapath;
    > +    HMAP_FOR_EACH (related_datapath, hmap_node,
    > +                   &od->related_datapaths) {
    > +        if (rdi >= od->n_related_datapaths) {
    > +            static struct vlog_rate_limit rl
    > +                = VLOG_RATE_LIMIT_INIT(5, 1);
    > +            VLOG_ERR_RL(&rl, "related datapaths accounting error"
    > +                        UUID_FMT, UUID_ARGS(&od->key));
    > +            break;
    > +        }
    > +
    > +        sb_related_datapaths[rdi] = CONST_CAST(
    > +            struct sbrec_datapath_binding *, related_datapath->sb_db);
    > +        rdi++;
    > +    }
    > +    sbrec_datapath_binding_set_related_datapaths(od->sb,
    > +        sb_related_datapaths, od->n_related_datapaths);
    > +    free(sb_related_datapaths);
    > +}
    > +
    > +static void
    > +generate_sb_related_datapaths_from_ports(struct ovs_list *both)
    > +{
    > +    struct dp_checked {
    > +           struct hmap_node hmap_node;
    > +           const struct sbrec_datapath_binding *sb_db;
    > +       };
    > +       struct dp_checked *dp_node;
    > +       struct hmap dp_processed = HMAP_INITIALIZER(&dp_processed);
    > +    struct ovn_port *op, *next;
    > +       LIST_FOR_EACH_SAFE (op, next, list, both) {
    > +           bool found = false;
    > +           HMAP_FOR_EACH_WITH_HASH (dp_node, hmap_node,
    > +                                    uuid_hash(&op->od->sb->header_.uuid),
    > +                                    &dp_processed) {
    > +               if (uuid_equals(&dp_node->sb_db->header_.uuid,
    > +                               &op->od->sb->header_.uuid)) {
    > +                   found = true;
    > +                   break;
    > +               }
    > +           }
    > +           if(!found){
    > +            dp_node = xzalloc(sizeof *dp_node);
    > +               dp_node->sb_db = op->od->sb;
    > +               hmap_insert(&dp_processed, &dp_node->hmap_node,
    > +                           uuid_hash(&dp_node->sb_db->header_.uuid));
    > +            populate_sb_related_datapaths(op->od);
    > +           }
    > +       }
    > +       struct dp_checked *cur_node, *next_node;
    > +       HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node, &dp_processed)
    > {
    > +           hmap_remove(&dp_processed, &cur_node->hmap_node);
    > +           free(cur_node);
    > +       }
    > +       hmap_destroy(&dp_processed);
    > +}
    > +
    >  static struct ovn_port *
    >  ovn_port_create(struct hmap *ports, const char *key,
    >                  const struct nbrec_logical_switch_port *nbsp,
    > @@ -1359,12 +1462,19 @@ ovn_port_update_sbrec(const struct ovn_port *op,
    >              sbrec_port_binding_set_type(op->sb, "patch");
    >          }
    >
    > +        if (op->peer && op->peer->od && op->peer->od->sb) {
    > +            add_related_datapath(&op->od->related_datapaths,
    > +                                 op->peer->od->sb,
    > +                                 &op->od->n_related_datapaths);
    > +        }
    > +
    >          const char *peer = op->peer ? op->peer->key : "<error>";
    >          struct smap new;
    >          smap_init(&new);
    >          smap_add(&new, "peer", peer);
    >          if (chassis) {
    >              smap_add(&new, "l3gateway-chassis", chassis);
    > +            sbrec_datapath_binding_set_options(op->sb->datapath, &new);
    >
    
    It would be cleaner to modify the datapath_binding in
    build_datapaths, though that would require addition of a run
    through the both list to update or remove "chassis" as
    necessary, as well as modifying the existing run through
    the nb_only list.
    
    I don't see any logic to remove "chassis" if that is removed
    from the nbr entry.

    With this code order, you are also adding "peer" which does
    not belong on datapath_binding. If this code stays here, then
    the smap_add "peer" line should be moved below this.

Yeah, that was sloppy.
“peer” in the datapath_binding options map will do no harm fortunately.
But the l3gateway-chassis option conceivably may need to be cleared in the same
datapath_binding entry at some point in it’s lifetime - a datapath may cease to be a
l3gateway and become something else.
I moved the line.

    
             }
    >          sbrec_port_binding_set_options(op->sb, &new);
    >          smap_destroy(&new);
    > @@ -1411,6 +1521,12 @@ ovn_port_update_sbrec(const struct ovn_port *op,
    >                  sbrec_port_binding_set_type(op->sb, "patch");
    >              }
    >
    > +            if (op->peer && op->peer->od && op->peer->od->sb) {
    > +                add_related_datapath(&op->od->related_datapaths,
    > +                                     op->peer->od->sb,
    > +                                     &op->od->n_related_datapaths);
    > +            }
    > +
    >              const char *router_port = smap_get_def(&op->nbsp->options,
    >                                                     "router-port",
    > "<error>");
    >              struct smap new;
    > @@ -1519,6 +1635,7 @@ build_ports(struct northd_context *ctx, struct hmap
    > *datapaths,
    >          cleanup_mac_bindings(ctx, ports);
    >      }
    >
    > +    generate_sb_related_datapaths_from_ports(&both);
    >      tag_alloc_destroy(&tag_alloc_table);
    >      destroy_chassis_queues(&chassis_qdisc_queues);
    >  }
    > @@ -4952,6 +5069,10 @@ main(int argc, char *argv[])
    >      add_column_noalert(ovnsb_idl_loop.idl,
    >                         &sbrec_datapath_binding_col_tunnel_key);
    >      add_column_noalert(ovnsb_idl_loop.idl,
    > +                       &sbrec_datapath_binding_col_related_datapaths);
    > +    add_column_noalert(ovnsb_idl_loop.idl,
    > +                       &sbrec_datapath_binding_col_options);
    > +    add_column_noalert(ovnsb_idl_loop.idl,
    >                         &sbrec_datapath_binding_col_external_ids);
    >
    >      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_port_binding);
    > diff --git a/ovn/ovn-sb.ovsschema b/ovn/ovn-sb.ovsschema
    > index 89342fe..8dfb2f4 100644
    > --- a/ovn/ovn-sb.ovsschema
    > +++ b/ovn/ovn-sb.ovsschema
    > @@ -1,7 +1,7 @@
    >  {
    >      "name": "OVN_Southbound",
    > -    "version": "1.9.0",
    > -    "cksum": "239060528 9012",
    > +    "version": "1.10.0",
    > +    "cksum": "3562877871 9537",
    >      "tables": {
    >          "SB_Global": {
    >              "columns": {
    > @@ -93,7 +93,17 @@
    >                                        "maxInteger": 16777215}}},
    >                  "external_ids": {
    >                      "type": {"key": "string", "value": "string",
    > -                             "min": 0, "max": "unlimited"}}},
    > +                             "min": 0, "max": "unlimited"}},
    > +                "options": {
    > +                     "type": {"key": "string",
    > +                              "value": "string",
    > +                              "min": 0,
    > +                              "max": "unlimited"}},
    > +                "related_datapaths": {"type": {"key": {"type": "uuid",
    > +                                      "refTable": "Datapath_Binding",
    > +                                      "refType": "weak"},
    > +                                      "min": 0,
    > +                                      "max": "unlimited"}}},
    >              "indexes": [["tunnel_key"]],
    >              "isRoot": true},
    >          "Port_Binding": {
    > diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
    > index 65191ed..88de25d 100644
    > --- a/ovn/ovn-sb.xml
    > +++ b/ovn/ovn-sb.xml
    > @@ -1556,6 +1556,19 @@ tcp.flags = RST;
    >        constructed for each supported encapsulation.
    >      </column>
    >
    > +    <column name="options" key="l3gateway-chassis">
    > +      The <code>chassis</code> in which the router datapath resides.
    > +    </column>
    > +
    > +    <column name="related_datapaths">
    > +      The related_datapaths column is used to keep track of which
    > datapaths
    > +      are connected to each other.  This is leveraged in ovn-controller to
    > +      do a flood fill from each local logical switch to determine the full
    > +      set of datapaths needed on a given ovn-controller.   This set of
    > +      datapaths can be used to determine which logical ports and logical
    > +      flows an ovn-controller should monitor.
    > +    </column>
    > +
    >      <group title="OVN_Northbound Relationship">
    >        <p>
    >          Each row in <ref table="Datapath_Binding"/> is associated with
    > some
    > diff --git a/tests/ovn.at b/tests/ovn.at
    > index a226849..af544e1 100644
    > --- a/tests/ovn.at
    > +++ b/tests/ovn.at
    > @@ -1673,6 +1673,141 @@ OVN_CLEANUP([hv1],[hv2])
    >
    >  AT_CLEANUP
    >
    > +# 2 hypervisors, each with one logical switch with 2 logical ports.
    > +AT_SETUP([ovn -- conditional monitoring: 2 HVs each with 1 LS and 2 LPs])
    > +AT_SKIP_IF([test $HAVE_PYTHON = no])
    > +ovn_start
    > +
    > +# In this test cases we create 2 switches, all connected to same
    > +# physical network (through br-phys on each HV). Each switch has
    > +# VIF ports across 1 HV. Each HV has 2 VIF ports. The first digit
    > +# of VIF port name indicates the hypervisor it is bound to, e.g.
    > +# lp12 means VIF 2 on hv1.
    > +
    > +for i in 1 2; do
    > +    ls_name=ls$i
    > +    ovn-nbctl ls-add $ls_name
    > +done
    > +
    > +net_add n1
    > +for i in 1 2; do
    > +    sim_add hv$i
    > +    as hv$i
    > +    ovs-vsctl add-br br-phys
    > +    ovn_attach n1 br-phys 192.168.0.$i
    > +    ovs-appctl -t ovn-controller vlog/set ANY:file:DBG
    > +
    > +    for j in 1 2; do
    > +        ovs-vsctl add-port br-int vif$i$j -- \
    > +            set Interface vif$i$j external-ids:iface-id=lp$i$j \
    > +                                  options:tx_pcap=hv$i/vif$i$j-tx.pcap \
    > +                                  options:rxq_pcap=hv$i/vif$i$j-rx.pcap \
    > +                                  ofport-request=$i$j
    > +
    > +        lsp_name=lp$i$j
    > +        if test $i -eq 1; then
    > +            ls_name=ls1
    > +        else
    > +            ls_name=ls2
    > +        fi
    > +
    > +        ovn-nbctl lsp-add $ls_name $lsp_name
    > +        ovn-nbctl lsp-set-addresses $lsp_name f0:00:00:00:00:$i$j
    > +        ovn-nbctl lsp-set-port-security $lsp_name f0:00:00:00:00:$i$j
    > +
    > +        OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up $lsp_name` = xup])
    > +    done
    > +done
    > +
    > +ovn_populate_arp
    > +
    > +# XXX This is now the 3rd copy of these functions in this file ...
    > +
    > +# Given the name of a logical port, prints the name of the hypervisor
    > +# on which it is located.
    > +vif_to_hv() {
    > +    echo hv${1%?}
    > +}
    > +#
    > +# test_packet INPORT DST SRC ETHTYPE OUTPORT...
    > +#
    > +# This shell function causes a packet to be received on INPORT.  The
    > packet's
    > +# content has Ethernet destination DST and source SRC (each exactly 12 hex
    > +# digits) and Ethernet type ETHTYPE (4 hex digits).  The OUTPORTs (zero or
    > +# more) list the VIFs on which the packet should be received.  INPORT and
    > the
    > +# OUTPORTs are specified as logical switch port numbers, e.g. 11 for
    > vif11.
    > +for i in 1 2; do
    > +    for j in 1 2; do
    > +        : > $i$j.expected
    > +    done
    > +done
    > +test_packet() {
    > +    local inport=$1 src=$2 dst=$3 eth=$4; shift; shift; shift; shift
    > +    local packet=${src}${dst}${eth}
    > +    hv=`vif_to_hv $inport`
    > +    vif=vif$inport
    > +    as $hv ovs-appctl netdev-dummy/receive $vif $packet
    > +    for outport; do
    > +        echo $packet >> $outport.expected
    > +    done
    > +}
    > +
    > +# lp11 and lp12 are on the same network and on
    > +# same hypervisors
    > +test_packet 11 f00000000012 f00000000011 1112 12
    > +test_packet 12 f00000000011 f00000000012 1211 11
    > +
    > +# lp21 and lp22 are on the same network and on
    > +# the same hypervisor
    > +test_packet 21 f00000000022 f00000000021 2122 22
    > +test_packet 22 f00000000021 f00000000022 2221 21
    > +
    > +# lp11 and lp22 are on different networks so should
    > +# not be able to communicate.
    > +test_packet 11 f00000000022 f00000000011 1122
    > +test_packet 22 f00000000011 f00000000022 2211
    > +
    > +echo "------ OVN dump ------"
    > +ovn-nbctl show
    > +ovn-sbctl show
    > +
    > +echo "------ hv1 dump ------"
    > +as hv1 ovs-vsctl show
    > +as hv1 ovs-ofctl -O OpenFlow13 dump-flows br-int
    > +
    > +echo "------ hv2 dump ------"
    > +as hv2 ovs-vsctl show
    > +as hv2 ovs-ofctl -O OpenFlow13 dump-flows br-int
    > +
    > +# Check that all datapath components are monitored as expected.
    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 1'], [0],
    > [ignore-nolog])
    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 2'], [1],
    > [ignore-nolog])
    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 2'], [0],
    > [ignore-nolog])
    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 1'], [1],
    > [ignore-nolog])
    > +
    > +dbg_instances1=`cat hv1/ovn-controller.log | grep 'tunnel_key 1' | wc -l`
    > +echo ls1 monitor traces on hv1 $dbg_instances1
    > +dbg_instances2=`cat hv1/ovn-controller.log | grep 'tunnel_key 2' | wc -l`
    > +echo ls2 monitor traces on hv1 $dbg_instances2
    > +AT_CHECK([test $dbg_instances1 -gt $dbg_instances2], [0], [ignore-nolog])
    > +
    > +dbg_instances1=`cat hv2/ovn-controller.log | grep 'tunnel_key 2' | wc -l`
    > +echo ls2 monitor traces on hv2 $dbg_instances1
    > +dbg_instances2=`cat hv2/ovn-controller.log | grep 'tunnel_key 1' | wc -l`
    > +echo ls1 monitor traces on hv2 $dbg_instances2
    > +AT_CHECK([test $dbg_instances1 -gt $dbg_instances2], [0], [ignore-nolog])
    > +
    > +# Now check the packets actually received against the ones expected.
    > +for i in 1 2; do
    > +    for j in 1 2; do
    > +        OVN_CHECK_PACKETS([hv$i/vif$i$j-tx.pcap], [$i$j.expected])
    > +    done
    > +done
    > +
    > +OVN_CLEANUP([hv1],[hv2])
    > +
    > +AT_CLEANUP
    > +
    >  AT_SETUP([ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS])
    >  AT_KEYWORDS([vtep])
    >  AT_SKIP_IF([test $HAVE_PYTHON = no])
    > @@ -2439,6 +2574,223 @@ OVN_CLEANUP([hv1], [hv2], [hv3])
    >
    >  AT_CLEANUP
    >
    > +# 3 hypervisors, 3 logical switches with 3 logical ports each, 1 logical
    > router
    > +AT_SETUP([ovn -- conditional Monitoring: 3 HVs, 3 LS, 3 lports/LS, 1 LR])
    > +AT_SKIP_IF([test $HAVE_PYTHON = no])
    > +ovn_start
    > +
    > +# Logical network:
    > +#
    > +# Three logical switches ls1, ls2, ls3.
    > +# One logical router lr0 connected to ls[123],
    > +# with nine subnets, three per logical switch:
    > +#
    > +#    lrp11 on ls1 for subnet 192.168.11.0/24
    > +#    lrp12 on ls1 for subnet 192.168.12.0/24
    > +#    lrp13 on ls1 for subnet 192.168.13.0/24
    
    
    This is confusing since the code below does not add
    lrp11, lrp12, lrp13, just the subnets in ls1 which is
    not attached to lr0.

That description was not cleaned up when the test was updated.
I fixed it.
    
    I am having trouble understanding the motivation for
    3 subnets per ls. It will work, just wondering why you
    thought that it is necessary or helpful?

I just needed multiple ports. Having multiple subnets allows easier porting
to create a different test case in future based on this one.
    
    >
    > +#    ...
    > +#    lrp33 on ls3 for subnet 192.168.33.0/24
    > +#
    > +# 27 VIFs, 9 per LS, 3 per subnet: lp[123][123][123], where the first two
    > +# digits are the subnet and the last digit distinguishes the VIF.
    > +for i in 1 2 3; do
    > +    ovn-nbctl ls-add ls$i
    > +    for j in 1 2 3; do
    > +        for k in 1 2 3; do
    > +            ovn-nbctl \
    > +                -- lsp-add ls$i lp$i$j$k \
    > +                -- lsp-set-addresses lp$i$j$k "f0:00:00:00:0$i:$j$k \
    > +                   192.168.$i$j.$k"
    > +        done
    > +    done
    > +done
    > +
    > +# Only ls2 and ls3 are connected by the logical router.
    > +ovn-nbctl lr-add lr0
    > +for i in 2 3; do
    > +    for j in 1 2 3; do
    > +        ovn-nbctl lrp-add lr0 lrp$i$j 00:00:00:00:ff:$i$j
    > 192.168.$i$j.254/24
    > +        ovn-nbctl \
    > +            -- lsp-add ls$i lrp$i$j-attachment \
    > +            -- set Logical_Switch_Port lrp$i$j-attachment type=router \
    > +                             options:router-port=lrp$i$j \
    > +                             addresses='"00:00:00:00:ff:'$i$j'"'
    > +    done
    > +done
    > +
    > +# Physical network:
    > +#
    > +# Three hypervisors hv[123].
    > +# lp[1??] on hv1
    > +# lp[2??] on hv2.
    > +# lp[3??] on hv3.
    >
    
    I like this. So ls3 is only present on hv3?
    If so, this will test the related_datapaths part of the code, making
    sure that hv2 grabs flows for datapaths lr0 and ls3 even though
    there are no local VIFs in ls3 on hv2. This would catch the
    code order northd issue from v9.
    
    Mickey
    
    +
    > +# Given the name of a logical port, prints the name of the hypervisor
    > +# on which it is located.
    > +vif_to_hv() {
    > +    case $1 in dnl (
    > +        1??) echo 1 ;; dnl (
    > +        2??) echo 2 ;; dnl (
    > +        3??) echo 3 ;;
    > +    esac
    > +}
    > +
    > +# Given the name of a logical port, prints the name of its logical router
    > +# port, e.g. "vif_to_lrp 123" yields 12.
    > +vif_to_lrp() {
    > +    echo ${1%?}
    > +}
    > +
    > +# Given the name of a logical port, prints the name of its logical
    > +# switch, e.g. "vif_to_ls 123" yields 1.
    > +vif_to_ls() {
    > +    echo ${1%??}
    > +}
    > +
    > +net_add n1
    > +for i in 1 2 3; do
    > +    sim_add hv$i
    > +    as hv$i
    > +    ovs-vsctl add-br br-phys
    > +    ovn_attach n1 br-phys 192.168.0.$i
    > +    ovs-appctl -t ovn-controller vlog/set ANY:file:DBG
    > +done
    > +for i in 1 2 3; do
    > +    for j in 1 2 3; do
    > +        for k in 1 2 3; do
    > +            hv=`vif_to_hv $i$j$k`
    > +                as hv$hv ovs-vsctl \
    > +                -- add-port br-int vif$i$j$k \
    > +                -- set Interface vif$i$j$k \
    > +                    external-ids:iface-id=lp$i$j$k \
    > +                    options:tx_pcap=hv$hv/vif$i$j$k-tx.pcap \
    > +                    options:rxq_pcap=hv$hv/vif$i$j$k-rx.pcap \
    > +                    ofport-request=$i$j$k
    > +        done
    > +    done
    > +done
    > +
    > +# Pre-populate the hypervisors' ARP tables so that we don't lose any
    > +# packets for ARP resolution (native tunneling doesn't queue packets
    > +# for ARP resolution).
    > +ovn_populate_arp
    > +
    > +# Allow some time for ovn-northd and ovn-controller to catch up.
    > +# XXX This should be more systematic.
    > +sleep 1
    > +
    > +# test_ip INPORT SRC_MAC DST_MAC SRC_IP DST_IP OUTPORT...
    > +#
    > +# This shell function causes a packet to be received on INPORT.  The
    > packet's
    > +# content has Ethernet destination DST and source SRC (each exactly 12 hex
    > +# digits) and Ethernet type ETHTYPE (4 hex digits).  The OUTPORTs (zero or
    > +# more) list the VIFs on which the packet should be received.  INPORT and
    > the
    > +# OUTPORTs are specified as logical switch port numbers, e.g. 123 for
    > vif123.
    > +for i in 1 2 3; do
    > +    for j in 1 2 3; do
    > +        for k in 1 2 3; do
    > +            : > $i$j$k.expected
    > +        done
    > +    done
    > +done
    > +test_ip() {
    > +    # This packet has bad checksums but logical L3 routing doesn't check.
    > +    local inport=$1 src_mac=$2 dst_mac=$3 src_ip=$4 dst_ip=$5
    > +    local packet=${dst_mac}${src_mac}08004500001c0000000040110000${
    > src_ip}${dst_ip}0035111100080000
    > +    shift; shift; shift; shift; shift
    > +    hv=hv`vif_to_hv $inport`
    > +    as $hv ovs-appctl netdev-dummy/receive vif$inport $packet
    > +    in_ls=`vif_to_ls $inport`
    > +    in_lrp=`vif_to_lrp $inport`
    > +    for outport; do
    > +        out_ls=`vif_to_ls $outport`
    > +        if test $in_ls = $out_ls; then
    > +            # Ports on the same logical switch receive exactly the same
    > packet.
    > +            echo $packet
    > +        else
    > +            # Routing decrements TTL and updates source and dest MAC
    > +            # (and checksum).
    > +            out_lrp=`vif_to_lrp $outport`
    > +            echo f00000000${outport}00000000ff$
    > {out_lrp}08004500001c00000000"3f1101"00${src_ip}${dst_ip}0035111100080000
    > +        fi >> $outport.expected
    > +    done
    > +}
    > +
    > +as hv1 ovn-sbctl list datapath_binding
    > +echo "----------------------------"
    > +as hv1 ovn-sbctl list port_binding
    > +echo "----------------------------"
    > +as hv1 ovn-sbctl dump-flows
    > +#as hv1 ovs-vsctl --columns=name,ofport list interface
    > +#as hv1 ovs-ofctl dump-flows br-int
    > +
    > +# Send IP packets between all pairs of source and destination ports:
    > +#
    > +# 1. Unicast IP packets are delivered to exactly one logical switch port
    > +#    (except that packets destined to their input ports are dropped).
    > +#
    > +ip_to_hex() {
    > +    printf "%02x%02x%02x%02x" "$@"
    > +}
    > +for is in 1 2 3; do
    > +  for js in 1 2 3; do
    > +    for ks in 1 2 3; do
    > +      s=$is$js$ks
    > +      smac=f00000000$s
    > +      sip=`ip_to_hex 192 168 $is$js $ks`
    > +      for id in 1 2 3; do
    > +          for jd in 1 2 3; do
    > +              for kd in 1 2 3; do
    > +                d=$id$jd$kd
    > +                dip=`ip_to_hex 192 168 $id$jd $kd`
    > +                if test $is = $id; then dmac=f00000000$d; else
    > dmac=00000000ff$is$js; fi
    > +                if test $d != $s; then unicast=$d; else unicast=; fi
    > +                if test $is = 1; then if test $is != $id; then unicast=;
    > fi; fi
    > +                if test $id = 1; then if test $is != $id; then unicast=;
    > fi; fi
    > +                test_ip $s $smac $dmac $sip $dip $unicast
    > +              done
    > +          done
    > +        done
    > +      done
    > +  done
    > +done
    > +
    > +# Allow some time for packet forwarding.
    > +# XXX This can be improved.
    > +sleep 1
    > +
    > +ovn-sbctl -f csv -d bare --no-heading \
    > +    -- --columns=logical_port,ip,mac list mac_binding > mac_bindings
    > +
    > +# Now check the packets actually received against the ones expected.
    > +for i in 1 2 3; do
    > +    for j in 1 2 3; do
    > +        for k in 1 2 3; do
    > +            OVN_CHECK_PACKETS([hv`vif_to_hv $i$j$k`/vif$i$j$k-tx.pcap],
    > +                              [$i$j$k.expected])
    > +        done
    > +    done
    > +done
    > +
    > +# Check that all datapath components are monitored as expected.
    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 1'], [0],
    > [ignore-nolog])
    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 2'], [1],
    > [ignore-nolog])
    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 3'], [1],
    > [ignore-nolog])
    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 4'], [1],
    > [ignore-nolog])
    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 1'], [1],
    > [ignore-nolog])
    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 2'], [0],
    > [ignore-nolog])
    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 3'], [0],
    > [ignore-nolog])
    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 4'], [0],
    > [ignore-nolog])
    > +AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 1'], [1],
    > [ignore-nolog])
    > +AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 2'], [0],
    > [ignore-nolog])
    > +AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 3'], [0],
    > [ignore-nolog])
    > +AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 4'], [0],
    > [ignore-nolog])
    > +
    > +# Gracefully terminate daemons
    > +OVN_CLEANUP([hv1], [hv2], [hv3])
    > +
    > +AT_CLEANUP
    > +
    >  # 3 hypervisors, one logical switch, 3 logical ports per hypervisor
    >  AT_SETUP([ovn -- portsecurity : 3 HVs, 1 LS, 3 lports/HV])
    >  AT_SKIP_IF([test $HAVE_PYTHON = no])
    > @@ -2983,6 +3335,7 @@ ovs-vsctl -- add-port br-int vif2 -- \
    >      options:tx_pcap=hv1/vif2-tx.pcap \
    >      options:rxq_pcap=hv1/vif2-rx.pcap \
    >      ofport-request=1
    > +ovs-appctl -t ovn-controller vlog/set ANY:file:DBG
    >
    >
    >  # Allow some time for ovn-northd and ovn-controller to catch up.
    > @@ -3014,10 +3367,13 @@ echo "---------------------"
    >  echo "---------SB dump-----"
    >  ovn-sbctl list datapath_binding
    >  echo "---------------------"
    > +ovn-sbctl list port_binding
    > +echo "---------------------"
    >  ovn-sbctl list logical_flow
    >  echo "---------------------"
    >
    >  echo "------ hv1 dump ----------"
    > +as hv1 ovs-ofctl show br-int
    >  as hv1 ovs-ofctl dump-flows br-int
    >
    >
    > @@ -3031,10 +3387,13 @@ sleep 1
    >  echo "---------SB dump-----"
    >  ovn-sbctl list datapath_binding
    >  echo "---------------------"
    > +ovn-sbctl list port_binding
    > +echo "---------------------"
    >  ovn-sbctl list logical_flow
    >  echo "---------------------"
    >
    >  echo "------ hv1 dump ----------"
    > +as hv1 ovs-ofctl show br-int
    >  as hv1 ovs-ofctl dump-flows br-int
    >
    >  as hv1 ovs-appctl netdev-dummy/receive vif1 $packet
    > @@ -4109,6 +4468,7 @@ ovs-vsctl -- add-port br-int hv1-vif1 -- \
    >      options:tx_pcap=hv1/vif1-tx.pcap \
    >      options:rxq_pcap=hv1/vif1-rx.pcap \
    >      ofport-request=1
    > +ovs-appctl -t ovn-controller vlog/set ANY:file:DBG
    >
    >
    >  sim_add hv2
    > @@ -4120,6 +4480,7 @@ ovs-vsctl -- add-port br-int hv2-vif1 -- \
    >      options:tx_pcap=hv2/vif1-tx.pcap \
    >      options:rxq_pcap=hv2/vif1-rx.pcap \
    >      ofport-request=1
    > +ovs-appctl -t ovn-controller vlog/set ANY:file:DBG
    >
    >  # Pre-populate the hypervisors' ARP tables so that we don't lose any
    >  # packets for ARP resolution (native tunneling doesn't queue packets
    > @@ -4213,7 +4574,6 @@ src_ip=`ip_to_hex 192 168 1 2`
    >  dst_ip=`ip_to_hex 172 16 1 2`
    >  expected=${dst_mac}${src_mac}08004500001c000000003e110200${
    > src_ip}${dst_ip}0035111100080000
    >
    > -
    >  as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
    >  as hv1 ovs-appctl ofproto/trace br-int in_port=1 $packet
    >
    > @@ -4225,6 +4585,15 @@ as hv2 ovs-ofctl show br-int
    >  as hv2 ovs-ofctl dump-flows br-int
    >  echo "----------------------------"
    >
    > +# tunnel key 2 represents the gateway router, hence hv1 does not need
    > +# to know about its ports and flows, since the transit switch acts as
    > proxy.
    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 2'], [0],
    > [ignore-nolog])
    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 2'], [1],
    > [ignore-nolog])
    > +# tunnel key 4 represents a datapath (alice) on the far side of the
    > gateway from
    > +# the perspective of hv1, hence hv1 does not need to know about its ports
    > and flows.
    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 4'], [0],
    > [ignore-nolog])
    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 4'], [1],
    > [ignore-nolog])
    > +
    >  echo $expected > expected
    >  OVN_CHECK_PACKETS([hv2/vif1-tx.pcap], [expected])
    >
    > --
    > 1.9.1
    >
    > _______________________________________________
    > dev mailing list
    > dev at openvswitch.org
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DgICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=0NVW3s0W2Cp5PwqZSAh44zzzcCktMWNrs3_jXV4h4pY&s=Pq6i89qqTAZzkmqhxV08ag-Ve08Jx9nGmx3NQeGgc08&e= 
    >
    _______________________________________________
    dev mailing list
    dev at openvswitch.org
    https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DgICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=0NVW3s0W2Cp5PwqZSAh44zzzcCktMWNrs3_jXV4h4pY&s=Pq6i89qqTAZzkmqhxV08ag-Ve08Jx9nGmx3NQeGgc08&e= 
    













More information about the dev mailing list