[ovs-dev] [PATCH ovn v7 5/9] ovn-controller: I-P for ct zone and OVS interface changes in flow output stage.

Numan Siddique numans at ovn.org
Sat May 23 18:36:52 UTC 2020


On Sat, May 23, 2020 at 7:30 AM Han Zhou <hzhou at ovn.org> wrote:

> On Fri, May 22, 2020 at 10:29 AM Numan Siddique <numans at ovn.org> wrote:
> >
> >
> >
> > On Thu, May 21, 2020 at 12:02 PM Han Zhou <hzhou at ovn.org> wrote:
> >>
> >> On Wed, May 20, 2020 at 12:41 PM <numans at ovn.org> wrote:
> >> >
> >> > From: Numan Siddique <numans at ovn.org>
> >> >
> >> > This patch handles ct zone changes and OVS interface changes
> incrementally
> >> > in the flow output stage.
> >> >
> >> > Any changes to ct zone can be handled by running physical_run()
> instead
> >> of running
> >> > flow_output_run(). And any changes to OVS interfaces can be either
> handled
> >> > incrementally (for OVS interfaces representing VIFs) or just running
> >> > physical_run() (for tunnel and other types of interfaces).
> >> >
> >> > To better handle this, a new engine node 'physical_flow_changes' is
> added
> >> which
> >> > handles changes to ct zone and OVS interfaces.
> >> >
> >> Hi Numan,
> >>
> > Hi Han,
> >
> > Thanks for the review and comments. Please see below.
> >
> >
> >>
> >> The engine node physical_flow_changes looks weird because it doesn't
> really
> >> have any data but merely to trigger some compute when VIF changes.
> >> I think it can be handled in a more straightforward way. In fact there
> was
> >> a miss in my initial I-P patch that there are still global variables
> used
> >> in physical.c (my bad):
> >> - localvif_to_ofport
> >> - tunnels
> >> In the principle of I-P engine global variable shouldn't be used because
> >> otherwise there is no way to ensure the dependency graph is followed.
> The
> >> current I-P is sitll correct only because interface changes (which in
> turn
> >> causes the above global var changes) always triggers recompute.
> >>
> >> Now to handle interface changes incrementally, we should simply move
> these
> >> global vars to engine nodes, and add them as input for flow_output, and
> >> also their input will be OVS_Interface (and probably some other inputs).
> >> With this, the dependency graph would be clear and implementation of
> each
> >> input handler will be straightforward. Otherwise, it would be really
> hard
> >> to follow the logic and deduce the correctness for all corner cases,
> >> although it may be correct for normal scenarios that I believe you have
> >> done thorough tests. Do you mind refactoring it?
> >
> >
> > Right now, the flow_output node has an "en_ct_zones" engine as input and
> it doesn't have
> > OVS_interface as input. But this is ok as any runtime data change will
> trigger recomputing.
> >
> Do you mean OVS_interface should actually be input of "en_ct_zones", but it
> is ok to not add it? But I don't understand why should it be input of
> en_ct_zones. Maybe I missed something?
>

I mean right now, OVS Interface is not an input to flow_output stage.


> My point above is about the dependency between flow_output and
> OVS_interface. flow_output actually depends on OVS_interface. The two
> global variables allowed flow_output to use OVS_interface data without the
> dependancy in the graph. Now since we incrementally procssing OVS_interface
> changes, we'd better fix the dependency graph to ensure the correctness.


I agree.



> I
> had encounted very tricky bugs even when some very trivial part of the I-P
> engine rule was not followed, and finally found out defining the engine
> node properly was the easiest fix.
>
> > If en_ct_zones engine node changes, there is no need to trigger full
> recompute and we
> > call "physical_run()", it should be enough. Although we can definitely
> further improve it
> > and this can be tackled separately.
> >
> > In my understanding, I think we can also handle ovs interface changes
> incrementally and if we can't
> > we can just call "physical_run()" instead of full flow recompute.
> >
> > With this patch, it adds an engine node for physical flow changes and the
> input for that is en_ct_zone
> > and ovs interface. I understand and agree that it's a bit weird.
> >
> > How to handle it when both
> >    - en_ct_zone changes
> >    -  some ovs interface changes.
> >
> > If both these inputs are direct inputs to the flow_output engine, then we
> may end up calling "physical_run()"
> > twice. And I wanted to avoid that and hence added a physical_flow_changes
> node as an intermediate.
> > How do you suggest we handle this if we directly handle the ovs interface
> changes in flow_output engine ?
> >
>
> Not sure if I understand completely, but it seems you are suggesting we can
> afford a recompute for physical flows, but not for logical flows, and in
> some cases this can happen. Hoewver, you are trying to make a full phyical
> computing appear as incremental processing, but you are not happy with this
> kind of I-P because the cost is high, so you want to avoid calling it
> multiple times, so a intermediate node is added for that purpose.
>

That's right.


>
> If this is the case, I'd suggest to handle it more cleanly with the I-P
> engine. We can split the flow_output into separate nodes: logical_output
> and physical_output, and add a dependency (fake) from physical_output to
> logical_output just to make sure physical_output is the last node to
> compute, so that even if it is recomputed it doesn't matter that much. Now
> we call physical_run only in recompute of physical_output node, and since
> it is recompute, it won't be called multiple times. What do you think?
>
>
Below is what I understand from what you suggested. Can you please confirm
if the
understanding is correct. If not, can you please explain the graph path ?

flow_output
       --- physical_output
                -- en_ct_zones
                -- OVS interface
                -- runtime_data
                -- logical_output
       -- logical_output
             -- runtime_data
                     -- OVS interface
                     -- SB Port Binding
                     -- SB datapath binding
                     -- ...
                        ...
             -- SB flow outout
             -- Address set
             -- SB Port Groups
             -- ..
                ...

The physical_output_run() will call physical_run()
I'm not sure what the flow_output_physical_output_handler()  do ? Call
physical_run() or no-op ?
I need your input for this if my understanding is correct.

The logical_output_run() will call lflow_run()
I'm not sure what the flow_output_logical_output_handler()  do ? Call
lflow_run() or no-op ?
I need your input for this if my understanding is correct.

And the flow_output_run() will remain the same.

Thanks
Numan



The


>
> > Thanks
> > Numan
> >
> >
> >>
> >>
> >> In addition, please see another comment inlined.
> >>
> >> Thanks,
> >> Han
> >>
> >>
> >> > Acked-by: Mark Michelson <mmichels at redhat.com>
> >> > Signed-off-by: Numan Siddique <numans at ovn.org>
> >> > ---
> >> >  controller/binding.c        | 23 +----------
> >> >  controller/binding.h        | 24 ++++++++++-
> >> >  controller/ovn-controller.c | 82
> ++++++++++++++++++++++++++++++++++++-
> >> >  controller/physical.c       | 51 +++++++++++++++++++++++
> >> >  controller/physical.h       |  5 ++-
> >> >  5 files changed, 159 insertions(+), 26 deletions(-)
> >> >
> >> > diff --git a/controller/binding.c b/controller/binding.c
> >> > index d5e8e4773..f00c975ce 100644
> >> > --- a/controller/binding.c
> >> > +++ b/controller/binding.c
> >> > @@ -504,7 +504,7 @@ remove_local_lport_ids(const struct
> >> sbrec_port_binding *binding_rec,
> >> >   * 'struct local_binding' is used. A shash of these local bindings is
> >> >   * maintained with the 'external_ids:iface-id' as the key to the
> shash.
> >> >   *
> >> > - * struct local_binding has 3 main fields:
> >> > + * struct local_binding (defined in binding.h) has 3 main fields:
> >> >   *    - type
> >> >   *    - OVS interface row object
> >> >   *    - Port_Binding row object
> >> > @@ -540,21 +540,6 @@ remove_local_lport_ids(const struct
> >> sbrec_port_binding *binding_rec,
> >> >   *              when it sees the ARP packet from the parent's VIF.
> >> >   *
> >> >   */
> >> > -enum local_binding_type {
> >> > -    BT_VIF,
> >> > -    BT_CONTAINER,
> >> > -    BT_VIRTUAL
> >> > -};
> >> > -
> >> > -struct local_binding {
> >> > -    char *name;
> >> > -    enum local_binding_type type;
> >> > -    const struct ovsrec_interface *iface;
> >> > -    const struct sbrec_port_binding *pb;
> >> > -
> >> > -    /* shash of 'struct local_binding' representing children. */
> >> > -    struct shash children;
> >> > -};
> >> >
> >> >  static struct local_binding *
> >> >  local_binding_create(const char *name, const struct ovsrec_interface
> >> *iface,
> >> > @@ -576,12 +561,6 @@ local_binding_add(struct shash *local_bindings,
> >> struct local_binding *lbinding)
> >> >      shash_add(local_bindings, lbinding->name, lbinding);
> >> >  }
> >> >
> >> > -static struct local_binding *
> >> > -local_binding_find(struct shash *local_bindings, const char *name)
> >> > -{
> >> > -    return shash_find_data(local_bindings, name);
> >> > -}
> >> > -
> >> >  static void
> >> >  local_binding_destroy(struct local_binding *lbinding)
> >> >  {
> >> > diff --git a/controller/binding.h b/controller/binding.h
> >> > index f7ace6faf..21118ecd4 100644
> >> > --- a/controller/binding.h
> >> > +++ b/controller/binding.h
> >> > @@ -18,6 +18,7 @@
> >> >  #define OVN_BINDING_H 1
> >> >
> >> >  #include <stdbool.h>
> >> > +#include "openvswitch/shash.h"
> >> >
> >> >  struct hmap;
> >> >  struct ovsdb_idl;
> >> > @@ -32,7 +33,6 @@ struct sbrec_chassis;
> >> >  struct sbrec_port_binding_table;
> >> >  struct sset;
> >> >  struct sbrec_port_binding;
> >> > -struct shash;
> >> >
> >> >  struct binding_ctx_in {
> >> >      struct ovsdb_idl_txn *ovnsb_idl_txn;
> >> > @@ -60,6 +60,28 @@ struct binding_ctx_out {
> >> >      struct smap *local_iface_ids;
> >> >  };
> >> >
> >> > +enum local_binding_type {
> >> > +    BT_VIF,
> >> > +    BT_CONTAINER,
> >> > +    BT_VIRTUAL
> >> > +};
> >> > +
> >> > +struct local_binding {
> >> > +    char *name;
> >> > +    enum local_binding_type type;
> >> > +    const struct ovsrec_interface *iface;
> >> > +    const struct sbrec_port_binding *pb;
> >> > +
> >> > +    /* shash of 'struct local_binding' representing children. */
> >> > +    struct shash children;
> >> > +};
> >> > +
> >> > +static inline struct local_binding *
> >> > +local_binding_find(struct shash *local_bindings, const char *name)
> >> > +{
> >> > +    return shash_find_data(local_bindings, name);
> >> > +}
> >> > +
> >> >  void binding_register_ovs_idl(struct ovsdb_idl *);
> >> >  void binding_run(struct binding_ctx_in *, struct binding_ctx_out *);
> >> >  bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> >> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> >> > index 9ad5be1c6..8f95fff1f 100644
> >> > --- a/controller/ovn-controller.c
> >> > +++ b/controller/ovn-controller.c
> >> > @@ -1369,8 +1369,13 @@ static void init_physical_ctx(struct
> engine_node
> >> *node,
> >> >
> >> >      ovs_assert(br_int && chassis);
> >> >
> >> > +    struct ovsrec_interface_table *iface_table =
> >> > +        (struct ovsrec_interface_table *)EN_OVSDB_GET(
> >> > +            engine_get_input("OVS_interface",
> >> > +                engine_get_input("physical_flow_changes", node)));
> >>
> >> I think we shouldn't try to get indirect input data (i.e. input of the
> >> input). The engine design didn't take this into consideration, although
> >> nothing could prevent a developer to do this (except code reviewing:)).
> >> If the input is required to compute the output, just add it as direct
> input
> >> to the node.
> >
> >
> > Sure I can do that. But we may need to have a no-op handler for these new
> input nodes.
> > Is that OK ?
>
> Based on the above discussion, the node "physical_flow_change" is added
> just to combine the "OVS_interface" and "ct_zones" node's data, so there is
> no point to add the two nodes again directly as input to flow_output with
> no-op handlers, which is meaningless. If we really want to have the
> physical_flow_change node, I think the data should be passed in the
> physical_flow_change node, just copy the references of the both input data
> and it then becomes "physical_flow_changes" node's data, and flow_output
> node and get it directly there. It achieves the same but not breaking I-P
> engine rules.
>
> >
> > Thanks
> > Numan
> >
> >>
> >>
> >> >      struct ed_type_ct_zones *ct_zones_data =
> >> > -        engine_get_input_data("ct_zones", node);
> >> > +        engine_get_input_data("ct_zones",
> >> > +            engine_get_input("physical_flow_changes", node));
> >> >      struct simap *ct_zones = &ct_zones_data->current;
> >> >
> >> >      p_ctx->sbrec_port_binding_by_name = sbrec_port_binding_by_name;
> >> > @@ -1378,12 +1383,14 @@ static void init_physical_ctx(struct
> engine_node
> >> *node,
> >> >      p_ctx->mc_group_table = multicast_group_table;
> >> >      p_ctx->br_int = br_int;
> >> >      p_ctx->chassis_table = chassis_table;
> >> > +    p_ctx->iface_table = iface_table;
> >> >      p_ctx->chassis = chassis;
> >> >      p_ctx->active_tunnels = &rt_data->active_tunnels;
> >> >      p_ctx->local_datapaths = &rt_data->local_datapaths;
> >> >      p_ctx->local_lports = &rt_data->local_lports;
> >> >      p_ctx->ct_zones = ct_zones;
> >> >      p_ctx->mff_ovn_geneve = ed_mff_ovn_geneve->mff_ovn_geneve;
> >> > +    p_ctx->local_bindings = &rt_data->local_bindings;
> >> >  }
> >> >
> >> >  static void init_lflow_ctx(struct engine_node *node,
> >> > @@ -1791,6 +1798,70 @@ flow_output_port_groups_handler(struct
> engine_node
> >> *node, void *data)
> >> >      return _flow_output_resource_ref_handler(node, data,
> >> REF_TYPE_PORTGROUP);
> >> >  }
> >> >
> >> > +struct ed_type_physical_flow_changes {
> >> > +    bool need_physical_run;
> >> > +    bool ovs_ifaces_changed;
> >> > +};
> >> > +
> >> > +static bool
> >> > +flow_output_physical_flow_changes_handler(struct engine_node *node,
> void
> >> *data)
> >> > +{
> >> > +    struct ed_type_physical_flow_changes *pfc =
> >> > +        engine_get_input_data("physical_flow_changes", node);
> >> > +    struct ed_type_runtime_data *rt_data =
> >> > +        engine_get_input_data("runtime_data", node);
> >> > +
> >> > +    struct ed_type_flow_output *fo = data;
> >> > +    struct physical_ctx p_ctx;
> >> > +    init_physical_ctx(node, rt_data, &p_ctx);
> >> > +
> >> > +    engine_set_node_state(node, EN_UPDATED);
> >> > +    if (pfc->need_physical_run) {
> >> > +        pfc->need_physical_run = false;
> >> > +        physical_run(&p_ctx, &fo->flow_table);
> >> > +        return true;
> >> > +    }
> >> > +
> >> > +    if (pfc->ovs_ifaces_changed) {
> >> > +        pfc->ovs_ifaces_changed = false;
> >> > +        return physical_handle_ovs_iface_changes(&p_ctx,
> >> &fo->flow_table);
> >> > +    }
> >> > +
> >> > +    return true;
> >> > +}
> >> > +
> >> > +static void *
> >> > +en_physical_flow_changes_init(struct engine_node *node OVS_UNUSED,
> >> > +                             struct engine_arg *arg OVS_UNUSED)
> >> > +{
> >> > +    struct ed_type_physical_flow_changes *data = xzalloc(sizeof
> *data);
> >> > +    return data;
> >> > +}
> >> > +
> >> > +static void
> >> > +en_physical_flow_changes_cleanup(void *data OVS_UNUSED)
> >> > +{
> >> > +
> >> > +}
> >> > +
> >> > +static void
> >> > +en_physical_flow_changes_run(struct engine_node *node, void *data
> >> OVS_UNUSED)
> >> > +{
> >> > +    struct ed_type_physical_flow_changes *pfc = data;
> >> > +    pfc->need_physical_run = true;
> >> > +    engine_set_node_state(node, EN_UPDATED);
> >> > +}
> >> > +
> >> > +static bool
> >> > +physical_flow_changes_ovs_iface_handler(struct engine_node *node
> >> OVS_UNUSED,
> >> > +                                        void *data OVS_UNUSED)
> >> > +{
> >> > +    struct ed_type_physical_flow_changes *pfc = data;
> >> > +    pfc->ovs_ifaces_changed = true;
> >> > +    engine_set_node_state(node, EN_UPDATED);
> >> > +    return true;
> >> > +}
> >> > +
> >> >  struct ovn_controller_exit_args {
> >> >      bool *exiting;
> >> >      bool *restart;
> >> > @@ -1914,6 +1985,7 @@ main(int argc, char *argv[])
> >> >      ENGINE_NODE(runtime_data, "runtime_data");
> >> >      ENGINE_NODE(mff_ovn_geneve, "mff_ovn_geneve");
> >> >      ENGINE_NODE(ofctrl_is_connected, "ofctrl_is_connected");
> >> > +    ENGINE_NODE(physical_flow_changes, "physical_flow_changes");
> >> >      ENGINE_NODE(flow_output, "flow_output");
> >> >      ENGINE_NODE(addr_sets, "addr_sets");
> >> >      ENGINE_NODE(port_groups, "port_groups");
> >> > @@ -1933,13 +2005,19 @@ main(int argc, char *argv[])
> >> >      engine_add_input(&en_port_groups, &en_sb_port_group,
> >> >                       port_groups_sb_port_group_handler);
> >> >
> >> > +    engine_add_input(&en_physical_flow_changes, &en_ct_zones,
> >> > +                     NULL);
> >> > +    engine_add_input(&en_physical_flow_changes, &en_ovs_interface,
> >> > +                     physical_flow_changes_ovs_iface_handler);
> >> > +
> >> >      engine_add_input(&en_flow_output, &en_addr_sets,
> >> >                       flow_output_addr_sets_handler);
> >> >      engine_add_input(&en_flow_output, &en_port_groups,
> >> >                       flow_output_port_groups_handler);
> >> >      engine_add_input(&en_flow_output, &en_runtime_data, NULL);
> >> > -    engine_add_input(&en_flow_output, &en_ct_zones, NULL);
> >> >      engine_add_input(&en_flow_output, &en_mff_ovn_geneve, NULL);
> >> > +    engine_add_input(&en_flow_output, &en_physical_flow_changes,
> >> > +                     flow_output_physical_flow_changes_handler);
> >> >
> >> >      engine_add_input(&en_flow_output, &en_ovs_open_vswitch, NULL);
> >> >      engine_add_input(&en_flow_output, &en_ovs_bridge, NULL);
> >> > diff --git a/controller/physical.c b/controller/physical.c
> >> > index 144aeb7bd..03eb677d1 100644
> >> > --- a/controller/physical.c
> >> > +++ b/controller/physical.c
> >> > @@ -1780,6 +1780,57 @@ physical_run(struct physical_ctx *p_ctx,
> >> >      simap_destroy(&new_tunnel_to_ofport);
> >> >  }
> >> >
> >> > +bool
> >> > +physical_handle_ovs_iface_changes(struct physical_ctx *p_ctx,
> >> > +                                  struct ovn_desired_flow_table
> >> *flow_table)
> >> > +{
> >> > +    const struct ovsrec_interface *iface_rec;
> >> > +    OVSREC_INTERFACE_TABLE_FOR_EACH_TRACKED (iface_rec,
> >> p_ctx->iface_table) {
> >> > +        if (!strcmp(iface_rec->type, "geneve") ||
> >> > +            !strcmp(iface_rec->type, "patch")) {
> >> > +            return false;
> >> > +        }
> >> > +    }
> >> > +
> >> > +    struct ofpbuf ofpacts;
> >> > +    ofpbuf_init(&ofpacts, 0);
> >> > +
> >> > +    OVSREC_INTERFACE_TABLE_FOR_EACH_TRACKED (iface_rec,
> >> p_ctx->iface_table) {
> >> > +        const char *iface_id = smap_get(&iface_rec->external_ids,
> >> "iface-id");
> >> > +        if (!iface_id) {
> >> > +            continue;
> >> > +        }
> >> > +
> >> > +        const struct local_binding *lb =
> >> > +            local_binding_find(p_ctx->local_bindings, iface_id);
> >> > +
> >> > +        if (!lb || !lb->pb) {
> >> > +            continue;
> >> > +        }
> >> > +
> >> > +        int64_t ofport = iface_rec->n_ofport ? *iface_rec->ofport :
> 0;
> >> > +        if (ovsrec_interface_is_deleted(iface_rec)) {
> >> > +            ofctrl_remove_flows(flow_table, &lb->pb->header_.uuid);
> >> > +            simap_find_and_delete(&localvif_to_ofport, iface_id);
> >> > +        } else {
> >> > +            if (!ovsrec_interface_is_new(iface_rec)) {
> >> > +                ofctrl_remove_flows(flow_table,
> &lb->pb->header_.uuid);
> >> > +            }
> >> > +
> >> > +            simap_put(&localvif_to_ofport, iface_id, ofport);
> >> > +            consider_port_binding(p_ctx->sbrec_port_binding_by_name,
> >> > +                                  p_ctx->mff_ovn_geneve,
> p_ctx->ct_zones,
> >> > +                                  p_ctx->active_tunnels,
> >> > +                                  p_ctx->local_datapaths,
> >> > +                                  lb->pb, p_ctx->chassis,
> >> > +                                  flow_table, &ofpacts);
> >> > +        }
> >> > +    }
> >> > +
> >> > +    ofpbuf_uninit(&ofpacts);
> >> > +    return true;
> >> > +}
> >> > +
> >> >  bool
> >> >  get_tunnel_ofport(const char *chassis_name, char *encap_ip,
> ofp_port_t
> >> *ofport)
> >> >  {
> >> > diff --git a/controller/physical.h b/controller/physical.h
> >> > index dadf84ea1..9ca34436a 100644
> >> > --- a/controller/physical.h
> >> > +++ b/controller/physical.h
> >> > @@ -49,11 +49,13 @@ struct physical_ctx {
> >> >      const struct ovsrec_bridge *br_int;
> >> >      const struct sbrec_chassis_table *chassis_table;
> >> >      const struct sbrec_chassis *chassis;
> >> > +    const struct ovsrec_interface_table *iface_table;
> >> >      const struct sset *active_tunnels;
> >> >      struct hmap *local_datapaths;
> >> >      struct sset *local_lports;
> >> >      const struct simap *ct_zones;
> >> >      enum mf_field_id mff_ovn_geneve;
> >> > +    struct shash *local_bindings;
> >> >  };
> >> >
> >> >  void physical_register_ovs_idl(struct ovsdb_idl *);
> >> > @@ -63,7 +65,8 @@ void physical_handle_port_binding_changes(struct
> >> physical_ctx *,
> >> >                                            struct
> ovn_desired_flow_table
> >> *);
> >> >  void physical_handle_mc_group_changes(struct physical_ctx *,
> >> >                                        struct ovn_desired_flow_table
> *);
> >> > -
> >> > +bool physical_handle_ovs_iface_changes(struct physical_ctx *,
> >> > +                                       struct ovn_desired_flow_table
> *);
> >> >  bool get_tunnel_ofport(const char *chassis_name, char *encap_ip,
> >> >                         ofp_port_t *ofport);
> >> >  #endif /* controller/physical.h */
> >> > --
> >> > 2.26.2
> >> >
> >> > _______________________________________________
> >> > dev mailing list
> >> > dev at openvswitch.org
> >> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >> _______________________________________________
> >> dev mailing list
> >> dev at openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>


More information about the dev mailing list