[ovs-dev] [PATCH ovn] [RFC] Add chassis liveness monitoring mechanism

Dumitru Ceara dceara at redhat.com
Thu Feb 20 09:18:51 UTC 2020


On 2/19/20 4:37 PM, lmartins at redhat.com wrote:
> From: Lucas Alvares Gomes <lucasagomes at gmail.com>
> 
> NOTE: SENDING THIS PATCH AS A RFC TO SEE WHAT OTHERS MIGHT THINK ABOUT
> THE IDEA PRIOR TO WRITTING TESTS TO IT.
> 
> CMSes integrating with OVN often uses the nb_cfg mechanism as a way to
> check the health status of the ovn-controller process but, the current
> implementation isn't ideal for that purpose because it floods the control
> plane with update notifications every time the nb_cfg value is
> incremented.
> 
> This patch is merging two ideas:
> 
> 1) Han Zhou proposed a patch [0] creating a table called Chassis_Private
>    where each hypervisor *only* writes and monitors its own record to
>    avoid this flooding problem.
> 
> 2) Having ovn-controller to periodically log that it's alive instead of
>    relying on the nb_cfg mechanism.
> 
> By using this mechanism, a CMS can more easily just read this new
> Chassis_Private table and figure out the status of each Chassis
> (ovn-controller) in the cluster.
> 

Hi Lucas,

Thanks a lot for working on this!

> Here's some reasons why I believe this approach is better than having
> to bump the  nb_cfg value:
> 
> 1) Simple integration. Before, the CMS had to increment the nb_cfg
>    value in the NB_Global table and wait for it to be propagated to the
>    Southbound database (ovn-northd and then ovn-controllers) Chassis
>    table in order to know the status of the Chassis.
> 
>    Now, it can just read the Chassis_Private table and compare the
>    alive_at column against the current time.

I guess the only comment I have about this approach is that it doesn't
feel completely OK to store a timestamp representation as string in
"alive_at". I'm afraid this might be too inflexible and force CMSes to
compare their local time with the value in this column.

I know we discussed offline that using a counter that is periodically
incremented by ovn-controller instead of a timestamp would complicate
the implementation in Openstack but maybe other people on this mailing
list have ideas on how to deal with this in a more generic way.

> 
> 2) Less costy. Just one read from the db is needed, no writing. Code
>    using the nb_cfg mechanism had to implement a few safe-guard code to
>    make less error prone. See [1] and [2] for example.
> 
> 3) Backwards compatibility. The patch [0] was moving the nb_cfg value
>    to new table so, systems relying on it would need to update their
>    code upon updating OVN.

Nice!

One more minor comment on the code, inline.

Thanks,
Dumitru

> 
> To enable this new mechanism set an option called
> chassis_liveness_interval=N to the "options" column in the NB_Global
> table, where N is the time (in seconds) in which the ovn-controller
> should updates the alive_at column. If not set or set to 0, the
> mechanism is disabled. By default, it's disabled.
> 
> [0] https://patchwork.ozlabs.org/patch/899608/
> [1] https://review.opendev.org/#/c/703612/
> [2] https://review.opendev.org/#/c/704530/
> 
> Co-authored-by: Han Zhou <hzhou8 at ebay.com>
> Co-authored-by: Numan Siddique <numans at ovn.org>
> Co-authored-by: Dumitru Ceara <dceara at redhat.com>
> Signed-off-by: Lucas Alvares Gomes <lucasagomes at gmail.com>
> ---
>  controller/chassis.c        | 56 +++++++++++++++++++++++++++++++++++--
>  controller/chassis.h        | 11 ++++++--
>  controller/ovn-controller.c | 44 +++++++++++++++++++++++++++--
>  lib/chassis-index.c         | 26 +++++++++++++++++
>  lib/chassis-index.h         |  5 ++++
>  northd/ovn-northd.c         |  4 +++
>  ovn-sb.ovsschema            | 13 +++++++--
>  ovn-sb.xml                  | 35 +++++++++++++++++++++++
>  8 files changed, 185 insertions(+), 9 deletions(-)
> 
> diff --git a/controller/chassis.c b/controller/chassis.c
> index 522893ead..7892cb966 100644
> --- a/controller/chassis.c
> +++ b/controller/chassis.c
> @@ -24,6 +24,7 @@
>  #include "openvswitch/dynamic-string.h"
>  #include "openvswitch/vlog.h"
>  #include "openvswitch/ofp-parse.h"
> +#include "openvswitch/poll-loop.h"
>  #include "lib/chassis-index.h"
>  #include "lib/ovn-sb-idl.h"
>  #include "ovn-controller.h"
> @@ -47,11 +48,16 @@ struct chassis_info {
>  
>      /* True if Chassis SB record is initialized, false otherwise. */
>      uint32_t id_inited : 1;
> +
> +    /* Next livennes update time to update the 'alive_at' column of
> +     * Chassis_Private table. */
> +    long long int next_liveness_update;
>  };
>  
>  static struct chassis_info chassis_state = {
>      .id = DS_EMPTY_INITIALIZER,
>      .id_inited = false,
> +    .next_liveness_update = LLONG_MAX,
>  };
>  
>  static void
> @@ -581,18 +587,38 @@ chassis_update(const struct sbrec_chassis *chassis_rec,
>      free(encaps);
>  }
>  
> +static void
> +chassis_update_liveness_cfg(const struct sbrec_chassis_private *chassis_rec,
> +                            struct chassis_info *ch_info,
> +                            int chassis_liveness_interval)
> +{
> +    if (ch_info->next_liveness_update == LLONG_MAX ||
> +        time_msec() > ch_info->next_liveness_update) {
> +        sbrec_chassis_private_set_alive_at(
> +            chassis_rec,
> +            xastrftime_msec("%Y-%m-%d %H:%M:%S", time_wall_msec(), true));
> +        ch_info->next_liveness_update =
> +            time_msec() + chassis_liveness_interval * 1000;
> +    }
> +}
> +
>  /* Returns this chassis's Chassis record, if it is available. */
>  const struct sbrec_chassis *
>  chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
>              struct ovsdb_idl_index *sbrec_chassis_by_name,
> +            struct ovsdb_idl_index *sbrec_chassis_private_by_name,
>              const struct ovsrec_open_vswitch_table *ovs_table,
>              const struct sbrec_chassis_table *chassis_table,
>              const char *chassis_id,
>              const struct ovsrec_bridge *br_int,
> -            const struct sset *transport_zones)
> +            const struct sset *transport_zones,
> +            const struct sbrec_chassis_private **chassis_private,
> +            int chassis_liveness_interval)
>  {
>      struct ovs_chassis_cfg ovs_cfg;
>  
> +    *chassis_private = NULL;
> +
>      /* Get the chassis config from the ovs table. */
>      ovs_chassis_cfg_init(&ovs_cfg);
>      if (!chassis_parse_ovs_config(ovs_table, br_int, &ovs_cfg)) {
> @@ -616,6 +642,24 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
>                                    chassis_id);
>      }
>  
> +    /* Create the Chassis_Private entry if it's not there yet */
> +    const struct sbrec_chassis_private *chassis_private_rec =
> +        chassis_private_lookup_by_name(sbrec_chassis_private_by_name,
> +                                       chassis_id);
> +    if (!chassis_private_rec && ovnsb_idl_txn) {
> +        chassis_private_rec = sbrec_chassis_private_insert(ovnsb_idl_txn);
> +        sbrec_chassis_private_set_name(chassis_private_rec, chassis_id);
> +    }
> +
> +    /* Update the alive_at column from the Chassis_Private table if
> +     * the chassis liveness mechanism is enabled */
> +    if (chassis_private_rec && ovnsb_idl_txn && chassis_liveness_interval) {
> +        chassis_update_liveness_cfg(chassis_private_rec, &chassis_state,
> +                                    chassis_liveness_interval);
> +    }
> +
> +    *chassis_private = chassis_private_rec;
> +
>      ovs_chassis_cfg_destroy(&ovs_cfg);
>      return chassis_rec;
>  }
> @@ -669,7 +713,8 @@ chassis_get_mac(const struct sbrec_chassis *chassis_rec,
>   * required. */
>  bool
>  chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> -                const struct sbrec_chassis *chassis_rec)
> +                const struct sbrec_chassis *chassis_rec,
> +                const struct sbrec_chassis_private *chassis_private_rec)
>  {
>      if (!chassis_rec) {
>          return true;
> @@ -679,6 +724,7 @@ chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
>                                    "ovn-controller: unregistering chassis '%s'",
>                                    chassis_rec->name);
>          sbrec_chassis_delete(chassis_rec);
> +        sbrec_chassis_private_delete(chassis_private_rec);
>      }
>      return false;
>  }
> @@ -695,3 +741,9 @@ chassis_get_id(void)
>  
>      return NULL;
>  }
> +
> +void chassis_wait(int chassis_liveness_interval) {
> +    if (chassis_liveness_interval) {
> +        poll_timer_wait_until(chassis_state.next_liveness_update);
> +    }
> +}
> diff --git a/controller/chassis.h b/controller/chassis.h
> index 178d2957e..38a549553 100644
> --- a/controller/chassis.h
> +++ b/controller/chassis.h
> @@ -17,6 +17,7 @@
>  #define OVN_CHASSIS_H 1
>  
>  #include <stdbool.h>
> +#include "lib/ovn-sb-idl.h"
>  
>  struct ovsdb_idl;
>  struct ovsdb_idl_index;
> @@ -33,17 +34,21 @@ void chassis_register_ovs_idl(struct ovsdb_idl *);
>  const struct sbrec_chassis *chassis_run(
>      struct ovsdb_idl_txn *ovnsb_idl_txn,
>      struct ovsdb_idl_index *sbrec_chassis_by_name,
> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
>      const struct ovsrec_open_vswitch_table *,
>      const struct sbrec_chassis_table *,
>      const char *chassis_id, const struct ovsrec_bridge *br_int,
> -    const struct sset *transport_zones);
> +    const struct sset *transport_zones,
> +    const struct sbrec_chassis_private **chassis_private,
> +    int chassis_liveness_interval);
>  bool chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> -                     const struct sbrec_chassis *);
> +                     const struct sbrec_chassis *,
> +                     const struct sbrec_chassis_private *);
>  bool chassis_get_mac(const struct sbrec_chassis *chassis,
>                       const char *bridge_mapping,
>                       struct eth_addr *chassis_mac);
>  const char *chassis_get_id(void);
>  const char * get_chassis_mac_mappings(const struct smap *ext_ids);
> -
> +void chassis_wait(int chassis_liveness_interval);
>  
>  #endif /* controller/chassis.h */
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 4d245ca28..fd9d95da2 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -155,6 +155,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>      struct ovsdb_idl_condition ce =  OVSDB_IDL_CONDITION_INIT(&ce);
>      struct ovsdb_idl_condition ip_mcast = OVSDB_IDL_CONDITION_INIT(&ip_mcast);
>      struct ovsdb_idl_condition igmp = OVSDB_IDL_CONDITION_INIT(&igmp);
> +    struct ovsdb_idl_condition chprv = OVSDB_IDL_CONDITION_INIT(&chprv);
>  
>      if (monitor_all) {
>          ovsdb_idl_condition_add_clause_true(&pb);
> @@ -165,6 +166,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>          ovsdb_idl_condition_add_clause_true(&ce);
>          ovsdb_idl_condition_add_clause_true(&ip_mcast);
>          ovsdb_idl_condition_add_clause_true(&igmp);
> +        ovsdb_idl_condition_add_clause_true(&chprv);
>          goto out;
>      }
>  
> @@ -196,6 +198,10 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>                                                    &chassis->header_.uuid);
>          sbrec_igmp_group_add_clause_chassis(&igmp, OVSDB_F_EQ,
>                                              &chassis->header_.uuid);
> +
> +        /* Monitors Chassis_Private record for current chassis only */
> +        sbrec_chassis_private_add_clause_name(&chprv, OVSDB_F_EQ,
> +                                              chassis->name);
>      }
>      if (local_ifaces) {
>          const char *name;
> @@ -229,6 +235,7 @@ out:
>      sbrec_controller_event_set_condition(ovnsb_idl, &ce);
>      sbrec_ip_multicast_set_condition(ovnsb_idl, &ip_mcast);
>      sbrec_igmp_group_set_condition(ovnsb_idl, &igmp);
> +    sbrec_chassis_private_set_condition(ovnsb_idl, &chprv);
>      ovsdb_idl_condition_destroy(&pb);
>      ovsdb_idl_condition_destroy(&lf);
>      ovsdb_idl_condition_destroy(&mb);
> @@ -237,6 +244,7 @@ out:
>      ovsdb_idl_condition_destroy(&ce);
>      ovsdb_idl_condition_destroy(&ip_mcast);
>      ovsdb_idl_condition_destroy(&igmp);
> +    ovsdb_idl_condition_destroy(&chprv);
>  }
>  
>  static const char *
> @@ -687,6 +695,15 @@ get_transport_zones(const struct ovsrec_open_vswitch_table *ovs_table)
>      return smap_get_def(&cfg->external_ids, "ovn-transport-zones", "");
>  }
>  
> +static uint16_t
> +get_chassis_liveness_interval(const struct sbrec_sb_global_table
> +                              *sb_global_table)
> +{
> +    const struct sbrec_sb_global *sb
> +        = sbrec_sb_global_table_first(sb_global_table);
> +    return atoi(smap_get_def(&sb->options, "chassis_liveness_interval", "0"));

We have smap_get_int in smap.h which does exactly this with a bit more
error checking.

> +}
> +
>  static void
>  ctrl_register_ovs_idl(struct ovsdb_idl *ovs_idl)
>  {
> @@ -1760,6 +1777,8 @@ main(int argc, char *argv[])
>  
>      struct ovsdb_idl_index *sbrec_chassis_by_name
>          = chassis_index_create(ovnsb_idl_loop.idl);
> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name
> +        = chassis_private_index_create(ovnsb_idl_loop.idl);
>      struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath
>          = mcast_group_index_create(ovnsb_idl_loop.idl);
>      struct ovsdb_idl_index *sbrec_port_binding_by_name
> @@ -1821,6 +1840,10 @@ main(int argc, char *argv[])
>      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_status);
>      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_target);
>  
> +    /* Do not monitor the Chassis_Private external_ids column */
> +    ovsdb_idl_omit(ovnsb_idl_loop.idl,
> +                   &sbrec_chassis_private_col_external_ids);
> +
>      update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
>  
>      stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
> @@ -1998,10 +2021,16 @@ main(int argc, char *argv[])
>                  process_br_int(ovs_idl_txn, bridge_table, ovs_table);
>              const char *chassis_id = get_ovs_chassis_id(ovs_table);
>              const struct sbrec_chassis *chassis = NULL;
> +            const struct sbrec_chassis_private *chassis_private = NULL;
> +            uint16_t chassis_liveness_interval = get_chassis_liveness_interval(
> +                sbrec_sb_global_table_get(ovnsb_idl_loop.idl));
>              if (chassis_id) {
>                  chassis = chassis_run(ovnsb_idl_txn, sbrec_chassis_by_name,
> +                                      sbrec_chassis_private_by_name,
>                                        ovs_table, chassis_table, chassis_id,
> -                                      br_int, &transport_zones);
> +                                      br_int, &transport_zones,
> +                                      &chassis_private,
> +                                      chassis_liveness_interval);
>              }
>  
>              if (br_int) {
> @@ -2164,6 +2193,10 @@ main(int argc, char *argv[])
>                  ofctrl_wait();
>                  pinctrl_wait(ovnsb_idl_txn);
>              }
> +
> +            if (chassis) {
> +                chassis_wait(chassis_liveness_interval);
> +            }
>          }
>  
>          unixctl_server_run(unixctl);
> @@ -2232,10 +2265,17 @@ main(int argc, char *argv[])
>                     ? chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id)
>                     : NULL);
>  
> +            const struct sbrec_chassis_private *chassis_private
> +                = (chassis_id
> +                   ? chassis_private_lookup_by_name(
> +                       sbrec_chassis_private_by_name, chassis_id)
> +                   : NULL);
> +
>              /* Run all of the cleanup functions, even if one of them returns
>               * false. We're done if all of them return true. */
>              done = binding_cleanup(ovnsb_idl_txn, port_binding_table, chassis);
> -            done = chassis_cleanup(ovnsb_idl_txn, chassis) && done;
> +            done = chassis_cleanup(ovnsb_idl_txn, chassis,
> +                                   chassis_private) && done;
>              done = encaps_cleanup(ovs_idl_txn, br_int) && done;
>              done = igmp_group_cleanup(ovnsb_idl_txn, sbrec_igmp_group) && done;
>              if (done) {
> diff --git a/lib/chassis-index.c b/lib/chassis-index.c
> index 39066f4cc..d9bbba8e4 100644
> --- a/lib/chassis-index.c
> +++ b/lib/chassis-index.c
> @@ -40,6 +40,32 @@ chassis_lookup_by_name(struct ovsdb_idl_index *sbrec_chassis_by_name,
>      return retval;
>  }
>  
> +struct ovsdb_idl_index *
> +chassis_private_index_create(struct ovsdb_idl *idl)
> +{
> +    return ovsdb_idl_index_create1(
> +        idl, &sbrec_chassis_private_col_name);
> +}
> +
> +/* Finds and returns the chassis with the given 'name', or NULL if no such
> + * chassis exists. */
> +const struct sbrec_chassis_private *
> +chassis_private_lookup_by_name(
> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
> +    const char *name)
> +{
> +    struct sbrec_chassis_private *target = \
> +        sbrec_chassis_private_index_init_row(sbrec_chassis_private_by_name);
> +    sbrec_chassis_private_index_set_name(target, name);
> +
> +    struct sbrec_chassis_private *retval = sbrec_chassis_private_index_find(
> +        sbrec_chassis_private_by_name, target);
> +
> +    sbrec_chassis_private_index_destroy_row(target);
> +
> +    return retval;
> +}
> +
>  struct ovsdb_idl_index *
>  ha_chassis_group_index_create(struct ovsdb_idl *idl)
>  {
> diff --git a/lib/chassis-index.h b/lib/chassis-index.h
> index 302e5f0fd..77d2e1f99 100644
> --- a/lib/chassis-index.h
> +++ b/lib/chassis-index.h
> @@ -23,6 +23,11 @@ struct ovsdb_idl_index *chassis_index_create(struct ovsdb_idl *);
>  const struct sbrec_chassis *chassis_lookup_by_name(
>      struct ovsdb_idl_index *sbrec_chassis_by_name, const char *name);
>  
> +struct ovsdb_idl_index *chassis_private_index_create(struct ovsdb_idl *);
> +
> +const struct sbrec_chassis_private *chassis_private_lookup_by_name(
> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name, const char *name);
> +
>  struct ovsdb_idl_index *ha_chassis_group_index_create(struct ovsdb_idl *idl);
>  const struct sbrec_ha_chassis_group *ha_chassis_group_lookup_by_name(
>      struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name, const char *name);
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 2580b4ec9..1b16f5ef6 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -11555,6 +11555,10 @@ main(int argc, char *argv[])
>      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
>      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_external_ids);
>  
> +    ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis_private);
> +    add_column_noalert(ovnsb_idl_loop.idl,
> +                       &sbrec_chassis_private_col_external_ids);
> +
>      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_ha_chassis);
>      add_column_noalert(ovnsb_idl_loop.idl,
>                         &sbrec_ha_chassis_col_chassis);
> diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
> index d89f8dbbb..88da05849 100644
> --- a/ovn-sb.ovsschema
> +++ b/ovn-sb.ovsschema
> @@ -1,7 +1,7 @@
>  {
>      "name": "OVN_Southbound",
> -    "version": "2.7.0",
> -    "cksum": "4286723485 21693",
> +    "version": "2.8.0",
> +    "cksum": "293994447 22064",
>      "tables": {
>          "SB_Global": {
>              "columns": {
> @@ -43,6 +43,15 @@
>                                                "max": "unlimited"}}},
>              "isRoot": true,
>              "indexes": [["name"]]},
> +        "Chassis_Private": {
> +            "columns": {
> +                "name": {"type": "string"},
> +                "alive_at": {"type": "string"},
> +                "external_ids": {
> +                    "type": {"key": "string", "value": "string",
> +                             "min": 0, "max": "unlimited"}}},
> +            "isRoot": true,
> +            "indexes": [["name"]]},
>          "Encap": {
>              "columns": {
>                  "type": {"type": {"key": {
> diff --git a/ovn-sb.xml b/ovn-sb.xml
> index 3ae9d4f92..8c9822fdb 100644
> --- a/ovn-sb.xml
> +++ b/ovn-sb.xml
> @@ -366,6 +366,41 @@
>      </group>
>    </table>
>  
> +  <table name="Chassis_Private" title="Chassis Private">
> +    <p>
> +      Each row in this table maintains per chassis private data that are
> +      accessed only by the owning chassis (write only) and ovn-northd, not by
> +      any other chassis.  These data are stored in this separate table instead
> +      of the <ref table="Chassis"/> table for performance considerations:
> +      the rows in this table can be conditionally monitored by chassises so
> +      that each chassis only get update notifications for its own row, to avoid
> +      unnecessary chassis private data update flooding in a large scale
> +      deployment.  (Future: this separation can be avoided if ovsdb conditional
> +      monitoring is supported on a set of columns)
> +    </p>
> +
> +    <column name="name">
> +      The name of the chassis that owns these chassis-private data.
> +    </column>
> +
> +    <column name="alive_at">
> +      A timestamp indicating the last time ovn-controller
> +      signalized it's alive. For monitoring purposes, setting the
> +      chassis_liveness_interval configuration to the OVN_Northhbound's
> +      <ref table="NB_Global" column="options"/> with an integer value (in
> +      seconds) causes the <code>ovn-controller</code> to update this
> +      column with a timestamp every N seconds. If the value is not set
> +      or 0, then <code>ovn-controller</code> doesn't update this column.
> +    </column>
> +
> +    <group title="Common Columns">
> +      The overall purpose of these columns is described under <code>Common
> +      Columns</code> at the beginning of this document.
> +
> +      <column name="external_ids"/>
> +    </group>
> +  </table>
> +
>    <table name="Encap" title="Encapsulation Types">
>      <p>
>        The <ref column="encaps" table="Chassis"/> column in the <ref
> 



More information about the dev mailing list