[ovs-dev] [PATCH v2] ovsdb-server: Alleviate the possible data loss in an active/standby setup

Han Zhou zhouhan at gmail.com
Tue Sep 11 17:55:56 UTC 2018


On Tue, Sep 11, 2018 at 10:30 AM <nusiddiq at redhat.com> wrote:
>
> From: Numan Siddique <nusiddiq at redhat.com>
>
> The present code resets the database when it is in the state -
> 'RPL_S_SCHEMA_REQUESTED' and repopulates the database when it
> receives the monitor reply when it is in the state -
> 'RPL_S_MONITOR_REQUESTED'. If however, it goes to active mode
> before it processes the monitor reply, the whole data is lost.
>
> This patch alleviates the problem by resetting the database when it
> receives the monitor reply (before processing it). So that
> reset and repopulation of the db happens in the same state.
>
> This approach still has a window for data loss if the function
> process_notification() when processing the monitor reply fails for
> some reason or ovsdb-server crashes for some reason during
> process_notification().
>
> Reported-by: Han Zhou <zhouhan at gmail.com>
> Reported-at:
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-August/047161.html
> Tested-by: aginwala <aginwala at ebay.com>
> Acked-by: Han Zhou <zhouhan at gmail.com>
> Signed-off-by: Numan Siddique <nusiddiq at redhat.com>
> ---
>  ovsdb/replication.c | 25 +++++++++++--------------
>  1 file changed, 11 insertions(+), 14 deletions(-)
>
> v1 -> v2
> --------
>  * Updated the commit message as per Han's suggestion
>  * Added few comments in the code where it resets the db.
>
> diff --git a/ovsdb/replication.c b/ovsdb/replication.c
> index 2b9ae2f83..752b3c89c 100644
> --- a/ovsdb/replication.c
> +++ b/ovsdb/replication.c
> @@ -299,19 +299,7 @@ replication_run(void)
>                  /* After receiving schemas, reset the local databases
that
>                   * will be monitored and send out monitor requests for
them. */
>                  if (hmap_is_empty(&request_ids)) {
> -                    struct shash_node *node, *next;
> -
> -                    SHASH_FOR_EACH_SAFE (node, next, replication_dbs) {
> -                        db = node->data;
> -                        error = reset_database(db);
> -                        if (error) {
> -                            const char *db_name = db->schema->name;
> -                            shash_find_and_delete(replication_dbs,
db_name);
> -                            ovsdb_error_assert(error);
> -                            VLOG_WARN("Failed to reset database, "
> -                                      "%s not replicated.", db_name);
> -                        }
> -                    }
> +                    struct shash_node *node;
>
>                      if (shash_is_empty(replication_dbs)) {
>                          VLOG_WARN("Nothing to replicate.");
> @@ -335,7 +323,16 @@ replication_run(void)
>              case RPL_S_MONITOR_REQUESTED: {
>                  /* Reply to monitor requests. */
>                  struct ovsdb_error *error;
> -                error = process_notification(msg->result, db);
> +                VLOG_INFO("Monitor request received. Resetting the
database");
> +                /* Resetting the database here has few risks. If the
> +                 * process_notification() fails, the database is
completely
> +                 * lost locally. In case that node becomes active, then
> +                 * there is a chance of complete data loss in the
active/standy
> +                 * cluster. */
> +                error = reset_database(db);
> +                if (!error) {
> +                    error = process_notification(msg->result, db);
> +                }
>                  if (error) {
>                      ovsdb_error_assert(error);
>                      state = RPL_S_ERR;
> --
> 2.17.1
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Thanks Numan!


More information about the dev mailing list