[ovs-dev] [PATCH] ovn-controller: Add unix reconnect command

Zhen Wang (SW-CLOUD) zhewang at nvidia.com
Thu Aug 27 05:30:22 UTC 2020


Hi Han,

Thanks for reviewing my code.
Please my reply inline.

From: Han Zhou <hzhou at ovn.org>
Date: Wednesday, August 26, 2020 at 4:55 PM
To: "Zhen Wang (SW-CLOUD)" <zhewang at nvidia.com>
Cc: ovs-dev <ovs-dev at openvswitch.org>, Han Zhou <hzhou at ovn.org>
Subject: Re: [PATCH] ovn-controller: Add unix reconnect command

External email: Use caution opening links or attachments



On Thu, Aug 13, 2020 at 1:56 PM Zhen Wang <zhewang at nvidia.com<mailto:zhewang at nvidia.com>> wrote:
>
> OVN SB in clustered mode, all the ovn-controller clients connect
> across all the nodes in a balanced state. When one raft node down and
> online. All the ovn-controller clients will not migirate back which
> cause RAFT DB clients unbalanced state.
> RAFT clients in an unbalanced state would trigger more stress to the SB.
> This commit introduce one unix command reconnect remote which let user
> trigger a force reconnect to desisred RAFT node which can adddress the
> problem.
> Note: this patch requires ovsdb-idl function ovsdb_idl_set_next_remote.

Thanks for the patch. Just a hint that for OVN patches please add keyword OVN in the email subject's [ ] section, which allows 0-day Robot to check the patch cleanly.
Please see my comments below.

Thanks for the information.
Will add the ovn subject for the v1 patch which address your comments.

>
> Reported-at:https://mail.openvswitch.org/pipermail/ovs-discuss/2020-August/050518.html
> Signed-off-by: Zhen Wang <zhewang at nvidia.com<mailto:zhewang at nvidia.com>>
> ---
>  controller/ovn-controller.8.xml | 12 ++++++++++++
>  controller/ovn-controller.c     | 27 +++++++++++++++++++++++++++
>  2 files changed, 39 insertions(+)
>
> diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
> index 66877314c..66f521398 100644
> --- a/controller/ovn-controller.8.xml
> +++ b/controller/ovn-controller.8.xml
> @@ -507,6 +507,18 @@
>          local index so that it can interact with the southbound database again.
>        </p>
>        </dd>
> +
> +      <dt><code>reconnect</code></dt>
> +      <dd>
> +      <p>
> +        Trigger a force reconnect to one specific remote in Open_vSwitch table
> +        external_ids:ovn-remote.
> +      </p>
> +      <p>
> +        This command is intended to use in the event of clustered SB DB has
> +        unbalanced clients across the raft nodes.
> +      </p>
> +      </dd>
>        </dl>
>      </p>
>
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index ea6a436c0..46ed90492 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -68,6 +68,7 @@
>  VLOG_DEFINE_THIS_MODULE(main);
>
>  static unixctl_cb_func ovn_controller_exit;
> +static unixctl_cb_func ovn_controller_reconnect;
>  static unixctl_cb_func ct_zone_list;
>  static unixctl_cb_func extend_table_list;
>  static unixctl_cb_func inject_pkt;
> @@ -2078,6 +2079,11 @@ struct ovn_controller_exit_args {
>      bool *restart;
>  };
>
> +struct ovn_controller_reconnect_args {
> +    bool trigger;
> +    char *ovn_remote;
> +};
> +
>  int
>  main(int argc, char *argv[])
>  {
> @@ -2085,6 +2091,7 @@ main(int argc, char *argv[])
>      bool exiting;
>      bool restart;
>      struct ovn_controller_exit_args exit_args = {&exiting, &restart};
> +    struct ovn_controller_reconnect_args reconnect = {false, NULL};
>      int retval;
>
>      ovs_cmdl_proctitle_init(argc, argv);
> @@ -2103,6 +2110,8 @@ main(int argc, char *argv[])
>      }
>      unixctl_command_register("exit", "", 0, 1, ovn_controller_exit,
>                               &exit_args);
> +    unixctl_command_register("reconnect", "", 1, 1, ovn_controller_reconnect,
> +                             &reconnect);
>
>      daemonize_complete();
>
> @@ -2511,6 +2520,13 @@ main(int argc, char *argv[])
>                                                 sb_monitor_all);
>                          }
>                      }
> +                    if (reconnect.trigger && ovsdb_idl_is_connected(ovnsb_idl_loop.idl)) {
> +                        VLOG_INFO("User triggered force reconnect to %s", reconnect.ovn_remote);
> +                        ovsdb_idl_set_next_remote(ovnsb_idl_loop.idl, reconnect.ovn_remote);
> +                        ovsdb_idl_force_reconnect(ovnsb_idl_loop.idl);
> +                        free(reconnect.ovn_remote);
> +                        reconnect.trigger = false;
> +                    }
>                  }
>
>              }
> @@ -2674,6 +2690,7 @@ main(int argc, char *argv[])
>      ovsdb_idl_loop_destroy(&ovnsb_idl_loop);
>
>      free(ovs_remote);
> +    free(reconnect.ovn_remote);
>      service_stop();
>
>      exit(retval);
> @@ -2780,6 +2797,16 @@ ovn_controller_exit(struct unixctl_conn *conn, int argc,
>      unixctl_command_reply(conn, NULL);
>  }
>
> +static void
> +ovn_controller_reconnect(struct unixctl_conn *conn, int argc OVS_UNUSED,
> +             const char *argv[], void *reconnect_args_)
> +{
> +    struct ovn_controller_reconnect_args *reconnect_args = reconnect_args_;

It may be better to validate the args is part of the ovn-remote set in external-ids:ovn-remote.

Agree.
Will add new new field ovn-remotes to the struct ovn_controller_reconnect_args and do the validation
In the function.

> +    reconnect_args->trigger = true;
> +    reconnect_args->ovn_remote = xstrdup(argv[1]);

Here can be memory leak if this command has been invoked before and reconnect_args->ovn_remote was not NULL.

Good catch.
I will take care  this part and send our the updated patch file this week.

Regards,
Zhen.

Thanks,
Han

> +    unixctl_command_reply(conn, NULL);
> +}
> +
>  static void
>  ct_zone_list(struct unixctl_conn *conn, int argc OVS_UNUSED,
>               const char *argv[] OVS_UNUSED, void *ct_zones_)
> --
> 2.20.1
>


More information about the dev mailing list