[ovs-dev] [PATCH] ovn-controller: Add unix reconnect command
Zhen Wang (SW-CLOUD)
zhewang at nvidia.com
Thu Aug 27 05:30:22 UTC 2020
Hi Han,
Thanks for reviewing my code.
Please my reply inline.
From: Han Zhou <hzhou at ovn.org>
Date: Wednesday, August 26, 2020 at 4:55 PM
To: "Zhen Wang (SW-CLOUD)" <zhewang at nvidia.com>
Cc: ovs-dev <ovs-dev at openvswitch.org>, Han Zhou <hzhou at ovn.org>
Subject: Re: [PATCH] ovn-controller: Add unix reconnect command
External email: Use caution opening links or attachments
On Thu, Aug 13, 2020 at 1:56 PM Zhen Wang <zhewang at nvidia.com<mailto:zhewang at nvidia.com>> wrote:
>
> OVN SB in clustered mode, all the ovn-controller clients connect
> across all the nodes in a balanced state. When one raft node down and
> online. All the ovn-controller clients will not migirate back which
> cause RAFT DB clients unbalanced state.
> RAFT clients in an unbalanced state would trigger more stress to the SB.
> This commit introduce one unix command reconnect remote which let user
> trigger a force reconnect to desisred RAFT node which can adddress the
> problem.
> Note: this patch requires ovsdb-idl function ovsdb_idl_set_next_remote.
Thanks for the patch. Just a hint that for OVN patches please add keyword OVN in the email subject's [ ] section, which allows 0-day Robot to check the patch cleanly.
Please see my comments below.
Thanks for the information.
Will add the ovn subject for the v1 patch which address your comments.
>
> Reported-at:https://mail.openvswitch.org/pipermail/ovs-discuss/2020-August/050518.html
> Signed-off-by: Zhen Wang <zhewang at nvidia.com<mailto:zhewang at nvidia.com>>
> ---
> controller/ovn-controller.8.xml | 12 ++++++++++++
> controller/ovn-controller.c | 27 +++++++++++++++++++++++++++
> 2 files changed, 39 insertions(+)
>
> diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
> index 66877314c..66f521398 100644
> --- a/controller/ovn-controller.8.xml
> +++ b/controller/ovn-controller.8.xml
> @@ -507,6 +507,18 @@
> local index so that it can interact with the southbound database again.
> </p>
> </dd>
> +
> + <dt><code>reconnect</code></dt>
> + <dd>
> + <p>
> + Trigger a force reconnect to one specific remote in Open_vSwitch table
> + external_ids:ovn-remote.
> + </p>
> + <p>
> + This command is intended to use in the event of clustered SB DB has
> + unbalanced clients across the raft nodes.
> + </p>
> + </dd>
> </dl>
> </p>
>
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index ea6a436c0..46ed90492 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -68,6 +68,7 @@
> VLOG_DEFINE_THIS_MODULE(main);
>
> static unixctl_cb_func ovn_controller_exit;
> +static unixctl_cb_func ovn_controller_reconnect;
> static unixctl_cb_func ct_zone_list;
> static unixctl_cb_func extend_table_list;
> static unixctl_cb_func inject_pkt;
> @@ -2078,6 +2079,11 @@ struct ovn_controller_exit_args {
> bool *restart;
> };
>
> +struct ovn_controller_reconnect_args {
> + bool trigger;
> + char *ovn_remote;
> +};
> +
> int
> main(int argc, char *argv[])
> {
> @@ -2085,6 +2091,7 @@ main(int argc, char *argv[])
> bool exiting;
> bool restart;
> struct ovn_controller_exit_args exit_args = {&exiting, &restart};
> + struct ovn_controller_reconnect_args reconnect = {false, NULL};
> int retval;
>
> ovs_cmdl_proctitle_init(argc, argv);
> @@ -2103,6 +2110,8 @@ main(int argc, char *argv[])
> }
> unixctl_command_register("exit", "", 0, 1, ovn_controller_exit,
> &exit_args);
> + unixctl_command_register("reconnect", "", 1, 1, ovn_controller_reconnect,
> + &reconnect);
>
> daemonize_complete();
>
> @@ -2511,6 +2520,13 @@ main(int argc, char *argv[])
> sb_monitor_all);
> }
> }
> + if (reconnect.trigger && ovsdb_idl_is_connected(ovnsb_idl_loop.idl)) {
> + VLOG_INFO("User triggered force reconnect to %s", reconnect.ovn_remote);
> + ovsdb_idl_set_next_remote(ovnsb_idl_loop.idl, reconnect.ovn_remote);
> + ovsdb_idl_force_reconnect(ovnsb_idl_loop.idl);
> + free(reconnect.ovn_remote);
> + reconnect.trigger = false;
> + }
> }
>
> }
> @@ -2674,6 +2690,7 @@ main(int argc, char *argv[])
> ovsdb_idl_loop_destroy(&ovnsb_idl_loop);
>
> free(ovs_remote);
> + free(reconnect.ovn_remote);
> service_stop();
>
> exit(retval);
> @@ -2780,6 +2797,16 @@ ovn_controller_exit(struct unixctl_conn *conn, int argc,
> unixctl_command_reply(conn, NULL);
> }
>
> +static void
> +ovn_controller_reconnect(struct unixctl_conn *conn, int argc OVS_UNUSED,
> + const char *argv[], void *reconnect_args_)
> +{
> + struct ovn_controller_reconnect_args *reconnect_args = reconnect_args_;
It may be better to validate the args is part of the ovn-remote set in external-ids:ovn-remote.
Agree.
Will add new new field ovn-remotes to the struct ovn_controller_reconnect_args and do the validation
In the function.
> + reconnect_args->trigger = true;
> + reconnect_args->ovn_remote = xstrdup(argv[1]);
Here can be memory leak if this command has been invoked before and reconnect_args->ovn_remote was not NULL.
Good catch.
I will take care this part and send our the updated patch file this week.
Regards,
Zhen.
Thanks,
Han
> + unixctl_command_reply(conn, NULL);
> +}
> +
> static void
> ct_zone_list(struct unixctl_conn *conn, int argc OVS_UNUSED,
> const char *argv[] OVS_UNUSED, void *ct_zones_)
> --
> 2.20.1
>
More information about the dev
mailing list