[ovs-dev] [PATCH] ovn-ctl: Add "--no-leader-only" while cluster db init

Han Zhou zhouhan at gmail.com
Mon Mar 16 19:09:26 UTC 2020


On Mon, Mar 9, 2020 at 3:50 AM Tao YunXiang <
taoyunxiang at cmss.chinamobile.com> wrote:
>
> For cluster mode, only the first server(which is the leader) in a cluster
> should do init. But, sometimes, the role of the first server would
transfer
> leadership, so we should add "--no-leader-only" option in case
> the ovn-northd stucked.
>
Thanks for the patch. This is trickier than it appears.
Let me describe the problem from my understanding. When creating the
cluster, the "init" needs to be executed once. This should be done with the
first server. In this case the script works well, because the
"cluster_remote_addr" is NOT specified when starting the first server,
which is the leader at that time. When another server joins the cluster,
the "cluster_remote_addr" must be specified, so the "init" won't be
executed.
However, when a server needs to be restarted later, no matter if it is
leader or not, it doesn't need to supply the "cluster_remote_addr"
parameter, because that parameter is needed only for joining a cluster.
When a server is restarted by executing this script, if
"cluster_remote_addr" is NOT specified, it will stuck at the "init"
command, because it is not leader and "--no-leader-only" is not specified
here.

So, the implementation doesn't really reflect the comment:
>      # Initialize the database if it's running standalone,
>      # active-passive, or is the first server in a cluster.
It tried to run the init command even when it is not the first server in a
cluster.

To fix the problem, one way is to make sure it does exactly as what the
comment said. However, it seems there is no easy way to distinguish if the
server is the first server in a cluster or it is just a server that has
already joined being restarted.

Alternatively, adding "--no-leader-only" solves the problem as well, as
proposed by this patch, because it is not harmful to run "init" again even
if the DB is already initiated, just like how it is handled when it is
standalone mode.

Based on the above analysis, I am ok with the fix, but I suggest to update
the commit message to clarify the problem more clearly. In addition, for
the comment mentioned above, it should be changed to:

# Initialize the database if it's NOT joining a cluster.

Finally, maybe the if condition can be removed completely, but I haven't
tested and not 100% sure.

Thanks,
Han


> Author: Tao YunXiang <taoyunxiang at cmss.chinamobile.com>
> Co-authored-by: Liu Chang <liuchang at cmss.chinamobile.com>
> Signed-off-by: Tao YunXiang <taoyunxiang at cmss.chinamobile.com>
> Signed-off-by: Liu Chang <liuchang at cmss.chinamobile.com>
>
> ---
>  utilities/ovn-ctl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/utilities/ovn-ctl b/utilities/ovn-ctl
> index 2a337ae27..71bdbc076 100755
> --- a/utilities/ovn-ctl
> +++ b/utilities/ovn-ctl
> @@ -293,7 +293,7 @@ $cluster_remote_port
>      # Initialize the database if it's running standalone,
>      # active-passive, or is the first server in a cluster.
>      if test -z "$cluster_remote_addr"; then
> -        $(echo ovn-${db}ctl | tr _ -) init
> +        $(echo ovn-${db}ctl | tr _ -) --no-leader-only init
>      fi
>
>      if test $mode = cluster; then
> --
> 2.17.1
>
>
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list