[ovs-dev] [PATCH] OVN resource agent - make promotion synchronous
Daniel Alvarez Sanchez
dalvarez at redhat.com
Tue Jul 9 07:27:43 UTC 2019
Thanks a lot Michele.
Just mentioning that this has been tested in an OpenStack environment
successfully. A timeout is not needed for the while loop since
pacemaker will enforce its own.
On Tue, Jul 9, 2019 at 9:20 AM Michele Baldessari <michele at acksyn.org> wrote:
>
> Currently inside the ovsdb_server_promote() function we call 'promote_ovnnb'
> and 'promote_ovnsb' and then just record the new master state in the
> CIB.
>
> This creates a race because those two promote commands are asynchronous
> so when we exit the ovsdb_server_promote() function the underlying DBs
> are not guaranteed to be in master state. That means that clients might
> connect to an instance that is in read-only mode.
>
> We add a simple sleep loop where we wait for the underlying DB state to
> confirm the master state. We do not need to add a timeout loop because
> in case of an issue the resource timeout set within pacemaker will kick
> in and the resource agent script will be killed by pacemaker.
>
> Tested this within an openstack environment using ovn with roughly ~20
> reboots and was unable to trigger the issue (before the patch we would
> trigger the issue after a couple of reboots tops).
>
> Signed-off-by: Michele Baldessari <michele at acksyn.org>
> ---
> ovn/utilities/ovndb-servers.ocf | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/ovn/utilities/ovndb-servers.ocf b/ovn/utilities/ovndb-servers.ocf
> index 10313304cb7c..cd47426689ef 100755
> --- a/ovn/utilities/ovndb-servers.ocf
> +++ b/ovn/utilities/ovndb-servers.ocf
> @@ -516,6 +516,8 @@ ovsdb_server_stop() {
> }
>
> ovsdb_server_promote() {
> + local state
> +
> ovsdb_server_check_status ignore_northd
> rc=$?
> case $rc in
> @@ -540,7 +542,15 @@ ovsdb_server_promote() {
> ${OVN_CTL} --ovn-manage-ovsdb=no start_northd
> fi
>
> - ocf_log debug "ovndb_servers: Promoting $host_name as the master"
> + ocf_log debug "ovndb_servers: Waiting for promotion $host_name as master to complete"
> + ovsdb_server_check_status
> + state=$?
> + while [ "$state" != "$OCF_RUNNING_MASTER" ]; do
> + sleep 1
> + ovsdb_server_check_status
> + state=$?
> + done
> + ocf_log debug "ovndb_servers: Promotion of $host_name as the master completed"
> # Record ourselves so that the agent has a better chance of doing
> # the right thing at startup
> ${CRM_ATTR_REPL_INFO} -v "$host_name"
> --
> 2.21.0
Acked-By: Daniel Alvarez <dalvarez at redhat.com>
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
More information about the dev
mailing list