[ovs-dev] [PATCH] ovn: fix OVNDB process is stopped when master node demote to the slave by pacemaker
Guoshuai Li
ligs at dtdream.com
Thu Dec 8 07:42:22 UTC 2016
On 2016/12/8 5:36, Andy Zhou wrote:
>
>
> On Tue, Dec 6, 2016 at 9:41 PM, Guoshuai Li <ligs at dtdream.com
> <mailto:ligs at dtdream.com>> wrote:
>
> When the master node's OVNDB process fails, the local node demote
> to the slave.
> Failure cause is that the OVNDB process is stop, So the need to
> re-run the process up.
> if return $OCF_NOT_RUNNING will not demote the node to slave.
>
> Signed-off-by: Guoshuai Li <ligs at dtdream.com
> <mailto:ligs at dtdream.com>>
> ---
> ovn/utilities/ovndb-servers.ocf | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ovn/utilities/ovndb-servers.ocf
> b/ovn/utilities/ovndb-servers.ocf
> index 1cf6f20..8a64e88 100755
> --- a/ovn/utilities/ovndb-servers.ocf
> +++ b/ovn/utilities/ovndb-servers.ocf
> @@ -283,7 +283,7 @@ ovsdb_server_promote() {
> ovsdb_server_demote() {
> ovsdb_server_check_status
> if [ $? = $OCF_NOT_RUNNING ]; then
> - return $OCF_NOT_RUNNING
> + ovsdb_server_start
>
>
> The logic here looks odd to me. demote() operation should be done
> against running OVNDBs.
>
> Why is OVNDB stopped in the first place? If they are stopped by
> admin, it would be odd that ocf script
> would restart them.
I agree that demote () should not start OVN-DB.
But when the OVN-DB process crashes, who might restart it?
I put the master node's OVSDB process with 'kill -9', It does not
migrate because of depends on VIP.
but after a long time did not start, and no master node.
/
Full list of resources://
// Master/Slave Set: ovndb_servers-master [ovndb_servers]//
// ovndb_servers (ocf::ovn:ovndb-servers): Started ovn2//
// ovndb_servers (ocf::ovn:ovndb-servers): Started ovn3//
// ovndb_servers (ocf::ovn:ovndb-servers): //Stopped//
// Slaves: [ ovn2 ovn3 ]//
// Stopped: [ ovn1 ]//
// VirtualIP (ocf::heartbeat:IPaddr2): Started ovn1//
//Failed Actions://
//* ovndb_servers_demote_0 on ovn1 'not running' (7): call=21,
status=complete, exitreason='none',//
// last-rc-change='Thu Dec 8 13:41:14 2016', queued=0ms, exec=69ms/
By debugging I found that pacemaker did not call ovsdb_server_start(),
it call ovsdb_server_demote() and ovsdb_server_stop().
Who should start it? ovsdb_server_monitor ()? or pacemaker error?
More information about the dev
mailing list