[ovs-dev] [PATCH] ovn: fix OVNDB process is stopped when master node demote to the slave by pacemaker

Guoshuai Li ligs at dtdream.com
Thu Dec 8 07:42:22 UTC 2016


On 2016/12/8 5:36, Andy Zhou wrote:
>
>
> On Tue, Dec 6, 2016 at 9:41 PM, Guoshuai Li <ligs at dtdream.com 
> <mailto:ligs at dtdream.com>> wrote:
>
>     When the master node's OVNDB process fails, the local node demote
>     to the slave.
>     Failure cause is that the OVNDB process is stop, So the need to
>     re-run the process up.
>     if return $OCF_NOT_RUNNING will not demote the node to slave.
>
>     Signed-off-by: Guoshuai Li <ligs at dtdream.com
>     <mailto:ligs at dtdream.com>>
>     ---
>      ovn/utilities/ovndb-servers.ocf | 2 +-
>      1 file changed, 1 insertion(+), 1 deletion(-)
>
>     diff --git a/ovn/utilities/ovndb-servers.ocf
>     b/ovn/utilities/ovndb-servers.ocf
>     index 1cf6f20..8a64e88 100755
>     --- a/ovn/utilities/ovndb-servers.ocf
>     +++ b/ovn/utilities/ovndb-servers.ocf
>     @@ -283,7 +283,7 @@ ovsdb_server_promote() {
>      ovsdb_server_demote() {
>          ovsdb_server_check_status
>          if [ $? = $OCF_NOT_RUNNING ]; then
>     -        return $OCF_NOT_RUNNING
>     +        ovsdb_server_start
>
>
> The logic here looks odd to me. demote() operation should be done 
> against running OVNDBs.
>
> Why is OVNDB stopped in the first place?  If they are stopped by 
> admin, it would be odd that ocf script
> would restart them.

I agree that demote () should not start OVN-DB.
But when the OVN-DB process crashes, who might restart it?

I put the master node's OVSDB process with 'kill -9', It does not 
migrate because of depends on VIP.
but after a long time did not start, and no master node.
/
Full list of resources://
// Master/Slave Set: ovndb_servers-master [ovndb_servers]//
//     ovndb_servers      (ocf::ovn:ovndb-servers): Started ovn2//
//     ovndb_servers      (ocf::ovn:ovndb-servers): Started ovn3//
//     ovndb_servers      (ocf::ovn:ovndb-servers): //Stopped//
//     Slaves: [ ovn2 ovn3 ]//
//     Stopped: [ ovn1 ]//
// VirtualIP      (ocf::heartbeat:IPaddr2):       Started ovn1//
//Failed Actions://
//* ovndb_servers_demote_0 on ovn1 'not running' (7): call=21, 
status=complete, exitreason='none',//
//    last-rc-change='Thu Dec  8 13:41:14 2016', queued=0ms, exec=69ms/

By debugging I found that pacemaker did not call ovsdb_server_start(), 
it call ovsdb_server_demote() and ovsdb_server_stop().
Who should start it?  ovsdb_server_monitor ()?  or pacemaker error?






More information about the dev mailing list