[ovs-git] [openvswitch/ovs] 9b46d3: ovn pacemaker: Fix promotion issue when the master...

GitHub noreply at github.com
Fri May 25 18:31:08 UTC 2018


  Branch: refs/heads/master
  Home:   https://github.com/openvswitch/ovs
  Commit: 9b46d3f609abf8ad532efb0820753f3a920e2213
      https://github.com/openvswitch/ovs/commit/9b46d3f609abf8ad532efb0820753f3a920e2213
  Author: Numan Siddique <nusiddiq at redhat.com>
  Date:   2018-05-25 (Fri, 25 May 2018)

  Changed paths:
    M ovn/utilities/ovndb-servers.ocf

  Log Message:
  -----------
  ovn pacemaker: Fix promotion issue when the master node is reset

When a node 'A' in the pacemaker cluster running OVN db servers in
master is brought down ungracefully ('echo b > /proc/sysrq_trigger'
for example), pacemaker is not able to promote any other node to
master in the cluster. When pacemaker selects a node B for instance to
promote, it moves the IPAddr2 resource (i.e the master ip) to node
'B'. As soon the node is configured with the IP address, when the
issue is seen, the OVN db servers which were running as standy
earlier, transitions to active. Ideally this should not have happened.
The ovsdb-servers are expected to remain in standby until there are
promoted. (This needs separate investigation). When the pacemaker
calls the OVN OCF script's promote action, the ovsdb_server_promot
function returns almost immediately without recording the present
master. And later in the notify action it demotes back the OVN db
servers since the last known master doesn't match with node 'B's
hostname. This results in pacemaker promoting/demoting in a loop.

This patch fixes the issue by not returning immediately when promote
action is called if the OVN db servers are running as active. Now it
would continue with the ovsdb_server_promot function and records the
new master by setting proper master score ($CRM_MASTER -N $host_name
-v ${master_score})

This issue is not seen when a node is brought down gracefully as
pacemaker before promoting a node, calls stop, start and then promote
actions. Not sure why pacemaker doesn't call stop, start and promote
actions when a node is reset ungracefully.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1579025
Signed-off-by: Numan Siddique <nusiddiq at redhat.com>
Signed-off-by: Russell Bryant <russell at ovn.org>



      **NOTE:** This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

      Functionality will be removed from GitHub.com on January 31st, 2019.


More information about the git mailing list