[ovs-dev] [PATCH] ovn pacemaker: Fix the promotion issue in other cluster nodes when the master node is reset

Russell Bryant russell at ovn.org
Fri May 25 18:32:30 UTC 2018


On Thu, May 17, 2018 at 6:04 AM,  <nusiddiq at redhat.com> wrote:
> From: Numan Siddique <nusiddiq at redhat.com>
>
> When a node 'A' in the pacemaker cluster running OVN db servers in master is
> brought down ungracefully ('echo b > /proc/sysrq_trigger' for example), pacemaker
> is not able to promote any other node to master in the cluster. When pacemaker selects
> a node B for instance to promote, it moves the IPAddr2 resource (i.e the master ip)
> to node 'B'. As soon the node is configured with the IP address, when the issue is
> seen, the OVN db servers which were running as standy earlier, transitions to active.
> Ideally this should not have happened. The ovsdb-servers are expected to remain in
> standby until there are promoted. (This needs separate investigation). When the pacemaker
> calls the OVN OCF script's promote action, the ovsdb_server_promot function returns
> almost immediately without recording the present master. And later in the notify action
> it demotes back the OVN db servers since the last known master doesn't match with
> node 'B's hostname. This results in pacemaker promoting/demoting in a loop.
>
> This patch fixes the issue by not returning immediately when promote action is
> called if the OVN db servers are running as active. Now it would continue with
> the ovsdb_server_promot function and records the new master by setting proper
> master score ($CRM_MASTER -N $host_name -v ${master_score})
>
> This issue is not seen when a node is brought down gracefully as pacemaker before
> promoting a node, calls stop, start and then promote actions. Not sure why pacemaker
> doesn't call stop, start and promote actions when a node is reset ungracefully.
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1579025
> Signed-off-by: Numan Siddique <nusiddiq at redhat.com>

Thanks, Numan.  I tweaked commit message formatting and applied this
to master and branch-2.9


More information about the dev mailing list