[ovs-dev] [PATCH] ovn pacemaker: Fix the promotion issue in other cluster nodes when the master node is reset

Numan Siddique nusiddiq at redhat.com
Mon May 28 06:15:36 UTC 2018


On Sat, May 26, 2018 at 12:02 AM, Russell Bryant <russell at ovn.org> wrote:

> On Thu, May 17, 2018 at 6:04 AM,  <nusiddiq at redhat.com> wrote:
> > From: Numan Siddique <nusiddiq at redhat.com>
> >
> > When a node 'A' in the pacemaker cluster running OVN db servers in
> master is
> > brought down ungracefully ('echo b > /proc/sysrq_trigger' for example),
> pacemaker
> > is not able to promote any other node to master in the cluster. When
> pacemaker selects
> > a node B for instance to promote, it moves the IPAddr2 resource (i.e the
> master ip)
> > to node 'B'. As soon the node is configured with the IP address, when
> the issue is
> > seen, the OVN db servers which were running as standy earlier,
> transitions to active.
> > Ideally this should not have happened. The ovsdb-servers are expected to
> remain in
> > standby until there are promoted. (This needs separate investigation).
> When the pacemaker
> > calls the OVN OCF script's promote action, the ovsdb_server_promot
> function returns
> > almost immediately without recording the present master. And later in
> the notify action
> > it demotes back the OVN db servers since the last known master doesn't
> match with
> > node 'B's hostname. This results in pacemaker promoting/demoting in a
> loop.
> >
> > This patch fixes the issue by not returning immediately when promote
> action is
> > called if the OVN db servers are running as active. Now it would
> continue with
> > the ovsdb_server_promot function and records the new master by setting
> proper
> > master score ($CRM_MASTER -N $host_name -v ${master_score})
> >
> > This issue is not seen when a node is brought down gracefully as
> pacemaker before
> > promoting a node, calls stop, start and then promote actions. Not sure
> why pacemaker
> > doesn't call stop, start and promote actions when a node is reset
> ungracefully.
> >
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1579025
> > Signed-off-by: Numan Siddique <nusiddiq at redhat.com>
>
> Thanks, Numan.  I tweaked commit message formatting and applied this
> to master and branch-2.9
>


Thanks Russell.

Numan


More information about the dev mailing list