[ovs-dev] [PATCH v3] OVN pacemaker: Add the monitor action for Master role
Numan Siddique
nusiddiq at redhat.com
Mon Dec 4 14:09:02 UTC 2017
On Mon, Dec 4, 2017 at 7:12 PM, Russell Bryant <russell at ovn.org> wrote:
> On Mon, Dec 4, 2017 at 12:29 AM, <nusiddiq at redhat.com> wrote:
> > From: Numan Siddique <nusiddiq at redhat.com>
> >
> > Pacemaker Resource agent periodically calls the OVN OCF's "monitor"
> action
> > periodically to check the status. But the OVN OCF script doesn't add the
> > action "monitor" for the role "Master" because of which the pacemaker
> > resource agent do not call the "monitor" action at all for the master.
> > In case OVN db servers exit for some reason this totally gets undetected
> > and one of the standby node is not promoted to master.
> >
> > This patch adds the monitor action for "Master" role. Also the monitor
> > action do not check for the status of the ovn-northd (if manage_northd
> is yes).
> > This patch also checks for the status of the ovn-northd in the monitor
> action
> > for the "Master" role. If any of the ovsdb-server or ovn-northd is not
> running,
> > monitor action will return OCF_NOT_RUNNING and this will cause the
> pacemaker
> > to restart the OVN OCF resource.
> >
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1512568
> > Signed-off-by: Numan Siddique <nusiddiq at redhat.com>
> > CC: Russel Bryant <russell at ovn.org>
> > ---
> >
> > v2 -> v3
> > --------
> > In the ovsdb_server_demote added the check to see the status of
> > ovn-northd if it is running as master. v2 was not working for
> > pacemaker OVN docker bundle resource.
> >
> > v1 -> v2
> > -----
> > Reverted the change to use 'ocf_attribute_target' as this function is
> > only availabe in pacemaker 1.1.16-12
> >
> > ovn/utilities/ovndb-servers.ocf | 49 ++++++++++++++++++++++++++++++
> ++++-------
> > 1 file changed, 41 insertions(+), 8 deletions(-)
> >
> > diff --git a/ovn/utilities/ovndb-servers.ocf
> b/ovn/utilities/ovndb-servers.ocf
> > index 3f3008700..389307a84 100755
> > --- a/ovn/utilities/ovndb-servers.ocf
> > +++ b/ovn/utilities/ovndb-servers.ocf
> > @@ -120,7 +120,11 @@ ovsdb_server_metadata() {
> > <action name="stop" timeout="20s" />
> > <action name="promote" timeout="50s" />
> > <action name="demote" timeout="50s" />
> > - <action name="monitor" timeout="20s" depth="0" interval="10s"
> />
> > + <action name="monitor" timeout="20s" depth="0" interval="30s"
> />
>
> Just making sure ... did you mean to leave this third "monitor" entry
> here? I don't really know how this works, but it looked like the next
> two would replace this one.
>
I referred to galera resource agent as an example [1] and it had 3 monitor
actions. So thought of keeping the same way.
I will test it out and remove it if it is not required.
[1] -
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/galera#L256
>
> > + <action name="monitor" timeout="20s" depth="0" interval="10s"
> > + role="Master" />
> > + <action name="monitor" timeout="20s" depth="0" interval="30s"
> > + role="Slave"/>
> > <action name="meta-data" timeout="5s" />
> > <action name="validate-all" timeout="20s" />
> > </actions>
> > @@ -247,7 +251,7 @@ ovsdb_server_master_update() {
> > }
> >
> > ovsdb_server_monitor() {
> > - ovsdb_server_check_status
> > + ovsdb_server_check_status $@
> > rc=$?
> >
> > ovsdb_server_master_update $rc
> > @@ -262,8 +266,21 @@ ovsdb_server_check_status() {
> > return $OCF_SUCCESS
> > fi
> >
> > + check_northd="no"
> > + if [ "$MANAGE_NORTHD" == "yes" ] && [ "$1" != "ignore_northd" ];
> then
> > + check_northd="yes"
> > + fi
> > +
> > if [[ $sb_status == "running/active" && $nb_status ==
> "running/active" ]]; then
> > - return $OCF_RUNNING_MASTER
> > + if [ "$check_northd" == "yes" ]; then
> > + # Verify if ovn-northd is running or not.
> > + ${OVN_CTL} status_northd | grep "ovn-northd is running"
>
> Is the grep needed? Can you just rely on the exit code of ovn-ctl?
> This script will fail if the output of ovn-ctl is changed in the
> future.
>
I thought I would be explicit. But I agree with you. Thanks for pointing
out. I will submit v4 soon.
> > + if [ "$?" == "0" ] ; then
> > + return $OCF_RUNNING_MASTER
> > + fi
> > + else
> > + return $OCF_RUNNING_MASTER
> > + fi
> > fi
> >
> > # TODO: What about service running but not in either state above?
> > @@ -317,8 +334,13 @@ ovsdb_server_start() {
> > $@ start_ovsdb
> >
> > while [ 1 = 1 ]; do
> > - # It is important that we don't return until we're in a
> functional state
> > - ovsdb_server_monitor
> > + # It is important that we don't return until we're in a
> functional
> > + # state. When checking the status of the ovsdb-server's ignore
> northd.
> > + # It is possible that when the resource is restarted
> ovsdb-server's
> > + # can be started as masters and ovn-northd would not have been
> started.
> > + # ovn-northd will be started once a node is promoted to master
> and
> > + # 'manage_northd' is set to yes.
> > + ovsdb_server_monitor ignore_northd
> > rc=$?
> > case $rc in
> > $OCF_SUCCESS) return $rc;;
> > @@ -350,7 +372,7 @@ ovsdb_server_stop() {
> > ${OVN_CTL} --ovn-manage-ovsdb=no stop_northd
> > fi
> >
> > - ovsdb_server_check_status
> > + ovsdb_server_check_status ignore_northd
> > case $? in
> > $OCF_NOT_RUNNING) return ${OCF_SUCCESS};;
> > esac
> > @@ -360,7 +382,7 @@ ovsdb_server_stop() {
> >
> > while [ 1 = 1 ]; do
> > # It is important that we don't return until we're stopped
> > - ovsdb_server_check_status
> > + ovsdb_server_check_status ignore_northd
> > rc=$?
> > case $rc in
> > $OCF_SUCCESS)
> > @@ -381,7 +403,7 @@ ovsdb_server_stop() {
> > }
> >
> > ovsdb_server_promote() {
> > - ovsdb_server_check_status
> > + ovsdb_server_check_status ignore_northd
> > rc=$?
> > case $rc in
> > ${OCF_SUCCESS}) ;;
> > @@ -395,6 +417,11 @@ ovsdb_server_promote() {
> > ${OVN_CTL} promote_ovnnb
> > ${OVN_CTL} promote_ovnsb
> >
> > + if [ "$MANAGE_NORTHD" = "yes" ]; then
> > + # Startup ovn-northd service
> > + ${OVN_CTL} --ovn-manage-ovsdb=no start_northd
> > + fi
> > +
> > ocf_log debug "ovndb_servers: Promoting $host_name as the master"
> > # Record ourselves so that the agent has a better chance of doing
> > # the right thing at startup
> > @@ -404,6 +431,8 @@ ovsdb_server_promote() {
> > }
> >
> > ovsdb_server_demote() {
> > + # While demoting, check the status of ovn_northd.
> > + # In case ovn_northd is not running, we should return
> OCF_NOT_RUNNING.
> > ovsdb_server_check_status
> > if [ $? = $OCF_NOT_RUNNING ]; then
> > return $OCF_NOT_RUNNING
> > @@ -452,6 +481,10 @@ ovsdb_server_demote() {
> > ${OVN_CTL} demote_ovnsb --db-sb-sync-from-addr=${
> INVALID_IP_ADDRESS}
> > fi
> >
> > + if [ "$MANAGE_NORTHD" = "yes" ]; then
> > + # Stop ovn-northd service
> > + ${OVN_CTL} --ovn-manage-ovsdb=no stop_northd
> > + fi
> > ovsdb_server_master_update $OCF_SUCCESS
> > return $OCF_SUCCESS
> > }
> > --
> > 2.14.3
> >
>
>
>
> --
> Russell Bryant
>
More information about the dev
mailing list