[ovs-dev] [PATCH] ovs-lib: Wait for a longer time after SIGKILL.
Ben Pfaff
blp at nicira.com
Wed Mar 27 22:38:07 UTC 2013
On Wed, Mar 27, 2013 at 02:30:05PM -0700, Gurucharan Shetty wrote:
> Currently, when we stop a daemon, we first send it SIGTERM.
> If SIGTERM did not work within ~5 seconds, we send a SIGKILL.
> After sending SIGKILL, we wait only for 4 seconds, before giving
> up.
>
> If the system is exteremely busy, there is a chance that a
> process is not killed by the kernel within 4 seconds. In such
> a case, when we try to start the daemon immediately, we see that
> the pid inside the pid-file is valid and assume that the daemon
> is still running. This leaves us in a state, where the daemon is
> actually not running.
>
> This patch increases the time waiting for the kernel to kill the
> process to 60 seconds.
>
> Bug #15404.
> Signed-off-by: Gurucharan Shetty <gshetty at nicira.com>
I see why you changed the FAIL case, but I think that it might instead
be better to do something like the following, to avoid duplicating the
pid_exists call in two places:
diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in
index d010abf..b44ab37 100644
--- a/utilities/ovs-lib.in
+++ b/utilities/ovs-lib.in
@@ -173,6 +173,9 @@ stop_daemon () {
if test -e "$rundir/$1.pid"; then
if pid=`cat "$rundir/$1.pid"`; then
for action in TERM .1 .25 .65 1 1 1 1 KILL 1 1 1 1 FAIL; do
+ if pid_exists $pid >/dev/null 2>&1; then :; else
+ return 0
+ fi
case $action in
TERM)
action "Killing $1 ($pid)" kill $pid
@@ -185,11 +188,7 @@ stop_daemon () {
return 1
;;
*)
- if pid_exists $pid >/dev/null 2>&1; then
- sleep $action
- else
- return 0
- fi
+ sleep $action
;;
esac
done
More information about the dev
mailing list