9

sleep is a stand-in for most complex processes of course.

This Dockerfile (as you can see using the exec form so that there is only one process running and no children of bash):

FROM busybox
CMD ["/bin/sleep", "100000"]

creates an uninterruptible container:

docker build -t kill-sleep .
docker run --rm --name kill-sleep kill-sleep

When I try to stop it:

time docker stop kill-sleep

kill-sleep
real    0m10.449s
user    0m0.021s
sys     0m0.027s

the command times out at 10 seconds before the container is killed.

The problem isn't that sleep doesn't handle signals, because if I run it on the host:

sleep 100000
# in another shell
ps faxww | grep sleep
kill -TERM 31333  # the PID

the process stops immediately.

The problem may have to do with the fact that this is running as PID 1 in the container, but I have yet to see a reference documentation for that.

2 Answers 2

12

When you run docker stop ..., some things will happen:

  1. docker sends a SIGTERM to the main process of the container. The process is able to mask/ignore a SIGTERM, and if it does so (or handles it without terminating) "nothing" will happen.
  2. After a timeout (default 10 seconds), docker sends a SIGKILL to the main process. This signal cannot be masked by a process, and thus it dies immediately with no opportunity for executing a shutdown prodedure.

Ideally, processes run within docker will respond to the SIGTERM in a timely fashion, taking care of any housekeeping before terminating.

If you know that the process either doesn't have any housekeeping to perform (e.g: sleep), or will not respond properly to SIGTERM, you can specify a shorter (or longer) timeout with the -t flag:

-t, --time=10
    Seconds to wait for stop before killing it

For example, in your case, you may like to run docker stop -t 0 ${CONTAINER}.


The reason that this signal behaviour is different is due to the sleep running with PID = 1.

Typically (e.g: running with PID != 1), any signal that the process doesn't explicitly deal with leads to the process being terminated - try sending a sleep a SIGUSR1.

However, when running with PID = 1, unhandled signals are ignored, otherwise you'd end up with a kernel panic:

Kernel panic - not syncing: Attempted to kill init!

You can send a signal to the docker container using docker tools, for example:

docker kill -s TERM kill-sleep

As we can see, this doesn't have the desired effect, whereas this does:

docker kill -s KILL kill-sleep

An Experiment

Dockerfile

FROM busybox
COPY run.sh /run.sh
RUN chmod +x /run.sh
CMD "/run.sh"

run.sh

#!/bin/sh

echo "sleeping"
sleep 100000

Now, run

docker build -t kill-sleep .
docker run --rm --name kill-sleep kill-sleep

And this in a different terminal:

docker stop kill-sleep

We observe the same 10 second delay / timeout.

A Solution

Now let's handle the SIGTERM. Backgrounding and waiting for sleep are due to how a POSIX shell handles signals (see this for more).

run.sh

#!/bin/sh

die_func() {
        echo "oh no"
        sleep 2
        exit 1
}
trap die_func TERM

echo "sleeping"
sleep 100000 &
wait

Run the commands again, and we see what we are after!

$ time docker stop kill-sleep
kill-sleep

real    0m2.515s
user    0m0.008s
sys     0m0.044s
1

Some more options:

  • add the --init switch to the container run command. This way, sleep is not PID 1, and init does the right thing on TERM.
  • add the --stop-signal=KILL to the container run command. Using KILL as a somewhat normal operation is generally discouraged however.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .