8

This is parent.sh:

#!/bin/bash

trap 'exit' SIGHUP SIGINT SIGQUIT SIGTERM

if ! [ -t 0 ]; then # if running non-interactively
    sleep 5 & # allow a little time for child to generate some output
    set -bm # to be able to trap SIGCHLD
    trap 'kill -SIGINT $$' SIGCHLD # when sleep is done, interrupt self automatically - cannot issue interrupt by keystroke since running non-interactively
fi

sudo ~/child.sh

This is child.sh:

#!/bin/bash

test -f out.txt && rm out.txt

for second in {1..10}; do
    echo "$second" >> out.txt
    sleep 1
done

If run the parent script in a terminal like so...

~/parent.sh

...and after about 3 seconds, issue an interrupt by keystroke. When checking out.txt a few seconds later, it will look like...

1  
2  
3  

...thus indicating that parent and child ended upon (keystroke) interrupt. This is corroborated by checking ps -ef in real-time and seeing that the script processes are present before the interrupt and gone after the interrupt.

If parent script is invoked by cron like so...

* * * * * ~/parent.sh  

...the content of out.txt is always...

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  

...thus indicating that at least the child did not end upon (kill command) interrupt. This is corroborated by checking ps -ef in real-time and seeing that the script processes are present before the interrupt and only the parent process is gone after the interrupt, but the child process persists until it runs its course.

Attempts to solve...

  1. Shell options can only be a factor here, inasmuch as non-interactive invocations of parent run set -bm (which entails PGIDs of children differing from PGID of parent - relevant up ahead). Other than that, both scripts show only options hB enabled, whether running interactively or not.
  2. Looked thru man bash for clues but found nothing helpful.
  3. Tried a handful of web searches which included many results from stackoverflow, but while some were similar to this question, none were the same. The closest answers entailed...
    • using wait to get the child process id and invoking kill on it - results in "/parent.sh: line 30: kill: (17955) - Operation not permitted"
    • invoking a kill on the process group - results in "~/parent.sh: line 31: kill: (-15227) - Operation not permitted" (kill using the PGID of child, which differs from parent when non-interactive, due to job control enabling)
    • looping thru the current jobs and killing each

Is the problem with these solutions that the parent runs as a regular user, while the child runs as root via sudo (it will ultimately be a binary, not a suid script), so the parent cannot kill it? If that's what "Operation not permitted" means, why is the sudo invoked process killable when sending a keystroke interrupt via terminal?

The natural course is to avoid additional code, unless necessary - i.e. since the scripts behave correctly when run interactively, if feasible it's much preferred to simply apply the same behavior when running non-interactively / by cron.

The bottom line question is, what can be done to make an interrupt (or term) signal issued while running non-interactively, produce the same behavior as an interrupt signal issued when running interactively?

Thanks. Any help is greatly appreciated.

3
  • 3
    This is a well presented question asking about a hairy edge case of process control. Although I don't have a direct solution exclusively in bash, I'd suggest that you make the child process a pipe descendant of the parent which should allow finer grained control for two reasons: SIGPIPE is not handled by your script, and the "downstream" sudo process may treat either SIGPIPE or EOF as sufficient conditions to terminate itself. You are running along the rocky edge of process control where "implement it in not shell" may be the best answer. Failing that, pipes are a credible alternative.
    – msw
    Commented Dec 20, 2016 at 3:01
  • 1
    Here's a possible workaround solution; Use screen -dmS child sudo ~/child.sh to start the child process, and then screen -S child -X quit to kill it. The first command starts a screen named "child", and then runs "sudo ~/child.sh" inside. The second kills the screen, which should also take the script with it. It's not elegant, but it should get the job done.
    – Guest
    Commented Dec 20, 2016 at 3:53
  • @msw - thank you both for your input. Following the answer below for now, for a possible clean solution.
    – S Kos
    Commented Dec 22, 2016 at 4:48

1 Answer 1

2
  1. When you manually run the script from an interactive shell (usually running on a pty), it's the terminal driver who catches CTRL-C and convert it to SIGINT and send to all processes in the foreground process group (the script itself and the sudo command).
  2. When your script is running from cron you only send SIGINT to the shell script itself and the sudo command will continue running and bash will not kill its child when it exits for this kind of scenario.

To explicitly send a signal to a whole process group you can use the negative process group ID. For your case the pgid should be the PID of the shell script so try like this:

trap 'kill -SIGINT -$$' SIGCHLD

UPDATE:

It turns out my assumption about the value of pgid is wrong. Just did a test with this simple cron.sh:

#!/bin/bash
set -m
sleep 888 &
sudo sleep 999

and crontal -l looks like this:

30 * * * * /root/tmp/cron.sh

When the cron job is running the ps outputs like this:

 PPID    PID   PGID    SID   COMMAND
15486  15487  15487  15487   /bin/sh -c /root/tmp/cron.sh
15487  15488  15487  15487   /bin/bash /root/tmp/cron.sh
15488  15489  15489  15487   sleep 888
15488  15490  15490  15487   sudo sleep 999
15490  15494  15490  15487   sleep 999

So the sudo (and its child) is running in a separate pgrp and the pgid is not the pid of the cron.sh so my solution (kill -INT -$$) would not work.

Then I think we can solve the problem like this:

#!/bin/bash
set -m
sudo sleep 999 & # run sudo in backgroup
pid=$!           # save the pid which is also the pgid
sleep 5
sudo kill -INT -$pid  # kill the pgrp.
                      # Use sudo since we're killing root's processes
9
  • Gave this a shot, since it was somewhat different than past attempts of "invoking a kill on the process group". It fails, even though the child was not invoked with sudo (for temporary simplification). Tried SIGINT and SIGTERM, each with the kill builtin, and with the kill utility, to no avail. In case it matters: during a cron invoked run, when checking ps -ef output after sleep 5 & completes, parent.sh shows as "[parent.sh] <defunct>"
    – S Kos
    Commented Dec 20, 2016 at 19:25
  • Ok. Your focus on process group brought something to light. When parent is run by terminal, parent, child and respective sleep processes all have the same PGID. But when by cron, parent has PGID "a", its sleep has "b", and child and its sleep share "c". It seems this is caused by the enabling of job control (set -bm), as disabling it results in identical PGID for all these processes.
    – S Kos
    Commented Dec 22, 2016 at 4:37
  • Now, job control (or some replacement) is needed for the task at hand; though, let's exclude it for now. Attempting kill -SIGINT -$pgid actually failed, but kill -SIGTERM -$l_pgid succeeds. However, when restoring sudo to child invocation - sudo ~/child.sh - the PGIDs are still identical, but the child process goes unaffected as before. It seems insufficient privilege should not be the problem, since the signal terminates child process when the parent is invoked by terminal. Thanks for your help.
    – S Kos
    Commented Dec 22, 2016 at 4:46
  • Just tried adding set -m in my script and you are right and now I can see different PGIDs. Does the sudo ./child.sh & way work for you?
    – pynexj
    Commented Dec 22, 2016 at 4:48
  • Ok. Sorry, the failure above was done like sudo ~/child.sh &. Is that the intention of your question? - backgrounding the the child invocation?
    – S Kos
    Commented Dec 22, 2016 at 5:06

Not the answer you're looking for? Browse other questions tagged or ask your own question.