I'm currently refactoring a script that works well if executed in terminal directly but exits early due to a process check if executed from crontab. This early termination is caused by code using a ps command piped to several grep/grep -v commands. The intent is to check if a process is already running and, if so, not to execute this script again. I know the reason this code isn't working is because it's trying to catch all processes but doesn't grep -v out the /bin/sh -c <script name>
process that crontab always uses to call scripts initially. In refactoring this code it just made sense to use something like pgrep
over a ps
piped to several greps.
Here's where my question comes in. My pgrep code works, I just don't fully understand why it works. When comparing the output of pgrep pgrep_test.sh | grep -v $$
to ps -ef | grep pgrep_test.sh
there are additional processes that the pgrep command seems to remove. It seems to me that pgrep is grouping several PIDs together like, it understands and follows the PID/PPID relationship. The problem is I don't see anything about that written in the pgrep manpage.
I think to understand why pgrep is working in my code I need to better understand how pgrep groups PIDs/PPIDs. Here's the code I'm using to test this, it's executing via crontab:
* * * * * user1 /tmp/inferencing/pgrep_test.sh 2>&1 >> /tmp/inferencing/test.log
The code itself:
#!/bin/bash
# Test how grepping for PIDs works when script is called from crontab
echo "+++++$(date +"%b %H:%M:%S") Beginning pgrep script+++++"
pgrep pgrep_test.sh | grep -v "$$" > /dev/null 2>&1
# RC=1 -- No additional processes running
# RC=0 -- Additional processes running
RETCODE=$?
echo "Return code: $RETCODE"
if [ $RETCODE -eq 0 ]; then
echo "Additional test processes exist, exiting script"
echo "$(ps -ef | grep pgrep_test.sh)" ; sleep 1
echo "$(pgrep -a pgrep_test.sh | grep -v \"$$\")"
exit 1
else
echo "No additional processes found, continuing execution"
echo "$(ps -ef | grep pgrep_test.sh)" ; sleep 1
echo "$(pgrep -a pgrep_test.sh | grep -v \"$$\")"
sleep 90
fi
I've used a 90 second sleep in the code to ensure that a cronjob running every minute will fail out every other time. Here's what the logfile looks like with some additional annotations in form of comments.
First with no additional processes running:
"+++++Nov 17:04:01 Beginning pgrep script+++++"
# Sleeping every 90s means we should have alternating "no additional
# processes found" and "additional processes found" logs each execution
Return code: 1
No additional processes found, continuing execution
# ps -ef | grep pgrep_test.sh
# Initial /bin/sh -c call crontab executes
user1 12956 12954 0 17:04 ? 00:00:00 /bin/sh -c /tmp/inferencing/pgrep_test.sh 2>&1 >> /tmp/inferencing/test.log
# Child process spawned from 12956 (Shouldn't this be PID for $$?)
user1 12957 12956 0 17:04 ? 00:00:00 /bin/bash /tmp/inferencing/pgrep_test.sh
# What even is PID 12961? PPID 12957 is the /bin/bash call but these two commands are identical otherwise
# Can't be the pgrep or sleep as these have not executed yet
# Technically there's more than 1 process here now so pgrep should be giving a return code of 0 and exiting the script
# Why does it work correctly here? How does pgrep know to group these?
user1 12961 12957 0 17:04 ? 00:00:00 /bin/bash /tmp/inferencing/pgrep_test.sh
user1 12963 12961 0 17:04 ? 00:00:00 grep pgrep_test.sh
# pgrep -a pgrep_test.sh | grep -v $$
12957 /bin/bash /tmp/inferencing/pgrep_test.sh
12965 /bin/bash /tmp/inferencing/pgrep_test.sh
Next with a matching process already running:
"+++++Nov 17:05:01 Beginning pgrep script+++++"
# Since other process is still sleeping, we correctly get a return code of 0 and stop script execution
Return code: 0
Additional test processes exist, exiting script
# crontab process for (now sleeping) original script call
user1 12956 12954 0 17:04 ? 00:00:00 /bin/sh -c /tmp/inferencing/pgrep_test.sh 2>&1 >> /tmp/inferencing/test.log
# Sleeping process
user1 12957 12956 0 17:04 ? 00:00:00 /bin/bash /tmp/inferencing/pgrep_test.sh
# New crontab process
user1 13733 13594 0 17:05 ? 00:00:00 /bin/sh -c /tmp/inferencing/pgrep_test.sh 2>&1 >> /tmp/inferencing/test.log
# New main bash process
user1 13734 13733 0 17:05 ? 00:00:00 /bin/bash /tmp/inferencing/pgrep_test.sh
# Second main bash process again -- this happens every time
user1 13738 13734 0 17:05 ? 00:00:00 /bin/bash /tmp/inferencing/pgrep_test.sh
# No grep -v grep, this doesn't show up to pgrep anyways
user1 13740 13738 0 17:05 ? 00:00:00 grep pgrep_test.sh
# pgrep -a pgrep_test.sh | grep -v $$
12957 /bin/bash /tmp/inferencing/pgrep_test.sh
13734 /bin/bash /tmp/inferencing/pgrep_test.sh
14105 /bin/bash /tmp/inferencing/pgrep_test.sh
Why is it that when I execute the pgrep
command, it seems to know how to filter out the additional child processes associated with $$ when ps -ef piped to greps is not capable?