Correct locking in shell scripts?

Question

Sometimes you have to make sure that only one instance of a shell script is running at the same time.

For example a cron job which is executed via crond that does not provide locking on its own (e.g. the default Solaris crond).

A common pattern to implement locking is code like this:

#!/bin/sh
LOCK=/var/tmp/mylock
if [ -f $LOCK ]; then            # 'test' -> race begin
  echo Job is already running\!
  exit 6
fi
touch $LOCK                      # 'set'  -> race end
# do some work
rm $LOCK

Of course, such code has a race condition. There is a time window where the execution of two instances can both advance after line 3 before one is able to touch the $LOCK file.

For a cron job this is usually not a problem because you have an interval of minutes between two invocations.

But things can go wrong - for example when the lockfile is on a NFS server - that hangs. In that case several cron jobs can block on line 3 and queue up. If the NFS server is active again then you have thundering herd of parallel running jobs.

Searching on the web I found the tool lockrun which seems like a good solution to that problem. With it you run a script that needs locking like this:

$ lockrun --lockfile=/var/tmp/mylock myscript.sh

You can put this in a wrapper or use it from your crontab.

It uses lockf() (POSIX) if available and falls back to flock() (BSD). And lockf() support over NFS should be relatively widespread.

Are there alternatives to lockrun?

What about other cron daemons? Are there common crond's that support locking in a sane way? A quick look into the man page of Vixie Crond (default on Debian/Ubuntu systems) does not show anything about locking.

Would it be a good idea to include a tool like lockrun into coreutils?

In my opinion it implements a theme very similar to timeout, nice and friends.

Tangentially, and for the benefit of others who may consider your initial pattern Good Enough(tm), that shell code should possibly trap TERM in order to remove its lockfile when killed; and it seems to be good practice to store one's own pid in the lockfile, rather than just touching it. — Ulrich Schwarz, Commented Oct 4, 2011 at 19:13
possible duplicate of What Unix commands can be used as a semaphore/lock? — Shawn J. Goff, Commented Oct 4, 2011 at 20:57
related question on SO: stackoverflow.com/questions/185451/… — maxschlepzig, Commented Oct 4, 2011 at 21:07
@Ulrich very belatedly, storing a PID in an NFS lockfile adds very little value. Even adding the hostname still doesn't really help with checking for a live process — Chris Davies, Commented Sep 14, 2015 at 21:15

Community · Accepted Answer · 2020-06-11 12:04:56Z

Here's another way to do locking in shell script that can prevent the race condition you describe above, where two jobs may both pass line 3. The noclobber option will work in ksh and bash. Don't use set noclobber because you shouldn't be scripting in csh/tcsh. ;)

lockfile=/var/tmp/mylock

if ( set -o noclobber; echo "$$" > "$lockfile") 2> /dev/null; then

        trap 'rm -f "$lockfile"; exit $?' INT TERM EXIT

        # do stuff here

        # clean up after yourself, and release your trap
        rm -f "$lockfile"
        trap - INT TERM EXIT
else
        echo "Lock Exists: $lockfile owned by $(cat $lockfile)"
fi

YMMV with locking on NFS (you know, when NFS servers are not reachable), but in general it's much more robust than it used to be. (10 years ago)

If you have cron jobs that do the same thing at the same time, from multiple servers, but you only need 1 instance to actually run, the something like this might work for you.

I have no experience with lockrun, but having a pre-set lock environment prior to the script actually running might help. Or it might not. You're just setting the test for the lockfile outside your script in a wrapper, and theoretically, couldn't you just hit the same race condition if two jobs were called by lockrun at exactly the same time, just as with the 'inside-the-script' solution?

File locking is pretty much honor system behavior anyways, and any scripts that don't check for the lockfile's existence prior to running will do whatever they're going to do. Just by putting in the lockfile test, and proper behavior, you'll be solving 99% of potential problems, if not 100%.

If you run into lockfile race conditions a lot, it may be an indicator of a larger problem, like not having your jobs timed right, or perhaps if interval is not as important as the job completing, maybe your job is better suited to be daemonized.

EDIT BELOW - 2016-05-06 (if you're using KSH88)

Base on @Clint Pachl's comment below, if you use ksh88, use mkdir instead of noclobber. This mostly mitigates a potential race condition, but doesn't entirely limit it (though the risk is miniscule). For more information read the link that Clint posted below.

lockdir=/var/tmp/mylock
pidfile=/var/tmp/mylock/pid

if ( mkdir ${lockdir} ) 2> /dev/null; then
        echo $$ > $pidfile
        trap 'rm -rf "$lockdir"; exit $?' INT TERM EXIT
        # do stuff here

        # clean up after yourself, and release your trap
        rm -rf "$lockdir"
        trap - INT TERM EXIT
else
        echo "Lock Exists: $lockdir owned by $(cat $pidfile)"
fi

And, as an added advantage, if you need to create tmpfiles in your script, you can use the lockdir directory for them, knowing they will be cleaned up when the script exits.

For more modern bash, the noclobber method at the top should be suitable.

No, with lockrun you don't have a problem - when a NFS server hangs, all lockrun calls will hang (at least) in the lockf() system call - when it is back up all processes are resumed but only one process will win the lock. No race condition. I don't run into such problems with cronjobs a lot - the opposite is the case - but this is a problem when it hits you it has the potential to create a lot of pain. — maxschlepzig, Commented Oct 4, 2011 at 20:15
I have accepted this answer because the method is safe and so far the most elegant one. I suggest a small variant: set -o noclobber && echo "$$" > "$lockfile" to get a safe fallback when the shell does not support the noclobber option. — maxschlepzig, Commented Oct 7, 2011 at 9:45
Good answer, but you should also 'kill -0' the value in lockfile to ensure that the process that created the lock still exists. — Nigel Horne, Commented Dec 17, 2014 at 13:29
The noclobber option may be prone to race conditions. See mywiki.wooledge.org/BashFAQ/045 for some food for thought. — Clint Pachl, Commented Jun 16, 2015 at 5:09
Note: using noclobber (or -C) in ksh88 does not work because ksh88 does not use O_EXCL for noclobber. If you're running with a newer shell you may be OK... — jrw32982, Commented Nov 17, 2015 at 19:24

Arcege · Accepted Answer · 2011-10-05 01:57:59Z

18

I prefer to use hard links.

lockfile=/var/lock/mylock
tmpfile=${lockfile}.$$
echo $$ > $tmpfile
if ln $tmpfile $lockfile 2>&-; then
    echo locked
else
    echo locked by $(<$lockfile)
    rm $tmpfile
    exit
fi
trap "rm ${tmpfile} ${lockfile}" 0 1 2 3 15
# do what you need to

Hard links are atomic over NFS and for the most part, mkdir is as well. Using mkdir(2) or link(2) are about the same, at a practical level; I just prefer using hard links because more implementations of NFS allowed atomic hard links than atomic mkdir. With modern releases of NFS, you shouldn't have to worry using either.

answered Oct 5, 2011 at 1:57

Arcege

22.7k5 gold badges57 silver badges64 bronze badges

What deos 2>&- do?
– nhooyr
Commented Nov 16, 2022 at 11:35
1

The &- can be used to close the file descriptor. In this case, it suppresses error messages from ln as we are only concerned with its exit code.
– Arcege
Commented Jan 7, 2023 at 20:20

Add a comment |

glenn jackman · Accepted Answer · 2011-10-04 19:37:17Z

14

I understand that mkdir is atomic, so perhaps:

lockdir=/var/tmp/myapp
if mkdir $lockdir; then
  # this is a new instance, store the pid
  echo $$ > $lockdir/PID
else
  echo Job is already running, pid $(<$lockdir/PID) >&2
  exit 6
fi

# then set traps to cleanup upon script termination 
# ref http://www.shelldorado.com/goodcoding/tempfiles.html
trap 'rm -r "$lockdir" >/dev/null 2>&1' 0
trap "exit 2" 1 2 3 13 15

answered Oct 4, 2011 at 19:37

glenn jackman

86.8k16 gold badges120 silver badges173 bronze badges

Ok, but I could not find informations whether mkdir() over NFS (>=3) is standardized to be atomic.
– maxschlepzig
Commented Oct 4, 2011 at 21:05
2

@maxschlepzig RFC 1813 does not explicitly call out for mkdir to be atomic (it does for rename). In practice, it is known that some implementations are not. Related: an interesting thread, including a contribution by the author of GNU arch.
– Gilles 'SO- stop being evil'
Commented Oct 4, 2011 at 22:12

Add a comment |

Partly Cloudy · Accepted Answer · 2016-11-10 00:31:00Z

12

sem which comes as part of the GNU parallel tools may be what you're looking for:

sem [--fg] [--id <id>] [--semaphoretimeout <secs>] [-j <num>] [--wait] command

As in:

sem --id my_semaphore --fg "echo 1 ; date ; sleep 3" &
sem --id my_semaphore --fg "echo 2 ; date ; sleep 3" &
sem --id my_semaphore --fg "echo 3 ; date ; sleep 3" &

outputting:

1
Thu 10 Nov 00:26:21 UTC 2016
2
Thu 10 Nov 00:26:24 UTC 2016
3
Thu 10 Nov 00:26:28 UTC 2016

Note that order isn't guaranteed. Also the output isn't displayed until it finishes (irritating!). But even so, it's the most concise way I know to guard against concurrent execution, without worrying about lockfiles and retries and cleanup.

answered Nov 10, 2016 at 0:31

Partly Cloudy

2212 silver badges4 bronze badges

2

Does the locking offered by sem handle being shot down mid-execution?
– Chris Davies
Commented Nov 10, 2016 at 0:56
Regarding "the output isn't displayed until it finishes": You can disable the buffering using -u or --lb, see man parallel.
– Socowi
Commented Sep 10, 2020 at 21:15
@roaima Yes, sem seems to correctly handle shutdowns like ctrl+c and even a brutal kill -9 (SIGKILL). You can try this using timeout -s9 1 sem -u --id myid --fg 'echo a; sleep 3; echo b'; sem -u --id myid --fg 'echo c'. For a better test, run this check in a loop and vary the timeout. In each iteration a c should be printed. Endless test: shuf -re {00..40} | while read t; do printf a; timeout -s9 0.$t sem -u --id myid --fg 'sleep 0.2; printf b'; sem -u --id myid --fg 'printf c'; echo d; done | grep '[^c]d'. No output means everything works fine.
– Socowi
Commented Sep 10, 2020 at 21:16

Add a comment |

maxschlepzig · Accepted Answer · 2016-05-07 21:18:20Z

11

An easy way is to use lockfile coming usually with the procmail package.

LOCKFILE="/tmp/mylockfile.lock"
# try once to get the lock else exit
lockfile -r 0 "$LOCKFILE" || exit 0

# here the actual job

rm -f "$LOCKFILE"

edited May 7, 2016 at 21:18

maxschlepzig

57.9k51 gold badges214 silver badges289 bronze badges

answered Apr 7, 2014 at 22:04

jofel

26.9k7 gold badges68 silver badges94 bronze badges

Add a comment |

Flimm · Accepted Answer · 2023-01-25 08:42:41Z

5

+50

I use the command-line tool flock to manage locks in my bash scripts, as described here and here. It is included in the Debian package util-linux and is installed by default on many Linux systems.

Here's how you would use it for a cron job:

* * * * * flock -n /tmp/foobar /usr/bin/foobar arg1 arg2 arg3

This would create the file /tmp/foobar if it does not exist already, attempt to obtain an exclusive lock on that file, run /usr/bin/foobar arg1 arg2 arg3, and then release that lock, exiting with the exit code of the command. If the lock cannot be obtained, then the flock command fails immediately with a 1 exit code, because of the -n option (short for --non-block).

I have used this simple method from the flock manpage, to run some commands in a subshell...

   (
     flock -n 9
     # ... commands executed under lock ...
   ) 9>/var/lock/mylockfile

In that example, it fails with exit code of 1 if it can't acquire the lockfile.

edited Jan 25, 2023 at 8:42

Flimm

4,2677 gold badges30 silver badges40 bronze badges

answered Oct 5, 2011 at 1:16

dru8274

4742 silver badges4 bronze badges

3

The flock() system call does not work over NFS.
– maxschlepzig
Commented Oct 5, 2011 at 8:03
1

BSD has a similar tool, "lockf".
– dubiousjim
Commented Nov 14, 2012 at 14:06
4

@dubiousjim, BSD lockf also calls flock() and is thus problematic over NFS. Btw, in the meantime, flock() on Linux now falls back to fcntl() when the file is located on a NFS mount, thus, in a Linux-only NFS environment flock() now does work over NFS.
– maxschlepzig
Commented May 7, 2016 at 21:24
flock is part of the util-linux package on Debian and Ubuntu.
– Flimm
Commented Jan 25, 2023 at 8:32

Add a comment |

Michael Mrozek · Accepted Answer · 2012-05-31 22:06:45Z

3

I use dtach.

$ dtach -n /tmp/socket long_running_task ; echo $?
0
$ dtach -n /tmp/socket long_running_task ; echo $?
dtach: /tmp/socket: Address already in use
1

edited May 31, 2012 at 22:06

Michael Mrozek

93.7k40 gold badges240 silver badges233 bronze badges

answered May 31, 2012 at 19:16

AndresVia

2281 silver badge5 bronze badges

Add a comment |

Community · Accepted Answer · 2020-06-11 14:16:50Z

For actual use, you should use the top voted answer.

However, I want to discuss some various broken and semi-workable approaches using ps and the many caveats they have, since I keep seeing people use them.

This answer is really the answer to "Why not use ps and grep to handle locking in the shell?"

Broken approach #1

First, an approach given in another answer that has a few upvotes despite the fact that it does not (and could never) work and was clearly never tested:

running_proc=$(ps -C bash -o pid=,cmd= | grep my_script);
if [[ "$running_proc" != "$$ bash my_script" ]]; do 
  echo Already locked
  exit 6
fi

Let's fix the syntax errors and the broken ps arguments and get:

running_proc=$(ps -C bash -o pid,cmd | grep "$0");
echo "$running_proc"
if [[ "$running_proc" != "$$ bash $0" ]]; then
  echo Already locked
  exit 6
fi

This script will always exit 6, every time, no matter how you run it.

If you run it with ./myscript, then the ps output will just be 12345 -bash, which doesn't match the required string 12345 bash ./myscript, so that will fail.

If you run it with bash myscript, things get more interesting. The bash process forks to run the pipeline, and the child shell runs the ps and grep. Both the original shell and the child shell will show up in the ps output, something like this:

25793 bash myscript
25795 bash myscript

That's not the expected output $$ bash $0, so your script will exit.

Broken approach #2

Now in all fairness to the user who wrote broken approach #1, I did something similar myself when I first tried this:

if otherpids="$(pgrep -f "$0" | grep -vFx "$$")" ; then
  echo >&2 "There are other copies of the script running; exiting."
  ps >&2 -fq "${otherpids//$'\n'/ }" # -q takes about a tenth the time as -p
  exit 1
fi

This almost works. But the fact of forking to run the pipe throws this off. So this one will always exit, also.

Unreliable approach #3

pids_this_script="$(pgrep -f "$0")"
if not_this_process="$(echo "$pids_this_script" | grep -vFx "$$")"; then
  echo >&2 "There are other copies of this script running; exiting."
  ps -fq "${not_this_process//$'\n'/ }"
  exit 1
fi

This version avoids the pipeline forking problem in approach #2 by first getting all PIDs that have the current script in their command line arguments, and then filtering that pidlist, separately, to omit the PID of the current script.

This might work...providing no other process has a command line matching the $0, and providing the script is always called the same way (e.g. if it's called with a relative path and then an absolute path, the latter instance will not notice the former).

Unreliable approach #4

So what if we skip checking the full command line, since that might not indicate a script actually running, and check lsof instead to find all the processes that have this script open?

Well, yes, this approach is actually not too bad:

if otherpids="$(lsof -t "$0" | grep -vFx "$$")"; then
  echo >&2 "Error: There are other processes that have this script open - most likely other copies of the script running.  Exiting to avoid conflicts."
  ps >&2 -fq "${otherpids//$'\n'/ }"
  exit 1
fi

Of course, if a copy of the script is running, then the new instance will start up just fine and you'll have two copies running.

Or if the running script is modified (e.g. with Vim or with a git checkout), then the "new" version of the script will start up with no problem, since both Vim and git checkout result in a new file (a new inode) in place of the old one.

However, if the script is never modified and never copied, then this version is pretty good. There's no race condition because the script file already has to be open before the check can be reached.

There can still be false positives if another process has the script file open, but note that even if it's open for editing in Vim, vim doesn't actually hold the script file open so won't result in false positives.

But remember, don't use this approach if the script might be edited or copied since you'll get false negatives i.e. multiple instances running at once - so the fact that editing with Vim doesn't give false positives kind of shouldn't matter to you. I mention it, though, because approach #3 does give false positives (i.e. refuses to start) if you have the script open with Vim.

So what to do, then?

The top-voted answer to this question gives a good solid approach.

Perhaps you can write a better one...but if you don't understand all the problems and caveats with all the above approaches, you're not likely to write a locking method that avoids them all.

frogstarr78 · Accepted Answer · 2011-10-05 17:29:19Z

1

Don't use a file.

If your script is executed like this e.g.:

bash my_script

You can detect if it's running using:

running_proc=$(ps -C bash -o pid=,cmd= | grep my_script);
if [[ "$running_proc" != "$$ bash my_script" ]]; do 
  echo Already locked
  exit 6
fi

edited Oct 5, 2011 at 17:29

answered Oct 5, 2011 at 5:52

frogstarr78

1,0121 gold badge7 silver badges13 bronze badges

Hm, the ps checking code runs from within my_script? In the case another instance is running - does not running_proc contain two matching lines? I like the idea, but of course - you will get false results when another user is running a script with the same name ...
– maxschlepzig
Commented Oct 5, 2011 at 8:20
3

It also includes a race condition: if 2 instances execute the first line in parallel then none gets the 'lock' and both exit with status 6. This would be a kind of one round mutual starvation. Btw, I am not sure why you use $! instead of $$ in your example.
– maxschlepzig
Commented Oct 5, 2011 at 12:02
@maxschlepzig indeed sorry about the incorrect $! vs. $$
– frogstarr78
Commented Oct 5, 2011 at 17:29
@maxschlepzig to handle multiple users running the script add euser= to the -o argument.
– frogstarr78
Commented Oct 5, 2011 at 17:33
@maxschlepzig to prevent multiple lines you can also change the arguments to grep, or additional "filters" (e.g. grep -v $$). Basically I was attempting to provide a different approach to the problem.
– frogstarr78
Commented Oct 5, 2011 at 17:39

| Show 1 more comment

tiian · Accepted Answer · 2014-04-24 20:37:51Z

0

Using FLOM (Free LOck Manager) tool, serializing commands becomes as easy as running

flom -- command_to_serialize

FLOM allows you to implement more sofisticate use cases (distributed locking, readers/writers, numeric resources, etc...) as explained here: http://sourceforge.net/p/flom/wiki/FLOM%20by%20examples/

answered Apr 24, 2014 at 20:37

tiian

111 bronze badge

Add a comment |

ziggestardust · Accepted Answer · 2015-06-13 10:09:52Z

Here is something I sometimes add on a server to easily handle race conditions for any job's on the machine. It is simmilar to Tim Kennedy's post, but this way you get race handling by only adding one row to each bash script that needs it.

Put the content below in e.g /opt/racechecker/racechecker :

ZPROGRAMNAME=$(readlink -f $0)
EZPROGRAMNAME=`echo $ZPROGRAMNAME | sed 's/\//_/g'`
EZMAIL="/usr/bin/mail"
EZCAT="/bin/cat"

if  [ -n "$EZPROGRAMNAME" ] ;then
        EZPIDFILE=/tmp/$EZPROGRAMNAME.pid
        if [ -e "$EZPIDFILE" ] ;then
                EZPID=$($EZCAT $EZPIDFILE)
                echo "" | $EZMAIL -s "$ZPROGRAMNAME already running with pid $EZPID"  [email protected] >>/dev/null
                exit -1
        fi
        echo $$ >> $EZPIDFILE
        function finish {
          rm  $EZPIDFILE
        }
        trap finish EXIT
fi

Here is how to use it. Note the row after the shebang:

     #/bin/bash
     . /opt/racechecker/racechecker
     echo "script are running"
     sleep 120

The way it works is that it figures out the main bashscript file name and creates a pidfile under "/tmp". It also adds a listener to the finish signal. The listener will remove the pidfile when the main script are properly finishing.

Instead if a pidfile exist when an instance are launched, then the if statement containing the code inside the second if-statement will be executed. In this case I have decided to launch an alarm mail when this happens.

What if the script crashes

A further exercise would be to handle crashes. Ideally the pidfile should be removed even if the main-script crashes for any reason, this are not done in my version above. That means if the script crashes the pidfile would have to be manually removed to restore functionality.

In case of system crash

It is a good idea to store the pidfile/lockfile under for example /tmp. This way your scripts will definently continue to execute after a system crash since the pidfiles will always be deleted on bootup.

Unlike Tim Kennedy's ansatz, your script DOES contain a race condition. This is because your checking of the presence of the PIDFILE and its conditional creation is not done in an atomic operation. — maxschlepzig, Commented Jun 13, 2015 at 12:40
+1 on that! I will take this under consideration and modify my script. — ziggestardust, Commented Jun 13, 2015 at 13:28

user54178 · Accepted Answer · 2013-12-09 22:08:54Z

Check my script ...

You may LOVE it....

[rambabu@Server01 ~]$ sh Prevent_cron-OR-Script_against_parallel_run.sh
Parallel RUN Enabled
Now running
Task completed in Parallel RUN...
[rambabu@Server01 ~]$ cat Prevent_cron-OR-Script_against_parallel_run.sh
#!/bin/bash
#Created by RambabuKella
#Date : 12-12-2013

#LOCK file name
Parallel_RUN="yes"
#Parallel_RUN="no"
PS_GREP=0
LOCK=/var/tmp/mylock_`whoami`_"$0"
#Checking for the process
PS_GREP=`ps -ef |grep "sh $0" |grep -v grep|wc -l`
if [ "$Parallel_RUN" == "no" ] ;then
echo "Parallel RUN Disabled"

 if [ -f $LOCK ] || [ $PS_GREP -gt 2   ] ;then
        echo -e "\nJob is already running OR LOCK file exists. "
        echo -e "\nDetail are : "
        ps -ef |grep  "$0" |grep -v grep
        cat "$LOCK"
  exit 6
 fi
echo -e "LOCK file \" $LOCK \" created on : `date +%F-%H-%M` ." &> $LOCK
# do some work
echo "Now running"
echo "Task completed on with single RUN ..."
#done

rm -v $LOCK 2>/dev/null
exit 0
else

echo "Parallel RUN Enabled"

# do some work
echo "Now running"
echo "Task completed in Parallel RUN..."
#done

exit 0
fi
echo "some thing wrong"
exit 2
[rambabu@Server01 ~]$

Anthon · Accepted Answer · 2013-09-19 16:48:35Z

-5

I offer the following solution, in a script named 'flocktest'

#!/bin/bash
export LOGFILE=`basename $0`.logfile
logit () {
echo "$1" >>$LOGFILE
}
PROGPATH=$0
(
flock -x -n 257
(($?)) && logit "'$PROGPATH' is already running!" && exit 0
logit "'$PROGPATH', proc($$): sleeping 30 seconds"
sleep 30
)257<$PROGPATH

edited Sep 19, 2013 at 16:48

Anthon

79.9k42 gold badges170 silver badges226 bronze badges

answered Sep 19, 2013 at 16:26

Newton T Hammet Jr

1

Add a comment |

Stack Exchange Network

Correct locking in shell scripts?

13 Answers 13

EDIT BELOW - 2016-05-06 (if you're using KSH88)

For actual use, you should use the top voted answer.

Broken approach #1

Broken approach #2

Unreliable approach #3

Unreliable approach #4

So what to do, then?

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
shell-script
cron
nfs
coreutils
lock
.

Linked

Hot Network Questions

Correct locking in shell scripts?

13 Answers 13

EDIT BELOW - 2016-05-06 (if you're using KSH88)

For actual use, you should use the top voted answer.

Broken approach #1

Broken approach #2

Unreliable approach #3

Unreliable approach #4

So what to do, then?

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged shell-scriptcronnfscoreutilslock.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
shell-script
cron
nfs
coreutils
lock
.