4

We have a Centos 7 cluster backed by NFS4. There are some files on it that running flock with LOCK_EX blocks (and with LOCK_NB returns resource temporarily unavailable). Only a handful of files, and they're all ones that a user would have had reason to flock().

Ergo, something has a lock on those files. But who? We've tried running lsof on every machine in the cluster, and it didn't come up with anything, but possibly there are some clients we don't know about. The lslocks program does not help.

How can I find out what machine, let alone what process, has the lock?
Or is it possible that NFS is confused and thinks there is a lock even when the process is long gone? In which case, how would I find out whether that is the case and how would I clear it? Restarting the NFS server requires a change request so is not something to do easily.
Ditto for the strategy of restarting all the clients one by one.

Although many web pages state that flock() doesn't work across NFS, others say it does (e.g. https://serverfault.com/questions/66919/file-locks-on-an-nfs), and my tests bear that out. For instance, run:

perl -E 'open $fh, ">>", shift or die "Open: $!"; say "Done open"; flock($fh, 2) or die "flock: $!"; say "Done flock";sleep 10' somefilename

Run it on one client, wait 5 seconds, run it on another, and the second one will not print "Done flock" until the first one exits 5 seconds later. Just what you'd expect. So the NFS server knows that the file is locked. How do I get it to tell me who it thinks locked it?

0

0

You must log in to answer this question.

Browse other questions tagged .