I have a process running on Linux that repeatedly updates a file on an NFS filesystem. The process writes the new data to a tempfile (on the same NFS) and calls the rename()
syscall to replace the live file with the new version.
I have a second rsync
process that periodically snapshots the file. Due to NFS cache consistency issues, the rsync
command sometimes sees that the file is unavailable and exits with the error rsync: read errors mapping "/path/to/my/dir/foo": Stale file handle (116)
.
How can I tell rsync to ignore this error and not return a non-zero exit code? I could just wrap it with || true
but I'm worried I'd miss other important errors.
Notes:
You can reproduce this by running this command in an NFS dir on one machine:while true; do date > foo.tmp; mv foo.tmp foo; done
and this command in the same dir on a different machine:
while true; do rm -f bar; rsync --quiet foo bar; done
The second command will occasionally print errors like
rsync: read errors mapping "/path/to/my/dir/foo": Stale file handle (116)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1189) [sender=3.1.3]
This issue only happens with NFS and when the writer and snapshotter are on different machines. On most local filesystem implementations, the file would always be accessible by the reader process, since rename()
is atomic. Even with NFS, if you run the two above commands on the same machine, you never see errors because only your local NFS cache is involved.
|| true
but I'm worried I'd miss other important errors" – A way not to miss some other errors is to convert only exit value 23 to 0:(rsync …; e="$?"; [ "$e" -eq 23 ] && exit 0; exit "$e")
. Not perfect, but better than just|| true
.rsync ... servername:/server/path
rather thanrsync ... /mnt/mountpath
)? rsync is really designed to handle the remote access itself (usually over ssh), and tends to perform better that way. And it should avoid this problem, at least if you can have both rsync processes work this way.