6

I'd like to figure out a way to alert a python script that a file is done copying. Here is the scenario:

  1. A folder, to_print is being watched by the script by constantly polling with os.listdir().

  2. Every time os.listdir() returns a list of files in which a file exists that hasn't been seen before, the script performs some operations on that file, which include opening it and manipulating its contents.

This is fine when the file is small, and copying the file from its original source to the directory being watched takes less time than the amount of time remaining until the next poll by os.listdir(). However, if a file is polled and found, but it is still in the process of being copied, then the file contents are corrupt when the script tries to act on it.

Instead, I'd like to be able to (using os.stat or otherwise) know that a file is currently being copied, and wait for it to be done until I act on it if so.

My current idea is to use os.stat() every time I find a new file, then wait until the next poll and compare the date modified/created time since the last time I polled, and if they remain the same then that file is "stable", otherwise keep polling until it is. I'm not sure this will work though as I am not too familiar with how Linux/Unix updates these values.

0

2 Answers 2

3

Try inotify.

This is a Linux standard for watching files. For your use-case the event IN_CLOSE_WRITE seems to be promising. There is a Python library for inotify. A very simple example (taken from there). You'll need to modify it to catch only IN_CLOSE_WRITE events.

# Example: loops monitoring events forever.
#
import pyinotify

# Instanciate a new WatchManager (will be used to store watches).

wm = pyinotify.WatchManager()
# Associate this WatchManager with a Notifier (will be used to report and
# process events).

notifier = pyinotify.Notifier(wm)
# Add a new watch on /tmp for ALL_EVENTS.
wm.add_watch('/tmp', pyinotify.ALL_EVENTS) # <-- replace by IN_CLOSE_WRITE

# Loop forever and handle events.
notifier.loop()

Here is an extensive API documentation: http://seb-m.github.com/pyinotify/

2

Since the files can be copied within the poll interval, just process the new files found by the last poll before checking for new files. In other words, instead of this:

while True:
    newfiles = check_for_new_files()
    process(newfiles)
    time.sleep(pollinterval)

Do this:

newfiles = []

while True:
    process(newfiles)
    newfiles = check_for_new_files()
    time.sleep(pollinterval)

Or just put the wait in the middle of the loop (same effect really):

while True:
    newfiles = check_for_new_files()
    time.sleep(pollinterval)
    process(newfiles)
3
  • This won't work if there are no files to process and the directory is empty.
    – emish
    Commented Oct 10, 2012 at 16:40
  • @emish, why not? Wouldn't newfiles just be an empty list, and surely process can handle an empty list sensibly. (If it can't, then it should be adjusted so that it can.)
    – huon
    Commented Oct 10, 2012 at 16:57
  • @kindall My apologies. I did not realize the difference until I tried it. Thanks, this is exactly the short hack I needed!
    – emish
    Commented Oct 11, 2012 at 14:15

Not the answer you're looking for? Browse other questions tagged or ask your own question.