39

Accidentally deleted log file of running process python something.py 2>&1 | tee .log. The script is running in a tmux pane on zsh. The process is still running but not logging. The output itself overflows the tmux-scrollback-buffer. Can I somehow (admin/sudo rights) start the logging process again without restarting the process?

Normally my attempt works without problems and the code is not relevant to safety or for any kind of production but simply complex mathematical calculations. Therefore, this attempt has always been sufficient.

In my current case it would be great if I could start logging again without restarting the process.

1 Answer 1

68

The file continues to exist as long as the tee process holds an open file descriptor, and everything is still being logged there. You can recover its current contents by copying them through /proc:

  1. Find the PID of the 'tee' process.

  2. Use lsfd -p <PID> or lsof -p <PID> or ls -l /proc/<PID>/fd to find the file descriptor number corresponding to the open file. (It'll even be marked "(deleted)" next to the file name.)

    With simple programs such as 'tee', the first file opened will almost always be FD #3, so all examples in this post will be using 3 as well.

  3. Copy the file's contents to a new file through /proc:

    cp /proc/<TEE_PID>/fd/3 old.log
    

    (Symlinks in /proc/PID/fd are special – opening them still resolves to the correct file, even if the symlink looks broken, or even if it points to something that's not even a real file.)

It is also possible to make 'tee' start writing to a new file:

  1. Attach the gdb debugger to the process:

    $ sudo gdb -p <TEE_PID>
    

    This will pause 'tee'. The Python program might also get paused if it produces enough log output to fill the pipe buffer (otherwise it won't notice).

  2. If you haven't yet – use the /proc trick to recover the old log file (through another shell, not from within gdb):

    $ cp /proc/<TEE_PID>/fd/3 old.log
    

    By doing this after gdb is attached (i.e. while 'tee' is suspended), you can avoid losing messages during the gap between the 'cp' and the open().

  3. Now use gdb to make 'tee' close and re-open the file:

    (gdb) p (int) close(3)
    $1 = 0
    
    (gdb) p (int) open("new.log", 01|0100|02000, 0666)
    $2 = 3
    
    (gdb) q
    Detach? y
    

    (The values 01|0100|02000 are equal to O_WRONLY|O_CREAT|O_APPEND from fcntl.h, which makes the open() call behave like the >> shell operator.)

    For simple cases such as 'tee', it's extremely unlikely that open() will give you any other file descriptor than the original #3, as that's the lowest free FD. But in some situations with more complex programs (if there's a numbering gap) it may be necessary to call dup2($2, 3) and close($2) to manually move the newly opened file to the desired FD.

  4. The old file will now be fully gone (as it's removed and the last file handle was closed), but 'tee' will be writing to the new file without noticing anything.

Note: instead of opening a new file, it may be possible to use linkat() to bring the original log file into existence without interrupting anything, but I have not tested this yet. (Edit: Unfortunately, according to linkat() documentation, this specifically doesn't work for files that have become fully unlinked.)

7
  • 12
    This is awesome.
    – Oliphaunt
    Commented Jun 2, 2022 at 6:43
  • 1
    You should be able to replace 01|0100|02000 with 02101
    – CSM
    Commented Jun 2, 2022 at 9:26
  • 2
    Would tail -f -n +1 /proc/<TEE_PID>/fd/3 recovered.log work?
    – Neil
    Commented Jun 2, 2022 at 12:16
  • 1
    Would it be possible to create a new hard link to the file by doing ln /proc/<TEE_PID>/fd/3 old.log instead of cp /proc/<TEE_PID>/fd/3 old.log? Commented Jun 3, 2022 at 12:16
  • 1
    @TannerSwett: Not with a direct ln (it'd try to hardlink the symlink itself and fail with "can't link across filesystems"). In theory yes if you use ln -L to call linkat(AT_SYMLINK_FOLLOW), which is actually mentioned in linkat() documentation as a valid alternate option to using linkat(fd) – but the same docs also note that it does not work for files that have already become completely unlinked, and indeed both variants result in a "file not found" when trying to resurrect a removed file. I'm not sure if that's a deliberate restriction or not, but it's a known restriction regardless. Commented Jun 3, 2022 at 14:05

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .