1

Can a process that was started in a tmux session fall asleep? If yes, what is the cause(s), how to prevent it?

Example reason for the question: I started a process on a server yesterday (training neural networks, it prints the current training epoch to stdout). I had a split window, and in the one with the process running, I had activated scroll mode before detaching from the session.

Today I come back, and it has made no progress at all.

More specifically, the epoch is the same. After quitting scroll mode, it now happily continued.

The log reads something like

...
Epoch 40: 1h few mins
Epoch 41: 12h few mins
Epoch 42: 12h few more mins
...
Epoch 73: 13h

Meaning, the time it took to get from epoch 0 to 49 was definitely less than two hours; from epoch 40 to 41 it took around 11 hours (!), from epoch 41 to 76 average time per epoch was around 1.7 minutes. The epochs are in a loop, and there shouldn't be a reason why one takes around 400 times longer than the others.


Additional information: This 'sleeping' doesn't happen every time I detach while being in scroll mode. But it already happened before. The scroll mode might not have anything to do with it at all.

The program is a python script, including tensorflow code running on a GPU; the command to run it was :

python train_script.py 2>&1 | tee train_log.txt.

For tmux I use tmux attach to re-attach, the standard key mapping and ctrl-b + d to detach, ctrl-b + up(number block) to start scrolling, q to quit scroll mode.

2 Answers 2

0

Can a process that was started in a tmux session fall asleep?

Basically all tmux doing is attaching own file descriptors in place of STDIN/STDOUT/STDERR to a running process inside of tmux that allows it to work while detached from console.

Below is a simple script you can run using the same workflow(attaching/detaching from tmux session) you described:

#!/bin/sh

c=1000

while [ $c -ne 0 ]; do
  date '+%Y-%m-%dT%H:%M:%S' | tee -a log.txt
  sleep 1
done

even if you would switch to the scroll mode and then detached from tmux session, it would still continue running, you can check log.txt file, so it isn't an issue with tmux.

7
  • Ok, so this does not ususally happen. But your example does not exclude that it can happen, and that tmux somehow has an influence. Maybe the interaction with the GPU, or with the python interpreter, causes it? Also, this 'sleeping' doesn't happen every time I detach while being in scroll mode.
    – dasWesen
    Commented Aug 11, 2018 at 12:26
  • A bunch of people using GPU for mining bitcoins without monitors, so I don't think it is a tmux or GPU issue. Do you run python virtual environment before using tmux?If yes, try to exit it and run python virtual environment inside of tmux. Also if you using anaconda, some its version don't support parallel environments.
    – Alex
    Commented Aug 11, 2018 at 12:42
  • Tonight, I'll write down where exactly the process is, to see whether it's just tensorflow taking random breaks. But I believe epoch 40 was the last thing visible in scroll mode this morning, but I'll try to really make sure it is correlated to tmux in this way.
    – dasWesen
    Commented Aug 11, 2018 at 12:45
  • No, I don't use a python virtual environment on the server. But anaconda I use. Maybe that's it, but then there's probably nothing I can do, except regularly looking what the process is doing. Thanks for your suggestions.
    – dasWesen
    Commented Aug 11, 2018 at 12:46
  • So it probably anaconda, read this tread: github.com/openai/universe-starter-agent/issues/9. Every time when I investigating kinda the same issue, it turns out that it isn't tmux fault for sure.
    – Alex
    Commented Aug 11, 2018 at 12:52
0

I know I'm late, but I've had the same thing happen to me a few times. The environment is a little different, I'm running a python script on a slurm front end, which submits jobs, moves files, sumbits more jobs etc. A single compute job usually takes about an hour.

I started my python script one day in the evening, checked on it a few times and then left tmux in scroll mode, detached and checked on the script in the morning. It seemed to be stuck, so I checked to see if any jobs were currently running, none were. I checked if the expected files were present, which they were not. My script didn't print its "all jobs successful" note, so clearly it was still running, just not doing anything. I left scroll mode, and suddenly the script continued, produced a lot more output and lo and behold, submitted another batch of compute jobs.

Now, this could just be odd timing, and unfortunately, I don't have iterating milestones with time stamps to see how long it got stuck, but this is the third time this has happened, I'm really doubting this is coincidental timing.

Did you ever figure out why/if your script got stuck? I will exit scroll mode from now on before detaching and see if it makes a difference.


Edit: Apparently, this used to be a known bug in tmux, but no note whether it has been fixed: https://github.com/tmux/tmux/issues/431. The tmux version on the machine I'm working on is quite outdated: tmux 1.8. So, in essence, the workaround would be:

Always exit scroll mode and detach properly from tmux.

1
  • Hi, it was a while ago but if my memory is correct, unfortunately I never figured out what was causing it. I guess that was also my solution: Trying not to forget to exit scroll mode. -- Thanks for the link.
    – dasWesen
    Commented Aug 5, 2019 at 10:37

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .