13

I have two scripts that use GPU and train ML models. I want to start them before I go to sleep so they work at the night and I expect to see some results in the morning.

But because of the GPU memory is limited, I want to run them in serial instead of parallel.

I can do it with python train_v1.py && python train_v2.py; but let's say I started to train the train_v1. In the mean time, because the training takes long time, I started and finished the implementation of the second script, train_v2.py, and I want to run it automatically when python train_v1.py is finished.

How can I achieve that? Thank you.

6
  • 6
    With python train_v1.py && python train_v2py, you'll run python train_v1.py and only after that completes successfully will it run python train_v2.py. That sounds like what you're after. Commented Mar 12, 2020 at 21:36
  • If you don't care about the exit status of the first program, use python v1; python v2 Commented Mar 12, 2020 at 21:38
  • 6
    To clarify: are you saying that, when you start python train_v1.py, you don't yet know whether you'll have a second program to run (or, similarly, what exactly the second command will be)? And that you can't/are not willing to simply start python foo && python bar && python baz and then just ensure that the programs you are still writing will be named bar and baz? (Noting that the shell won't complain if, when you run foo && bar, bar doesn't exist yet; it will only complain if bar does not exist when footerminates).
    – fra-san
    Commented Mar 12, 2020 at 22:55
  • 1
    Possibly useful: Simple queuing system?
    – fra-san
    Commented Mar 12, 2020 at 23:28
  • The OP already knows about &&. This question is clearly about achieving the same effect after the first program has already been started.
    – jamesdlin
    Commented Mar 13, 2020 at 18:05

6 Answers 6

33

Here's an approach that doesn't involve looping and checking if the other process is still alive, or calling train_v1.py in a manner different from what you'd normally do:

$ python train_v1.py
^Z
[1]+  Stopped                 python train_v1.py
$ % && python train_v2.py

The ^Z is me pressing Ctrl+Z while the process is running to sleep train_v1.py through sending it a SIGTSTP signal. Then, I tell the shell to wake it with %, using it as a command to which I can add the && python train_v2.py at the end. This makes it behave just as if you'd done python train_v1.py && python train_v2.py from the very beginning.

Instead of %, you can also use fg. It's the same thing. If you want to learn more about these types of features of the shell, you can read about them in the "JOB CONTROL" section of bash's manpage.

EDIT: How to keep adding to the queue

As pointed out by jamesdlin in a comment, if you try to continue the pattern to add train_v3.py for example before v2 starts, you'll find that you can't:

$ % && python train_v2.py
^Z
[1]+  Stopped                 python train_v1.py

Only train_v1.py gets stopped because train_v2.py hasn't started, and you can't stop/suspend/sleep something that hasn't even started.

$ % && python train_v3.py

would result in the same as

python train_v1.py && python train_v3.py

because % corresponds to the last suspended process. Instead of trying to add v3 like that, one should instead use history:

$ !! && python train_v3.py
% && python train_v2.py && python train_v3.py

One can do history expansion like above, or recall the last command with a keybinding (like up) and add v3 to the end.

$ % && python train_v2.py && python train_v3.py

That's something that can be repeated to add more to the pipeline.

$ !! && python train_v3.py
% && python train_v2.py && python train_v3.py
^Z
[1]+  Stopped                 python train_v1.py
$ !! && python train_v4.py
% && python train_v2.py && python train_v3.py && python train_v4.py
5
  • 4
    This is the only reliable solution and the simplest one to boot.
    – l0b0
    Commented Mar 13, 2020 at 20:27
  • In this particular case, the programs don't depend on each other apart from using the same resources (you want to run the second program even if the first one exits with an error), so it's better to use the semicolon (fg; python train_v2.py) instead of the short-circuiting &&.
    – Bass
    Commented Mar 15, 2020 at 13:45
  • @Bass Well, that depends on what the OP wants. I just did as they did in their question. Perhaps, if v1 fails they want to see the error message at the bottom of the terminal, even if the 2 are independent. Using ; would obscure that.
    – JoL
    Commented Mar 15, 2020 at 18:26
  • +1 I can't believe I've wanted this ability for a while, and it never occurred to me that fg && ... would work. However, it might be worth noting that it doesn't seem that you can continue to chain. For example, you can't do the suspend + fg && process multiple times to get the equivalent of process && process && process. (Similarly, I only now realize that if I do process1 && process2 and suspend before process2 starts, process2 will not be invoked if I resume with a plain fg.)
    – jamesdlin
    Commented Mar 18, 2020 at 2:10
  • @jamesdlin Right, because fg/% wakes the last suspended process, and since process2 wasn't invoked yet, you can't suspend it. In that scenario, I would use the history to facilitate adding to the pipeline. I've added to the answer to address that.
    – JoL
    Commented Mar 18, 2020 at 5:04
12

If you have already started python train_v1.py, you could possibly use pgrep to poll that process until it disappears, and then run your second Python script:

while pgrep -u "$USER" -fx 'python train_v1.py' >/dev/null
do
    # sleep for a minute
    sleep 60
done
python train_v2.py

By using -f and -x you match against the exact command line that was used to launch the first Python script. On some systems, pgrep implements a -q option, which makes it quiet (just like grep -q), which means that the redirection to /dev/null wouldn't be needed.

The -u option restricts the match to commands that you are running (and not a friend or other person on the same system).

If you haven't started the first script yet:

As mentioned in comments, you could just launch the second script straight after the first script. The fact that the second script does not exist, or isn't quite ready to run yet, does not matter (as long as it ready to run when the first script finishes):

python train_v1.py; python train_v2.py

Doing it this way will launch the second script regardless of the exit status of the first script. Using && instead of ;, as you show in the question, will also work, but will require the first script to finish successfully for the second script to start.

4
  • First one is really useful, thank you! And yes, I can use the second one too in most cases, but first one is neat.
    – emremrah
    Commented Mar 13, 2020 at 6:43
  • 4
    I'd prefer running ps once to find out the pid of that first script, then run a loop like while kill -0 <pid>. Had it happen way too often that pgrep (or killall) found something that wasn't intended, especially on multi-user systems. Btw, kill -0 won't do anything to the killed process, it just checks if that process is still there. Commented Mar 13, 2020 at 11:08
  • 4
    @GuntramBlohmsupportsMonica You could obviously use pgrep -u "$USER" ... on a multi-user system. Using pgrep would also avoid issues arising from PID reuse on heavily used systems.
    – Kusalananda
    Commented Mar 13, 2020 at 11:16
  • This method is brittle, as shown by the above comments, and complex too.
    – l0b0
    Commented Mar 13, 2020 at 20:31
7

You can launch the first script with

python train_v1.py; touch finished

Then simply make a loop that checks regularly if finished exists:

while [ ! -f finished ] ; do     
    sleep 5
done
python train_v2.py
rm finished
3
  • I haven't tried yet but that seems like exactly what I'm looking for. I will try it asap.
    – emremrah
    Commented Mar 12, 2020 at 22:23
  • 4
    Where's the advantage over python train_v1.py; python train_v2.py ? Commented Mar 13, 2020 at 11:03
  • 2
    @GuntramBlohmsupportsMonica I'd say it is more customizable and, above all, you don't need to bother to choose the second program beforehand. Those considerations apart, there is no advantage.
    – Quasímodo
    Commented Mar 13, 2020 at 14:39
3

You can always simply wait for a running program by passing the program ID as a parameter. The PID can be obtained from a ps call. To make it robust in a script is difficult, which is why programs which anticipate that somebody wants to wait for (or kill) them usually write their own PID into a known location. But for the interactive situation you describe looking the PID up and copying it is easy enough.

1

If you don't need to know the exit status of the first script, then I recommend something like what Kusalananda wrote.

If you do need to know the exit status (which you probably don't in this case, but someone else may come along looking for a solution that does this), it's more complicated. I've written a small Linux utility pwait that lets you wait for a process to finish and find out its exit status.

-1

If you are working in Linux (with a bash avaliable) it would be as easy as:

TL;DR: use this:

((python train_v1.py; kill $(cat ./pid) && rm ./pid) &

python train_v2.py & echo $! > pid

The longer version:

You create a sub-shell where your training V1 is running - when V1 finished (regardless of the return code - if you want to check for a return code of 0, use && instead of ;) the output of a file named pid is provided to the kill application and if kill returns successful the pid file is removed.

Afterwards you start the second training and write the process ID of the second training to a file named pid and there the cycle is closed. now both training programs run in the background until python train_v1.py is finished.

1
  • 1
    "now both training programs run in the background..." But OP wants to run V2 after V1 (and pkill -F pid can read the PID from a file).
    – Freddy
    Commented Mar 13, 2020 at 17:38

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .