I need to process >50,000 files using a third-party .exe command-line application. The application takes only one input file at a time, so I have to launch the application >50,000 times.
Each file (each job) usually takes about one second. However, sometimes the application hangs indefinitely.
I have written a Windows shell script that runs all the jobs serially, and checks every second to see whether the job is done. After 10 seconds, it kills the job and moves on to the next. However, it takes about 20 hours. I believe I can bring the total runtime down by a large amount if I run multiple jobs in parallel. The question is how?
In CMD I launch the task with Start, but there is no simple way to recover the process ID (PID) and therefore I cannot easily keep track of which instance has run for how long. I feel like I am trying to reinvent the umbrella. Any suggestions?