-1

I am trying to detect whenever the following script (random_fail.sh) fails --which happens rarely-- by running it inside a while loop in the second script (catch_error.sh):

#!/usr/bin/env bash
# random_fail.sh

 n=$(( RANDOM % 100 ))

 if [[ n -eq 42 ]]; then
    echo "Something went wrong"
    >&2 echo "The error was using magic numbers"
    exit 1
 fi

 echo "Everything went according to plan"
#!/usr/bin/env bash
# catch_error.sh

count=0  # The number of times before failing
error=0  # assuming everything initially ran fine

while [ "$error" != 1 ]; do
    # running till non-zero exit

    # writing the error code from the radom_fail script into /tmp/error
    bash ./random_fail.sh 1>/tmp/msg 2>/tmp/error

    # reading from the file, assuming 0 written inside most of the times
    error="$(cat /tmp/error)"

    echo "$error"

    # updating the count
    count=$((count + 1))

done

echo "random_fail.sh failed!: $(cat /tmp/msg)"
echo "Error code: $(cat /tmp/error)"
echo "Ran ${count} times, before failing"

I was expecting that the catch_error.sh will read from /tmp/error and come out of the loop once a particular run of random_fail.sh exits with 1.

Instead, the catch script seems to be running forever. I think this is because the error code is not being redirected to the /tmp/error file at all.

Please help.

2
  • [ "$error" != 1 ] is true if random_fail.sh prints a lone digit 1 to stderr. As long as this doesn't happen, your script will loop. Commented Nov 7, 2022 at 11:57
  • The string "The error was using magic numbers" is never equal to 1 Commented Nov 7, 2022 at 12:16

3 Answers 3

1

You aren't catching the error code in the proper/usual manner. Also, no need to prefix the execution with the "bash" command, when it already contains the shebang. Lastly, curious why you don't simply use #!/bin/bash instead of #!/usr/bin/env bash .

Your second script should be modified to look like this:

#!/usr/bin/env bash
# catch_error.sh

count=0  # The number of times before failing
error=0  # assuming everything initially ran fine

while [ "$error" != 1 ]; do
    # running till non-zero exit

    # writing the error code from the radom_fail script into /tmp/error
    ./random_fail.sh 1>/tmp/msg 2>/tmp/error
    error=$?

    echo "$error"

    # updating the count
    count=$((count + 1))

done

echo "random_fail.sh failed!: $(cat /tmp/msg)"
echo "Error code: ${error}"
echo "Ran ${count} times, before failing"
4
  • I understand that $? gives out the error code of the last command. But I thought the error code could be redirected and written to a file via 2>file. Instead the /tmp/error always seems to be empty. Why?
    – Vishal
    Commented Nov 6, 2022 at 3:59
  • /tmp/error is empty because you probably Ctrl+C the script when it runs forever, and when you're stopping it it's after a successful run, so it doesn't print anything to /tmp/error, and so ./random_fail.sh 1>/tmp/msg 2>/tmp/error leaves it empty.
    – root
    Commented Nov 6, 2022 at 9:50
  • In addition to what root said, the output to /tmp/error would only have a string if it had anything. It would never contain an error code, because that is never being directed into the file. Commented Nov 6, 2022 at 14:24
  • Every script has standard output (stdout/1) and standard error (stderr/2). Those streams capture what you directed to each one from within the script, using ">&2" for all messages to stderr, or no redirect for stdout. The only way to have the return code value directed to a file (in addition to its being always available to the shell that called the script) is by using the trap command, specifying an action for every return code scenario that you will encounter in the script. See good example of trap usage: stackoverflow.com/questions/20602675/… Commented Nov 10, 2022 at 1:16
0

[ "$error" != 1 ] is true if random_fail.sh prints a lone digit 1 to stderr. As long as this doesn't happen, your script will loop. You could instead test whether there has been written anything to stderr. There are several possibilities to achieve this:

printf '' >/tmp/error
while [[ ! -s /tmp/error ]]

or

error=
while (( $#error == 0 ))

or

error=
while [[ -z $error ]]
0

/tmp/error will always be either empty or will contain the line "The error was using magic numbers". It will never contain 0 or 1. If you want to know the exit value of the script, just check it directly:

if ./random_fail.sh 1>/tmp/msg 2>/tmp/error; then error=1; else error=0; fi

Or, you can do:

./random_fail.sh 1>/tmp/msg 2>/tmp/error
error=$?

But don't do either of those. Just do:

while ./random_fail.sh; do ...; done

As long as random_fail.sh (please read https://www.talisman.org/~erlkonig/documents/commandname-extensions-considered-harmful/ and stop naming your scripts with a .sh suffix) returns 0, the loop body will be entered. When it returns non-zero, the loop terminates.

Not the answer you're looking for? Browse other questions tagged or ask your own question.