20

I've got this simple script below to stream compressed MySQL dumps to Amazon S3 bucket in parallel:

#!/bin/bash

COMMIT_COUNT=0
COMMIT_LIMIT=2

for i in $(cat list.txt); do

        echo "$i "

        mysqldump -B $i | bzip2 -zc | gof3r put -b s3bucket -k $i.sql.bz2 &


        (( COMMIT_COUNT++ ))

        if [ ${COMMIT_COUNT} -eq ${COMMIT_LIMIT} ]; then
        COMMIT_COUNT=0
        wait
        fi

done

if [ ${COMMIT_COUNT} -gt 0 ]; then
        wait
fi

The output looks like this:

database1 
database2 
duration: 2.311823213s
duration: 2.317370326s

Is there a way to print this on one line for each dump?

database1 - duration: 2.311823213s
database2 - duration: 2.317370326s

The echo -n switch doesn't help in this case.

EDIT: Wed May 6 15:17:29 BST 2015

I was able to achieve expected results based on accepted answer:

echo "$i -" $(mysqldump -B $i| bzip2 -zc | gof3r put -b s3bucket -k $i.sql.bz2 2>&1) &

- however a command that is running in a subshell is not returning exit status to a parent shell because it's running in parallel so I'm not able to verify if it succeed or failed.

4
  • Try remove the ampersand & at the end of mysqldump .. command.
    – tivn
    Commented May 3, 2015 at 17:42
  • 2
    Replace echo "$i " by echo -n "$i ".
    – Cyrus
    Commented May 3, 2015 at 17:52
  • 3
    Consult the Bash FAQ for the correct way to iterate over a file line-by-line.
    – chepner
    Commented May 3, 2015 at 18:03
  • Sub-shells report error status. You just lose that because of the "wrapping" echo. Commented May 6, 2015 at 15:52

7 Answers 7

7

I think this command will do what you want:

echo "$i -" `(mysqldump -B $i | bzip2 -zc | gof3r put -b s3bucket -k $i.sql.bz2) 2>&1` &

Or, use $() in place of backticks :

echo "$i -" $( (mysqldump -B $i| bzip2 -zc | gof3r put -b s3bucket -k $i.sql.bz2) 2>&1 ) &

The echo command will wait for mysqldump .. result to finish before try to print together with $i. The sub-shell ( … ) and error redirection 2>&1 ensure that error messages go into the echoed output too. The space after the $( is necessary because $(( without a space is a different special operation — an arithmetic expansion.

6
  • Thanks for your help, do you know a solution to my query from updated: EDIT: Wed May 6 15:17:29 BST 2015?
    – HTF
    Commented May 6, 2015 at 14:26
  • If the output from that mysqldump command contains important internal whitespace (not newlines but spacing whitespace) this will lose that spacing. If the output ever contains shell metacharacters this will also trigger globbing/etc. on the output. This isn't a good solution. Catch the output and strip newlines specifically if that's the desired goal. Commented May 6, 2015 at 15:55
  • @EtanReisner If you want to preserve whitespace, this can be done by moving the second double quote to the end of the command (before ampersand).
    – tivn
    Commented May 6, 2015 at 21:06
  • That would preserve newlines in the mysql output which the OP appeared to explicitly not want. (I have no idea if the OP would want newlines from the command output should there be any or not but the newlines they did have they didn't want. And yes, that's what you should do instead of this comparatively unsafe and relatively incorrect quoting. Commented May 6, 2015 at 21:15
  • @EtanReisner I think the newline that is not wanted by OP is not caused by the mysql command. Rather it is because the echo command was run foreground while mysql command was run background but taking quite long time to finish (2 seconds).
    – tivn
    Commented May 6, 2015 at 21:32
5

Thanks for all your help but I think I've finally found an optimal solution for this.

Basically I used xargs to format the output so each entry (dump name + duration time) is on one line. I also added the job spec to wait command to get the exit status:

man bash

wait [n ...] Wait for each specified process and return its termination status. Each n may be a process ID or a job specification; if a job spec is given, all processes in that job's pipeline are waited for. If n is not given, all currently active child processes are waited for, and the return status is zero. If n specifies a non-existent process or job, the return status is 127. Otherwise, the return status is the exit status of the last process or job waited for.

Test:

# sh -c 'sleep 5; exit 1' &
[1] 29970
# wait; echo $?
0
# sh -c 'sleep 5; exit 1' &
[1] 29972
# wait $(jobs -p); echo $?
1

Final script:

#!/bin/bash

COMMIT_COUNT=0
COMMIT_LIMIT=2


while read -r i; do

    mysqldump -B $i | bzip2 -zc | gof3r put -b s3bucket -k $i.sql.bz2 |& xargs -I{} echo "${DB} - {}" &

    (( COMMIT_COUNT++ ))

    if [ ${COMMIT_COUNT} -eq ${COMMIT_LIMIT} ]; then
        COMMIT_COUNT=0
        wait $(jobs -p)
    fi

done < list.txt

if [ ${COMMIT_COUNT} -gt 0 ]; then
     wait $(jobs -p)
fi

if [ $? -ne 0 ]; then
     echo "ERROR: Backups failed"
     exit 1
fi  
3
  • The problem with using wait this way is contained in the sentence Otherwise, the return status is the exit status of the last process or job waited for.. Waiting on more than one job will only get you one exit status and it will be the exit status of the last job to finish. (Also $(cat list.txt) there is pointless just use < list.txt.) Commented May 7, 2015 at 22:51
  • Yes, this is true do you know how to overcome this problem so it will exit immediately on failure?
    – HTF
    Commented May 8, 2015 at 7:15
  • No, I don't. Not if you run them in the background in parallel. Commented May 8, 2015 at 11:30
3
+25

Expanding on your answer, to exit the script immediately upon failure, you have to save the pids of the background processes in an array. In your while loop add pids[COMMIT_COUNT]=$! after the mysqldump command.

Then you could write a function to loop over all these pids, and exit if one of them failed:

wait_jobs() {
    for pid in "${pids[@]}"; do
        wait ${pid}
        if [ $status -ne 0 ]; then
            echo "ERROR: Backups failed"
            exit 1
        fi
    done
}

Call this function instead of wait $(jobs -p) in the script.

Notes

You can replace the pids array with jobs -p in the for loop, but then you will not get the pids of jobs that completed before the call to the loop.

The wait_jobs() function above cannot be used in a subshell, the exit 1 call will only terminate the subshell then.


The complete script:

#!/bin/bash

COMMIT_COUNT=0
COMMIT_LIMIT=2

wait_jobs() {
    for pid in "${pids[@]}"; do
        wait ${pid}
        if [ $status -ne 0 ]; then
            echo "ERROR: Backups failed"
            exit 1
        fi
    done
}

while read -r i; do

    mysqldump -B $i | bzip2 -zc | gof3r put -b s3bucket -k $i.sql.bz2 |& xargs -I{} echo "${DB} - {}" &
    # save the pid of the background job so we can get the
    # exit status with wait $pid later
    pids[COMMIT_COUNT]=$!

    (( COMMIT_COUNT++ ))

    if [ ${COMMIT_COUNT} -eq ${COMMIT_LIMIT} ]; then
        COMMIT_COUNT=0
        wait_jobs
    fi

done < list.txt

wait_jobs
1

Regarding your additional question about exit status, let me write another answer. Because $() will run a subshell, I don't think it is possible to return the exit status to the main shell like normal command would. But it is possible to write the exit status to a file to be examined later. Please try command below. It will create file called status-$i.txt containing two lines. One is for mysqldump, the other for gof3r.

e="status-$i.txt"
echo -n > $e

echo "$i -" $( \
      ( mysqldump -B $i 2>&1; echo m=$? >> $e ) \
    |   bzip2 -zc \
    | ( gof3r put -b s3bucket -k $i.sql.bz2 2>&1; echo g=$? >> $e ) \
) &

You may also need to clean-up all status-*.txt files at the start of your script.

0

I would make separate function to control all the process and then run this function in background instead of running mysqldump itself.

By doing this you will have several processes running simultaneously and at the same time you'll have control over mysqldump as it was run synchronously

#!/bin/bash

do_job(){
    param=$1
    echo job $param started... >&2  # Output to stderr as stdout is grabbed
    sleep $[$RANDOM/5000]
    echo $RANDOM  # Make some output
    [ $RANDOM -ge 16383 ]  # Generate exit code
}

control_job() {
    param=$1
    output=`do_job $param`
    exit_code=$?
    echo $1 printed $output and exited with $exit_code
}

JOBS_COUNT=0
JOBS_LIMIT=2

for i in database1 database2 database3 database4; do

    control_job $i &

    (( JOBS_COUNT++ ))

    if [ $JOBS_COUNT -ge $JOBS_LIMIT ]; then
        (( JOBS_COUNT-- ))
        wait -n 1  # wait for one process to exit
    fi

done

wait  # wait for all processes running

Here do_job is used in place of your mysqldump pipline. BTW, there's a small improvement here. You probably do not want to wait for all spawned processes when you've reached the limit. It will be enough to wait for arbitrary one. That's what wait -n 1 does

0

You try to do parallelization with your script. I'd recommend not to re-invent the wheel but to use a tried and tested tool: GNU parallel. The tutorial is huge: http://www.gnu.org/software/parallel/parallel_tutorial.html

It has a different options for jobs that return with exit value != 0: abort on the first error or continue working till the end.

One of the advantages of GNU parallel to the script of the OP is that it immediately starts the third job as soon as the first one is finished.

2
  • I use CentOS and unfortunately I can't install software that is not available in official repos.
    – HTF
    Commented May 13, 2015 at 16:29
  • Maybe you'll find something here: gnu.org/software/parallel/…
    – hagello
    Commented May 15, 2015 at 12:06
-3

untested, etc.

#!/bin/sh

COMMIT_COUNT=0
COMMIT_LIMIT=2

_dump() {
  # better use gzip or xz. There's no benefit using bzip2 afaict
  output="$(mysqldump -B "$1" | bzip2 -zc | gof3r put -b s3bucket -k "$1.sql.bz2" 2>&1)"
  [ "$?" != 0 ] && output="failed"
  printf "%s - %s\n" "$1" "$output"
}

while read i; do
  _dump "$i" &

  (( COMMIT_COUNT++ ))

  if [ ${COMMIT_COUNT} -eq ${COMMIT_LIMIT} ]; then
    COMMIT_COUNT=0
    wait
  fi
done < list.txt

wait

Not the answer you're looking for? Browse other questions tagged or ask your own question.