0

I have an issue where a tail command in bash script does not work correctly when called from remote by providing only 1 param out of 2 params. But it works correctly if:

  • executed directly on local with 1 param
  • executed directly on local with 2 params
  • executed from remote with 2 params

I have written below script which starts tail with follow. It takes 2 parameters:

  1. TESTNAME : This param is mandatory. This is the name of test case. It creates a log file with this name.
  2. SLAVE_HOST : This param is optional. If provided, it will ssh to the slave host provided and start a similar script on it.

    #!/bin/bash
    TESTNAME="$1"
    
    testdate=$(date +'%m_%d_%Y')
    REG_DIR=/opt/reg-test-results/REG_"$testdate"
    
    #create regression results directory if it does not exist
    mkdir -p "$REG_DIR"
    
    FILENAME="$REG_DIR"/"$TESTNAME"
    
    #if file already exists, create a new one with current time stamp appended to name
    if [ -f "$FILENAME" ]; then
           TIME=$(date +'%m_%d_%Y-%H.%M.%S')
           FILENAME="$FILENAME"_"$TIME"
    fi
    
    echo "$FILENAME" > /opt/reg-test-results/currentTestName
    
    #start tailing
    nohup tail -f -n0 /path/to/log/files/*/*server.log > "$FILENAME" &
    echo "$!" > $REG_DIR/reg_tail.pid
    
    #if slave host is provided, start tailing logs on slave also
    if [ "$#" -gt 1 ]; then
           SLAVE_HOST="$2"
           ssh "$SLAVE_HOST" /path/to/script/startTailLogTestCaseSlave.sh "$FILENAME"
    fi
    

The first few lines of code save params in variables, create a directory structure and a file name for the log file where tailed logs will be directed. After that, I have a nohup tail command to start tailing logs and directing to the log file. This is the line of code which is not working properly. Then if second argument was provided, it will ssh to that host and execute a command on it.

Issue: When running from remote and passing both params, I see a tail process running after executing this script and I see the log file being populated with content properly. But if I provide only first param, then it looks like it starts tail and immediately stops it because I see a new process id in reg_tail.pid file but log file ($FILENAME) is not created and there is no tail process running.

The script runs perfectly with either 1 param or both when executed directly on the machine.

By "When running from remote", I mean ssh to the machine and invoke the script. For e.g.:

$ ssh -t user@host /path/to/script/script.sh testcasename.log

Debugging effort:

Here is what I see when I use set -x and run from remote machine:

When second argument is passed and everything runs normally, I see nohup tail being executed at the very end.

  ....
  + echo 13441
  + '[' 2 -gt 1 ']'
  + SLAVE_HOST=slaveHost
  + ssh slaveHost /path/to/script/startTailLogTestCaseSlave.sh /opt/reg-test-   results/REG_09_11_2015/logs2.log
  + nohup tail -f -n0 /path/to/logs/../check-server.log ...
  nohup: redirecting stderr to stdout
  Connection to hostname closed.

When only first argument is passed, nohup tail is never executed:

  ...
  + echo 13607
  + '[' 1 -gt 1 ']'
  Connection to hostname closed.
8
  • (1) Wow.  You’re logged in on hostA, and you type ssh hostB scriptname filename hostC, and the script calls ssh hostC scriptname filename?  You might want to step back and see whether you can simplify your design.  (2) The script shows the tail command before the ssh "$SLAVE_HOST" … command — but your set -x diagnostic output shows tail after ssh.  Care to explain that?  (3) You should always quote shell variables unless you have a good reason not to, and you’re sure you know what you’re doing. Commented Sep 12, 2015 at 4:37
  • Why do you need nohup here in the first place? I think that might be messing with the output file descriptors in this case. Commented Sep 12, 2015 at 21:57
  • Thank you everyone. @Scott, noted #1. Thanks. About #2, this is the order I am seeing. I do not know the reason. Tail is always being executed at the end when I pass in second argument. I have tried many variations like passing a string with value "null" for SLAVE_HOST and added condition to not ssh to it, if value is "null". It starts tail only when ssh command is executed and starts tail after ssh, even though tail command is before ssh. About #3, updated my code to always use quotes. Commented Sep 14, 2015 at 0:01
  • @Breakthrough, removing nohup did not help. Commented Sep 14, 2015 at 0:02
  • @NehaSharma Why did you add nohup in the first place? Explaining might help debug the issue at hand here. That being said, if you get the same behaviour without nohup at all, then that has nothing to do with your particular problem/question in the first place. Commented Sep 14, 2015 at 0:20

1 Answer 1

1

I resolved the issue by adding a delay after tail command.

nohup tail -f -n0 /path/to/log/files/*/*server.log > "$FILENAME" 2> test.err < /dev/null &
echo "$!" > "$REG_DIR"/reg_tail.pid

#without this delay, the above tail command is not being executed when only one 1 argumet is passed
sleep 1

#if slave host is provided, start tailing logs on slave also
if [ "$#" -gt 1 ]; then

I was wondering why it works after ssh command is executed even though it's unrelated. So, I decided to add a delay. Though I am no expert, but it looks like a race condition. The tail background process was being scheduled to run at the end and in the case where no ssh command was executed, the script exited and session ended before the tail process could get a chance to run. Putting main process to sleep gives a chance to tail process (or thread) to run. Given that, I am not sure that this solution is the best. If I use wait instead of sleep, it gets stuck because "tail -f" will keep running. I am using this script to tail logs and then run a test case. After test case execution completed, I am running another script which reads the pid of tail from where it is stored and kills tail. Please let me know if I am mistaken in my understanding and also if there is a better solution.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .