11

I have a local machine which is supposed to make an SSH session to a remote master machine and then another inner SSH session from the master to each of some remote slaves, and then execute 2 commands i.e. to delete a specific directory and recreate it.

Note that the local machine has passwordless SSH to the master and the master has passwordless SSH to the slaves. Also all hostnames are known in .ssh/config of the local/master machines and the hostnames of the slaves are in slaves.txt locally and I read them from there.

So what I do and works is this:

username="ubuntu"
masterHostname="myMaster"
while read line
do

    #Remove previous folders and create new ones.
    ssh -n $username@$masterHostname "ssh -t -t $username@$line "rm -rf Input Output Partition""
    ssh -n $username@$masterHostname "ssh -t -t $username@$line "mkdir -p EC2_WORKSPACE/$project Input Output Partition""


    #Update changed files...
    ssh -n $username@$masterHostname "ssh -t -t $username@$line "rsync --delete -avzh /EC2_NFS/$project/* EC2_WORKSPACE/$project""

done < slaves.txt 

This cluster is on Amazon EC2 and I have noticed that there are 6 SSH sessions created at each iteration which induces a significant delay. I would like to combine these 3 commands into 1 to get fewer SSH connections. So I tried to combine the first 2 commands into

ssh -n $username@$masterHostname "ssh -t -t $username@$line "rm -rf Input Output Partition && mkdir -p EC2_WORKSPACE/$project Input Output Partition""

But it doesn't work as expected. It seems to execute the first one (rm -rf Input Output Partition) and then exits the session and goes on. What can I do?

1
  • 3
    Instead of such indirection in command you can use -J option that would define your jump host.
    – Hauleth
    Commented Jul 26, 2017 at 21:21

4 Answers 4

15

Consider that && is a logical operator. It does not mean "also run this command" it means "run this command if the other succeeded".

That means if the rm command fails (which will happen if any of the three directories don't exist) then the mkdir won't be executed. This does not sound like the behaviour you want; if the directories don't exist, it's probably fine to create them.

Use ;

The semicolon ; is used to separate commands. The commands are run sequentially, waiting for each before continuing onto the next, but their success or failure has no impact on each other.

Escape inner quotes

Quotes inside other quotes should be escaped, otherwise you're creating an extra end point and start point. Your command:

ssh -n $username@$masterHostname "ssh -t -t $username@$line "rm -rf Input Output Partition && mkdir -p EC2_WORKSPACE/$project Input Output Partition""

Becomes:

ssh -n $username@$masterHostname "ssh -t -t $username@$line \"rm -rf Input Output Partition && mkdir -p EC2_WORKSPACE/$project Input OutputPartition\""

Your current command, because of the lack of escaped quotes should be executing:

ssh -n $username@$masterHostname "ssh -t -t $username@$line "rm -rf Input Output Partition

if that succeeds:

mkdir -p EC2_WORKSPACE/$project Input Output Partition"" # runs on your local machine

You'll notice the syntax highlighting shows the entire command as red on here, which means the whole command is the string being passed to ssh. Check your local machine; you may have the directories Input Output and Partition where you were running this.

2
  • I understand your points. One part I was confused by was the semicolon, since I thought it would execute more than 1 commands at the same time which is why I didn't use it.
    – mgus
    Commented Jul 26, 2017 at 18:02
  • The semicolon will not cause the commands to execute at the same time see here as a reference for executing commands. The & causes commands to run int he background, which means they wont be waited to finish before moving onto the next.
    – Centimane
    Commented Jul 26, 2017 at 18:09
11

You can always define in your jumpbox Multiplexing in OpenSSH

Multiplexing is the ability to send more than one signal over a single line or connection. With multiplexing, OpenSSH can re-use an existing TCP connection for multiple concurrent SSH sessions rather than creating a new one each time.

An advantage with SSH multiplexing is that the overhead of creating new TCP connections is eliminated. The overall number of connections that a machine may accept is a finite resource and the limit is more noticeable on some machines than on others, and varies greatly depending on both load and usage. There is also significant delay when opening a new connection. Activities that repeatedly open new connections can be significantly sped up using multiplexing.

For that do in /etc/ssh/ssh_config:

ControlMaster auto
ControlPath ~/.ssh/controlmasters/ssh_mux_%h_%p_%r
ControlPersist 30m

In this way, any consecutive connections made to the same server in the following 30 minutes will be done reusing the previous ssh connection.

You can also define it for a machine or group of machines. Taken from the link provided.

Host machine1
    HostName machine1.example.org
    ControlPath ~/.ssh/controlmasters/%r@%h:%p
    ControlMaster auto
    ControlPersist 10m
8
  • That's interesting! Is there a way to explicitly turn it on and off for a given connection? Seems like overkill to so drastically change all SSH connections for this one use case. Can it be used in a more precise manner? To start multiplexing only a certain connection?
    – Centimane
    Commented Jul 26, 2017 at 23:36
  • @Centimane yeah, updated the answer Commented Jul 27, 2017 at 0:24
  • 1
    I would suggest to put the socket in user's home rather than a world read-writable /tmp/.
    – heemayl
    Commented Jul 27, 2017 at 3:43
  • @heemayl Good point. I will edit it when in a computer. Commented Jul 27, 2017 at 6:23
  • @RuiFRibeiro Also looks like, according to man ssh, ControlPath, ControlMaster and ControlPersist are valid options to pass an ssh command using -o. Could be an even more precise use case, set up the multiplexing in the first ssh of the script and recycle it for the others, but otherwise avoid the performance penalty. I wonder what the benchmark of multiplexing VS not for 3 SSH connections, given that "There is also significant delay when opening a new connection"
    – Centimane
    Commented Jul 27, 2017 at 10:21
4

You can put all your commands into a separate script on your "master" server.

Master Script

#!/bin/bash
rm -rf "Input Output Partition"
mkdir -p "EC2_WORKSPACE/$project Input Output Partition"

Then in your ssh script call it like this: SSH Script

username="ubuntu"
masterHostname="myMaster"
while read line
do
ssh -n $username@$masterHostname "ssh -t -t $username@$line < /path/to/masterscript.sh"
ssh -n $username@$masterHostname "ssh -t -t $username@$line "rsync --delete -avzh /EC2_NFS/$project/* EC2_WORKSPACE/$project""
done < slaves.txt 

OR if all files must be on the initial machine you could do something like this:

script1

script2="/path/to/script2"
username="ubuntu"
while read line; do
cat $script2 | ssh -t -t $username@line
done < slaves.txt

script2

#!/bin/bash
rm -rf "Input Output Partition"
mkdir -p "EC2_WORKSPACE/$project Input Output Partition"
rsync --delete -avzh "/EC2_NFS/$project/* EC2_WORKSPACE/$project"

ssh script

script1="/path/to/script1"
username="ubuntu"
masterHostname="myMaster"
cat $script1 | ssh -n $username@$masterHostname
1

Some time ago, I had occasion to use control sockets like the other answers recommend (this answer is essentially a combination of using control sockets like this answer and scripts like this answer).

The use case was a hack: The authorized_keys of the target user was overwritten periodically by a scheduled task and I wanted to quickly test things without going through red-tape needed to add something to that file. So I'd setup a while loop which added the key to that file as needed, run my test, and the cancel the loop. However, there would be a small window where the scheduled task would overwrite the file and my loop would still be sleeping. So, setting up a control socket at the start would let my script SSH later without problems:

#! /bin/bash -xe
. "${CONFIG_DIR}/scripts/setup-ssh.sh"

# Build and test
export TEST_LABEL="${_started_by}-${BUILD_TAG%-BUILD*}"
#...
xargs --arg-file test-list \
    --no-run-if-empty \
    --process-slot-var=NUM \
    --max-procs=${#SERVERS[@]} \
    --max-args="${BATCH_SIZE:-20}" \
    "${CONFIG_DIR}/scripts/run-test.sh"

Where setup-ssh.sh is:

export SSH_CONFIG="${CONFIG_DIR}/scripts/.ssh-config"
mapfile -t SERVERS < "${CONFIG_DIR}/scripts/hosts"

for SERVER in "${SERVERS[@]}"
do
    while ! ssh -F "${SSH_CONFIG}" "${SERVER}" -fnN; do sleep 1; done
    scp -F "${SSH_CONFIG}" "${CONFIG_DIR}/scripts/ssh-script.sh" "${SERVER}":"${TEST_LABEL}.sh"
done

And .ssh-config:

Host test-*
  User test
  StrictHostKeyChecking no
  ControlMaster auto
  ControlPath /tmp/ssh-%h-%p-%r

And run-test.sh:

mapfile -t TEST_SERVERS < "${CONFIG_DIR}/scripts/hosts"
ssh -F "${SSH_CONFIG}" "${TEST_SERVERS[$NUM]}" "./${TEST_LABEL}.sh"

The sequence goes like this:

  • The main script (shown first) sources setup-ssh.sh.
  • setup-ssh.sh busy-loops the servers until all of them have a control socket setup. The hosts file simply lists the server hostnames one per line.
  • Since the configuration specifying the control socket is only in ${CONFIG_DIR}/scripts/.ssh-config, unless I specify that file using -F, SSH connections won't use it. So this allows me to use the control socket only where I need them using the F option.
  • The setup script also copies the test execution script to the servers. The execution script itself contains a bunch of commands, and because I copied the execution script, I don't have to worry about an additional layer of quoting for SSH (and the additional cognitive overhead for figuring out what gets expanded when).
  • Then the main script uses xargs to distribute the workload over the servers, by starting new jobs as soon as running ones end.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .