18

On Machine1, I have a Python2.7 script that computes a big (up to 10MB) binary string in RAM that I'd like to write to a disk file on Machine2, which is a remote machine. What is the best way to do this?

Constraints:

  • Both machines are Ubuntu 13.04. The connection between them is fast -- they are on the same network.

  • The destination directory might not yet exist on Machine2, so it might need to be created.

  • If it's easy, I would like to avoid writing the string from RAM to a temporary disk file on Machine1. Does that eliminate solutions that might use a system call to rsync?

  • Because the string is binary, it might contain bytes that could be interpreted as a newline. This would seem to rule out solutions that might use a system call to the echo command on Machine2.

  • I would like this to be as lightweight on Machine2 as possible. Thus, I would like to avoid running services like ftp on Machine2 or engage in other configuration activities there. Plus, I don't understand security that well, and so would like to avoid opening additional ports unless truly necessary.

  • I have ssh keys set up on Machine1 and Machine2, and would like to use them for authentication.

  • EDIT: Machine1 is running multiple threads, and so it is possible that more than one thread could attempt to write to the same file on Machine2 at overlapping times. I do not mind the inefficiency caused by having the file written twice (or more) in this case, but the resulting datafile on Machine2 should not be corrupted by simultaneous writes. Maybe an OS lock on Machine2 is needed?

I'm rooting for an rsync solution, since it is a self-contained entity that I understand reasonably well, and requires no configuration on Machine2.

5
  • you can take a look at python sockets (tcp sockets in your case). What ever scheme you need can be implemented with them. Commented Oct 5, 2013 at 20:28
  • sftp seems like a likely candidate. wiki.python.org/moin/SecureShell stackoverflow.com/questions/432385/…
    – Robᵩ
    Commented Oct 5, 2013 at 20:28
  • How long would it take to transfer these 10 MB to the other side? Are broken connections and resuming likely? These questions might be relevant to decide if Erik Allik's solution - which would be my favourite as well - is usable here.
    – glglgl
    Commented Oct 5, 2013 at 21:00
  • @SioulSeuguh Not without opening an additional port - which seems to be unwanted here. SSH connection would probably be better...
    – glglgl
    Commented Oct 5, 2013 at 21:01
  • Edited the question to state that the connection between the machines is fast. Commented Oct 5, 2013 at 21:05

6 Answers 6

22

Paramiko supports opening files on remote machines:

import paramiko

def put_file(machinename, username, dirname, filename, data):
    ssh = paramiko.SSHClient()
    ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    ssh.connect(machinename, username=username)
    sftp = ssh.open_sftp()
    try:
        sftp.mkdir(dirname)
    except IOError:
        pass
    f = sftp.open(dirname + '/' + filename, 'w')
    f.write(data)
    f.close()
    ssh.close()


data = 'This is arbitrary data\n'.encode('ascii')
put_file('v13', 'rob', '/tmp/dir', 'file.bin', data)
4
  • +1, great solution to begin with (actually, it doesn't account for deeper paths (/a/b/c/d or so), if even b or c don't exist yet...).
    – glglgl
    Commented Oct 5, 2013 at 21:27
  • @glglgl - agreed, but I probably won't fix it.
    – Robᵩ
    Commented Oct 5, 2013 at 22:34
  • @Robᵩ No, that's up to whoever needs it.
    – glglgl
    Commented Oct 6, 2013 at 12:45
  • Are the data ASCII encoded because f.write(data) requires ASCII data (seems hard to believe) or because it's just good form to specify encoding, even on an example string? Commented Oct 6, 2013 at 21:44
7

You open a new SSH process to Machine2 using subprocess.Popen and then you write your data to its STDIN.

import subprocess

cmd = ['ssh', 'user@machine2',
       'mkdir -p output/dir; cat - > output/dir/file.dat']

p = subprocess.Popen(cmd, stdin=subprocess.PIPE)

your_inmem_data = 'foobarbaz\0' * 1024 * 1024

for chunk_ix in range(0, len(your_inmem_data), 1024):
    chunk = your_inmem_data[chunk_ix:chunk_ix + 1024]
    p.stdin.write(chunk)

I've just verified that it works as advertised and copies all of the 10485760 dummy bytes.

P.S. A potentially cleaner/more elegant solution would be to have the Python program write its output to sys.stdout instead and do the piping to ssh externally:

$ python process.py | ssh <the same ssh command>
12
  • This looks very good, but is there a typo involving quotation marks in the second line? Commented Oct 5, 2013 at 20:57
  • Why shell=True? A mere ssh_cmd_list = ['ssh', 'user@machine2', 'mkdir -p output/dir; cat - > output/dir/file.dat'] followed by a p = subprocess.Popen(ssh_cmd_list, stdin=subprocess.PIPE) makes the stuff much easier to read and removes a layer of complexity, what the additional shell layer would be.
    – glglgl
    Commented Oct 5, 2013 at 20:58
  • @glglgl: then you need the full path to ssh I'm afraid; but anyway... I can't find a typo. Basically I'm just providing sufficiently perfected code that works; the OP is free to amend, adapt, transform and clean up :) Commented Oct 5, 2013 at 21:03
  • Aha. I was unfamiliar with what appears to be a ('foo' 'bar') concatenation syntax. Commented Oct 5, 2013 at 21:11
  • 1
    Just to clear up one misunderstanding: even in shell=False mode, you don't have to provide the full path of the executable - Popen() finds it for you. (See here how subprocess.call(["ls", "-l"]) is working code, and see here for other examples.)
    – glglgl
    Commented Oct 5, 2013 at 21:30
3

A bit modification to @Erik Kaplun answer, the below code worked for me. (using communicate() rather than .stdin.write)

import subprocess
# convert data to compatible format
cmd = ['ssh', 'user@machine2', 'cat - > /path/filename']
p = subprocess.Popen(cmd, stdin=subprocess.PIPE)
p.communicate(data)
1
  • 1
    Concise, nice. Might be including a mkdir, or mention of the pitfall. Save someone some tears
    – mcint
    Commented Jul 1, 2021 at 10:18
2

We can write string to remote file in three simple steps:

  1. Write string to a temp file
  2. Copy temp file to remote host
  3. Remove temp file

Here is my code (without any third parties)

import os

content = 'sample text'
remote_host = 'your-remote-host'
remote_file = 'remote_file.txt'

# step 1
tmp_file = 'tmp_file.txt'
open(tmp_file, 'w').write(content)

# step 2
command = 'scp %s %s:%s' % (tmp_file, remote_host, remote_file)
os.system(command)

# step 3
os.remove(tmp_file)
1
  • I haven't tested this, but it looks nice. Thanks. Commented Oct 15, 2020 at 19:01
0

If just calling a subprocess is all you want, maybe sh.py could be the right thing.

from sh import ssh
remote_host = ssh.bake(<remote host>) 
remote_host.dd(_in = <your binary string>, of=<output filename on remote host>) 
0
0

A solution in which you don't explicitly send your data over some connection would be to use sshfs. You can use it to mount a directory from Machine2 somewhere on Machine1 and writing to a file in that directory will automatically result in the data being written to Machine2.

4
  • This is clever and elegant, but it's not clear what happens if Machine1 reboots. I did not study the docs, but it appears that the connection would be lost, and would need to be re-established manually. Commented Oct 6, 2013 at 20:57
  • Actually, if either Machine1 or Machine2 reboots it could be a problem. Commented Oct 6, 2013 at 21:11
  • @IronPillow Maybe -o reconnect would help?
    – tshepang
    Commented Oct 6, 2013 at 21:22
  • @IronPillow perhaps doing the mount and umount of Machine2 from your Python script could help somewhat
    – brm
    Commented Oct 7, 2013 at 14:38

Not the answer you're looking for? Browse other questions tagged or ask your own question.