10

I have two python programs (one is a subprocess) that need to communicate with each other. Currently I am doing that through stdin and stdout. However, writing to the subprocess's stdin seems painfully slow.

a.py, a program that takes an arbitrary line of input and prints the time:

from time import time, sleep
from sys import stdout, stdin
while True:
    stdin.readline()
    stdout.write('%f\n' % time())
    stdout.flush()

b.py, a program that runs a.py and times how long it took to write to the program's stdin and read from it's stdout:

from time import time
from subprocess import PIPE, Popen
from threading import Thread
stdin_times = []
stdout_times = []
p = Popen(['python', 'a.py'], stdin=PIPE, stdout=PIPE)
for i in range(100000):
    t1 = time()
    p.stdin.write(b'\n')
    p.stdin.flush()
    t2 = float(p.stdout.readline().strip().decode())
    t3 = time()
    stdin_times.append(t2 - t1)
    stdout_times.append(t3 - t2)
p.kill()
print('stdin (min/ave):', min(stdin_times), sum(stdin_times) / len(stdin_times))
print('stdout (min/ave):', min(stdout_times), sum(stdout_times) / len(stdout_times))

Sample output:

stdin (min/ave): 1.69277191162e-05 0.000138891274929
stdout (min/ave): 1.78813934326e-05 2.09228754044e-05

I'm using Python 3.1.2 on Ubuntu 10.10.

Why is writing to a.py's stdin so much slower than reading from its stdout? Is there anyway I can get these two programs to communicate faster?

10
  • How many times have you run this test to come up with those numbers? Perhaps between the time a.py and b.py were scheduled, crond may have started an updatedb task or ntpdate might have skewed your clock or...
    – sarnold
    Commented Mar 22, 2011 at 7:25
  • Updated to show averages Commented Mar 22, 2011 at 7:37
  • 4
    You don't want averages in this context, you want minimum times (since "best case" = "nothing else interfering").
    – ncoghlan
    Commented Mar 22, 2011 at 7:44
  • 1
    You also want to get all the extraneous junk out of your timing loop. Cache the method lookups, use b'\n' instead of calling encode, calculate t2 after recording t3. However, you're still going to be at the mercy of the OS scheduler. If it has decided the worker process is non-interactive, it may be optimising its scheduling for IO throughput rather than low latency.
    – ncoghlan
    Commented Mar 22, 2011 at 7:52
  • I've updated to show mins and they are comparable, but I don't think that really helps me. I'm not trying to prove the theoretical inferiority of stdin; I just need my process to be able to communicate faster (on average). Commented Mar 22, 2011 at 7:53

2 Answers 2

2

I'd see if you can reproduce this when disabling buffering on both input and output. I have a hunch that output is being (line) buffered by default (as it is in most languages: perl, .NET, C++ iostreams)

2
  • python subprocess defaults bufsize to 0 <docs.python.org/release/3.1.3/library/subprocess.html>. Also, I'm obviously flushing every time the output needs to be flushed. Isn't flushing when I need to just as good as setting the buffer sizes to 0? Commented Mar 28, 2011 at 18:42
  • it's just as good, but it's not the same. depending on what was costing time, it may make the difference...
    – sehe
    Commented Mar 28, 2011 at 18:56
0

If I try this a few times, the variance of the numbers is very high.

Maybe you should set up a more sane test to benchmark stdin and stdout, without a lot of other overhead, so you don't measure other stuff that's going on on your cpu. Also you're measuring string operations and float conversion.

2
  • 1
    That's not the opposite. stdin is still slower in your example. Also, I've updated my post to show averages Commented Mar 22, 2011 at 7:41
  • 1
    That's not actually reversed, it's just that both are being printed in the same output format -- the stdin time is still greater than the stdout time.
    – sarnold
    Commented Mar 22, 2011 at 7:41

Not the answer you're looking for? Browse other questions tagged or ask your own question.