15

It's easiest to explain in code:

require 'timeout'

puts "this block will properly kill the sleep after a second"

IO.popen("sleep 60") do |io|
  begin
    Timeout.timeout(1) do
      while (line=io.gets) do
        output += line
      end
    end
  rescue Timeout::Error => ex
    Process.kill 9, io.pid
    puts "timed out: this block worked correctly"
  end
end

puts "but this one blocks for >1 minute"

begin
  pid = 0
  Timeout.timeout(1) do
    IO.popen("sleep 60") do |io|
      pid = io.pid
      while (line=io.gets) do
        output += line
      end
    end
  end
rescue Timeout::Error => ex
  puts "timed out: the exception gets thrown, but much too late"
end

My mental model of the two blocks is identical:

flow chart

So, what am I missing?

edit: drmaciver suggested on twitter that in the first case, for some reason, the pipe socket goes into non-blocking mode, but in the second it doesn't. I can't think of any reason why this would happen, nor can I figure out how to get the descriptor's flags, but it's at least a plausible answer? Working on that possibility.

5
  • Which ruby are you running? Commented Jun 21, 2013 at 15:37
  • this behavior occurs on at least 1.8.7 and 1.9.3. jruby blocks for all 60 on both blocks, which is the behavior I'd have guessed a priori.
    – llimllib
    Commented Jun 21, 2013 at 15:46
  • Note that your puts("but this one...") between the two blocks for me waits until the first sleep is complete, because the first IO#popen block is dutifully in a call to waitpid(). If you don't want that, then your rescue logic needs to kill the child process.
    – pilcrow
    Commented Jun 21, 2013 at 15:54
  • @pilcrow the real code does kill the proc, but I removed it for this code... I shouldn't have. Will re-add it, thanks
    – llimllib
    Commented Jun 21, 2013 at 15:56
  • @llimllib Timeout accepts an arbitrary class as a second argument to override the default of Timeout::Error. If you plug in your own class that records system time when initialized, you should be able to determine if your problem is due to when the Timeout tries to raise, or to the way the Ruby VM propogates raised exceptions in the context of subprocesses. Not an answer, but maybe a path to one.
    – animal
    Commented Jun 21, 2013 at 16:53

2 Answers 2

18

Aha, subtle.

There is a hidden, blocking ensure clause at the end of the IO#popen block in the second case. The Timeout::Error is raised raised timely, but you cannot rescue it until execution returns from that implicit ensure clause.

Under the hood, IO.popen(cmd) { |io| ... } does something like this:

def my_illustrative_io_popen(cmd, &block)
  begin
    pio = IO.popen(cmd)
    block.call(pio)      # This *is* interrupted...
  ensure
    pio.close            # ...but then control goes here, which blocks on cmd's termination
  end

and the IO#close call is really more-or-less a pclose(3), which is blocking you in waitpid(2) until the sleeping child exits.

You can verify this like so:

#!/usr/bin/env ruby

require 'timeout'

BEGIN { $BASETIME = Time.now.to_i }

def xputs(msg)
  puts "%4.2f: %s" % [(Time.now.to_f - $BASETIME), msg]
end

begin
  Timeout.timeout(3) do
    begin
      xputs "popen(sleep 10)"
      pio = IO.popen("sleep 10")
      sleep 100                     # or loop over pio.gets or whatever
    ensure
      xputs "Entering ensure block"
      #Process.kill 9, pio.pid      # <--- This would solve your problem!
      pio.close
      xputs "Leaving ensure block"
    end
  end
rescue Timeout::Error => ex
  xputs "rescuing: #{ex}"
end

So, what can you do?

You'll have to do it the explicit way, since the interpreter doesn't expose a way to override the IO#popen ensure logic. You can use the above code as a starting template and uncomment the kill() line, for example.

2
  • I was staring for so long at io.c, looking just a few lines above the ensure, without seeing the ensure or considering it at all. Great answer, thanks so much.
    – llimllib
    Commented Jun 21, 2013 at 17:53
  • Is it possible to get exit status with this solution?
    – tiktak
    Commented May 13, 2015 at 9:55
1

In the first block, the timeout is raised in the child, killing it and returning control to the parent. In the second block, the timeout is raised in the parent. The child never gets the signal.

See io.c https://github.com/ruby/ruby/blob/trunk/io.c#L6021 and timeout.rb https://github.com/ruby/ruby/blob/trunk/lib/timeout.rb#L51

4
  • I know it's not a detailed answer Bill, but that's the way I read the blocks. Commented Jun 21, 2013 at 15:47
  • 1
    A block passed to IO#popen is executed in the context of the parent process. I'm not sure what you mean when you say that the child process could or could not "[get] the signal."
    – pilcrow
    Commented Jun 21, 2013 at 15:50
  • @JonathanJulian I've had those two files open in a split screen, trying to figure it out. As far as I can tell, the timeout is spawned from the main thread in both examples. Here's where popen runs the block it's passed: github.com/ruby/ruby/blob/trunk/io.c#L6075
    – llimllib
    Commented Jun 21, 2013 at 16:00
  • @JonathanJulian I instrumented it to show that the timeout is being spawned from the same thread both times: gist.github.com/llimllib/48da4b10d9b6bd6538d5
    – llimllib
    Commented Jun 21, 2013 at 16:03

Not the answer you're looking for? Browse other questions tagged or ask your own question.