17

I want to pipe large video files from AWS S3 into Popen's stdin, which is from Python's point of view a 'file-like object'. This code runs as an AWS Lambda function, so these files won't fit in memory or on the local file system. Also, I don't want to copy these huge files anywhere, I just want to stream the input, process on the fly, and stream the output. I've already got the processing and streaming output bits working. The problem is how to obtain an input stream as a Popen pipe.

Update: I put together a short program that invokes StreamingBody.read(amt=chunk_size) based on a comment. The program reads some of the input file (an mp4 video) and gets stuck, possibly because the consumer of the data (ffmpeg) does not actually run, or maybe its STDIN buffer fills and the whole mess grinds to a halt?

I can access a file in an S3 bucket:

import boto3
s3 = boto3.resource('s3')
response = s3.Object(bucket_name=bucket, key=key).get()
body = response['Body']  

body is a botocore.response.StreamingBody which looks like this:

{
  u'Body': <botocore.response.StreamingBody object at 0x00000000042EDAC8>,
  u'AcceptRanges': 'bytes', 
  u'ContentType': 'video/mp4', 
  'ResponseMetadata': {
    'HTTPStatusCode': 200, 
    'HostId': 'aAUs3IdkXP6vPGwauv6/USEBUWfxxVeueNnQVAm4odTkPABKUx1EbZO/iLcrBWb+ZiyqmQln4XU=', 
    'RequestId': '6B306488F6DFEEE9'
  }, 
  u'LastModified': datetime.datetime(2015, 3, 1, 1, 32, 58, tzinfo=tzutc()),
  u'ContentLength': 393476644, 
  u'ETag': '"71079d637e9f14a152170efdf73df679"', 
  u'Metadata': {'cb-modifiedtime': 'Sun, 01 Mar 2015 01:27:52 GMT'}}

I intend to use body something like this:

from subprocess import Popen, PIPE
Popen(cmd, stdin=PIPE, stdout=PIPE).communicate(input=body)[0]

But of course body needs to be converted into a file-like object. The question is how?

2
  • See my response in this related thread.
    – smallo
    Commented Nov 17, 2016 at 17:37
  • See my response to this related thread.
    – smallo
    Commented Nov 17, 2016 at 17:38

1 Answer 1

14

For reading binary data from StreamingBody use StreamBody.read(). You get a binary string.

4
  • 5
    Calling read() loads the entire video (hundreds of MB) into RAM. I need to stream it by inhaling a chunk at a time
    – Mike Slinn
    Commented Jan 12, 2016 at 20:58
  • 1
    @MikeSlinn StreamingBody.read(amt=chunk_size) lets you process chunk_size bytes
    – Josh J
    Commented Jan 19, 2016 at 20:40
  • 1
    I put together a short program that invokes StreamingBody.read(amt=chunk_size) from another thread. It reads 1/3 of the input file (an mp4 video) and gets stuck, possibly because the consumer of the data (ffmpeg), which runs on the original thread, does not actually run. Maybe its STDIN buffer fills and the whole mess grinds to a halt?
    – Mike Slinn
    Commented Jan 20, 2016 at 5:10
  • 2
    When the StreamingBody just contains a JSON document, such as a device shadow in IoT, how can we tell what it has been encoded with? I've seen a lot of assumptions that it'll be UTF-8, but I don't see this actually documented. Commented Jul 28, 2017 at 18:48

Not the answer you're looking for? Browse other questions tagged or ask your own question.