How to use botocore.response.StreamingBody as stdin PIPE

Question

I want to pipe large video files from AWS S3 into Popen's stdin, which is from Python's point of view a 'file-like object'. This code runs as an AWS Lambda function, so these files won't fit in memory or on the local file system. Also, I don't want to copy these huge files anywhere, I just want to stream the input, process on the fly, and stream the output. I've already got the processing and streaming output bits working. The problem is how to obtain an input stream as a Popen pipe.

Update: I put together a short program that invokes StreamingBody.read(amt=chunk_size) based on a comment. The program reads some of the input file (an mp4 video) and gets stuck, possibly because the consumer of the data (ffmpeg) does not actually run, or maybe its STDIN buffer fills and the whole mess grinds to a halt?

I can access a file in an S3 bucket:

import boto3
s3 = boto3.resource('s3')
response = s3.Object(bucket_name=bucket, key=key).get()
body = response['Body']

body is a botocore.response.StreamingBody which looks like this:

{
  u'Body': <botocore.response.StreamingBody object at 0x00000000042EDAC8>,
  u'AcceptRanges': 'bytes', 
  u'ContentType': 'video/mp4', 
  'ResponseMetadata': {
    'HTTPStatusCode': 200, 
    'HostId': 'aAUs3IdkXP6vPGwauv6/USEBUWfxxVeueNnQVAm4odTkPABKUx1EbZO/iLcrBWb+ZiyqmQln4XU=', 
    'RequestId': '6B306488F6DFEEE9'
  }, 
  u'LastModified': datetime.datetime(2015, 3, 1, 1, 32, 58, tzinfo=tzutc()),
  u'ContentLength': 393476644, 
  u'ETag': '"71079d637e9f14a152170efdf73df679"', 
  u'Metadata': {'cb-modifiedtime': 'Sun, 01 Mar 2015 01:27:52 GMT'}}

I intend to use body something like this:

from subprocess import Popen, PIPE
Popen(cmd, stdin=PIPE, stdout=PIPE).communicate(input=body)[0]

But of course body needs to be converted into a file-like object. The question is how?

See my response in this related thread.
– smallo
Commented Nov 17, 2016 at 17:37 — smallo, Commented Nov 17, 2016 at 17:37
See my response to this related thread.
– smallo
Commented Nov 17, 2016 at 17:38 — smallo, Commented Nov 17, 2016 at 17:38

David · Accepted Answer · 2017-08-16 17:11:44Z

14

For reading binary data from StreamingBody use StreamBody.read(). You get a binary string.

edited Aug 16, 2017 at 17:11

David

11.4k3 gold badges42 silver badges46 bronze badges

answered Jan 12, 2016 at 16:22

Michael

3111 silver badge9 bronze badges

5

Calling read() loads the entire video (hundreds of MB) into RAM. I need to stream it by inhaling a chunk at a time
– Mike Slinn
Commented Jan 12, 2016 at 20:58
1

@MikeSlinn StreamingBody.read(amt=chunk_size) lets you process chunk_size bytes
– Josh J
Commented Jan 19, 2016 at 20:40
1

I put together a short program that invokes StreamingBody.read(amt=chunk_size) from another thread. It reads 1/3 of the input file (an mp4 video) and gets stuck, possibly because the consumer of the data (ffmpeg), which runs on the original thread, does not actually run. Maybe its STDIN buffer fills and the whole mess grinds to a halt?
– Mike Slinn
Commented Jan 20, 2016 at 5:10
2

When the StreamingBody just contains a JSON document, such as a device shadow in IoT, how can we tell what it has been encoded with? I've seen a lot of assumptions that it'll be UTF-8, but I don't see this actually documented.
– Michael Scheper
Commented Jul 28, 2017 at 18:48

Add a comment |

Collectives™ on Stack Overflow

How to use botocore.response.StreamingBody as stdin PIPE

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
python
python-2.7
stdin
boto3
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged pythonpython-2.7stdinboto3 or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
python-2.7
stdin
boto3
or ask your own question.