downloading a file from Internet into S3 bucket

Question

I would like to grab a file straight of the Internet and stick it into an S3 bucket to then copy it over to a PIG cluster. Due to the size of the file and my not so good internet connection downloading the file first onto my PC and then uploading it to Amazon might not be an option.

Is there any way I could go about grabbing a file of the internet and sticking it directly into S3?

Answers below are great, but also see here for a bit more perspective: stackoverflow.com/questions/28458590/… — Kevin Glynn, Commented Mar 14, 2018 at 5:23

Soph · Accepted Answer · 2017-10-29 16:45:08Z

51

Download the data via curl and pipe the contents straight to S3. The data is streamed directly to S3 and not stored locally, avoiding any memory issues.

curl "https://download-link-address/" | aws s3 cp - s3://aws-bucket/data-file

As suggested above, if download speed is too slow on your local computer, launch an EC2 instance, ssh in and execute the above command there.

answered Oct 29, 2017 at 16:45

Soph

9331 gold badge10 silver badges17 bronze badges

1

If the file is textual, use: curl -s "url" |cat| aws s3 cp - "s3://..."
– Uri Goren
Commented Dec 6, 2018 at 10:21
1

Add --expected-size <size_in_bytes> to the end if your file is bigger than 50GB. From docs: "Failure to include this argument under these conditions may result in a failed upload due to too many parts in upload."
– Chrisjan
Commented Feb 21, 2022 at 6:59
How can we calculate the estimate cost in $ of launching the EC2 instance and sending + keeping the file on S3?
– The Dan
Commented Feb 1, 2023 at 12:53

Add a comment |

mpavey · Accepted Answer · 2015-04-03 23:02:04Z

For anyone (like me) less experienced, here is a more detailed description of the process via EC2:

Launch an Amazon EC2 instance in the same region as the target S3 bucket. Smallest available (default Amazon Linux) instance should be fine, but be sure to give it enough storage space to save your file(s). If you need transfer speeds above ~20MB/s, consider selecting an instance with larger pipes.
Launch an SSH connection to the new EC2 instance, then download the file(s), for instance using wget. (For example, to download an entire directory via FTP, you might use wget -r ftp://name:[email protected]/somedir/.)
Using AWS CLI (see Amazon's documentation), upload the file(s) to your S3 bucket. For example, aws s3 cp myfolder s3://mybucket/myfolder --recursive (for an entire directory). (Before this command will work you need to add your S3 security credentials to a config file, as described in the Amazon documentation.)
Terminate/destroy your EC2 instance.

Do you know how can we calculate the estimate cost in $ of launching the EC2 instance and sending + keeping the file on S3? — The Dan, Commented Feb 1, 2023 at 12:54
@TheDan The AWS calculator is one good starting point. I think the various components to consider are: EC2 (hourly cost); EBS if needed (hourly); data transfer charges (download/upload, charged by GB); S3 upload/retrieval (by GB); S3 storage (per GB per month). Some of the other solutions (streaming; Lambda) may lower your costs. — mpavey, Commented Feb 7, 2023 at 16:30

iGili · Accepted Answer · 2017-03-08 10:04:32Z

18

[2017 edit] I gave the original answer back at 2013. Today I'd recommend using AWS Lambda to download a file and put it on S3. It's the desired effect - to place an object on S3 with no server involved.

[Original answer] It is not possible to do it directly.

Why not do this with EC2 instance instead of your local PC? Upload speed from EC2 to S3 in the same region is very good.

regarding stream reading/writing from/to s3 I use python's smart_open

edited Mar 8, 2017 at 10:04

answered Oct 8, 2013 at 13:43

iGili

8537 silver badges18 bronze badges

I think this is what I will have to do. I looked into the documentation and will probably go with python and boto. Just need to figure out the whole s3 key idea and how files are referenced...
– dreamwalker
Commented Oct 8, 2013 at 15:31
1

This is exactly what I did. Turned out uploading the file with boto and python was extremely easy. Thanks!
– dreamwalker
Commented Oct 10, 2013 at 7:02
Can you explain a little or give a short code example how to "stream" without realy "downloading" it. Is it something like writeFileOutputBufferToS3()?
– endertunc
Commented Dec 22, 2015 at 11:56
No, I think the last sentence is wrong. The answer is that it (downloading direct to S3) is not supported. The EC2 suggestion is good in this case, but you must download and then upload the file (though you don't necessarily have to create a local file).
– Tom
Commented Jun 13, 2016 at 16:27
I want to do this, but I need to download a pip package to get the files I need, how can I do that using AWS lambda?
– Acuervov
Commented Apr 25, 2023 at 21:21

Add a comment |

vinod_vh · Accepted Answer · 2020-06-09 15:42:13Z

7

You can stream the file from internet to AWS S3 using Python.

s3=boto3.resource('s3')
http=urllib3.PoolManager()

urllib.request.urlopen('<Internet_URL>')   #Provide URL
s3.meta.client.upload_fileobj(http.request('GET', 'Internet_URL>', preload_content=False), s3Bucket, key, 
    ExtraArgs={'ServerSideEncryption':'aws:kms','SSEKMSKeyId':'<alias_name>'})

answered Jun 9, 2020 at 15:42

vinod_vh

1,05111 silver badges16 bronze badges

Won't this still download the packets to the local machine and then upload them? OP mentioned his internet connection is not good/fast.
– Rajavanya Subramaniyan
Commented Mar 18, 2022 at 6:23
Download the packets to the local machine and then upload to s3 bucket is not a good option. Using above code the data will be stream to S3 bucket directly from internet
– vinod_vh
Commented Mar 24, 2022 at 9:45
@vinod_vh, just want to make sure about the point "Using above code the data will be stream to S3 bucket directly from internet", are you sure that this code will not download packets ? how it will stream directly to S3 bucket ... I couldn't find anything similar mentioned here - boto3.amazonaws.com/v1/documentation/api/latest/guide/…
– John Prawyn
Commented Jun 28 at 12:04

Add a comment |

Collectives™ on Stack Overflow

downloading a file from Internet into S3 bucket

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
amazon-web-services
amazon-s3
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged amazon-web-servicesamazon-s3 or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
amazon-web-services
amazon-s3
or ask your own question.