-2

I have a web application that allows users to upload files. My application does processing on these files (parsing, modifying metadata, etc). What approach should I follow?

  • Should I save the files locally in the web server until they are processed and finally upload them to S3 and delete them from local webserver?
  • Or should I upload the files to S3 from the very beginning and read the file from S3 (for processing), copy it locally (for modifying metadata), overwriting it on S3, and deleting it locally again?
4
  • 3
    I don't think there's a 'should' here. It depends entirely on your project demands, none of which you've provided. Commented Jul 26, 2016 at 17:47
  • S3 uploads and downloads cost money. Sure, not much per byte, but it adds up if you do this a lot. There's also the issue of bandwidth: local storage is virtually always faster than remote storage, at least in terms of throughput.
    – user
    Commented Jul 26, 2016 at 19:22
  • In bigger usage situations you want the first because you want to separate the workers doing the processing to be separated from your web servers. If you use s3 also look into the lambda and queue systems aws provides. It helps you maybe to setup workers easily. On costs: there might be a different way aws handles data internally compared to in-out traffic of the private network. Commented Jul 27, 2016 at 8:30
  • 1
    It's hard to say without knowing what your exact use case is.
    – Andres F.
    Commented Oct 25, 2016 at 1:14

3 Answers 3

0

The first approach makes more sense. It doesn't involve as many network transfers. You might be able to process the files in-memory and upload to S3, saving the step of deleting temp files

0

If the requirement also needs it to be a fault tolerant system, then I would store the uploaded content in S3 and before confirming the file upload to the client.

0

Largely depends upon your Application logic. If you are not just using S3 and open to use other services by AWS, consider Lambda.

My Project has requirement of uploading video to S3, process it using Elastic encoder and save again on S3, then distribute via Cloud front.

and i don't have to use my server at all.

so it is possible to entirely remove dependency of you server for the particular functionality you mentioned.

that said,

Uploading first to S3 would only make sense if you ever need the row file, if you don't have any requirement of the file after processing, then hold it onto your server* until you process it, then save the result on S3 and delete the row file, this way you save the cost of S3 usage.

*i am assuming your server has relatively sufficient storage to handle multiple user requests until the process per file completes. if not, then it is totally different scenario. at least that was what my situation was.

Not the answer you're looking for? Browse other questions tagged or ask your own question.