2

I have a Django application of 2 GB running and I need to receive a CSV file of more than 1 GB, read it and load the data to a PostgreSQL DB in IBM Cloud. The problem is that if I receive the file, it would have to be stored locally and I will definitely have to increase the memory of the server or handle it in a different way.

One idea will be stored it in a S3 bucket and then read it by pieces, but I don't know how to achieve that using Python because the record's size is not fixed. I can't load the data using the aws_s3 PostgreSQL extension because it does not exists in IBM Cloud Postgres service or anything similar. If I am right, I can't install an extension either.

Another way would be use an ETL solution for this kind of jobs, but I don't know any in particular that fits my requirements.

Right now I just created a different instance with greater memory, turn it on when I need to load the data and turn it down when it is finished.

1
  • this all depends on whether you are uploading/sending from a javascript client and that you control it, but you can read the csv from javascript and send over rows in pieces by reading the file from the client and sending HTTP requests - you arent necessarily bound in that case to send the file
    – alilland
    Commented Jul 31, 2021 at 4:13

2 Answers 2

3

Another option: you read the data as you are receiving it. It may be less straightforward than simply reading the submitted follow at once, but it's is a usual approach when dealing with large files.

Note that:

  • It may be faster to process the file by chunks of N lines, rather than line by line. Test it to see if this is the case.
  • You might be unable to use the Python's default parser for CSV, and would need to draft your own.
1
  • Wow the 'less straightforward' link is a real eye opener. Can't imagine being unsure whether streaming is real or having a server force a full read of a large incoming entity before my code sees the first byte.
    – joshp
    Commented Jul 31, 2021 at 5:24
0

May be you can use one of the below.

  1. Write a stored procedure which will do the import from a url if the file has a separate url. Example - https://stackoverflow.com/questions/41696675/how-to-copy-a-csv-file-from-a-url-to-postgresql

  2. If the data is uploaded using a file, obviously read it in chunks and upload it.

Not the answer you're looking for? Browse other questions tagged or ask your own question.