Reading a large CSV file and then loading data to a DB

Question

I have a Django application of 2 GB running and I need to receive a CSV file of more than 1 GB, read it and load the data to a PostgreSQL DB in IBM Cloud. The problem is that if I receive the file, it would have to be stored locally and I will definitely have to increase the memory of the server or handle it in a different way.

One idea will be stored it in a S3 bucket and then read it by pieces, but I don't know how to achieve that using Python because the record's size is not fixed. I can't load the data using the aws_s3 PostgreSQL extension because it does not exists in IBM Cloud Postgres service or anything similar. If I am right, I can't install an extension either.

Another way would be use an ETL solution for this kind of jobs, but I don't know any in particular that fits my requirements.

Right now I just created a different instance with greater memory, turn it on when I need to load the data and turn it down when it is finished.

this all depends on whether you are uploading/sending from a javascript client and that you control it, but you can read the csv from javascript and send over rows in pieces by reading the file from the client and sending HTTP requests - you arent necessarily bound in that case to send the file — alilland, Commented Jul 31, 2021 at 4:13

Arseni Mourzenko · Accepted Answer · 2021-07-30 23:26:24Z

3

Another option: you read the data as you are receiving it. It may be less straightforward than simply reading the submitted follow at once, but it's is a usual approach when dealing with large files.

Note that:

It may be faster to process the file by chunks of N lines, rather than line by line. Test it to see if this is the case.
You might be unable to use the Python's default parser for CSV, and would need to draft your own.

answered Jul 30, 2021 at 23:26

Arseni Mourzenko

135k31 gold badges349 silver badges521 bronze badges

Wow the 'less straightforward' link is a real eye opener. Can't imagine being unsure whether streaming is real or having a server force a full read of a large incoming entity before my code sees the first byte.
– joshp
Commented Jul 31, 2021 at 5:24

Add a comment |

T.kowshik Yedida · Accepted Answer · 2021-07-31 04:35:31Z

0

May be you can use one of the below.

Write a stored procedure which will do the import from a url if the file has a separate url. Example - https://stackoverflow.com/questions/41696675/how-to-copy-a-csv-file-from-a-url-to-postgresql
If the data is uploaded using a file, obviously read it in chunks and upload it.

answered Jul 31, 2021 at 4:35

T.kowshik Yedida

1001 silver badge9 bronze badges

Add a comment |

Stack Exchange Network

Reading a large CSV file and then loading data to a DB

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
database
django
postgres
etl
ibm
or ask your own question.

Hot Network Questions

Reading a large CSV file and then loading data to a DB

2 Answers 2

Not the answer you're looking for? Browse other questions tagged databasedjangopostgresetlibm or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
database
django
postgres
etl
ibm
or ask your own question.