I'm trying to download a very large gzipped csv file, hosted on a S3 server. I want to insert each csv row to my database. To achieve that I get the encoded stream, unzip it, parse it and save it to my mysql database.
The first +/- 2000 rows are inserted to database correctly, but after some time the process stops with a ECONNRESET onStreamRead
error. I spent the last 2 days figuring out why. I'm very sure my internet connection is stable and fast and I'm using the latest version of Node + external packages.
Goal
Process a remote gzipped csv on the fly (unzipped 1 GB big)
Don't download/store the whole file first before processing, but process the streams directly
Insert each row to database
When all rows are inserted to database, resolve the Promise
Error
Error: read ECONNRESET at TLSWrap.onStreamRead (node:internal/stream_base_commons:218:20) { errno: -4077, code: 'ECONNRESET', syscall: 'read' }
Code
import https from "node:https";
import zlib from "node:zlib";
import mysql from "mysql";
import csv from 'fast-csv';
async processData(url) {
return new Promise(async (resolve, reject) => {
let gunzip = zlib.createGunzip();
let csvStream = csv.parse({
delimiter: '\t',
headers: true,
discardUnmappedColumns: true,
ignoreEmpty: true,
trim: true
}).transform(async (row, next) => {
await processRow(row);
next();
}).on('error', (e) => {
reject(e);
});
https.get(url, async (encodedStream) => {
encodedStream.pipe(gunzip).pipe(csvStream);
}).on('error', (e) => {
reject(e)
}).on('finish', () => {
resolve('done');
});
});
}
processRow(row) {
let query = 'INSERT INTO `data` (external_id, external_value) VALUES(`${row.id}`, `${row.value}`)';
this.con.query(query, [true], async (error, result, fields) => {
if (error) {
console.log(error, result, fields);
}
});
}
processData("https://some-gzipped-csv.gz");