I want to split large, compressed CSV files into multiple smaller gzip file, split on line boundary.
I'm trying to pipe gunzip to a bash script with a while read LINE. That script writes to a named pipe where a background gzip process is recompressing it. Every X characters read I close the FD and restart a new gzip process for the next split.
But in this scenario the script, with while read LINE, is consuming 90% of the cpu because read is so inefficient here (I understand that it makes a system call to read 1 char at a time).
Any thoughts on doing this efficiently? I would expect gzip to consume the majority cpu.