More efficient way to loop through lines in shell

Question

I've come to learn that looping through lines in bash by

while read line; do stuff; done <file

Is not the most efficient way to do it. https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice

What is a more time/resource efficient method?

Depends on whether stuff is calling other shell tools, or just processing the text. — Tom Fenech, Commented May 4, 2017 at 17:16
Shell if for launching your OS provided commands and piping their inputs and outputs wherever they are needed at. For reading data from files and processing it use tools that are made for that particular job. Don't confuse shell as text processor or programming language. — James Brown, Commented May 4, 2017 at 17:21
Instead of linking one question (where the answers could be disputable and dependant on the usage scenario) you should post your problem, where you need speed-up the loop. Premature optimization is the root of all evil -- DonaldKnuth — clt60, Commented May 4, 2017 at 20:07
This is the most efficient way to read a file in shell. The question is really, should you be iterating over the file in shell at all? General rule of thumb: if you are doing anything more complex with the data you are reading than passing it as arguments to another program, you're probably using the shell when you shouldn't be. — chepner, Commented May 4, 2017 at 20:29

James Brown · Accepted Answer · 2017-05-05 04:49:56Z

2

Here's a time'd example using Bash and awk. I have 1 million records in a file:

$ wc -l 1M
1000000 1M

Counting it's records with bash, using while read:

$ time while read -r line ; do ((i++)) ; done < 1M ; echo $i

real    0m12.440s
user    0m11.548s
sys     0m0.884s
1000000

Using let "i++" took 15.627 secs (real) and NOPing with do : ; 10.466. Using awk:

$ time awk '{i++}END{print i}' 1M
1000000

real    0m0.128s
user    0m0.128s
sys     0m0.000s

answered May 5, 2017 at 4:49

James Brown

37.1k8 gold badges49 silver badges62 bronze badges

Add a comment |

Kyle Banerjee · Accepted Answer · 2017-05-04 19:39:15Z

0

As others have said, it depends on what you're doing.

The reason it's inefficient is that everything runs in its own process. Depending on what you are doing, that may or may not be a big deal.

If what you want to do in the loop is run another shell process, you won't get any gain from eliminating the loop. If you can do what you need without the need for a loop, you could get a gain.

answered May 4, 2017 at 19:39

Kyle Banerjee

2,7044 gold badges23 silver badges30 bronze badges

Add a comment |

ilkkachu · Accepted Answer · 2017-05-04 20:31:06Z

awk? Perl? C(++)? Of course it depends on if you're interested in CPU time or programmer time, and the latter depends on what the programmer is used to using.

The top answer to the question you linked to pretty much explains that the biggest problem is spawning external processes for simple text processing tasks. E.g. running an instance of awk or a pipeline of sed and cut for each single line just to get a part of the string is silly.

If you want to stay in shell, use the string processing parameter expansions (${var#word}, ${var:n:m}, ${var/search/replace} etc.) and other shell features as much as you can. If you see yourself running a set of commands for each input line, it's time to think the structure of the script again. Most of the text processing commands can process a whole file with one execution, so use that.

A trivial/silly example:

while read -r line; do
    x=$(echo "$line" | awk '{print $2}')
    somecmd "$x"
done < file

would be better as

awk < file '{print $2}' | while read -r x ; do somecmd "$x" ; done

Using awk at all here is wrong; read -r _ x _ will split the line into the necessary fields. — chepner, Commented May 4, 2017 at 20:28
@chepner, yes, I did say something about using the shell's functionality, and I did say the example was trivial. (added "silly" now, that was the other word I was thinking of.) — ilkkachu, Commented May 4, 2017 at 20:35

Shakiba Moshiri · Accepted Answer · 2017-05-05 09:19:54Z

0

Choose between awk or perl both are efficient

answered May 5, 2017 at 9:19

Shakiba Moshiri

22.9k3 gold badges35 silver badges48 bronze badges

Add a comment |

Collectives™ on Stack Overflow

More efficient way to loop through lines in shell

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
bash
performance
shell
loops
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged bashperformanceshellloops or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
bash
performance
shell
loops
or ask your own question.