0

I've come to learn that looping through lines in bash by

while read line; do stuff; done <file

Is not the most efficient way to do it. https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice

What is a more time/resource efficient method?

5
  • 4
    awk (this is a filler) Commented May 4, 2017 at 17:10
  • 1
    Depends on whether stuff is calling other shell tools, or just processing the text.
    – Tom Fenech
    Commented May 4, 2017 at 17:16
  • 2
    Shell if for launching your OS provided commands and piping their inputs and outputs wherever they are needed at. For reading data from files and processing it use tools that are made for that particular job. Don't confuse shell as text processor or programming language. Commented May 4, 2017 at 17:21
  • 1
    Instead of linking one question (where the answers could be disputable and dependant on the usage scenario) you should post your problem, where you need speed-up the loop. Premature optimization is the root of all evil -- DonaldKnuth
    – clt60
    Commented May 4, 2017 at 20:07
  • 1
    This is the most efficient way to read a file in shell. The question is really, should you be iterating over the file in shell at all? General rule of thumb: if you are doing anything more complex with the data you are reading than passing it as arguments to another program, you're probably using the shell when you shouldn't be.
    – chepner
    Commented May 4, 2017 at 20:29

4 Answers 4

2

Here's a time'd example using Bash and awk. I have 1 million records in a file:

$ wc -l 1M
1000000 1M

Counting it's records with bash, using while read:

$ time while read -r line ; do ((i++)) ; done < 1M ; echo $i

real    0m12.440s
user    0m11.548s
sys     0m0.884s
1000000

Using let "i++" took 15.627 secs (real) and NOPing with do : ; 10.466. Using awk:

$ time awk '{i++}END{print i}' 1M
1000000

real    0m0.128s
user    0m0.128s
sys     0m0.000s
0

As others have said, it depends on what you're doing.

The reason it's inefficient is that everything runs in its own process. Depending on what you are doing, that may or may not be a big deal.

If what you want to do in the loop is run another shell process, you won't get any gain from eliminating the loop. If you can do what you need without the need for a loop, you could get a gain.

0

awk? Perl? C(++)? Of course it depends on if you're interested in CPU time or programmer time, and the latter depends on what the programmer is used to using.

The top answer to the question you linked to pretty much explains that the biggest problem is spawning external processes for simple text processing tasks. E.g. running an instance of awk or a pipeline of sed and cut for each single line just to get a part of the string is silly.

If you want to stay in shell, use the string processing parameter expansions (${var#word}, ${var:n:m}, ${var/search/replace} etc.) and other shell features as much as you can. If you see yourself running a set of commands for each input line, it's time to think the structure of the script again. Most of the text processing commands can process a whole file with one execution, so use that.

A trivial/silly example:

while read -r line; do
    x=$(echo "$line" | awk '{print $2}')
    somecmd "$x"
done < file

would be better as

awk < file '{print $2}' | while read -r x ; do somecmd "$x" ; done
2
  • Using awk at all here is wrong; read -r _ x _ will split the line into the necessary fields.
    – chepner
    Commented May 4, 2017 at 20:28
  • @chepner, yes, I did say something about using the shell's functionality, and I did say the example was trivial. (added "silly" now, that was the other word I was thinking of.)
    – ilkkachu
    Commented May 4, 2017 at 20:35
0

Choose between awk or perl both are efficient

awk or perl

Not the answer you're looking for? Browse other questions tagged or ask your own question.