In general, using the shell for text-parsing is very slow and cumbersome. Here are some other options:
Perl in "paragraph mode"
perl -00pe 's/^/$./' file
Explanation
The -00
turns on paragraph mode where "lines" are defined by consecutive \n\n
, paragraphs in other words. The s/^/$./
will replace the start of the line (^
) with the current "line" (paragraph) number $.
. The -p
tells perl to print each line of the input file after running the script given by -e
on it.
Awk
awk -vRS='\n\n' -vORS='\n\n' '{print NR$0}' file
Explanation
-vRS='\n\n'
sets awk's record separator to consecutive newline characters. Like perl's paragraph mode, this makes it treat paragraphs as "lines". We then tell it to print the current line number (NR
) and the current "line" $0
. The -vORS=
sets the output record separator to consecutive newlines so that paragraphs are separated by blank lines in the output as well. Note that this will add 2 empty lines at the end of the output. To avoid that, you can use head
:
awk -v RS='\n\n' -vORS='\n\n' '{print NR$0}' file | head -n -2
By way of comparison, here are the times that the various solutions took on my system when run on a 10M test file:
$ time a.sh > /dev/null ## a.sh is Cyrus's solution
real 0m1.419s
user 0m1.308s
sys 0m0.104s
$ time perl -00pe 's/^/$./' file > /dev/null
real 0m0.087s
user 0m0.084s
sys 0m0.000s
$ time awk -v RS='\n\n' -vORS='\n\n' '{print NR$0}' file | head -n -2 >/dev/null
real 0m0.074s
user 0m0.056s
sys 0m0.020s
As you can see above, both the perl and awk solutions are an order of magnitude faster than the shell approach.