2

I have a file with multiple paragraphs separated by blank line. Technically they are not paragraphs just sections of text separated by blank line.

I want to number the paragraphs so to speak by inserting a number in the first line of each line following a blank line. So if my file says:

This is text.
This is more text.
Even more text!

This is text in section two.
Some more text.
You get the point...

I want to make it say:

1This is text
this is more text
Even more text!

2This is text in section two.
Some more text.
You get the point...

2 Answers 2

1

Try this with bash builtin commands:

#!/bin/bash

l=1                          # paragraph counter
echo -n $l                   # print paragraph counter without new line
while read x; do             # read current line from file, see last line
  if [[ $x == "" ]]; then    # empty line?
    echo                     # print empty line
    read x                   # read next line from file, see last line
    ((l++))                  # increment paragraph counter
    echo -n $l               # print paragraph counter without new line
  fi
  echo "$x"                  # print current line
done < file
0
2

In general, using the shell for text-parsing is very slow and cumbersome. Here are some other options:

  1. Perl in "paragraph mode"

    perl -00pe 's/^/$./' file 
    

    Explanation

    The -00 turns on paragraph mode where "lines" are defined by consecutive \n\n, paragraphs in other words. The s/^/$./ will replace the start of the line (^) with the current "line" (paragraph) number $.. The -p tells perl to print each line of the input file after running the script given by -e on it.

  2. Awk

    awk -vRS='\n\n' -vORS='\n\n' '{print NR$0}' file
    

    Explanation

    -vRS='\n\n' sets awk's record separator to consecutive newline characters. Like perl's paragraph mode, this makes it treat paragraphs as "lines". We then tell it to print the current line number (NR) and the current "line" $0. The -vORS= sets the output record separator to consecutive newlines so that paragraphs are separated by blank lines in the output as well. Note that this will add 2 empty lines at the end of the output. To avoid that, you can use head:

    awk -v RS='\n\n' -vORS='\n\n' '{print NR$0}' file | head -n -2
    

By way of comparison, here are the times that the various solutions took on my system when run on a 10M test file:

$ time a.sh > /dev/null ## a.sh is Cyrus's solution

real    0m1.419s
user    0m1.308s
sys     0m0.104s

$ time perl -00pe 's/^/$./' file  > /dev/null 

real    0m0.087s
user    0m0.084s
sys     0m0.000s

$ time awk -v RS='\n\n' -vORS='\n\n' '{print NR$0}' file | head -n -2 >/dev/null

real    0m0.074s
user    0m0.056s
sys     0m0.020s

As you can see above, both the perl and awk solutions are an order of magnitude faster than the shell approach.

0

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .