Using GNU awk for multi-char RS
, RT
, and use of NUL (\0
) to split the file into NUL-separated multi-line records:
while IFS= read -r -d '' rec; do
printf '=====\n%s\n=====\n' "$rec"
done < <(
awk -v rs='Total:' -v ORS='\0' '
BEGIN { RS = "(^|\n)((" rs "\n)|$)" }
NR>1 { print rs "\n" $0 }
' file
)
Using any awk and use of Form-Feed (\f
) (or any other character you know can't be in the input) to split the file into FF-separated multi-line records:
sep=$'\f' # or whatever non-NUL character you prefer
while IFS= read -r -d "$sep" rec; do
printf '=====\n%s\n=====\n' "$rec"
done < <(
awk -v rs='Total:' -v ORS="$sep" '
$0 == rs { if (NR>1) print rec; rec=$0; next }
{ rec = rec RS $0 }
END { if (NR>1) print rec }
' file
)
Both will output:
=====
Total:
text1
text2
=====
=====
Total:
text3
=====
=====
Total:
Text1
Text4
Text5
=====
Replace the printf
with whatever command you want to run on each multi-line record.
Explanations:
You could do this using GNU awk for multi-char RS
, RT
, and use of NUL (\0
) to split the file into NUL-separated records and then a bash read loop to process them one at a time however you like:
while IFS= read -r -d '' rec; do
printf '=====\n%s\n=====\n' "$rec"
done < <(
awk -v rs='Total:' -v ORS='\0' '
BEGIN { RS = "(^|\n)((" rs "\n)|$)" }
NR>1 { print rs "\n" $0 }
' file
)
The above uses awk to do what it is designed to do, i.e. manipulate text, and the shell to do one of the things it is designed to do, i.e. sequence calls to tools. You COULD do it all inside the call to awk using system()
to call other tools on each block of text but then you're using awk to do what a shell is designed to do, i.e. sequence calls to tools, and so the resulting code would be even harder to write robustly and slower (due to spawning a subshell per block of input) than calling those tools directly from shell as I'm doing above.
The awk script is looking for records separated by Total:
on a line of it's own so we need to set RS
to include the \n
before and after Total:
otherwise it'd match anywhere on a line, and we need to include the ^
as a possibility before Total:
so it also matches at the start of the input. At the end of the file the last record ends with just a \n
on it's own so we need to add that possibility (\n$
) to the RS
too. Remember - despite what is often said, $
does not mean end of line in a regexp, it means end of string/buffer so in an RS
the $
will only match at the end of the input file just like ^
only matches at the start of the input file, not at the start of each line.
If you're not sure what any of that means, just add some tracing print
statements to dump RT
and $0
values for each record, e.g.:
$ awk -v rs='Total:' -v ORS='\0' '
BEGIN { RS = "(^|\n)((" rs "\n)|$)" }
NR>1 {
printf "NR=<%d>, $0=<%s>, RT=<%s>\n-----\n", NR, $0, RT
#print rs "\n" $0
}
' file
NR=<2>, $0=<text1
text2>, RT=<
Total:
>
-----
NR=<3>, $0=<text3>, RT=<
Total:
>
-----
NR=<4>, $0=<Text1
Text4
Text5>, RT=<
>
-----
The record numbers start at 2 because the first record is the empty string before the first line of the file as that first line contains the record separator, Total:\n
so by definition there must be some record that ends with that string, even if it's empty.
If your awk doesn't support multi-char RS and/or printing NUL chars then with any awk you could construct the record 1 line at a time and choose some other character that you know (hope!) can't appear in your input, e.g. some control-char like \r
for Carriage Return or \f
for Form Feed, for the ORS
and then change your bash read loop to use that as the delimiter (the -d ...
argument), e.g.:
sep=$'\f' # or whatever character you prefer
while IFS= read -r -d "$sep" rec; do
printf '=====\n%s\n=====\n' "$rec"
done < <(
awk -v rs='Total:' -v ORS="$sep" '
$0 == rs { if (NR>1) print rec; rec=$0; next }
{ rec = rec RS $0 }
END { if (NR>1) print rec }
' file
)
The check for NR>1
in the END
section is so we don't print a blank line given an empty input file but instead just don't output anything for that case.
for $(cat file)
, also known as bash pitfall number one.perl
an "external" tool? You will have to look quite hard to find a system that has GNU tools but does not have perl installed.