1

My file file.txt looks like this:

[NamesA]
Andreas
Alex

[NamesB]
Bernd
Bruno

[NamesC]
Casper

[NamesD]
Doris

I would like accomplish the following 3 different outputs, using grep or awk to use it in a bash script:

  1. Output

    [NamesB]   
    Bernd   
    Bruno
    
  2. Output

    [NamesB]
    Bernd
    Bruno
    
    [NamesC] 
    Casper
    
  3. Output

    [NamesD]
    Doris
    

I tried:

grep  -oP '\[NamesB\].*?' file.txt

but get only [NamesB] and not the following textblock. I managed to get text which is direct behind, but not in a new line.

And that's it. If I would get at least all following lines starting with [NamesB], but even this did not work.

  • So I can imagine the output for 1. might be the simplest by printing all starting with [NamesB] and ending at the next [.
  • I also can imagine how this might work for 2. Similar to output 1. but then running grep 2 times. One time with [NamesB] and then with [NamesC]

But how would this then work for 3. as there is no [ next. And it could be possible there is an unknown next block starting with [.

The command should then start printing text starting with [NamesB] or whatever and then stop either with the next opening bracket [ or by the end of the file.

PS: I posted a smilar question already and found a solution, but it was all one line of text. In this question i have a different situation of a text block and not a single line.

2
  • 2
    It's totally not clear why NamesBis output twice, and the 2. Output contains two Names groups. Anyways, nope, that's not something that line-based grep can do for you. You need to learn a scripting language and use that. Maybe you even already know one? Python? Go? awk? Perl? JavaScript? Commented Jan 31 at 13:08
  • 1
    Please edit your question and make sure you show us accurate files. Is the first line of your file actually file.txt? If not, please remove it. Do you really want the strings 1. Output, 2. Output etc in your output? And why is output repeated? Please show us the exact input and output.
    – terdon
    Commented Jan 31 at 13:35

3 Answers 3

1

Your requirements for selecting which record(s) to print aren't clear but maybe this is what you're trying to do, using any awk:

Output 1, option 1 (read 1 line at a time):

$ awk '/^\[/{f=(/^\[NamesB]/)} f' file.txt
[NamesB]
Bernd
Bruno

Output 1, option 2 (read 1 multi-line record at a time)

$ awk -v RS= -v ORS='\n\n' -F'\n' '$1 == "[NamesB]"' file.txt
[NamesB]
Bernd
Bruno

Output 2, option 1 (print the NamesB record and whichever record comes after it):

$ awk -v RS= -v ORS='\n\n' -F'\n' '$1 == "[NamesB]"{c=2} c&&c--' file.txt
[NamesB]
Bernd
Bruno

[NamesC]
Casper

Output 2, option 2 (print the NamesB and NamesC records, wherever they are in the input):

$ awk -v RS= -v ORS='\n\n' -F'\n' '$1 ~ /^\[Names[BC]]$/' file.txt
[NamesB]
Bernd
Bruno

[NamesC]
Casper

Output 3, option 1 (print the NamesD record):

$ awk -v RS= -v ORS='\n\n' -F'\n' '$1 == "[NamesD]"' file.txt
[NamesD]
Doris

Output 3, option 2 (print the 4th record from the input, whatever Name it has):

$ awk -v RS= -v ORS='\n\n' -F'\n' 'NR == 4' file.txt
[NamesD]
Doris

Also, regarding:

If I would get at least all following lines starting with [NamesB]

the following would do that:

$ awk -v RS= -v ORS='\n\n' -F'\n' '$1 == "[NamesB]"{f=1} f' file.txt
[NamesB]
Bernd
Bruno

[NamesC]
Casper

[NamesD]
Doris

There are, of course, many other scripts that could be written to produce that output based on various criteria, the right one will depend on whatever your requirements are for selecting the block(s) to output.

1

Using Raku (formerly known as Perl_6)

~$ raku -e 'for slurp.split("\n\n") { .put if / \[ NamesA \]  /};'   file

#OR

~$ raku -e '.put if / \[ NamesA \]  / for slurp.split("\n\n");'   file

Above are answers written in Raku, a member of the Perl-family of programming languages. Briefly, the file is slurped into memory all at once, and split on \n\n two consecutive newlines. Resultant elements (records) are iterated through using for: if a match is found to the desired regex, the element (record) is output.

Sample Input:

[NamesA]
Andreas
Alex

[NamesB]
Bernd
Bruno

[NamesC]
Casper

[NamesD]
Doris

Sample Output:

[NamesA]
Andreas
Alex

Note, you can return multiple records by using an | OR symbol within the regex matcher. To properly separate the return, the $_.put or .put portion can be re-written as put "$_\n" to pad each record with a trailing newline:

~$ raku -e 'put "$_\n" if / \[ NamesA | NamesB \]  / for slurp.split("\n\n");'   file
[NamesA]
Andreas
Alex

[NamesB]
Bernd
Bruno

Note: the regex matcher can be to any line within the record. To match specifically to the first line use /^ \[ NamesA \] $$ /, where ^ denotes beginning-of-string and $$ denotes end-of-line.

https://docs.raku.org
https://raku.org

0

You can extract any of the blocks (e.g. NamesA) with this command:

$ awk '/^\[NamesA/{p=1; print; next} /^\[/{p=0}; p>0{print}' input_file
[NamesA]
Andreas
Alex

The [ as first character of block headers needs to be escaped as shown in the code.

Using this one-liner you can print any combination of outputs to suit your needs.

3
  • Or more simply with a range pattern: awk '/^\[NamesA/,/^$/' file (omitted action = default to print $0) (may need /^[:blank:]*$/ or similar if visibly-empty lines are not really empty). sed can do the same: sed -n '/^\[NamesA/,/^$/p' file Commented Feb 7 at 5:28
  • You assume that blank lines are delimiters. If a line looks blank but actually has invisible white spaces then your version fails. Therefore it is safer to assume that the header lines are block delimiters and any 'blank' lines belong to the preceding block.
    – user167612
    Commented Feb 7 at 8:51
  • That's exactly why I said /^[:blank:]*$/. But both your any my versions fail if the [NamesA] line contains LRM ZWJ ZWNJ or similar (including BOM mostly on Windows) -- or if the apparent [NamesA] is really RLM ]AsemaN[. Commented Feb 7 at 10:57

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .