Find all lines between two patterns, EXCLUSIVE of the second pattern?

Question

Consider that I have the file listed below. I need to select all lines from every instance of the regex pattern Word A to before the regex pattern Word D.

Word A
Word B
Word C
Word D
Word E
Word F
Word G
Word A
Word H
Word I
Word D
Word J
Word A
Word K
Word D
Word L
Word M
Word A
Word D

Note the variable number of rows between A and D. Sometimes, D is the very next row. Here's what I need the output to be:

Word A
Word B
Word C
Word A
Word H
Word I
Word A
Word K
Word A

Can be done with awk, perl, python, or sed. Doesn't matter as long as it's installed on the RHEL6 server where the file is.

What if the file contains 2 Word Ds after Word A - stop at the first or the last one? What if there's Word A but no Word D - print to end of file or not? Word D without Word A - print from start of file? OtherWord A - should that match Word A? If Word D appears mid-line should the part of the line before it be printed? What if both exist on the same line? etc., etc..... — Ed Morton, Commented Nov 22, 2023 at 13:00
See how-do-i-find-the-text-that-matches-a-pattern for considerations when asking a pattern matching question on how to make your requirements clear. — Ed Morton, Commented Nov 22, 2023 at 13:03
@EdMorton "What if the file contains 2 Word Ds after Word A"? Stop at the first one. — RonJohn, Commented Nov 22, 2023 at 15:26
Please edit your question to state your rainy-day requirements like that and include the rainy-day cases in your sample input/output so we have something to test a potential solution with. Regarding "That's a very low probability occurrence" - not handling those is where most software bugs show up. — Ed Morton, Commented Nov 22, 2023 at 15:28
The thing is, with any pattern matching problem it's always FAR easier to match what you want than it is to not match similar text you don't want so it's important to think through and state what those rainy-day "similar text I don't want" cases are and how they should be handled, and include them in your sample input/output. — Ed Morton, Commented Nov 22, 2023 at 19:35

Stephen Kitt · Accepted Answer · 2023-11-22 05:33:11Z

4

Using AWK:

awk '/Word A/ { m = 1 } /Word D/ { m = 0 } m'

answered Nov 22, 2023 at 5:33

Stephen Kitt

446k58 gold badges1.2k silver badges1.2k bronze badges

Add a comment |

bxm · Accepted Answer · 2023-11-23 07:40:10Z

2

Here's an awk solution

awk \
  -vstart='Word A' \
  -vend='Word D' \
  '{
     if ($0==end  ) {flag=0;next};
     if ($0==start) {flag=1};
     if (flag==1) {print $0};
  }'

Only a minor change required for regex handling

awk \
  -vstart='Word[ ]A' \
  -vend='Word[ ]D' \
  '{
     if ($0 ~ end  ) {flag=0;next};
     if ($0 ~ start) {flag=1};
     if (flag==1) {print $0};
  }'

edited Nov 23, 2023 at 7:40

answered Nov 21, 2023 at 23:36

bxm

4,9651 gold badge21 silver badges24 bronze badges

Close, but does not work for when Word D comes right after Word A. Also, I apparently wasn't explicit enough when I wrote that it must work on patterns; they'll be simple regex patterns.
– RonJohn
Commented Nov 22, 2023 at 2:37
Worked in my tests where A and D and on consecutive lines eg tio.run/##S0oszvj/…
– bxm
Commented Nov 23, 2023 at 7:50
If you have a requirement where matches can occur on the same line in the input, this is not clear from the question as posed, so please amend accordingly.
– bxm
Commented Nov 23, 2023 at 12:02
I worked around the situation where Word D is on the same line as Word A (which happens every time) with a sed substitution that adds "marker text" to the beginning of every Word A line. Combined with @nezabudka's answer, my problem is solved.
– RonJohn
Commented Nov 24, 2023 at 2:05

Add a comment |

jubilatious1 · Accepted Answer · 2023-12-26 12:18:18Z

Using Raku (formerly known as Perl_6)

~$ raku -ne '.put if / Word \h A / fff^ / Word \h D /;'  file

Raku is a programming language in the Perl-family. It's an "operator-rich" language that features a powerful Regex engine. Above, the -ne non-autoprinting linewise flags are used, in conjunction with Raku's sed-like fff "Flip-flop" operator.

Raku includes various 'flavors' of its sed-like fff infix operator, including fff^, ^fff and even ^fff^. While each Regex is recognized, the ^ caret indicates that recognized line should be dropped from the output:

Sample Input:

Word A
Word B
Word C
Word D
Word E
Word F
Word G
Word A
Word H
Word I
Word D
Word J
Word A
Word K
Word D
Word L
Word M
Word A
Word D

Sample Output:

Word A
Word B
Word C
Word A
Word H
Word I
Word A
Word K
Word A

The above code solves the OP's test case. But what if the /start/ and /stop/ Regexes are actually on the same line? For that problem you could try Raku's awk-like ff operator:

~$ echo 'AB\nCD\nEF' | raku -ne 'say $_ if /A/ ff /B/;'
AB
~$ echo 'AB\nCD\nEF' | raku -ne 'say $_ if /A/ ff /C/;'
AB
CD

As compared to Raku's sed-like fff operator:

~$ echo 'AB\nCD\nEF' | raku -ne 'say $_ if /A/ fff /B/;'
AB
CD
EF
~$ echo 'AB\nCD\nEF' | raku -ne 'say $_ if /A/ fff /C/;'
AB
CD

https://docs.raku.org/routine/fff
https://docs.raku.org/routine/ff
https://raku.org

nezabudka · Accepted Answer · 2023-11-22 05:34:38Z

1

GNU sed only:

sed '/Word A/!d;:1;n;/Word D/d;b1' file

In more complex cases - invalid blocks:

sed -n '/Word A/!b;:1;/Word A/h;n;/Word D/{g;p;d};H;b1' file

answered Nov 22, 2023 at 5:34

nezabudka

2,4186 silver badges15 bronze badges

Add a comment |

Kaz · Accepted Answer · 2023-11-23 04:08:51Z

1

TXR Lisp's awk macro supports this directly; the rng (range) operator has nine variants for various ways of excluding records from the start or end of a range:

$ txr -e '(awk ((rng- #/Word A/ #/Word D/)))' data
Word A
Word B
Word C
Word A
Word H
Word I
Word A
Word K
Word A

Also, unlike Awk's range operator, it combines with other operators. E.g. suppose you wanted to print records which are simultaneously in a foo to bar range, and in a start to end range, no matter how those kinds of ranges overlap in the data:

(awk ((and (rng #/foo/ #/bar/)
           (rng #/start/ #/end/))))

answered Nov 23, 2023 at 4:08

Kaz

8,5852 gold badges28 silver badges50 bronze badges

I'll have to take your word for it.
– RonJohn
Commented Nov 24, 2023 at 2:00
Never heard of TXR Lisp. Will have to investigate.
– jubilatious1
Commented Dec 24, 2023 at 3:53
Your final example: tried similar with Raku (a.k.a Perl6) and it works! raku -ne '.put if (/ A / fff / C /) & (/ B / fff / D /);' file.
– jubilatious1
Commented Dec 24, 2023 at 3:54

Add a comment |

Prabhjot Singh · Accepted Answer · 2023-11-24 02:20:31Z

0

Using awk:

$ awk '
    $0 == "Word A" { f=1; rec=$0; next }
    { if ( $0 == "Word D" ) { print rec; f=0 } }
    f{rec = rec ORS $0}'

# For regex pattern
$ awk '          
    (/Word A/ && !/Word D/) { f=1; rec=$0; next }
    (/Word D/ && rec){ print rec; f=0; rec="" }
    f{rec = rec ORS $0}
'

If Word D matches Word A everytime, then the following command may be used.

$ awk '/Word A/,/Word D/ { if (!/Word D/) print }'

edited Nov 24, 2023 at 2:20

answered Nov 22, 2023 at 2:36

Prabhjot Singh

1,9851 gold badge5 silver badges18 bronze badges

Add a comment |

waltinator · Accepted Answer · 2023-11-21 23:39:02Z

-2

sed lets one do arithmetic on line specifications:

sed -n -e '/Word A/,/Word D/-1p' The_File

Read man sed.

answered Nov 21, 2023 at 23:39

waltinator

5,2591 gold badge19 silver badges23 bronze badges

This doesn't seem to be supported by GNU sed - range addresses only appear to allow positive offsets relative to the start of the range (like /Word A/,+3p). However you could do /Word A/,/Word D/{/Word D/!p} I think.
– steeldriver
Commented Nov 22, 2023 at 1:06
Tested in GNU sed; does not work.
– RonJohn
Commented Nov 22, 2023 at 2:30
@steeldriver sed -n -e '/Word A/,/Word D/{/Word D/!p}' The_file works. Make this an answer, and I'll accept.
– RonJohn
Commented Nov 22, 2023 at 2:41
@RonJohn if you use range expressions then you end up having to specify the same regexp twice while if you use a flag you don't. That makes a flag solution better than a range solution. Sed doesn't have variables to use as flags but awk does.
– Ed Morton
Commented Nov 22, 2023 at 12:56
Not my downvote by the way.
– Ed Morton
Commented Nov 22, 2023 at 13:05

Add a comment |

Stack Exchange Network

Find all lines between two patterns, EXCLUSIVE of the second pattern?

7 Answers 7

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
awk
sed
grep
perl
.

Hot Network Questions

Find all lines between two patterns, EXCLUSIVE of the second pattern?

7 Answers 7

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged awksedgrepperl.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
awk
sed
grep
perl
.