Can grep output only specified groupings that match?

Question

Say I have a file:

# file: 'test.txt'
foobar bash 1
bash
foobar happy
foobar

I only want to know what words appear after "foobar", so I can use this regex:

"foobar \(\w\+\)"

The parenthesis indicate that I have a special interest in the word right after foobar. But when I do a grep "foobar $\w\+$" test.txt, I get the entire lines that match the entire regex, rather than just "the word after foobar":

foobar bash 1
foobar happy

I would much prefer that the output of that command looked like this:

bash
happy

Is there a way to tell grep to only output the items that match the grouping (or a specific grouping) in a regular expression?

for those who do not need grep: perl -lne 'print $1 if /foobar (\w+)/' < test.txt — vault, Commented Dec 19, 2016 at 11:56
To indicate a regex grouping and to match 1 or more characters instead of searching for the actual characters (, ), and +. — Cory Klein, Commented Jan 29, 2020 at 15:54
@Sébastien You need $, $, \+ in GRE. Use ERE = egrep, when you like (, ), + — Hrobky, Commented Oct 14, 2022 at 15:44

camh · Accepted Answer · 2011-05-20 01:50:11Z

640

GNU grep has the -P option for perl-style regexes, and the -o option to print only what matches the pattern. These can be combined using look-around assertions (described under Extended Patterns in the perlre manpage) to remove part of the grep pattern from what is determined to have matched for the purposes of -o.

$ grep -oP 'foobar \K\w+' test.txt
bash
happy
$

The \K is the short-form (and more efficient form) of (?<=pattern) which you use as a zero-width look-behind assertion before the text you want to output. (?=pattern) can be used as a zero-width look-ahead assertion after the text you want to output.

For instance, if you wanted to match the word between foo and bar, you could use:

$ grep -oP 'foo \K\w+(?= bar)' test.txt

or (for symmetry)

$ grep -oP '(?<=foo )\w+(?= bar)' test.txt

edited May 20, 2011 at 1:50

answered May 20, 2011 at 1:33

camh

39.3k9 gold badges74 silver badges62 bronze badges

12

How you do it if your regex has more than a grouping? (as the title implied?)
– barracel
Commented Mar 21, 2013 at 7:52
10

@barracel: I don't believe you can. Time for sed(1)
– camh
Commented Mar 22, 2013 at 22:51
3

@camh I have just tested that grep -oP 'foobar \K\w+' test.txt outputs nothing with the OP's test.txt. The grep version is 2.5.1. What could be wrong ? O_O
– SOUser
Commented Jul 24, 2014 at 14:19
8

Great answer for mentioning the \K! When I used (?<=) grep complained about my look-behind not being of fixed length, but using \K solved the problem.
– Hai Zhang
Commented Mar 31, 2017 at 11:30
3

seems -P flag doesn't work on Mac El Capitan at least
– OZZIE
Commented Jan 25, 2018 at 10:39

| Show 5 more comments

jgshawkey · Accepted Answer · 2016-04-22 16:08:37Z

155

    sed -n "s/^.*foobar\s*\(\S*\).*$/\1/p"

-n     suppress printing
s      substitute
^.*    anything before foobar
foobar initial search match
\s*    any white space character (space)
\(     start capture group
\S*    capture any non-white space character (word)
\)     end capture group
.*$    anything after the capture group
\1     substitute everything with the 1st capture group
p      print it

answered Apr 22, 2016 at 16:08

jgshawkey

1,6691 gold badge10 silver badges3 bronze badges

8

+1 for the sed example, seems like a better tool for the job than grep. One comment, the ^ and $ are extraneous since .* is a greedy match. However, including them might help clarify the intent of the regex.
– Tony
Commented May 30, 2018 at 21:22
1

And for me was escential to add .* at the beginning. Otherwise it also captured what's before to foobar.
– aerijman
Commented Feb 19, 2020 at 18:37
3

For some reason this does not seem to work with macOS sed: echo "foobar bash 1" | sed -n "s/^.*foobar\s*$\S*$.*$/\1/p" outputs nothing.
– Frederik
Commented Nov 27, 2020 at 15:57
4

I had to add "-r" as sed option in order for it to work.
– Roemer
Commented Jun 8, 2021 at 12:37
5

with sed -nr and ( ) instead of  it worked for me (Ubuntu 20.4)
– Martin T.
Commented Feb 7, 2022 at 9:13

| Show 2 more comments

Community · Accepted Answer · 2017-04-13 12:36:44Z

74

Standard grep can't do this, but recent versions of GNU grep can. You can turn to sed, awk or perl. Here are a few examples that do what you want on your sample input; they behave slightly differently in corner cases.

Replace foobar word other stuff by word, print only if a replacement is done.

sed -n -e 's/^foobar \([[:alnum:]]\+\).*/\1/p'

If the first word is foobar, print the second word.

awk '$1 == "foobar" {print $2}'

Strip foobar if it's the first word, and skip the line otherwise; then strip everything after the first whitespace and print.

perl -lne 's/^foobar\s+// or next; s/\s.*//; print'

edited Apr 13, 2017 at 12:36

CommunityBot

1

answered May 19, 2011 at 23:17

Gilles 'SO- stop being evil'

839k198 gold badges1.8k silver badges2.2k bronze badges

Awesome! I thought I may be able to do this with sed, but I haven't used it before and was hoping I could use my familiar grep. But the syntax for these commands actually looks very familiar now that I am familiar with vim-style search & replace + regexes. Thanks a ton.
– Cory Klein
Commented May 19, 2011 at 23:51
1

Not true, Gilles. See my answer for a GNU grep solution.
– camh
Commented May 20, 2011 at 1:33
2

@camh: Ah, I didn't know GNU grep now had full PCRE support. I've corrected my answer, thanks.
– Gilles 'SO- stop being evil'
Commented May 20, 2011 at 7:14
3

This answer is especially useful for embedded Linux since Busybox grep doesn't have PCRE support.
– Craig McQueen
Commented Mar 17, 2016 at 0:12

Add a comment |

G-Man Says 'Reinstate Monica' · Accepted Answer · 2018-04-14 07:29:46Z

52

pcregrep has a smarter -o option that lets you choose which capturing groups you want output. So, using your example file,

$ pcregrep -o1 "foobar (\w+)" test.txt
bash
happy

answered Apr 14, 2018 at 7:29

G-Man Says 'Reinstate Monica'

23.2k27 gold badges74 silver badges122 bronze badges

4

Wow, this was magical for me, thank you so much. I'm on MacOS, and was trying to use match-groups somehow. I had been trying zegrep because I was grepping a large zip-file, but also found that pcregrep will (from thepcregrep --help page): Files whose names end in .gz are read using zlib. So I could use it straight away on my zip file. Thanks again!
– samjewell
Commented Apr 6, 2020 at 15:11

Add a comment |

Dave · Accepted Answer · 2011-05-20 01:07:12Z

33

Well, if you know that foobar is always the first word or the line, then you can use cut. Like so:

grep "foobar" test.file | cut -d" " -f2

answered May 20, 2011 at 1:07

Dave

4313 silver badges2 bronze badges

The -o switch on grep is widely implemented (moreso than the Gnu grep extensions), so doing grep -o "foobar" test.file | cut -d" " -f2 will increase the effectiveness of this solution, which is more portable than using lookbehind assertions.
– dubiousjim
Commented Apr 19, 2012 at 21:04
1

I believe that you would need grep -o "foobar .*" or grep -o "foobar \w+".
– G-Man Says 'Reinstate Monica'
Commented Apr 14, 2018 at 7:20
1

Breaks if there is another space in the value
– mvmn
Commented Dec 27, 2019 at 15:17

Add a comment |

Community · Accepted Answer · 2020-06-11 14:16:50Z

26

Using grep is not cross-platform compatible, since -P/--perl-regexp is only available on GNU grep, not BSD grep.

Here is the solution using ripgrep:

$ rg -o "foobar (\w+)" -r '$1' <test.txt
bash
happy

As per man rg:

-r/--replace REPLACEMENT_TEXT Replace every match with the text given.

Capture group indices (e.g., $5) and names (e.g., $foo) are supported in the replacement string.

^{Related: GH-462.}

edited Jun 11, 2020 at 14:16

CommunityBot

1

answered Apr 16, 2018 at 15:35

kenorb

21.3k17 gold badges147 silver badges165 bronze badges

It is possible to install gnugrep on BSD distros however.
– bparker
Commented May 13, 2020 at 20:50

Add a comment |

Thor · Accepted Answer · 2013-10-08 12:38:10Z

10

If PCRE is not supported you can achieve the same result with two invocations of grep. For example to grab the word after foobar do this:

<test.txt grep -o 'foobar  *[^ ]*' | grep -o '[^ ]*$'

This can be expanded to an arbitrary word after foobar like this (with EREs for readability):

i=1
<test.txt egrep -o 'foobar +([^ ]+ +){'$i'}[^ ]+' | grep -o '[^ ]*$'

Output:

Note the index i is zero-based.

answered Oct 8, 2013 at 12:38

Thor

17.3k3 gold badges53 silver badges70 bronze badges

Add a comment |

Tim Richardson · Accepted Answer · 2019-01-29 08:29:37Z

4

I found the answer of @jgshawkey very helpful. grep is not such a good tool for this, but sed is, although here we have an example that uses grep to grab a relevant line.

Regex syntax of sed is idiosyncratic if you are not used to it.

Here is another example: this one parses output of xinput to get an ID integer

⎜   ↳ SynPS/2 Synaptics TouchPad                id=19   [slave  pointer  (2)]

and I want 19

export TouchPadID=$(xinput | grep 'TouchPad' | sed  -n "s/^.*id=\([[:digit:]]\+\).*$/\1/p")

Note the class syntax:

[[:digit:]]

and the need to escape the following +

I assume only one line matches.

answered Jan 29, 2019 at 8:29

Tim Richardson

2501 silver badge7 bronze badges

Slightly simpler version without the extra grep, assuming 'TouchPad' is to the left of 'id' : echo "SynPS/2 Synaptics TouchPad id=19 [slave pointer (2)]" | sed -nE "s/.*TouchPad.+id=([0-9]+).*/\1/p"
– Amit Naidu
Commented May 19, 2019 at 5:10

Add a comment |

jubilatious1 · Accepted Answer · 2024-03-30 22:23:18Z

Compare Perl and Raku solutions:

Using Perl (answers from @vault and @Gilles 'SO- stop being evil'):

~$ perl -lne 'print $1 if /^foobar (\w+)/;'  file

#OR:

~$ perl -lne 's/^foobar\s+// or next; s/\s.*//; print'  file

Using Raku (formerly known as Perl_6)

~$ raku -ne 'put $0 if /^foobar \s+ (\w+)/;'  file

#OR:

~$ raku -ne 's/^foobar\s+// or next; s/\s.*//; .put;'  file

A few more Raku answers (including Raku grep)

~$ raku -pe 's/^foobar \s+ ( \S+ ) [ \s+ .*?]? $ /$0/ or next;'  file

#OR:

~$ raku -ne '.grep(/^foobar\s+/) or next; .words[1].put;'  file

#OR:

~$ raku -ne '$_ .= words; if .[0] eq "foobar" { put .[1] // next };'  file

#OR:

~$ raku -ne 'put .[1] || next if $_.=words[0] eq "foobar";'  file

Note: for the last few examples using indexing, stray spaces can be cleaned up first if they are problematic. Try using any of the various trim routines in Raku: .trim, .trim-leading. or trim-trailing, like so:

~$ raku -ne '.trim-trailing.grep(/foobar\s+/) or next; .words[1].put;'  file

(Of course, an advantage of solutions in Perl/Raku is that these languages are cross-platform, having binaries available for Windows, etc.).

Perl References:
https://perldoc.perl.org
https://www.perl.org

Raku References:
https://docs.raku.org
https://raku.org

Stack Exchange Network

Can grep output only specified groupings that match?

9 Answers 9

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
text-processing
grep
regular-expression
.

Linked

Hot Network Questions

Can grep output only specified groupings that match?

9 Answers 9

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged text-processinggrepregular-expression.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
text-processing
grep
regular-expression
.