How to extract text from a string using sed?

Question

My example string is as follows:

This is 02G05 a test string 20-Jul-2012

Now from the above string I want to extract 02G05. For that I tried the following regex with sed

$ echo "This is 02G05 a test string 20-Jul-2012" | sed -n '/\d+G\d+/p'

But the above command prints nothing and the reason I believe is it is not able to match anything against the pattern I supplied to sed.

So, my question is what am I doing wrong here and how to correct it.

When I try the above string and pattern with python I get my result

>>> re.findall(r'\d+G\d+',st)
['02G05']
>>>

Python is definitely not sed. Their regex flavors are quite different. — tripleee, Commented Dec 12, 2013 at 11:45

mVChr · Accepted Answer · 2020-05-29 17:21:30Z

137

How about using grep -E?

echo "This is 02G05 a test string 20-Jul-2012" | grep -Eo '[0-9]+G[0-9]+'

edited May 29, 2020 at 17:21

answered Jul 19, 2012 at 20:42

mVChr

50.1k11 gold badges110 silver badges104 bronze badges

4

+1 This is simpler, and will also correctly handle the case of multiple matches on the same line. A complex sed script could be devised for that case, but why bother?
– tripleee
Commented Jul 20, 2012 at 7:28
egrep uses extended regexp, sed and grep uses standard regexp, egrep or grep -e or sed -E use extended regexp, and the python code in the question uses PCRE, (perl common regular expression) GNU grep can use PCRE with -P option.
– Felipe Buccioni
Commented Aug 22, 2016 at 13:46
@FelipeBuccioni actually that should be egrep or grep -E or sed -r
– SensorSmith
Commented Apr 13, 2018 at 15:44
For a single(first) match, append ` | head -1` (without backticks), as per this answer to another question.
– SensorSmith
Commented Apr 13, 2018 at 15:55
2

grep has -m 1 to stop after the first match.
– tripleee
Commented Apr 20, 2018 at 3:42

| Show 4 more comments

tripleee · Accepted Answer · 2023-12-29 14:44:40Z

134

The pattern \d might not be supported by your sed. Try [0-9] or [[:digit:]] instead.

To only print the actual match (not the entire matching line), use a substitution.

sed -n 's/.*\([0-9][0-9]*G[0-9][0-9]*\).*/\1/p'

The parentheses capture the text they match into a back reference. Here, the first (and only) parentheses capture the string we want to keep, and we replace the entire line with just the captured string \1, and print the resulting line. (The p option says to print the resulting line after performing a successful substitution, and the -n option prevents sed from performing its normal printing of every other line.)

edited Dec 29, 2023 at 14:44

answered Jul 19, 2012 at 20:39

tripleee

185k36 gold badges295 silver badges342 bronze badges

6

Thanks it worked fine. But I have a question why .* is necessary with your regex because when I try sed -n 's/\([0-9]\+G[0-9]\+\)/\1/p' it just prints the entire line.
– RanRag
Commented Jul 19, 2012 at 20:47
7

That's why, isn't it? Replace whatever comes before and after the match with norhing, then print the whole line.
– tripleee
Commented Jul 19, 2012 at 21:01
1

@tripleee This only prints 2G05 not 02G05. The expression that works is 's/.*\([0-9][0-9]G[0-9][0-9]*\).*/\1/p'
– Kshitiz Sharma
Commented Dec 12, 2013 at 10:06
1

That hard-codes it to exactly two digits. Something like sed -n 's/\(.*[^0-9]\)\?\([0-9][0-9]*G[0-9][0-9]*\).*/\2/p' would be more general. (I assume your sed supports \? for zero or one occurrence.)
– tripleee
Commented Dec 12, 2013 at 11:53
See also stackoverflow.com/a/48898886/874188 for how to replace various other common Perl escapes like \w, \s, etc.
– tripleee
Commented Aug 16, 2019 at 5:28

| Show 3 more comments

Zsolt Botykai · Accepted Answer · 2012-07-19 20:40:07Z

8

Try this instead:

echo "This is 02G05 a test string 20-Jul-2012" | sed 's/.* \([0-9]\+G[0-9]\+\) .*/\1/'

But note, if there is two pattern on one line, it will prints the 2nd.

answered Jul 19, 2012 at 20:40

Zsolt Botykai

51.2k14 gold badges87 silver badges111 bronze badges

Or more generally the last one if there are multiple matches.
– tripleee
Commented Jul 19, 2016 at 13:28

Add a comment |

Dennis Williamson · Accepted Answer · 2012-07-19 20:37:52Z

6

sed doesn't recognize \d, use [[:digit:]] instead. You will also need to escape the + or use the -r switch (-E on OS X).

Note that [0-9] works as well for Arabic-Hindu numerals.

answered Jul 19, 2012 at 20:37

Dennis Williamson

356k93 gold badges379 silver badges442 bronze badges

I tried sed -n '/[0-9]\+G[0-9]\+/p'. Now it just prints the whole string
– RanRag
Commented Jul 19, 2012 at 20:43
@Noob: You will need to use substitution to exclude the parts you don't want to print.
– Dennis Williamson
Commented Jul 19, 2012 at 20:46

Add a comment |

aotherix · Accepted Answer · 2023-03-17 22:20:56Z

1

We can use sed -En to simplify the regular expression, where:

n: suppress automatic printing of pattern space
E: use extended regular expressions in the script

$ echo "This is 02G05 a test string 20-Jul-2012" | sed -En 's/.*([0-9][0-9]+G[0-9]+).*/\1/p'

02G05

answered Mar 17, 2023 at 22:20

aotherix

113 bronze badges

Add a comment |

Geoff · Accepted Answer · 2018-08-22 16:28:13Z

0

Try using rextract. It will let you extract text using a regular expression and reformat it.

Example:

$ echo "This is 02G05 a test string 20-Jul-2012" | ./rextract '([\d]+G[\d]+)' '${1}'

2G05

edited Aug 22, 2018 at 16:28

Geoff

8,0253 gold badges36 silver badges44 bronze badges

answered Sep 13, 2016 at 3:03

Tim Savannah

192 bronze badges

If this uses standard regex, the square brackets around \d are completely superfluous.
– tripleee
Commented Nov 26, 2019 at 6:16

Add a comment |

Collectives™ on Stack Overflow

How to extract text from a string using sed?

6 Answers 6

Not the answer you're looking for? Browse other questions tagged
regex
bash
sed
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Not the answer you're looking for? Browse other questions tagged regexbashsed or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
regex
bash
sed
or ask your own question.