capturing groups in sed

Question

I have many lines of the form

ko04062 ko:CXCR3
ko04062 ko:CX3CR1
ko04062 ko:CCL3
ko04062 ko:CCL5
ko04080 ko:GZMA

and would dearly like to get rid of the ko: bit of the right-hand column. I'm trying to use sed, as follows:

echo "ko05414 ko:ITGA4" | sed 's/\(^ko\d{5}\)\tko:\(.*$\)/\1\2/'

which simply outputs the original string I echo'd. I'm very new to command line scripting, sed, pipes etc, so please don't be too angry if/when I'm doing something extremely dumb.

The main thing that is confusing me is that the same thing happens if I reverse the \1\2 bit to read \2\1 or just use one group. This, I guess, implies that I'm missing something about the mechanics of piping the output of echo into sed, or that my regexp is wrong or that I'm using sed wrong or that sed isn't printing the results of the substitution.

Any help would be greatly appreciated!

don't know Perl! learning sed now. Will learn perl, and anything else, as and when necessary... — Mike Dewar, Commented Jul 21, 2010 at 18:36

ninjalj · Accepted Answer · 2010-07-21 18:37:13Z

28

sed is outputting its input because the substitution isn't matching. Since you're probably using GNU sed, try this:

echo "ko05414     ko:ITGA4" | sed 's/\(^ko[0-9]\{5\}\)\tko:\(.*$\)/\1\2/'

\d -> [0-9] since GNU sed doesn't recognize \d
{} -> \{\} since GNU sed by default uses basic regular expressions.

answered Jul 21, 2010 at 18:37

ninjalj

43.2k10 gold badges109 silver badges150 bronze badges

2

this still gives me the same error. I'm in OSX - not sure how to find out if I'm using GNU sed...
– Mike Dewar
Commented Jul 21, 2010 at 18:40
7

@Mike Dewar -- ooh, that's important information... i think OS X uses a BSD-like sed, whereas it's a common assumption here that folks use GNU sed
– Dan LaRocque
Commented Jul 21, 2010 at 18:46
that's important to know! Thanks so much!
– Mike Dewar
Commented Jul 21, 2010 at 19:42
2

On OSX, GNU sed is called gsed
– hd1
Commented Apr 15, 2013 at 18:25

Add a comment |

Anders · Accepted Answer · 2010-07-21 19:55:43Z

11

This should do it. You can also skip the last group and simply use, \1 instead, but since you're learning sed and regex this is good stuff. I wanted to use a non-capturing group in the middle (:? ) but I could not get that to play with sed for whatever reason, perhaps it's not supported.

sed --posix 's/\(^ko[0-9]\{5\}\)\( ko:\)\(.*$\)/\1 \3/g' file > result

And ofcourse you can use

sed --posix 's/ko://'

edited Jul 21, 2010 at 19:55

answered Jul 21, 2010 at 19:01

Anders

6,2084 gold badges27 silver badges31 bronze badges

Thanks so much for this! I've upvoted your answer because you've totally nailed this, and the 's/ko://' is great (though what's that backtick doing?). I'm giving the tick to ninjalj cos his answer + comments has explained what I was doing wrong. But I'm definitely sticking with 's/ko://' or maybe even the string replace by getekha! I'll see which is faster...
– Mike Dewar
Commented Jul 21, 2010 at 19:47
My bad, leftover from a variable. Yeah I would would give it to him also, he actually bothered explaining.
– Anders
Commented Jul 21, 2010 at 19:57

Add a comment |

getekha · Accepted Answer · 2010-07-21 19:03:11Z

6

You don't need sed for this

Here is how you can do it with bash:

var="ko05414 ko:ITGA4"
echo ${var//"ko:"}

${var//"ko:"} replaces all "ko:" with ""

See Manipulating Strings for more info

answered Jul 21, 2010 at 19:03

getekha

2,5533 gold badges18 silver badges20 bronze badges

3

while I /am/ learning sed, this approach strikes me as brilliant and simple. I had no idea about this syntax. All this command line fu is awesome.
– Mike Dewar
Commented Jul 21, 2010 at 19:41

Add a comment |

0zkr PM · Accepted Answer · 2014-03-28 18:47:55Z

@OP, if you just want to get rid of "ko:", then

$ cat file
ko04062 ko:CXCR3
ko04062 ko:CX3CR1
ko04062 ko:CCL3
ko04062 ko:CCL5
some text with a legit ko: this ko: will be deleted if you use gsub.
ko04080 ko:GZMA

$ awk '{sub("ko:","",$2)}1' file
ko04062 CXCR3
ko04062 CX3CR1
ko04062 CCL3
ko04062 CCL5
some text with a legit ko: this ko: will be deleted if you use gsub.
ko04080 GZMA

Jsut a note. While you can use pure bash string substitution, its only more efficient when you are changing a single string. If you have a file, especially a big file, using bash's while read loop is still slower than using sed or awk.

Collectives™ on Stack Overflow

capturing groups in sed

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
command-line
sed
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged command-linesed or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
command-line
sed
or ask your own question.