0

I'm trying to remove all instances of '_(one number)' from certain strings in a file. So tig00000003_1 should become tig00000003 This is what my test file looks like:

##sequence-region tig00000001_732 1 630
tig00000003_1 Name=tig00000003_1;

I've tried sed -E 's/(tig[0-9]{8}\_[0-9]{1})/ \1(tig[0-9]{8}) /' my_test.txt , which gives:

##sequence-region  tig00000001_7(tig[0-9]{8}) 32 1 630
 tig00000003_1(tig[0-9]{8}) Name=tig00000003_1;

and this is what I want:

##sequence-region tig00000001_732 1 630
tig00000003 Name=tig00000003;

how can I remove the matched pattern in the capture group,or alternately only keep the match within the capture group?

2 Answers 2

0

You could simply replace the '_(one number)' with nothing on any lines that are not comments like so:

sed '/^[^#]/ s/\_[0-9]//g' your_file

The way it works is as follows:

  • Lines not matching comments are identified as those that start with (^) any non # symbol ([^#])
  • Then on those lines substitute any underscore + digit (_[0-9]) with nothing (//) every time that pattern is found on the line (g)
0

You're pretty close. Use capturing parentheses around the "tig" number

sed -E '/^#/n; s/(tig[0-9]{8})\_[0-9]/\1/g' my_test.txt
# ...............^^^^^^^^^^^^^........^^

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .