2

How do you remove all the text BETWEEN two Characters using sed...

Eg:

[email protected]
[email protected]
[email protected]

I want to remove the text + to @ in the email. (Even the + needs to be deleted, and the symbol @ needs to retain)

I used the following command:

sed -e 's/\(+\).*\(@\)/\1\2/' FILE.txt > RESULT.txt

But the output of the file includes "+" sign in it. Eg: [email protected]

I want the following output:

[email protected]
[email protected]
[email protected]

Can someone help me with modifying the above sed command?

0

2 Answers 2

3

The simple solution is to match the one(s) you want to keep around the boundary of the match, and put them back with nothing between them.

sed 's/+[^@+]*@/@/' FILE.txt >RESULT.txt

You were putting back stuff you didn't want to keep, which obviously produces the wrong result.

You can capture the string you want to keep using \( ... \) grouping parentheses, but in this case, since it's a completely static string, I opted to keep the regex and the replacement string as simple as possible, and just hardcode @ as the replacement string.

Notice also how the regex takes care not to straddle across multiple plus signs or @ signs. Maybe you do want to straddle any repeated + characters actually; then take out the plus from the negated character class, leaving only [^@].

1
  • Maybe avoid SHOUTING in your file names too.
    – tripleee
    Commented Jan 23, 2019 at 7:44
6

I will start with the original command rather than building from scratch. Building from scratch is an excellent approach in this case, still there is an educational value in understanding the original command and steps you can take to adjust the command to your needs.

The core of the original command:

sed -e 's/\(+\).*\(@\)/\1\2/'

The expression is in a form s/pattern/replacement/, which means "search for pattern and replace it with replacement". / is the separator here.

Your pattern is \(+\).*\(@\). Its matching function would be the same if it was +.*@ (enclosing something in \( \) is relevant in the context of replacement, we will get to it). A pattern of +.*@ means "literal + followed by (almost) any character (.) repeated zero or more times (*), followed by literal @".

Note + matches the first possible + and * is greedy so this match spans from the first + to the last @. It may not matter in your specific case, still sometimes it's very important.

Your replacement is \1\2. It means "whatever was matched by the 1st \( \) followed by whatever was matched by the 2nd \( \)". Your first \( \) is in fact \(+\), it matches + you want to get rid of.

To make it clear: the reason these \( \) groups appear in the pattern (so the pattern is not just +.*@) is they define fragments referred to as \1 and \2 later.

So if you don't want + to be printed, the minimal change to your original command will be to omit \1, because this is the exact part that prints + in your case.

sed -e 's/\(+\).*\(@\)/\2/'

But then you don't need \( \) around + in the pattern, therefore you can simplify:

sed -e 's/+.*\(@\)/\1/'

Note \2 became \1 because \(@\) is now the 1st \( \) group. Also, since it can only match @, you can use the literal @ instead of \1:

sed -e 's/+.*\(@\)/@/'

But now you don't need \( \) at all. The command becomes:

sed -e 's/+.*@/@/'

Then you recall * is greedy so .* may include (extra) + or/and @. Let's suppose you don't want this. You need to turn . into something that matches anything but @ or +:

sed -e 's/+[^@+]*@/@/'

This is exactly what this other answer gave you. Somewhat experienced sed user would build this solution from scratch. As you can see it's possible to reduce your original command, step by step in a logical manner, and get to the same solution.

2
  • 1
    What a nice step by step explanation. The end result was unexpected.
    – Yoric
    Commented Jan 23, 2019 at 9:51
  • 1
    Excellent explanation. This helped me a lot to understand what was happening. Thanks a ton. Commented Jan 23, 2019 at 11:07

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .