1
  • GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu)
  • grep (GNU grep) 2.20
  • grep-2.20-3.el7.x86_64

Can someone explain this puzzle? I'm getting false matches with grep/egrep.

echo "somestringthing" | egrep  '\bstring*'
(no output as expected)
echo "somestringthing" | egrep '\bsomestring*'
somestringthing
echo "somestringthing" | egrep '\bsomestringthingy*'
somestringthing
echo "somestringthing" | egrep '\bsomestringthing1*'
somestringthing
echo "somestringthing" | egrep '\bsomestringthingX*'
somestringthing

That last three should NOT match because of the single char before the wildcard. Experimenting, I've found that any string will match as if the single character before the wildcard did not exist.

'\b' is a word boundary, FYI.

So am I missing something here, or is this a bug in grep? (Talk about hair-pulling madness trying to debug code you think is working properly.)

3
  • 2
    * in a regexp means zero-or-more. so y* means zero-or-more y characters. use a y+ (or y\+ in BRE) if you mean "one-or-more y characters". or use .* if you mean "followed by zero-or-more of any other characters"
    – cas
    Commented Aug 30, 2019 at 3:17
  • btw, use grep -E, not egrep. egrep is deprecated
    – cas
    Commented Aug 30, 2019 at 3:19
  • also worth mentioning is that unless you're capturing the match (e.g. with grep's -o option), grep -E '\bstring*' is functionally identical to grep -E '\bstrin'.
    – cas
    Commented Aug 30, 2019 at 3:24

2 Answers 2

2

The y*, 1* and X* at the end of the last three regular expressions will match zero or more y, 1 and X respectively.

At the end of the input string somestringthing you do actually have zero or more of these characters (exactly zero), so all three expressions matches.

If you want to match one or more y at the end of the string, use y+ or y{1,} in an extended regular expression, or yy* or y\{1,\} in a basic regular expression (grep without -E):

echo somestringthing | grep -E 'somestringthingy+'

(this produces no output)

Also note that egrep is deprecated and you should be using grep -E. If you want to match complete words only, use grep -E -w (this would require a word boundary at the start and end of the match in the input).

0

Bahh..more messing around and it seems the character before the * wildcard is being treated as a .

The proper wildcard use for grep is apparently .* not just *

Also, the \b was not required once I used the .* as the wildcard. The -w flag works as expected:

echo "somestringthing" | egrep -w 'somestring.*'
somestringthing

echo "somestringthing" | egrep -w 'somestringy.*'
(no output as expected)
2
  • no, the character before the * is NOT treated as a . unless it IS a .. It's treated as zero-or-more of whatever character it happens to be. .* isn't the "proper wildcard for grep", it's a pattern that matches zero-or-more of any character (. matches any character). And, unless you want to capture to the end of the line, you generally don't need to have a .* at the end of a regexp pattern. regular expressions are not globs (shell/filename wildcards)....they may look similar and share some common features, but they are not the same.
    – cas
    Commented Aug 30, 2019 at 13:11
  • see man 7 glob and man 7 regex
    – cas
    Commented Aug 30, 2019 at 13:13

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .