Skip to main content
added bold text to ephasize the importance of storing the regex in a variable instead of using the pattern as a string literal
Source Link
Trevor Boyd Smith
  • 18.9k
  • 33
  • 129
  • 185
files="*.jpg"
regex="[0-9]+_([a-z]+)_[0-9a-z]*" # put the regex in a variable because some patterns won't work if included literally
for f in $files    # unquoted in order to allow the glob to expand
do
    if [[ $f =~ $regex ]]
    then
        name="${BASH_REMATCH[1]}"
        echo "${name}.jpg"    # concatenate strings
        name="${name}.jpg"    # same thing stored in a variable
    else
        echo "$f doesn't match" >&2 # this could get noisy if there are a lot of non-matching files
    fi
done

It's better to put the regex in a variable. Some patterns won't work if included literally.It's better to put the regex in a variable. Some patterns won't work if included literally.

 


side note #1 regarding regex anchors:

You should be aware that without anchors, this regex (and the one using grep) will match any of the following examples and more, which may not be what you're looking for:

then the third example will also be eliminated since the dot is not among the characters in the regex and the dollar sign represents the end of the string. Note that the fourth example fails this match as well.

side note #2 regarding grep and the \K operator:

files="*.jpg"
regex="[0-9]+_([a-z]+)_[0-9a-z]*"
for f in $files    # unquoted in order to allow the glob to expand
do
    if [[ $f =~ $regex ]]
    then
        name="${BASH_REMATCH[1]}"
        echo "${name}.jpg"    # concatenate strings
        name="${name}.jpg"    # same thing stored in a variable
    else
        echo "$f doesn't match" >&2 # this could get noisy if there are a lot of non-matching files
    fi
done

It's better to put the regex in a variable. Some patterns won't work if included literally.

You should be aware that without anchors, this regex (and the one using grep) will match any of the following examples and more, which may not be what you're looking for:

then the third example will also be eliminated since the dot is not among the characters in the regex and the dollar sign represents the end of the string. Note that the fourth example fails this match as well.

files="*.jpg"
regex="[0-9]+_([a-z]+)_[0-9a-z]*" # put the regex in a variable because some patterns won't work if included literally
for f in $files    # unquoted in order to allow the glob to expand
do
    if [[ $f =~ $regex ]]
    then
        name="${BASH_REMATCH[1]}"
        echo "${name}.jpg"    # concatenate strings
        name="${name}.jpg"    # same thing stored in a variable
    else
        echo "$f doesn't match" >&2 # this could get noisy if there are a lot of non-matching files
    fi
done

It's better to put the regex in a variable. Some patterns won't work if included literally.

 


side note #1 regarding regex anchors:

You should be aware that without anchors, this regex (and the one using grep) will match any of the following examples and more, which may not be what you're looking for:

then the third example will also be eliminated since the dot is not among the characters in the regex and the dollar sign represents the end of the string. Note that the fourth example fails this match as well.

side note #2 regarding grep and the \K operator:

clarifying comment
Source Link
Dennis Williamson
  • 355.7k
  • 93
  • 379
  • 441
files="*.jpg"
regex="[0-9]+_([a-z]+)_[0-9a-z]*"
for f in $files    # unquoted in order to allow the glob to expand
do
    if [[ $f =~ $regex ]]
    then
        name="${BASH_REMATCH[1]}"
        echo "${name}.jpg"    # concatenate strings
        name="${name}.jpg"    # same thing stored in a variable
    else
        echo "$f doesn't match" >&2 # this could get noisy if there are a lot of non-matching files
    fi
done
files="*.jpg"
regex="[0-9]+_([a-z]+)_[0-9a-z]*"
for f in $files
do
    if [[ $f =~ $regex ]]
    then
        name="${BASH_REMATCH[1]}"
        echo "${name}.jpg"    # concatenate strings
        name="${name}.jpg"    # same thing stored in a variable
    else
        echo "$f doesn't match" >&2 # this could get noisy if there are a lot of non-matching files
    fi
done
files="*.jpg"
regex="[0-9]+_([a-z]+)_[0-9a-z]*"
for f in $files    # unquoted in order to allow the glob to expand
do
    if [[ $f =~ $regex ]]
    then
        name="${BASH_REMATCH[1]}"
        echo "${name}.jpg"    # concatenate strings
        name="${name}.jpg"    # same thing stored in a variable
    else
        echo "$f doesn't match" >&2 # this could get noisy if there are a lot of non-matching files
    fi
done
edited for readability and grammar and remove "Edit" and "Update".
Source Link
the Tin Man
  • 160k
  • 44
  • 218
  • 306

If you're using Bash, you don't even have to use grep:

files="*.jpg"
regex="[0-9]+_([a-z]+)_[0-9a-z]*"
for f in $files
do
    if [[ $f =~ $regex ]]
    then
        name="${BASH_REMATCH[1]}"
        echo "${name}.jpg"    # concatenate strings
        name="${name}.jpg"    # same thing stored in a variable
    else
        echo "$f doesn't match" >&2 # this could get noisy if there are a lot of non-matching files
    fi
done

Edit: It's better to put the regex in a variable. Some patterns won't work if included literally.

This uses =~ which is Bash's regex match operator. The results of the match are saved to an array called $BASH_REMATCH. The first capture group is stored in index 1, the second (if any) in index 2, etc. Index zero is the full match.

You should be aware that without anchors, this regex (and the one using grep) will match any of the following examples and more, which may not be what you're looking for:

123_abc_d4e5
xyz123_abc_d4e5
123_abc_d4e5.xyz
xyz123_abc_d4e5.xyz

To eliminate the second and fourth examples, make your regex like this:

^[0-9]+_([a-z]+)_[0-9a-z]*

which says the string must start with one or more digits. The carat represents the beginning of the string. If you add a dollar sign at the end of the regex, like this:

^[0-9]+_([a-z]+)_[0-9a-z]*$

then the third example will also be eliminated since the dot is not among the characters in the regex and the dollar sign represents the end of the string. Note that the fourth example fails this match as well.

If you have GNU grep (around 2.5 or later, I think, when the \K operator was added):

name=$(echo "$f" | grep -Po '(?i)[0-9]+_\K[a-z]+(?=_[0-9a-z]*)').jpg

The \K operator (variable-length look-behind) causes the preceding pattern to match, but doesn't include the match in the result. The fixed-length equivalent is (?<=) - the pattern would be included before the closing parenthesis. You must use \K if quantifiers may match strings of different lengths (e.g. +, *, {2,4}).

The (?=) operator matches fixed or variable-length patterns and is called "look-ahead". It also does not include the matched string in the result.

In order to make the match case-insensitive, the (?i) operator is used. It affects the patterns that follow it so its position is significant.

The regex might need to be adjusted depending on whether there are other characters in the filename. You'll note that in this case, I show an example of concatenating a string at the same time that the substring is captured.

If you're using Bash, you don't even have to use grep:

files="*.jpg"
regex="[0-9]+_([a-z]+)_[0-9a-z]*"
for f in $files
do
    if [[ $f =~ $regex ]]
    then
        name="${BASH_REMATCH[1]}"
        echo "${name}.jpg"    # concatenate strings
        name="${name}.jpg"    # same thing stored in a variable
    else
        echo "$f doesn't match" >&2 # this could get noisy if there are a lot of non-matching files
    fi
done

Edit: It's better to put the regex in a variable. Some patterns won't work if included literally.

This uses =~ which is Bash's regex match operator. The results of the match are saved to an array called $BASH_REMATCH. The first capture group is stored in index 1, the second (if any) in index 2, etc. Index zero is the full match.

You should be aware that without anchors, this regex (and the one using grep) will match any of the following examples and more, which may not be what you're looking for:

123_abc_d4e5
xyz123_abc_d4e5
123_abc_d4e5.xyz
xyz123_abc_d4e5.xyz

To eliminate the second and fourth examples, make your regex like this:

^[0-9]+_([a-z]+)_[0-9a-z]*

which says the string must start with one or more digits. The carat represents the beginning of the string. If you add a dollar sign at the end of the regex, like this:

^[0-9]+_([a-z]+)_[0-9a-z]*$

then the third example will also be eliminated since the dot is not among the characters in the regex and the dollar sign represents the end of the string. Note that the fourth example fails this match as well.

If you have GNU grep (around 2.5 or later, I think, when the \K operator was added):

name=$(echo "$f" | grep -Po '(?i)[0-9]+_\K[a-z]+(?=_[0-9a-z]*)').jpg

The \K operator (variable-length look-behind) causes the preceding pattern to match, but doesn't include the match in the result. The fixed-length equivalent is (?<=) - the pattern would be included before the closing parenthesis. You must use \K if quantifiers may match strings of different lengths (e.g. +, *, {2,4}).

The (?=) operator matches fixed or variable-length patterns and is called "look-ahead". It also does not include the matched string in the result.

In order to make the match case-insensitive, the (?i) operator is used. It affects the patterns that follow it so its position is significant.

The regex might need to be adjusted depending on whether there are other characters in the filename. You'll note that in this case, I show an example of concatenating a string at the same time that the substring is captured.

If you're using Bash, you don't even have to use grep:

files="*.jpg"
regex="[0-9]+_([a-z]+)_[0-9a-z]*"
for f in $files
do
    if [[ $f =~ $regex ]]
    then
        name="${BASH_REMATCH[1]}"
        echo "${name}.jpg"    # concatenate strings
        name="${name}.jpg"    # same thing stored in a variable
    else
        echo "$f doesn't match" >&2 # this could get noisy if there are a lot of non-matching files
    fi
done

It's better to put the regex in a variable. Some patterns won't work if included literally.

This uses =~ which is Bash's regex match operator. The results of the match are saved to an array called $BASH_REMATCH. The first capture group is stored in index 1, the second (if any) in index 2, etc. Index zero is the full match.

You should be aware that without anchors, this regex (and the one using grep) will match any of the following examples and more, which may not be what you're looking for:

123_abc_d4e5
xyz123_abc_d4e5
123_abc_d4e5.xyz
xyz123_abc_d4e5.xyz

To eliminate the second and fourth examples, make your regex like this:

^[0-9]+_([a-z]+)_[0-9a-z]*

which says the string must start with one or more digits. The carat represents the beginning of the string. If you add a dollar sign at the end of the regex, like this:

^[0-9]+_([a-z]+)_[0-9a-z]*$

then the third example will also be eliminated since the dot is not among the characters in the regex and the dollar sign represents the end of the string. Note that the fourth example fails this match as well.

If you have GNU grep (around 2.5 or later, I think, when the \K operator was added):

name=$(echo "$f" | grep -Po '(?i)[0-9]+_\K[a-z]+(?=_[0-9a-z]*)').jpg

The \K operator (variable-length look-behind) causes the preceding pattern to match, but doesn't include the match in the result. The fixed-length equivalent is (?<=) - the pattern would be included before the closing parenthesis. You must use \K if quantifiers may match strings of different lengths (e.g. +, *, {2,4}).

The (?=) operator matches fixed or variable-length patterns and is called "look-ahead". It also does not include the matched string in the result.

In order to make the match case-insensitive, the (?i) operator is used. It affects the patterns that follow it so its position is significant.

The regex might need to be adjusted depending on whether there are other characters in the filename. You'll note that in this case, I show an example of concatenating a string at the same time that the substring is captured.

use an if statement for improved clarity
Source Link
Dennis Williamson
  • 355.7k
  • 93
  • 379
  • 441
Loading
Additional information
Source Link
Dennis Williamson
  • 355.7k
  • 93
  • 379
  • 441
Loading
Additional information
Source Link
Dennis Williamson
  • 355.7k
  • 93
  • 379
  • 441
Loading
Source Link
Dennis Williamson
  • 355.7k
  • 93
  • 379
  • 441
Loading