RegEx : Nested Groups and Quantifiers

Question

This is my string : file_1234_test.pdf
Task is to find the filename-without-extension and find the number.
So the result should be :

> Match 1 = file_1234_test.pdf
> Group 1 = file_1234_test
> Group 2 = 1234

I found Stack-58379142 but it does not answer my question.

I tested the following queries on regex101 and regexstorm

Step 1. as expected

> (.*)\.pdf
> Match 1 = file_1234_test.pdf
> Group 1 = file_1234_test

Step 2. as expected : greedy '+' quantifier

> (\d+)
> Match 1 = 1234
> Group 1 = 1234

Step 3. still as expected

> ((\d+).*)
> Match 1 = 1234_test.pdf
> Group 1 = 1234_test.pdf
> Group 2 = 1234

Step 4. once again as expected

> ((\d+).*)\.pdf
> Match 1 = 1234_test.pdf
> Group 1 = 1234_test
> Group 2 = 1234

Step 5. '+' quantifier suddenly became lazy

> (.*(\d+).*)\.pdf
> Match 1 = file_1234_test.pdf
> Group 1 = file_1234_test
> Group 2 = 4

Of course (.*(\d{4}).*)\.pdf or (.*_(\d+).*)\.pdf works.

> Match 1 = file_1234_test.pdf
> Group 1 = file_1234_test
> Group 2 = 1234

But then the query is (as I feel it) needless narrowing and too specific. What if I have a list of hundreds and ...

So, Question : Is there a solution ?

Trung Duong · Accepted Answer · 2023-03-04 13:36:31Z

1

You could try this regex pattern: (.*?(\d+).*)\.pdf

It makes the first part .*? become lazy matching.

See demo here

answered Mar 4, 2023 at 13:36

Trung Duong

3,4852 gold badges9 silver badges10 bronze badges

1

To be more clear: in step 5 + doesn't become lazy, it is always greedy, but since .* (the one on the left) is evaluated first (a pattern is tested from left to right) and is greedy too, it consumes as many characters as possible. Making it lazy solve the problem.
– Casimir et Hippolyte
Commented Mar 4, 2023 at 14:01
1

An other possibility is to replace the dot with a character class that excludes the digits.
– Casimir et Hippolyte
Commented Mar 4, 2023 at 14:06
@Trung Problem solved, so simple, should have found it myself
– biburepo
Commented Mar 4, 2023 at 16:35
@Casimir Appreciated your further clarification
– biburepo
Commented Mar 4, 2023 at 16:38

Add a comment |

1 Answer 1