REGEX Repeater "Or" Operator

Question

I am looking to match a regex with either 2 [0-9] repeats (and then some other pattern)

[0-9]{2}[A-z]{4}

OR 6 [0-9] repeats (and then some other pattern)

[0-9]{6}[A-z]{4}

The following is too inclusive:

[0-9]{2,6}[A-z]{4}

QUESTION

Is there a way that I can specify either 2 or 6 repeats?

DO NOT USE [A-z] IN A REGEX. To match any ASCII letter, uppercase or lowercase, use [A-Za-z]. [A-z] matches all those, plus several punctuation characters whose code points happen to lie between Z and a. — Alan Moore, Commented Jan 28, 2014 at 21:59
@AlanMoore thank you for this. As I mention in a comment below, [A-z] is not part of my regex I'm working with, I just used it as a stand in for the other parts of my lengthy regex so as to not detract from my question. Thank you for the teaching moment, however (no sarcasm, I actually am new-ish to regex and didn't realize this). — JSK NS, Commented Jan 29, 2014 at 13:17

dee-see · Accepted Answer · 2014-01-28 22:10:33Z

5

You can use the or | like this within a non-capturing group:

(?:[0-9]{2}|[0-9]{6})[A-z]{4}

Be aware that using [A-z] doesn't only include lower and upper case letters, but also [, \, ], ^, _, and ' which lie between Z and a in the ASCII code points. Use [A-Za-z] for letters, as pointed out by @AlanMoore in his comment.

edited Jan 28, 2014 at 22:10

answered Jan 28, 2014 at 19:28

dee-see

24k6 gold badges60 silver badges91 bronze badges

-1 for [A-z]. (See my comment under the question.)
– Alan Moore
Commented Jan 28, 2014 at 22:08
1

@AlanMoore I assumed that OP did this on purpose, but I suppose I could add a note.
– dee-see
Commented Jan 28, 2014 at 22:12
What's the initial ?: for?
– Ayush
Commented Jan 29, 2014 at 8:22
@xbonez: CaffGeek's answer explains the ?:. As a rule, it's best to use non-capturing groups whenever you can, saving capturing groups for those times when you actually want to capture something. It makes your regexes slightly more efficient, but the main reason is that it makes it easier to keep track of which capturing group captures what.
– Alan Moore
Commented Jan 29, 2014 at 9:24

Add a comment |

CaffGeek · Accepted Answer · 2014-01-28 22:39:49Z

3

This should work

(?:[0-9]{2}|[0-9]{6})[a-zA-Z]{4}

Do you have some test cases I can verify it with.

12asdf - passes
123456asdf - passes
1234asdf - fails

However, if you don't anchor the start of the regex to a word (\b) or line boundary (^), the 1234asdf will have 34asdf as a partial match.

So either

\b(?:[0-9]{2}|[0-9]{6})[a-zA-Z]{4}

or

^(?:[0-9]{2}|[0-9]{6})[a-zA-Z]{4}

As a quick rundown of the regex changes

(?: ) creates a non capturing group
| selects between the alteratives [0-9]{2} and [0-9]{6}
^ matches the start of a line
$ matches the end of a line
\b matches a word boundary
[a-zA-Z] is being used instead of [A-z] as it's likely what was intended (all alpha characters, regardless of case)

You can also replace your [0-9]s with a \d which is shorthand for any digit. The best way I can think of to right this, and not get partial matches is as follows

(?:\b|^)(?:\d{2}|\d{6})[a-zA-Z]{4}(?:\b|$)

edited Jan 28, 2014 at 22:39

answered Jan 28, 2014 at 19:28

CaffGeek

22k18 gold badges103 silver badges184 bronze badges

-1 for [A-z]. (See my comment under the question.)
– Alan Moore
Commented Jan 28, 2014 at 22:06
1

@AlanMoore, I copied the "some other pattern" from the users question. Who are you to judge the validity of that portion of his regex? Leave a comment for the OP, and us, pointing out the potential err, but downvoting for it is simply rude.
– CaffGeek
Commented Jan 28, 2014 at 22:09
Who am I? I'm the guy (apparently the only one) who knows [A-z] is always wrong. Maybe I overreacted, but I was a little shocked to see three people who seem to know something about regexes blindly repeating such a blatant, beginner's error.
– Alan Moore
Commented Jan 28, 2014 at 22:23
1

Honestly, I saw it, thought it looked odd, but it wasn't the part of the regex the OP was having an issue with and I didn't give it a second thought. I'll update my answer with [a-zA-Z] as was likely intended.
– CaffGeek
Commented Jan 28, 2014 at 22:38
@CaffGeek you are correct. [A-z] is not actually part of the regex I'm working with. I just through that in as a sample as the rest of my regex is long and would have side-tracked from my original question. +1 to you for providing a good answer.
– JSK NS
Commented Jan 29, 2014 at 13:14

Add a comment |

Jerry · Accepted Answer · 2014-01-29 08:08:37Z

3

The classic way would be:

(?:[0-9]{2}|[0-9]{6})[A-z]{4}

[Literally as [0-9]{2} OR [0-9]{6}]

But you can also use this one, which should be a little more efficient than the above with less potential backtracking:

[0-9]{2}(?:[0-9]{4})?[A-z]{4}

[Here, [0-9]{2} then potential other 4 [0-9] which makes a total of 6 [0-9] in the required conditions]

You might not be aware that [A-z] matches letters and some other characters, but it actually does.

The range [A-z] effectively is equivalent to:

[A-Z\[\\\]^_`a-z]

Notice that the additional characters that match are:

[ \ ] ^ _ `

[spaces included voluntarily for separation but is not part of the characters]

This is because those characters are between the block letters and lowercase letters in the unicode table.

edited Jan 29, 2014 at 8:08

answered Jan 28, 2014 at 19:29

Jerry

71.3k14 gold badges102 silver badges146 bronze badges

Of course, depending on your language and the function you use, you might need anchors to enforce the application of the regex on the full string to be tested or not.
– Jerry
Commented Jan 28, 2014 at 19:34
-1 for [A-z]. (See my comment under the question.)
– Alan Moore
Commented Jan 28, 2014 at 22:07
@AlanMoore I merely addressed the actual issue the OP was facing and assumed that they knew what [A-z] actually matches (in many cases, OP is right and has their reasons, in others, OP is wrong and not aware of it...). If you insist on being nit picky, that's not a problem, I can add it to my answer.
– Jerry
Commented Jan 29, 2014 at 8:01
Given the basic level of the question, I think it's safe to assume the OP isn't aware of the underlying issues. He probably saw someone else use [A-z] in a regex and assumed it was a legitimate idiom. And by copying it in your answer you're effectively confirming that assumption. I know it's a trivial error that will almost never cause problems in actual practice, but that's all the more reason to make a lot of noise about it, so anyone who sees it here learns that it's wrong.
– Alan Moore
Commented Jan 29, 2014 at 9:10

Add a comment |

Niet the Dark Absol · Accepted Answer · 2014-01-28 19:28:34Z

2

Not obvious, but yes:

(?:\d{2}|\d{6})

answered Jan 28, 2014 at 19:28

Niet the Dark Absol

323k83 gold badges469 silver badges596 bronze badges

\d is not guaranteed to be the same as [0-9]. Unicode is wild and wooly!
– Donal Fellows
Commented Jan 28, 2014 at 23:06

Add a comment |

Collectives™ on Stack Overflow

REGEX Repeater "Or" Operator

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
regex
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged regex or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
regex
or ask your own question.