1

I am looking to match a regex with either 2 [0-9] repeats (and then some other pattern)

[0-9]{2}[A-z]{4}

OR 6 [0-9] repeats (and then some other pattern)

[0-9]{6}[A-z]{4}

The following is too inclusive:

[0-9]{2,6}[A-z]{4}

QUESTION

Is there a way that I can specify either 2 or 6 repeats?

2
  • 2
    DO NOT USE [A-z] IN A REGEX. To match any ASCII letter, uppercase or lowercase, use [A-Za-z]. [A-z] matches all those, plus several punctuation characters whose code points happen to lie between Z and a.
    – Alan Moore
    Commented Jan 28, 2014 at 21:59
  • @AlanMoore thank you for this. As I mention in a comment below, [A-z] is not part of my regex I'm working with, I just used it as a stand in for the other parts of my lengthy regex so as to not detract from my question. Thank you for the teaching moment, however (no sarcasm, I actually am new-ish to regex and didn't realize this).
    – JSK NS
    Commented Jan 29, 2014 at 13:17

4 Answers 4

5

You can use the or | like this within a non-capturing group:

(?:[0-9]{2}|[0-9]{6})[A-z]{4}

Be aware that using [A-z] doesn't only include lower and upper case letters, but also [, \, ], ^, _, and ' which lie between Z and a in the ASCII code points. Use [A-Za-z] for letters, as pointed out by @AlanMoore in his comment.

4
  • -1 for [A-z]. (See my comment under the question.)
    – Alan Moore
    Commented Jan 28, 2014 at 22:08
  • 1
    @AlanMoore I assumed that OP did this on purpose, but I suppose I could add a note.
    – dee-see
    Commented Jan 28, 2014 at 22:12
  • What's the initial ?: for?
    – Ayush
    Commented Jan 29, 2014 at 8:22
  • @xbonez: CaffGeek's answer explains the ?:. As a rule, it's best to use non-capturing groups whenever you can, saving capturing groups for those times when you actually want to capture something. It makes your regexes slightly more efficient, but the main reason is that it makes it easier to keep track of which capturing group captures what.
    – Alan Moore
    Commented Jan 29, 2014 at 9:24
3

This should work

(?:[0-9]{2}|[0-9]{6})[a-zA-Z]{4}

Do you have some test cases I can verify it with.

  • 12asdf - passes
  • 123456asdf - passes
  • 1234asdf - fails

However, if you don't anchor the start of the regex to a word (\b) or line boundary (^), the 1234asdf will have 34asdf as a partial match.

So either

\b(?:[0-9]{2}|[0-9]{6})[a-zA-Z]{4}

or

^(?:[0-9]{2}|[0-9]{6})[a-zA-Z]{4}

As a quick rundown of the regex changes

  • (?: ) creates a non capturing group
  • | selects between the alteratives [0-9]{2} and [0-9]{6}
  • ^ matches the start of a line
  • $ matches the end of a line
  • \b matches a word boundary
  • [a-zA-Z] is being used instead of [A-z] as it's likely what was intended (all alpha characters, regardless of case)

You can also replace your [0-9]s with a \d which is shorthand for any digit. The best way I can think of to right this, and not get partial matches is as follows

(?:\b|^)(?:\d{2}|\d{6})[a-zA-Z]{4}(?:\b|$)
5
  • -1 for [A-z]. (See my comment under the question.)
    – Alan Moore
    Commented Jan 28, 2014 at 22:06
  • 1
    @AlanMoore, I copied the "some other pattern" from the users question. Who are you to judge the validity of that portion of his regex? Leave a comment for the OP, and us, pointing out the potential err, but downvoting for it is simply rude.
    – CaffGeek
    Commented Jan 28, 2014 at 22:09
  • Who am I? I'm the guy (apparently the only one) who knows [A-z] is always wrong. Maybe I overreacted, but I was a little shocked to see three people who seem to know something about regexes blindly repeating such a blatant, beginner's error.
    – Alan Moore
    Commented Jan 28, 2014 at 22:23
  • 1
    Honestly, I saw it, thought it looked odd, but it wasn't the part of the regex the OP was having an issue with and I didn't give it a second thought. I'll update my answer with [a-zA-Z] as was likely intended.
    – CaffGeek
    Commented Jan 28, 2014 at 22:38
  • @CaffGeek you are correct. [A-z] is not actually part of the regex I'm working with. I just through that in as a sample as the rest of my regex is long and would have side-tracked from my original question. +1 to you for providing a good answer.
    – JSK NS
    Commented Jan 29, 2014 at 13:14
3

The classic way would be:

(?:[0-9]{2}|[0-9]{6})[A-z]{4}

[Literally as [0-9]{2} OR [0-9]{6}]

But you can also use this one, which should be a little more efficient than the above with less potential backtracking:

[0-9]{2}(?:[0-9]{4})?[A-z]{4}

[Here, [0-9]{2} then potential other 4 [0-9] which makes a total of 6 [0-9] in the required conditions]


You might not be aware that [A-z] matches letters and some other characters, but it actually does.

The range [A-z] effectively is equivalent to:

[A-Z\[\\\]^_`a-z]

Notice that the additional characters that match are:

[ \ ] ^ _ `

[spaces included voluntarily for separation but is not part of the characters]

This is because those characters are between the block letters and lowercase letters in the unicode table.

4
  • Of course, depending on your language and the function you use, you might need anchors to enforce the application of the regex on the full string to be tested or not.
    – Jerry
    Commented Jan 28, 2014 at 19:34
  • -1 for [A-z]. (See my comment under the question.)
    – Alan Moore
    Commented Jan 28, 2014 at 22:07
  • @AlanMoore I merely addressed the actual issue the OP was facing and assumed that they knew what [A-z] actually matches (in many cases, OP is right and has their reasons, in others, OP is wrong and not aware of it...). If you insist on being nit picky, that's not a problem, I can add it to my answer.
    – Jerry
    Commented Jan 29, 2014 at 8:01
  • Given the basic level of the question, I think it's safe to assume the OP isn't aware of the underlying issues. He probably saw someone else use [A-z] in a regex and assumed it was a legitimate idiom. And by copying it in your answer you're effectively confirming that assumption. I know it's a trivial error that will almost never cause problems in actual practice, but that's all the more reason to make a lot of noise about it, so anyone who sees it here learns that it's wrong.
    – Alan Moore
    Commented Jan 29, 2014 at 9:10
2

Not obvious, but yes:

(?:\d{2}|\d{6})
1
  • \d is not guaranteed to be the same as [0-9]. Unicode is wild and wooly! Commented Jan 28, 2014 at 23:06

Not the answer you're looking for? Browse other questions tagged or ask your own question.