188

I am new to regular expressions and have been given the following regular expression:

(\p{L}|\p{N}|_|-|\.)*

I know what * means and | means "or" and that \ escapes.

But what I don't know what \p{L} and \p{N} means. I have searched Google for it, without result...

Can someone help me?

0

2 Answers 2

275

\p{L} matches a single code point in the category "letter".
\p{N} matches any kind of numeric character in any script.

Source: regular-expressions.info

If you're going to work with regular expressions a lot, I'd suggest bookmarking that site, it's very useful.

10
  • thx for the fast answer :). But shouldnt the regex then match 10? I have tried an online regex matcher: regexpal.com
    – Diemauerdk
    Commented Feb 15, 2013 at 9:10
  • @user1093774: I don't think regexpal supports \p{}, but yes, it should match.
    – Cerbrus
    Commented Feb 15, 2013 at 9:12
  • 1
    This syntax is specific for modern Unicode regex implementation, which not all interpreters recognize. You can safely replace \p{L} by {a-zA-Z} (ascii notation) or {\w} (perl/vim notation); and \p{N} by {0-9} (ascii) or {\d} (perl/vim). If you want to match all of them, just do: {a-zA-Z0-9}+ or {\w\d}+ Commented Aug 18, 2015 at 2:46
  • 53
    Rafael, I dont' agree that you can safely replace \p{L} by {a-zA-Z}. {a-zA-Z}, for example, will not match any accented character, such as é, which is used all over in French. So these are only safely replaceable if you are sure that you will only be processing english, and nothing else.
    – Rolf
    Commented Nov 8, 2017 at 12:19
  • Does it match code point or code unit? stackoverflow.com/a/27331885/4928642
    – Qwertiy
    Commented Nov 7, 2018 at 15:39
51

These are Unicode property shortcuts (\p{L} for Unicode letters, \p{N} for Unicode digits). They are supported by .NET, Perl, Java, PCRE, XML, XPath, JGSoft, Ruby (1.9 and higher) and PHP (since 5.1.0)

At any rate, that's a very strange regex. You should not be using alternation when a character class would suffice:

[\p{L}\p{N}_.-]*
6
  • its regex in xml - i have not constrcuted the regex myself :)
    – Diemauerdk
    Commented Feb 15, 2013 at 9:13
  • Apart from the fact that capturing parentheses were used, the REs will actually compile to the same thing (well, in any optimizing RE engine that supports the \p{…} escape sequence style in the first place). Commented Feb 15, 2013 at 9:34
  • that looks like XRegExp unicode plugin. which if so, would be any alpha-numeric in any language
    – Tim
    Commented Oct 30, 2015 at 19:10
  • Thanks, listing supporting languages was useful, unaware there were limitations there (most regex'y things being "universal"). Commented Jul 19, 2018 at 20:42
  • @HoldOffHunger: Far from it, unfortunately. That's why there is a market for tools like RegexBuddy. Take a look at regular-expressions.info/refbasic.html, you'll be amazed at the subtle and not-so-subtle differences between regex flavors... Commented Jul 20, 2018 at 6:23

Not the answer you're looking for? Browse other questions tagged or ask your own question.