Regular expression for a hexadecimal number?

Question

How do I create a regular expression that detects hexadecimal numbers in a text?

For example, ‘0x0f4’, ‘0acdadecf822eeff32aca5830e438cb54aa722e3’, and ‘8BADF00D’.

Regex doesn't really parse. Try extracting all number-like things and sift out the ones that aren't hexadecimals. — Blender, Commented Feb 10, 2012 at 1:10

Steven Schroeder · Accepted Answer · 2012-02-10 01:10:12Z

293

How about the following?

0[xX][0-9a-fA-F]+

Matches expression starting with a 0, following by either a lower or uppercase x, followed by one or more characters in the ranges 0-9, or a-f, or A-F

answered Feb 10, 2012 at 1:10

Steven Schroeder

6,1442 gold badges22 silver badges15 bronze badges

54

That could be shortified to /0x[\da-f]/i, but otherwise, +1.
– Niklas B.
Commented Feb 10, 2012 at 1:13
31

@NiklasB. Your shorthand is only valid if using perl regex, if using POSIX regex, then Steven's solution is the shortest. Either way, Steven's solution works for both perl and POSIX regex.
– David M. Syzdek
Commented Feb 10, 2012 at 1:39
Got it! Solution by Steven is good if the hex number starts with 0x or 0X. This one should work better: ^[0-9A-F]+$ It can also recognize hex patterns like: '535GH0G73' For Java, we can use e.g String.matches() for checking this.. Thank you guys for the response :)
– saurcery
Commented Feb 10, 2012 at 2:23
2

'0x[\da-f]{2}' propaply better to limit the number too
– Yazan Rawashdeh
Commented Apr 11, 2016 at 12:54
2

Would that match the second and third example numbers, 0acdadecf822eeff32aca5830e438cb54aa722e3 and 8BADF00D?
– Peter Mortensen
Commented Aug 11, 2016 at 12:34

| Show 2 more comments

SimonMayer · Accepted Answer · 2012-02-10 01:40:45Z

63

The exact syntax depends on your exact requirements and programming language, but basically:

/[0-9a-fA-F]+/

or more simply, i makes it case-insensitive.

/[0-9a-f]+/i

If you are lucky enough to be using Ruby, you can do:

/\h+/

EDIT - Steven Schroeder's answer made me realise my understanding of the 0x bit was wrong, so I've updated my suggestions accordingly. If you also want to match 0x, the equivalents are

/0[xX][0-9a-fA-F]+/
/0x[0-9a-f]+/i
/0x[\h]+/i

ADDED MORE - If 0x needs to be optional (as the question implies):

/(0x)?[0-9a-f]+/i

edited Feb 10, 2012 at 1:40

answered Feb 10, 2012 at 1:11

SimonMayer

4,8425 gold badges34 silver badges45 bronze badges

can you explain me the reason for above RE?
– saurcery
Commented Feb 10, 2012 at 1:16
4

@noobDroid What specifically would you like me to explain?
– SimonMayer
Commented Feb 10, 2012 at 1:19

Add a comment |

smathy · Accepted Answer · 2014-02-26 21:13:54Z

30

Not a big deal, but most regex engines support the POSIX character classes, and there's [:xdigit:] for matching hex characters, which is simpler than the common 0-9a-fA-F stuff.

So, the regex as requested (ie. with optional 0x) is: /(0x)?[[:xdigit:]]+/

answered Feb 26, 2014 at 21:13

smathy

27.7k5 gold badges48 silver badges70 bronze badges

Not powershell or .net unfortunately.
– js2010
Commented May 8, 2021 at 14:07

Add a comment |

Adaddinsane · Accepted Answer · 2014-09-08 13:00:51Z

18

It's worth mentioning that detecting an MD5 (which is one of the examples) can be done with:

[0-9a-fA-F]{32}

answered Sep 8, 2014 at 13:00

Adaddinsane

5356 silver badges11 bronze badges

1

1+ for simplicity
– Gray Programmerz
Commented Jul 30, 2022 at 9:36
This could be a problem if there are more than 32 hex digits, you should include some boundary check.
– hochl
Commented Dec 7, 2023 at 10:15

Add a comment |

Pawel Furmaniak · Accepted Answer · 2013-07-01 13:35:04Z

12

This will match with or without 0x prefix

(?:0[xX])?[0-9a-fA-F]+

answered Jul 1, 2013 at 13:35

Pawel Furmaniak

4,7663 gold badges31 silver badges34 bronze badges

Add a comment |

joachim · Accepted Answer · 2015-08-12 14:41:53Z

8

If you're using Perl or PHP, you can replace

[0-9a-fA-F]

with:

[[:xdigit:]]

answered Aug 12, 2015 at 14:41

joachim

30.1k13 gold badges43 silver badges47 bronze badges

This ought to be a self-contained answer.
– Peter Mortensen
Commented Aug 15, 2016 at 14:10
Already covered by this answer from 2014.
– snakecharmerb
Commented Oct 10, 2021 at 10:16

Add a comment |

Rimian · Accepted Answer · 2017-09-15 11:19:24Z

6

Just for the record I would specify the following:

/^[xX]?[0-9a-fA-F]{6}$/

Which differs in that it checks that it has to contain the six valid characters and on lowercase or uppercase x in case we have one.

edited Sep 15, 2017 at 11:19

Rimian

37.9k17 gold badges121 silver badges118 bronze badges

answered Jul 6, 2016 at 21:16

batspy

3955 silver badges7 bronze badges

Add a comment |

Tommy Vasquez · Accepted Answer · 2020-06-20 15:18:29Z

5

Another example: Hexadecimal values for css colors start with a pound sign, or hash (#), then six characters that can either be a numeral or a letter between A and F, inclusive.

^#[0-9a-fA-F]{6}

answered Jun 20, 2020 at 15:18

Tommy Vasquez

511 silver badge2 bronze badges

Add a comment |

Fábio Borges · Accepted Answer · 2017-12-07 00:24:24Z

4

If you are looking for an specific hex character in the middle of the string, you can use "\xhh" where hh is the character in hexadecimal. I've tried and it works. I use framework for C++ Qt but it can solve problems in other cases, depends on the flavor you need to use (php, javascript, python , golang, etc.).

This answer was taken from:http://ult-tex.net/info/perl/

edited Dec 7, 2017 at 0:24

answered Jan 2, 2017 at 15:29

Fábio Borges

414 bronze badges

Hey! While this might be true for perl, it doesn't seem to be the case for Regular Expressions in all programming languages. According to this \x is the equivalent to \u in other languages.
– Maurice
Commented Jan 2, 2017 at 16:07

Add a comment |

Local Needs · Accepted Answer · 2014-09-18 17:53:17Z

1

This one makes sure you have no more than three valid pairs:

(([a-fA-F]|[0-9]){2}){3}

Any more or less than three pairs of valid characters fail to match.

answered Sep 18, 2014 at 17:53

Local Needs

5593 gold badges6 silver badges20 bronze badges

Add a comment |

Sven · Accepted Answer · 2021-10-10 09:40:20Z

1

In Java this is allowed:

(?:0x?)?[\p{XDigit}]+$

As you see the 0x is optional (even the x is optional) in a non-capturing group.

edited Oct 10, 2021 at 9:40

answered Oct 10, 2021 at 7:26

Sven

2,5031 gold badge21 silver badges31 bronze badges

Add a comment |

IndigoLily · Accepted Answer · 2023-05-17 22:42:32Z

If your regex engine has Extended Unicode Support, you can match a character that has the Hex_Digit property with \p{Hex_Digit}. Therefore, to match a hex number optionally prefixed with 0x, the regex would be (0x)?\p{Hex_Digit}+.

However, as @d512 points out in their comment on another answer, this is still a bit naïve, and will also match hex numbers concatenated with non-hex strings. To avoid this, surround the expression with word boundary anchors like so: \b(0x)?\p{Hex_Digit}+\b.

You can see this in action here. Unfortunately, it appears JavaScript doesn't properly support fullwidth characters together with word boundaries, but Rust's main regex crate, and Python with the regex module, do.

Paul Razvan Berg · Accepted Answer · 2019-11-24 23:12:46Z

0

In case you need this within an input where the user can type 0 and 0x too but not a hex number without the 0x prefix:

^0?[xX]?[0-9a-fA-F]*$

edited Nov 24, 2019 at 23:12

answered Nov 24, 2019 at 22:43

Paul Razvan Berg

20.2k12 gold badges85 silver badges125 bronze badges

With this solution you allow x without the leading 0. You should group them, to make sure you don't get the x without a 0. So in the example it should be ^(?:0[xX]?)?[0-9a-fA-F]*$.
– Sven
Commented Oct 10, 2021 at 9:40

Add a comment |

Michał Kawiecki · Accepted Answer · 2022-07-11 18:51:18Z

first, instead of ^ and $ use \b as this is a word delimiter and can help when the hash is not the only string in the line.

i came here looking for similar but specialized regex and came up with this:

\b(\d+[a-f]+\d+[\da-f]*|[a-f]+\d+[a-f]+[\da-f]*)\b

I needed to detect hashes like git commit identifiers (and similar) in console and more then matching all possible hashes i prioritize NOT matching random words or numbers like EB or 12345678

So a heuristic approach i made is that I assume a hash will be alternating between numbers and letters reasonably often and the chains of only numbers or only letters will be short.

Another important fact is that MD5 hash is 32 characters long (as mentioned by @Adaddinsane) and git displays a shortened version with only 10 characters, so above example can be modified as follows:

for 10-char long hashes i assume the groups will be at most 3-char long

\b(\d+[a-f]+\d+[\da-f]{1,7}|[a-f]+\d+[a-f]+[\da-f]{1,7})\b

for up to 32-char long hashes i assume the groups will be at most 5-char long

\b(\d+[a-f]+\d+[\da-f]{17,29}|[a-f]+\d+[a-f]+[\da-f]{17,29})\b

you can easily change a-f to a-fA-F for case insensitivity or add 0[xX] at the front for that 0x prefix matching

those examples will obviously not match exotic but valid hashes that have very long sequences of only numbers or only letters in the front or extreme hashes like only 0s but this way i can match hashes and reduce accident false-positive matches significantly, like dir name or line number

Fabian Röling · Accepted Answer · 2023-07-31 23:57:31Z

I took the idea from this answer to ignore words by introducing more conditions and took it to the extreme until I had created this 1000 character monster:

(?<![\dA-ZÄÖÜẞa-zäöüß])\#?(?#phone numbers)(?:\+[\d ]*)?(?:$\+?[\d ]+$ ?)?(?#AND blocks)(?=(?:(?#first)[a-f]*\d[\da-f]*(?:(?#mid)(?:(?: |\.|\,|\_|\/| ?[\/\-] ?)[\da-f]+)*(?#last)(?: |\.|\,|\_|\/| ?[\/\-] ?)[a-f]*\d[\da-f]*(?#no date+time)(?!\:))?|(?#again for capitals)[A-F]*\d[\dA-F]*(?:(?:(?: |\.|\,|\_|\/| ?[\/\-] ?)[\dA-F]+)*(?: |\.|\,|\_|\/| ?[\/\-] ?)[A-F]*\d[\dA-F]*(?!\:))?)(?#same length)(?![\dA-Fa-f])(?#anchor to end)([^\dA-ZÄÖÜẞa-zäöüß]?.*)$)(?#NAND date)(?!\d{1,4}([\.\_\/\-])\d{1,2}\2\d{1,4}(?#IP)(?!\2?[\da-fA-F]))(?#NAND part date)(?!\d\d([\.\_\/\-])\d{4}(?![\.\_\/\-]?[\da-fA-F]))(?#NAND year+time)(?!\d\d(?:\d\d)? \d{1,2}\:)(?#NAND house+city)(?!\d{1,3}[a-f]? \d{5} [A-Z])(?#AND length>5)[\da-fA-F](?:(?: |\.|\,|\_|\/| ?[\/\-] ?)?[\da-fA-F]){5}(?:(?#1 block)[\da-fA-F]*|(?#mid)[\da-fA-F \.\,\_\/\-]*(?: |\.|\,|\_|\/| ?[\/\-] ?)(?#last)[a-fA-F]*\d[\da-fA-F]*)(?#anchor to same end)(?=\1$)|(?#0x allows more)0x[\da-f]+(?:(?: |\.|\,|\_|\/| ?[\/\-] ?)[\da-f]+)*(?=[^\dA-ZÄÖÜẞa-zäöüß])

My goal was actually slightly different, I wanted to exclude a bunch of annoying unreadable strings from notification texts and TTS. With the explanation below it should hopefully be reasonably easy to adjust it. This regex matches hex numbers, phone numbers, IP addresses and more, it allows grouping stuff like 123 456 789, it specifically excludes stuff like regular words, addresses or dates and it contains basically an AND operator, which doesn't really exist in regex. I don't know if anyone invented this before, I couldn't find anything online.

Some example matches: 0x1, #123ABC, ab1 abc 123, +49 (0)12 / 34 - 56, 127.0.0.1
Some example non-matches: 1Abcde (mixed case), 12345 (need 6+ chars, except with "0x"), x123456x, 2023-01-01 00:00, Street 12a 34567 City, decade
Some potentially unintended matches: 100 1/10 b1 (might technically be a valid house number), Eva-Zilcher-Gasse 1a 1100 Vienna (I focused the address exclusion on Germany), cafe420, 2023-01-01 123456

Explanation of the AND/NAND operator

If you want to match either one of two conditions, that's easy: ([ab]|[bc]) matches a, b and c.
But what if you want to match both conditions? Something like ([ab]&[bc]) that matches only b doesn't exist.
Searching for it online results in lots of people actually meaning "a, then b or b, then a" (so both on the same line), which is not AND.
But it is actually possible: (?=[ab])[bc] matches only b!
This works with a "positive lookahead". That is a "non-capturing" group that just checks if something exists behind the current position, without extending the selection to include it. Left of it in this example is just nothing. Then it checks if there's a or b behind that, but the cursor stays where it is. Then it checks for b or c at the same position. Before this project, I only used lookaheads at the end of a regex, but they work everywhere.
It gets much more complicated if a condition can have a variable length. For example a(?=.{2})[bc]+ will match abbbc, even though the first condition only wants 2 characters. That's because both things exist behind the a, 2 characters and a bunch of bs and cs. It's just not both the same string. To prevent this, you actually have to check whether everything after it is the same string, which anchors the two ends to the same point.
Example: a(?=.{2}(.*)$)[bc]+(?=\1) will only match the abb part of abbbc. Here, .* captures the rest of the line, the $ ensures that it's actually all of it. (?=\1) then looks ahead to see if the rest of the line after the other condition is the same (or rather, it tries to find a spot where that is the case), without including it in the match. (?=) is not needed for the first occurrence, because it's already in a non-capturing group.
In some cases something even more complicated is necessary, because it seems like the regex engine doesn't always like to recalculate capture groups (()) for the backreference (\1) to match, in that case something similar to the end of the last condition might have to be repeated in the first condition (like a(?=.{1}[bc](.*)$)[bc]+(?=\1)). I don't fully understand that yet and I'm unsure whether generalised AND is even possible in all cases because of this, but I managed to make it work in this project, at least. During development, I even found some strange cases where (x|y) matched something, but (y|x) didn't (x and y stand for more complicated expressions here).

Explanation of components

(These explanations assume that you already know the most common regex elements, explaining everything from the start would take way too long.)

(?<![\dA-ZÄÖÜẞa-zäöüß]) checks for something to not include before this, so that it doesn't select something at the end of a word.
\#?: # occurs quite often before case numbers or so.
(?#phone) is a comment. I didn't know before this project that comments were possible, but it's really nice to spend less time looking for the right spot to modify in such a giant expression.
(?:) is a non-capturing group. It acts identically to (), except that it can't be referenced with \1 etc., which is nice, because I would have gone past \9 otherwise and the syntax for higher numbers or named references seems to depend on the platform.
(?:\+[\d ]*)?(?:$\+?[\d ]+$ ?) matches phone numbers, including spaces and potentially one set of brackets and/or a plus before or inside them. Technically it also matches 1(+2), but there's a limit to how much complexity I wanted to implement for increasibly unlikely cases.
Everything marked with (?#AND) or (?#NAND) is a condition for the main part, they all act on the same bit of text.
The first block is [a-f]*\d[\da-f]*, so it must include at least one digit, this must always exist.
(?: |\.|\,|\_|\/| ?[\/\-] ?) is a list of all the possible separators between digits, this occurs a bunch of times in the regex. ␣ is useful for lots of things, . for example for IP addresses, decimals or large numbers in German, , for large numbers or decimals in German, _ for file names with version numbers, / for case numbers, ␣/␣ and ␣-␣ for phone numbers.
(?: |\.|\,|\_|\/| ?[\/\-] ?)[\da-f]+)* means "a separator and then more hex digits, arbitrarily many times". It's inside an optional group (()?), so a single block with no separators also works.
(?: |\.|\,|\_|\/| ?[\/\-] ?)[a-f]*\d[\da-f]* is a mix of the previous two, it's any separator and then a block containing at least one digit. The middle blocks and the last block are together in the optional group, so the last block needs to exist. That means that 1 a 2 can be matched, but not 1 a, because a could theoretically be a word.
(?!\:) checks that there is no : after a multi-block string, to prevent matching e.g. 01-01 00 in the string 2023-01-01 00:00.
The previous five bullet points are then repeated again for capital letters, making sure that lowercase or uppercase is matched, but not a mix of both.
(?![\dA-Fa-f]) makes sure that the maximum length is matched and not some other substring that would work with this, but not another condition, before ([^\dA-ZÄÖÜẞa-zäöüß]?.*)$ selects the entire rest of the line and stores it in \1 for later verification that another condition actually matched the same text.
(?!\d{1,4}([\.\_\/\-])\d{1,2}\2\d{1,4}) is easier compared to the rest, it just checks for three groups of digits with constrained lengths and two of the same separator. This excludes dates in various formats, like 2001-01-01, 1.1.2001 or 01/01/01.
A little addition to that is (?!\2?[\da-fA-F]), which looks if there's another one of that separator and more (hex) digits after it and then includes it in the match again due to the double negative. This is meant for IP addresses or other longer arrangements of number groups.
It's a bit annoying that I had to include (?!\d\d([\.\_\/\-])\d{4}(?![\.\_\/\-]?[\da-fA-F])), which is very similar to the previous condition, but matches just 2 and then 4 digits, not followed by another group, to exclude e.g. a partial date with a word behind. I'm pretty sure there's no way to integrate that into the previous condition, I can't just make the first group of 1-4 digits optional, because then the capturing group for the first separator is not initialised and the backreference doesn't work. Forward-references also don't exist.
Similarly, (?!\d\d(?:\d\d)? \d{1,2}\:) excludes just the day or year and then the hour of a time written directly after it. It's very similar to an earlier date+time-excluding part, but catches slightly different cases.
Normally I would expect people to write addresses like "Street 12a, 34567 City", but they very often don't include the comma, so I wrote a special case for this: (?!\d{1,3}[a-f]? \d{5} [A-Z]). I actually read Wikipedia's article on (German) house numbering for this, but quickly decided to not cover all the madness that's possible in rare cases, because it would exclude way too many intended matches. This part of the regex is also quite German-centric, it doesn't match e.g. Austrian 4-digit postal codes or anything related to USA's numbered roads.
Now the last AND-linked condition: [\da-fA-F](?:(?: |\.|\,|\_|\/| ?[\/\-] ?)?[\da-fA-F]){5} looks simply for exactly 6 hex digits with optional separators, which can then be followed by…
… more hex digits or nothing: [\da-fA-F]*
… or some more blocks, the last of which needs to include a digit: [\da-fA-F \.\,\_\/\-]*(?: |\.|\,|\_|\/| ?[\/\-] ?)[a-fA-F]*\d[\da-fA-F]*
That's quite a bit of repetition of the first condition, as I said in the AND explanation.
And finally for the main part, (?=\1$) anchors this condition to the same end as the first condition, as explained above. It randomly happened that none of the other conditions needed this, they could just look for things at the start of the matched string.
And actually finally, 0x[\da-f]+(?:(?: |\.|\,|\_|\/| ?[\/\-] ?)[\da-f]+)* bypasses all of those rules, because it allows many more cases if the hex digit is prefixed with 0x, which I consider to be a good enough indicator that it's actually intended to be a hex digit, even if most of it is letters. (?=[^\dA-ZÄÖÜẞa-zäöüß]) makes sure that it matches the maximum possible length, but I'm not sure if this is even necessary, because * and + try to find as much as possible anyway. I also learned that | is enough for OR if you want to include everything on either side, you don't actually need (|).

I learned a lot in this project, which took me multiple days to finish. It was a lot of fun problem solving, interrupted by occasional frustration. Now I'll integrate this into my mail notification macro, hopefully that app supports all these fancy regex features…

Collectives™ on Stack Overflow

Regular expression for a hexadecimal number?

15 Answers 15

Not the answer you're looking for? Browse other questions tagged
regex
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

15 Answers 15

Not the answer you're looking for? Browse other questions tagged regex or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
regex
or ask your own question.