29

What is the default operator precedence in Oracle's regular expressions when they don't contain parentheses?

For example, given

 H|ha+

would it be evaluated as H|h and then concatenated to a as in ((H|h)a), or would the H be alternated with ha as in (H|(ha))?

Also, when does the + kick in, etc.?

0

2 Answers 2

24

Using capturing groups to demonstrate the order of evaluation, the regex H|ha+ is equivalent to the following:

(H|(h(a+)))

This is because the precedence rules (as seen below) are applied in order from the highest precedence (the lowest numbered) one to the lowest precedence (the highest numbered) one:

  • Rule 5 → (a+) The + is grouped with the a because this operator works on the preceding single character, back-reference, group (a "marked sub-expression" in Oracle parlance), or bracket expression (character class).

  • Rule 6 → (h(a+)) The h is then concatenated with the group in the preceding step.

  • Rule 8 → (H|(h(a+))) The H is then alternated with the group in the preceding step.



Precedence table from section 9.4.8 of the POSIX docs for regular expressions (there doesn't seem to be an official Oracle table):

+---+----------------------------------------------------------+
|   |             ERE Precedence (from high to low)            |
+---+----------------------------------------------------------+
| 1 | Collation-related bracket symbols | [==] [::] [..]       |
| 2 | Escaped characters                | \<special character> |
| 3 | Bracket expression                | []                   |
| 4 | Grouping                          | ()                   |
| 5 | Single-character-ERE duplication  | * + ? {m,n}          |
| 6 | Concatenation                     |                      |
| 7 | Anchoring                         | ^ $                  |
| 8 | Alternation                       | |                    |
+---+-----------------------------------+----------------------+

The table above is for Extended Regular Expressions. For Basic Regular Expressions see 9.3.7.

3
  • Does Java regex support collation symbols? Commented Oct 11, 2018 at 3:21
  • @VictorGrazi Collation symbols are defined in the POSIX standard, but unfortunately they are not widely supported. The Java regex package implements a "Perl-like" regex engine, missing a few features (e.g. conditional expressions and comments), but including some extra ones (e.g. possessive quantifiers and variable-length, but finite, look-behind assertions). Neither Perl nor Java supports collating sequences or character equivalences. However,Perl does support "proper" POSIX character classes, although Java only supports them using the \p operator (with a few caveats).
    – robinCTS
    Commented Oct 21, 2018 at 2:59
  • Where is the place for back referencing if we include it in the ERE precedence table?
    – Peng
    Commented Apr 13, 2020 at 13:34
17

Given the Oracle doc:

Table 4-2 lists the list of metacharacters supported for use in regular expressions passed to SQL regular expression functions and conditions. These metacharacters conform to the POSIX standard; any differences in behavior from the standard are noted in the "Description" column.

And taking a look at the | value in that table:

The expression a|b matches character a or character b.

Plus taking a look at the POSIX doc:

Operator precedence The order of precedence for of operators is as follows:

  1. Collation-related bracket symbols [==] [::] [..]

  2. Escaped characters \

  3. Character set (bracket expression) []

  4. Grouping ()

  5. Single-character-ERE duplication * + ? {m,n}

  6. Concatenation

  7. Anchoring ^$

  8. Alternation |

I would say that H|ha+ would be the same as (?:H|ha+).

11
  • ?: in (?: ...) is not Oracle syntax.
    – user5683823
    Commented Apr 26, 2016 at 17:20
  • 2
    @mathguy I agree but I wanted to show that it's not the same than (H|ha+) because there were no capturing group Commented Apr 26, 2016 at 17:26
  • 2
    As Thomas has shown, alternation (the | operator) has the lowest precedence. A regexp search with H|ha+ will try with H first, and only if it can't find a match (after trying all the combinations indicated by other operators), as a last resort, it will try again with ha+ instead of H. The + is a unary operator, it attaches to a only and greediness applies.
    – user5683823
    Commented Apr 26, 2016 at 17:35
  • Interesting question regarding precedence: if you have two "alternation" groups, in which order are they evaluated? Answer - the first choice in the first alternation is tried with all the choices in the second alternation, and if that doesn't find any match, then the second choice in the first alternation is tried. select regexp_substr('adibcd', '(a|b)(c|d)') from dual; returns 'ad' (the other choice would have been 'bc', but after checking for ac, the regexp checks for ad before trying bc).
    – user5683823
    Commented Apr 26, 2016 at 17:40
  • 1
    @ThomasAyoub What would be the precedence of the dot operator (".")? Same as the other wildcard operators? I can't seem to find the answer to that anywhere.
    – Venom
    Commented Jul 21, 2017 at 12:46

Not the answer you're looking for? Browse other questions tagged or ask your own question.