2

I have a string in format @@@substring1@@@substring2, that comes from a black-box.

substring1 could be empty or not, substring2 is always non-empty. @@@ is a delimiter and I could change it via black-box settings. substring1 and substring2 never contain @@@ inside of them.

I need to get the first substring from this string, e.g. from @@@substring1@@@substring2 I need to get substring1, from @@@@@@substring2 I need to get substring2.

My black-box allows to process the string with RE2 regex. I can't use external stuff like cut, sed, awk etc. Is it possible to do that with regex only?

My thoughts are as follows:

regex @@@([^@]+)

  • will produce 1 match with 1 group @@@@@@substring2 - that is what I need
  • will produce 2 matches with 1 group each for @@@substring1@@@substring2 - that is not what I need, I need only 1 match

Lookahead / lookbehind assertions (?=re), (?!re), (?<=re), (?<!re) and \K syntax are not supported in RE2 regex.

5
  • 1
    Right, there is no lookahead support in RE2, but it is not quite clear yet: can there be more substrings? If yes, then something like ^.*?(@@@[^@]*)?(@@@[^@]*)$ can work to get one or two captures at the end of the string. Can you use code to parse the regex match result? If yes, Something like (?:@@@[^@]*)+$ can work, you will just need to remove the first @@@ and then split with @@@. Commented Sep 18, 2023 at 7:53
  • ^.*?(@@@[^@]*)?(@@@[^@]*)$ matches the whole string @@@substring1@@@substring2 - regex101.com/r/HtmcNK/1
    – AntonioK
    Commented Sep 18, 2023 at 8:06
  • There are always 1 (@@@@@@substring2) or 2 (@@@substring1@@@substring2) substrings, cannot be more than 2. No, I can't parse matches with additional "after-regex" logic.
    – AntonioK
    Commented Sep 18, 2023 at 8:08
  • 1
    It is not important what is matched, it is important what is captured. And if you have no access to code, you are stuck, this problem has no plain regex solution with RE2. Commented Sep 18, 2023 at 8:12
  • Yes, you are right, whole-string matching is okay here. The problem is that "substring1" would be in capturing-group 1, and "substring2" would be in capturing-group 2, even if "substring1" is empty. So I cannot just "perform regex and use ${1} as a result".
    – AntonioK
    Commented Sep 18, 2023 at 10:31

3 Answers 3

1

Match the trailing delimiters as well so that substring2 would not be able to match if substring1 matched:

@@@           # Match triple '@'
([^@]+)       # followed by a non-empty sequence of non-'@' character, which we capture,
(?:@@@|$)     # then another triple '@' or the end of string.

Try it on regex101.com.

This, of course, relies on a capturing group. If you cannot use capturing groups, then there is no answer.

Also, just for fun, here's a PCRE solution:

^                      # Match at the start of the string
(?(?=@@@(.+?)@@@.+)    #                     if it exists
  @@@\1                # the first substring
|                      # or
  @{6}\K.+             # the second substring (preceded by 6 '@' which we forfeit).
)                      #

Try it on regex101.com.

...and an extension of the first regex above which accepts substrings containing no more than three consecutive @ (see my explanation for the middle expression here):

^(?:@@@)?@@@
((?:@(?:@(?:[^@]|$)|[^@]|$)|[^\n@])+)
(?:@@@|$)

Try it on regex101.com.

2
  • 1
    Unfortunately ?= syntax is not supported in ER2: "(?=re) before text matching re (NOT SUPPORTED)" github.com/google/re2/wiki/Syntax . But the first example helped me to find a working solution for my problem. I will post it separately. Thank you!
    – AntonioK
    Commented Sep 18, 2023 at 9:46
  • @AntonioK I said the second is for PCRE. I know it doesn't work.
    – InSync
    Commented Sep 18, 2023 at 20:19
1

"... I need to get the first substring from this string, e.g. from @@@substring1@@@substring2 I need to get substring1, from from @@@@@@substring2 I need to get substring2. ...

... Is it possible to do that with regex only?"

Yes, you can use the following pattern.

@{3,6}(.+?)(?:@|$)

Yours is correct also, you just need to define when to stop the capture.

@@@([^@]+?)(?:@|$)
0

Working RE2-flavored solution based on @InSync answer:

(?:^@@@|^)@@@([^@]+).*$

  • for @@@substring1@@@substring2 it matches the whole string with just one capturing group ${1} containing substring1
  • for @@@@@@substring2 it matches the whole string with just one capturing group ${1} containing substring2
1

Not the answer you're looking for? Browse other questions tagged or ask your own question.