28

In other words, I have a string like:

"anything, escaped double-quotes: \", yep" anything here NOT to be matched.

How do I match everything inside the quotes?

I'm thinking

^"((?<!\\)[^"]+)"

But my head spins, should that be a positive or a negative lookbehind? Or does it work at all?

How do I match any characters except a double-quote NOT preceded by a backslash?

3 Answers 3

56

No lookbehind necessary:

"(\\"|[^"])*"

So: match quotes, and inside them: either an escaped quote (\\") or any character except a quote ([^"]), arbitrarily many times (*).

8
  • 5
    As chaos mentioned, you probably also want to handle double-backslashes separately (although that wasn't specified by the OP). Commented Aug 29, 2009 at 19:08
  • Hah, here I go again, over-complicating the problem. Didn't think of such a simple solution at all, thanks!
    – Core Xii
    Commented Aug 29, 2009 at 19:11
  • 1
    I'd probably use '\\.' to allow the backslash to escape any single following character, which prevents the regex from being confused by backslash, backslash, (close) double quote. Clearly, you need a more complex expression in place of the dot if you want to handle octal or hex escapes, or Unicode escapes, or ... Commented Aug 29, 2009 at 21:03
  • I only need to escape " and \. Thanks for your help.
    – Core Xii
    Commented Aug 29, 2009 at 23:22
  • 9
    On ruby it only works in the inverse order: "(\\"|[^"])*"
    – fotanus
    Commented Dec 2, 2014 at 12:42
5

"Not preceded by" translates directly to "negative lookbehind", so you'd want (?<!\\)".

Though here's a question that may ruin your day: what about the string "foo\\"? That is, a double-quote preceded by two backslashes, where in most escaping syntaxes we would be wanting to negate the special meaning of the second backslash by preceding it with the first.

That sort of thing is kind of why regexes aren't a substitute for parsers.

4
  • I’m pretty sure though that a negative lookbehind is more expensive than my solution which uses a negative character class and an alternation. That’s a trivial case for regex engines. Commented Aug 29, 2009 at 19:04
  • What about this? ^"([^"]|(?<!\\)\\")"
    – Core Xii
    Commented Aug 29, 2009 at 19:15
  • Great. Now three backslashes. :)
    – chaos
    Commented Aug 29, 2009 at 19:27
  • 1
    But that's already valid: "foo\\\" The first double-backslash escapes to `, leaving the \"` at the end invalid. In the middle of the string: "foo\\\"bar" it again parses correctly, doesn't it?
    – Core Xii
    Commented Aug 29, 2009 at 20:38
0

Here's a variation which permits backslash + anything pairs generally, and then disallows backslashes which are not part of this construct.

^"([^\"]+|\\.)*"

In many regex dialects, you only need a single backslash inside a character class; but what exactly works depends on the regex dialect and on the host language.

Not the answer you're looking for? Browse other questions tagged or ask your own question.