95

Is there a way in Python to access match groups without explicitly creating a match object (or another way to beautify the example below)?

Here is an example to clarify my motivation for the question:

Following Perl code

if    ($statement =~ /I love (\w+)/) {
  print "He loves $1\n";
}
elsif ($statement =~ /Ich liebe (\w+)/) {
  print "Er liebt $1\n";
}
elsif ($statement =~ /Je t\'aime (\w+)/) {
  print "Il aime $1\n";
}

translated into Python

m = re.search("I love (\w+)", statement)
if m:
  print "He loves",m.group(1)
else:
  m = re.search("Ich liebe (\w+)", statement)
  if m:
    print "Er liebt",m.group(1)
  else:
    m = re.search("Je t'aime (\w+)", statement)
    if m:
      print "Il aime",m.group(1)

looks very awkward (if-else-cascade, match object creation).

5
  • 1
    Duplicate: stackoverflow.com/questions/122277/…
    – S.Lott
    Commented Mar 31, 2010 at 17:12
  • 3
    Caveat: Python re.match() specifically matches against the beginning of the target. Thus re.match("I love (\w+)", "Oh! How I love thee") would NOT match. You either want to use re.search() or explicitly prefix the regex with appropriate wildcard patterns for re.match(".* I love (\w+)", ...)
    – Jim Dennis
    Commented Mar 31, 2010 at 17:32
  • @Jim Dennis: thanks to point out; I adapted the python example accordingly
    – Curd
    Commented Mar 31, 2010 at 19:11
  • @S.Lott: oops, you are right. I didn't see, though I was looking for before posting; nevertheless there are valuable new answers here
    – Curd
    Commented Mar 31, 2010 at 19:18
  • 3
    Possible duplicate of How do you translate this regular-expression idiom from Perl into Python?
    – Brian H.
    Commented Sep 6, 2017 at 13:36

5 Answers 5

76

You could create a little class that returns the boolean result of calling match, and retains the matched groups for subsequent retrieval:

import re

class REMatcher(object):
    def __init__(self, matchstring):
        self.matchstring = matchstring

    def match(self,regexp):
        self.rematch = re.match(regexp, self.matchstring)
        return bool(self.rematch)

    def group(self,i):
        return self.rematch.group(i)


for statement in ("I love Mary", 
                  "Ich liebe Margot", 
                  "Je t'aime Marie", 
                  "Te amo Maria"):

    m = REMatcher(statement)

    if m.match(r"I love (\w+)"): 
        print "He loves",m.group(1) 

    elif m.match(r"Ich liebe (\w+)"):
        print "Er liebt",m.group(1) 

    elif m.match(r"Je t'aime (\w+)"):
        print "Il aime",m.group(1) 

    else: 
        print "???"

Update for Python 3 print as a function, and Python 3.8 assignment expressions - no need for a REMatcher class now:

import re

for statement in ("I love Mary",
                  "Ich liebe Margot",
                  "Je t'aime Marie",
                  "Te amo Maria"):

    if m := re.match(r"I love (\w+)", statement):
        print("He loves", m.group(1))

    elif m := re.match(r"Ich liebe (\w+)", statement):
        print("Er liebt", m.group(1))

    elif m := re.match(r"Je t'aime (\w+)", statement):
        print("Il aime", m.group(1))

    else:
        print()
10
  • 1
    It might be verbose, but you'll put the REMatcher class in a nice module which you'll import whenever needed. You wouldn't ask this question for an issue that won't come up again in the future, would you?
    – tzot
    Commented Mar 31, 2010 at 22:10
  • 4
    @ΤΖΩΤΖΙΟΥ: I agree; but, why isn't such a class in module re yet?
    – Curd
    Commented Apr 1, 2010 at 8:34
  • @Curd: because you're the one to bring it up. Thousands of other submitters to the Python code base have lived fine without it, so why should there be such a class in the re module? In any case, if you think such functionality belongs to the re module, you're most than welcome to supply a patch. Otherwise, please refrain from asking "why aren't things like I think they should be?" questions, because they are non-productive.
    – tzot
    Commented Apr 1, 2010 at 12:49
  • 18
    @ΤΖΩΤΖΙΟΥ: I disagree. Beeing satisied by the fact that "thousands of others" didn't consider to introduce it is just silly. How can I be sure that there is no good reason not to have such a class if I don't ask "Why"? I don't see one, but maybe somebody else does and can explain it (and thus give a better insight into the philosophy of Python). Here is an good example that such questions are productive: stackoverflow.com/questions/837265/…
    – Curd
    Commented Apr 4, 2010 at 20:54
  • 1
    “Why” questions are generally productive, but your question falls in the subcategory “Why not how I like” (emphasis on “how I like”), which cannot be answered. You consider that such a function/class would be most useful, and then ask why others haven't acted upon it. For a change to occur, the motivated (here: you) has to justify the change to the rest of the community (here: the Python community). It's quite self-centered and non-productive to ask the community why your desired change hasn't already been introduced.
    – tzot
    Commented Apr 5, 2010 at 8:30
33

Less efficient, but simpler-looking:

m0 = re.match("I love (\w+)", statement)
m1 = re.match("Ich liebe (\w+)", statement)
m2 = re.match("Je t'aime (\w+)", statement)
if m0:
  print("He loves", m0.group(1))
elif m1:
  print("Er liebt", m1.group(1))
elif m2:
  print("Il aime", m2.group(1))

The problem with the Perl stuff is the implicit updating of some hidden variable. That's simply hard to achieve in Python because you need to have an assignment statement to actually update any variables.

The version with less repetition (and better efficiency) is this:

pats = [
    ("I love (\w+)", "He Loves {0}" ),
    ("Ich liebe (\w+)", "Er Liebe {0}" ),
    ("Je t'aime (\w+)", "Il aime {0}")
 ]
for p1, p3 in pats:
    m = re.match(p1, statement)
    if m:
        print(p3.format(m.group(1)))
        break

A minor variation that some Perl folk prefer:

pats = {
    "I love (\w+)" : "He Loves {0}",
    "Ich liebe (\w+)" : "Er Liebe {0}",
    "Je t'aime (\w+)" : "Il aime {0}",
}
for p1 in pats:
    m = re.match(p1, statement)
    if m:
        print(pats[p1].format(m.group(1)))
        break

This is hardly worth mentioning except it does come up sometimes from Perl programmers.

2
  • 4
    @ S.Lott: ok, your solution avoids the if-else-cascade, but at the expenses of doing unneccessary matches (m1 and m2 is not needed if m0 matches); thats why I am not really satisfied with this solution.
    – Curd
    Commented Mar 31, 2010 at 15:34
  • If the key order in your last variation is significant, be sure to tell the OP to use an OrderedDict.
    – PaulMcG
    Commented May 18, 2014 at 19:43
24

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), we can now capture the condition value re.search(pattern, statement) in a variable (let's all it match) in order to both check if it's not None and then re-use it within the body of the condition:

if match := re.search('I love (\w+)', statement):
  print(f'He loves {match.group(1)}')
elif match := re.search("Ich liebe (\w+)", statement):
  print(f'Er liebt {match.group(1)}')
elif match := re.search("Je t'aime (\w+)", statement):
  print(f'Il aime {match.group(1)}')
2

this is not a regex solution.

alist={"I love ":""He loves"","Je t'aime ":"Il aime","Ich liebe ":"Er liebt"}
for k in alist.keys():
    if k in statement:
       print alist[k],statement.split(k)[1:]
1

You could create a helper function:

def re_match_group(pattern, str, out_groups):
    del out_groups[:]
    result = re.match(pattern, str)
    if result:
        out_groups[:len(result.groups())] = result.groups()
    return result

And then use it like this:

groups = []
if re_match_group("I love (\w+)", statement, groups):
    print "He loves", groups[0]
elif re_match_group("Ich liebe (\w+)", statement, groups):
    print "Er liebt", groups[0]
elif re_match_group("Je t'aime (\w+)", statement, groups):
    print "Il aime", groups[0]

It's a little clunky, but it gets the job done.

Not the answer you're looking for? Browse other questions tagged or ask your own question.