0

I am trying to match source when time is between 12 PM - 4 PM.

st = "error 1/23/2020 11:53:41 PM Microsoft-Windows-DistributedCOM 10010 None The server {4AA0A5C4-1B9B-4F2E-99D7-99C6AEC83474} did not register with DCOM within the required timeout."

".+/2020 (?<=12:\d\d:\d\d PM)|(?<=[1-3]:\d\d:\d\d PM) (.+?) \d+"

Why is the above regex capturing Microsoft-Windows-DistributedCOM given that time is 11:53:41 PM. Isn't it suppose to ignore it ?

1
  • 1
    "1:53:41" is a part of "11:53:41", and you don't check that there's a space before it.
    – Grismar
    Commented Jun 15, 2022 at 3:25

1 Answer 1

1

The problem with your pattern is that (?<=[1-3]:\d\d:\d\\d PM) can actually assert on the second digit in 11:53:41, and satisfy the assertion that way. I suggest using re.findall here and doing away with the lookbehind, which are clumsy and error prone.

st = "error 1/23/2020 11:53:41 PM Microsoft-Windows-DistributedCOM 10010 None The server {4AA0A5C4-1B9B-4F2E-99D7-99C6AEC83474} did not register with DCOM within the required timeout."
matches = re.findall(r'\d+/\d+/2020 (?:[1-3]|12):\d{2}:\d{2} PM (.+?)\d+', st)
print(matches)  # []

Note that if the timestamp were 1/23/2020 12:53:41 PM, then the match would be Microsoft-Windows-DistributedCOM.

Not the answer you're looking for? Browse other questions tagged or ask your own question.