1

I am trying to setup a good log analyzer for our tomcat application. I was able to create a basic parser and analyzer with python regex and panda stats feature.

The parser basically parses the timestamp, log level, thread class, thread name and error part.

However the error part is not uniform and doesn't follow a specific pattern. Even with ignoring stack trace and parsing only the main error part, it still doesn't have a specific pattern, due to the use of plugins from different vendors and each follow different rules to show the errors.

One thing that can be done is to extensively identify and group errors manually and create a reference sub parsing rule file. We already did that by using a reference xml (based on one provided by Vendor to identify known errors). But it needs a lot of additional efforts to add new rules for unknown errors

I am thinking if we can parse manually, is it possible to do that with parser alone and without a reference sheet.

Example:

logAncestorsTableFailure Detected ancestors table corruption for pageId: 715588532. Access to this page is blocked for all users as inherited permissions cannot be determined. To resolve this, rebuild the ancestors table. See https://confluence.atlassian.com/display/DOC/Rebuilding+the+Ancestor+Table
logAncestorsTableFailure Detected ancestors table corruption for pageId: 685814402. Access to this page is blocked for all users as inherited permissions cannot be determined. To resolve this, rebuild the ancestors table. See https://confluence.atlassian.com/display/DOC/Rebuilding+the+Ancestor+Table

To identify above kind of error, I could create a reference rule for it: logAncestorsTableFailure Detected ancestors table corruption for pageId

There are 2 problems with this approach:

  1. We don't know if the remaining part of the error message is similar or different. Hence in this method, we may miss important errors that were not identified before.
  2. As discussed before, it requires initial efforts to identify all possible error messages and all patterns.

Hence is it possible to parse it without using such reference or we need to step into AI for such things?

If I look the error myself, I could spot the pattern. Is that human intelligence that is doing the job there?

In other words, can some sort of aggregator be used to automatically identify similar patterns?

1 Answer 1

1

Is it theriotically possible to create a log parser/analyzer with the quality of manual parsing without use of an AI?

Absolutely. All you have to do is grab a copy of the latest edition Cambridge Grammar of the English Language and implement all the rules as described in its 1,860 pages. You'll need a decent dictionary of English words as well - a complete database thereof which may consume a lot of space.

In all serious, you'll have to simplify the task, and that means periodically going through logs, identifying keywords and sequences that are meaningful to you, and continuing to refine your filtering or sorting algorithm.

1
  • Yes that will be most accurate. Thinking of it now, I have one idea for parsing errors. Usually similar errors has only one thing different - URL or some numeric value. I am thinking maybe I could create parser to convert URL and numeric value to a single identifer and then get different patterns.
    – GP92
    Commented Mar 16, 2022 at 16:10

Not the answer you're looking for? Browse other questions tagged .