Running ad hoc queries on JSON log files

Question

I have a situation where let's say I have a folder called logs which has N folders. Each folder contains events for a specific event type and each folder has N .log files where each file has multiple lines of JSON.

Example:

event1.1.log

{"id":1, "name": "ABCD"}
{"id":2, "name": "EFGH"}
{"id":5, "name": "IJKL"}
{"id":7, "name": "MNOP"}

event1.2.log

{"id":3, "name": "ABCD"}
{"id":4, "name": "EFGH"}
{"id":6, "name": "IFKL"}
{"id":8, "name": "ABED"}

Now, each event can have its own structure, but it's guaranteed that each log in the same event will always have the same structure.

Now, I need a way to run ad hoc queries on these: get a list of students, get top ten students, etc.

I thought of loading them onto a temporary table and then run queries on it, but I was wondering if there was any other way to do this.

I could write an application that could parse the files in memory, but the amount of data could be huge to do computation in memory. And every time I want to run a different query on the same dataset within the next few days, it would have to parse all files into memory again.

Any approaches on this?

There are tools such as jq (link), but it only supports comparatively simple queries. — amon, Commented May 9, 2021 at 21:19
What amount of data do you consider "huge"? Is reading the data into memory really not an option? For ad hoc queries, flexibility is often more important than performance, so some Python code that reads the files, performs the selection and sorting, and writes out results would most likely be the simplest solution. — Hans-Martin Mosner, Commented May 10, 2021 at 12:51

Ewan · Accepted Answer · 2021-05-09 21:21:21Z

1

You want to upload these to a "query time schema" database like Splunk or use the ELK stack if there is some structure.

https://aws.amazon.com/elasticsearch-service/the-elk-stack/#:~:text=The%20ELK%20stack%20is%20an,Elasticsearch%2C%20Logstash%2C%20and%20Kibana.

answered May 9, 2021 at 21:21

Ewan

77.5k5 gold badges81 silver badges170 bronze badges

Add a comment |

Stack Exchange Network

Running ad hoc queries on JSON log files

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
architecture
json
etl
or ask your own question.

Hot Network Questions

Running ad hoc queries on JSON log files

1 Answer 1

Not the answer you're looking for? Browse other questions tagged architecturejsonetl or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
architecture
json
etl
or ask your own question.