SlideShare a Scribd company logo
© 2023 VictoriaMetrics
VictoriaLogs: preview
Aliaksandr Valialkin, CTO VictoriaMetrics
Existing open source log management systems
● ELK stack - https://www.elastic.co/observability/log-monitoring
● Grafana Loki - https://grafana.com/oss/loki/
ELK stack
● Strong points:
○ Fast full-text search via ElasticSearch
ELK stack
● Strong points:
○ Fast full-text search via ElasticSearch
○ Widespread usage of Logstash and Fluentbit for logs’ ingestion
ELK stack
● Strong points:
○ Fast full-text search via ElasticSearch
○ Widespread usage of Logstash and Fluentbit for logs’ ingestion
○ Advanced log analysis via Kibana
ELK stack
● Strong points:
○ Fast full-text search via ElasticSearch
○ Widespread usage of Logstash and Fluentbit for logs’ ingestion
○ Advanced log analysis via Kibana
● Weak points:
○ Slow data ingestion
ELK stack
● Strong points:
○ Fast full-text search via ElasticSearch
○ Widespread usage of Logstash and Fluentbit for logs’ ingestion
○ Advanced log analysis via Kibana
● Weak points:
○ Slow data ingestion
○ High CPU and RAM usage during data ingestion
ELK stack
● Strong points:
○ Fast full-text search via ElasticSearch
○ Widespread usage of Logstash and Fluentbit for logs’ ingestion
○ Advanced log analysis via Kibana
● Weak points:
○ Slow data ingestion
○ High CPU and RAM usage during data ingestion
○ Bad on-disk compression for stored logs
ELK stack
● Strong points:
○ Fast full-text search via ElasticSearch
○ Widespread usage of Logstash and Fluentbit for logs’ ingestion
○ Advanced log analysis via Kibana
● Weak points:
○ Slow data ingestion
○ High CPU and RAM usage during data ingestion
○ Bad on-disk compression for stored logs
○ Missing log stream concept
ELK stack
● Strong points:
○ Fast full-text search via ElasticSearch
○ Widespread usage of Logstash and Fluentbit for logs’ ingestion
○ Advanced log analysis via Kibana
● Weak points:
○ Slow data ingestion
○ High CPU and RAM usage during data ingestion
○ Bad on-disk compression for stored logs
○ Missing log stream concept
○ Missing ability for querying advanced stats from access logs
ELK stack
● Strong points:
○ Fast full-text search via ElasticSearch
○ Widespread usage of Logstash and Fluentbit for logs’ ingestion
○ Advanced log analysis via Kibana
● Weak points:
○ Slow data ingestion
○ High CPU and RAM usage during data ingestion
○ Bad on-disk compression for stored logs
○ Missing log stream concept
○ Missing ability for querying advanced stats from access logs
○ Missing CLI integration (e.g. $ elk “some query” | grep … | sort … | tail)
Grafana Loki
● Strong points:
○ Lower CPU and RAM usage during data ingestion
Grafana Loki
● Strong points:
○ Lower CPU and RAM usage during data ingestion
○ Good on-disk compression for stored logs
Grafana Loki
● Strong points:
○ Lower CPU and RAM usage during data ingestion
○ Good on-disk compression for stored logs
○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/
Grafana Loki
● Strong points:
○ Lower CPU and RAM usage during data ingestion
○ Good on-disk compression for stored logs
○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/
○ Ability to query advanced stats from access logs
Grafana Loki
● Strong points:
○ Lower CPU and RAM usage during data ingestion
○ Good on-disk compression for stored logs
○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/
○ Ability to query advanced stats from access logs
○ CLI integration - https://grafana.com/docs/loki/latest/tools/logcli/
Grafana Loki
● Strong points:
○ Lower CPU and RAM usage during data ingestion
○ Good on-disk compression for stored logs
○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/
○ Ability to query advanced stats from access logs
○ CLI integration - https://grafana.com/docs/loki/latest/tools/logcli/
● Weak points:
○ LogQL is non-trivial to use for typical queries - https://grafana.com/docs/loki/latest/logql/
Grafana Loki
● Strong points:
○ Lower CPU and RAM usage during data ingestion
○ Good on-disk compression for stored logs
○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/
○ Ability to query advanced stats from access logs
○ CLI integration - https://grafana.com/docs/loki/latest/tools/logcli/
● Weak points:
○ LogQL is non-trivial to use for typical queries - https://grafana.com/docs/loki/latest/logql/
○ Full-text search queries may be slow
Grafana Loki
● Strong points:
○ Lower CPU and RAM usage during data ingestion
○ Good on-disk compression for stored logs
○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/
○ Ability to query advanced stats from access logs
○ CLI integration - https://grafana.com/docs/loki/latest/tools/logcli/
● Weak points:
○ LogQL is non-trivial to use for typical queries - https://grafana.com/docs/loki/latest/logql/
○ Full-text search queries may be slow
○ Grafana UI for logs isn’t so good compared to Kibana
Grafana Loki
● Strong points:
○ Lower CPU and RAM usage during data ingestion
○ Good on-disk compression for stored logs
○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/
○ Ability to query advanced stats from access logs
○ CLI integration - https://grafana.com/docs/loki/latest/tools/logcli/
● Weak points:
○ LogQL is non-trivial to use for typical queries - https://grafana.com/docs/loki/latest/logql/
○ Full-text search queries may be slow
○ Grafana UI for logs isn’t so good compared to Kibana
○ Missing ability to set individual labels per each log line (ip, user_id, trace_id, etc.) during data
ingestion - https://grafana.com/docs/loki/latest/fundamentals/labels/ . This frequently leads to
high cardinality issues.
What is VictoriaLogs?
● Open source log management system from VictoriaMetrics
What is VictoriaLogs?
● Open source log management system from VictoriaMetrics
● Easy to setup and operate
What is VictoriaLogs?
● Open source log management system from VictoriaMetrics
● Easy to setup and operate
● Scales vertically and horizontally
What is VictoriaLogs?
● Open source log management system from VictoriaMetrics
● Easy to setup and operate
● Scales vertically and horizontally
● Optimized for low resource usage (CPU, RAM, disk space)
What is VictoriaLogs?
● Open source log management system from VictoriaMetrics
● Easy to setup and operate
● Scales vertically and horizontally
● Optimized for low resource usage (CPU, RAM, disk space)
● Accepts data from Logstash and Fluentbit in Elasticsearch format
What is VictoriaLogs?
● Open source log management system from VictoriaMetrics
● Easy to setup and operate
● Scales vertically and horizontally
● Optimized for low resource usage (CPU, RAM, disk space)
● Accepts data from Logstash and Fluentbit in Elasticsearch format
● Accepts data from Promtail in Loki format
What is VictoriaLogs?
● Open source log management system from VictoriaMetrics
● Easy to setup and operate
● Scales vertically and horizontally
● Optimized for low resource usage (CPU, RAM, disk space)
● Accepts data from Logstash and Fluentbit in Elasticsearch format
● Accepts data from Promtail in Loki format
● Supports stream concept from Loki
What is VictoriaLogs?
● Open source log management system from VictoriaMetrics
● Easy to setup and operate
● Scales vertically and horizontally
● Optimized for low resource usage (CPU, RAM, disk space)
● Accepts data from Logstash and Fluentbit in Elasticsearch format
● Accepts data from Promtail in Loki format
● Supports stream concept from Loki
● Provides easy to use yet powerful query language - LogsQL
What is VictoriaLogs?
● Open source log management system from VictoriaMetrics
● Easy to setup and operate
● Scales vertically and horizontally
● Optimized for low resource usage (CPU, RAM, disk space)
● Accepts data from Logstash and Fluentbit in Elasticsearch format
● Accepts data from Promtail in Loki format
● Supports stream concept from Loki
● Provides easy to use yet powerful query language - LogsQL
● It is in active development right now
LogsQL examples
LogsQL examples: search by time
● _time:-5m.. - search for logs for the last 5 minutes
LogsQL examples: search by time
● _time:-5m.. - search for logs for the last 5 minutes
● _time:-1h..-5m - search for logs on the time range [now()-1h … now()-5m]
LogsQL examples: search by time
● _time:-5m.. - search for logs for the last 5 minutes
● _time:-1h..-5m - search for logs on the time range [now()-1h … now()-5m]
● _time:2023-03-01..2023-03-31 - search for logs at March 2023
LogsQL examples: search by time
● _time:-5m.. - search for logs for the last 5 minutes
● _time:-1h..-5m - search for logs on the time range [now()-1h … now()-5m]
● _time:2023-03-01..2023-03-31 - search for logs at March 2023
● _time:2023-03 - the same as above
LogsQL examples: search by time
● _time:-5m.. - search for logs for the last 5 minutes
● _time:-1h..-5m - search for logs on the time range [now()-1h … now()-5m]
● _time:2023-03-01..2023-03-31 - search for logs at March 2023
● _time:2023-03 - the same as above
● _time:2023-03-20T22 - search for logs at March 20, 2023, at 22 UTC
LogsQL examples: search by time
● _time:-5m.. - search for logs for the last 5 minutes
● _time:-1h..-5m - search for logs on the time range [now()-1h … now()-5m]
● _time:2023-03-01..2023-03-31 - search for logs at March 2023
● _time:2023-03 - the same as above
● _time:2023-03-20T22 - search for logs at March 20, 2023, at 22 UTC
● It is recommended specifying _time filter in order to narrow down the search
scope and speed up the query
LogsQL examples: full-text search
● error - case-insensitive search for log messages with “error” word. Messages with
“Error”, “ERROR”, “eRRoR”, etc. will be also found
LogsQL examples: full-text search
● error - case-insensitive search for log messages with “error” word. Messages with
“Error”, “ERROR”, “eRRoR”, etc. will be also found
● fail* - case-insensitive search for log messages with words starting with “fail”,
such as “fail”, “failure”, “FAILED”, etc.
LogsQL examples: full-text search
● error - case-insensitive search for log messages with “error” word. Messages with
“Error”, “ERROR”, “eRRoR”, etc. will be also found
● fail* - case-insensitive search for log messages with words starting with “fail”,
such as “fail”, “failure”, “FAILED”, etc.
● “Error” - case-sensitive search for log messages with the “Error” word
LogsQL examples: full-text search
● error - case-insensitive search for log messages with “error” word. Messages with
“Error”, “ERROR”, “eRRoR”, etc. will be also found
● fail* - case-insensitive search for log messages with words starting with “fail”,
such as “fail”, “failure”, “FAILED”, etc.
● “Error” - case-sensitive search for log messages with the “Error” word
● “failed to open” - search for logs containing “failed to open” phrase
LogsQL examples: full-text search
● error - case-insensitive search for log messages with “error” word. Messages with
“Error”, “ERROR”, “eRRoR”, etc. will be also found
● fail* - case-insensitive search for log messages with words starting with “fail”,
such as “fail”, “failure”, “FAILED”, etc.
● “Error” - case-sensitive search for log messages with the “Error” word
● “failed to open” - search for logs containing “failed to open” phrase
● exact(“foo bar”) - search for logs with the exact “foo bar” message
LogsQL examples: full-text search
● error - case-insensitive search for log messages with “error” word. Messages with
“Error”, “ERROR”, “eRRoR”, etc. will be also found
● fail* - case-insensitive search for log messages with words starting with “fail”,
such as “fail”, “failure”, “FAILED”, etc.
● “Error” - case-sensitive search for log messages with the “Error” word
● “failed to open” - search for logs containing “failed to open” phrase
● exact(“foo bar”) - search for logs with the exact “foo bar” message
● re(“https?://[^s]+”) - search for logs matching the given regexp, e.g. logs with
http or https urls
LogsQL examples: combining search queries
● error OR warning - search for log messages with either “error” or “warning”
words
LogsQL examples: combining search queries
● error OR warning - search for log messages with either “error” or “warning”
words
● err* AND fail* - search for logs containing both words, which start from err and
fail. For example, “ERROR: the file /foo/bar failed to open”
LogsQL examples: combining search queries
● error OR warning - search for log messages with either “error” or “warning”
words
● err* AND fail* - search for logs containing both words, which start from err and
fail. For example, “ERROR: the file /foo/bar failed to open”
● error AND NOT “/foo/bar” - search for logs containing “error” word, but without
“/foo/bar” string
LogsQL examples: combining search queries
● error OR warning - search for log messages with either “error” or “warning”
words
● err* AND fail* - search for logs containing both words, which start from err and
fail. For example, “ERROR: the file /foo/bar failed to open”
● error AND NOT “/foo/bar” - search for logs containing “error” word, but without
“/foo/bar” string
● _time:-1h.. AND (error OR warning) AND NOT debug - search for logs for the
last hour with either “error” or “warning” words, but without “debug” word
LogsQL examples: searching arbitrary labels
● By default the search is performed in the log message
LogsQL examples: searching arbitrary labels
● By default the search is performed in the log message
● Every log entry can contain additional labels. For example, level, ip, user_id,
trace_id, etc.
LogsQL examples: searching arbitrary labels
● By default the search is performed in the log message
● Every log entry can contain additional labels. For example, level, ip, user_id,
trace_id, etc.
● LogsQL allows searching in any label via label_name:query syntax
LogsQL examples: searching arbitrary labels
● level:(error or warning) - case-insensitive search for logs with level label
containing “error” or “warning”
LogsQL examples: searching arbitrary labels
● level:(error or warning) - case-insensitive search for logs with level label
containing “error” or “warning”
● trace_id:”012345-6789ab-cdef” AND error - search for logs with the given
trace_id label, which contain “error” word
Log streams
What is a log stream?
● A log stream is logs generated by a single instance of some application:
○ Some Linux process (Unix daemon)
○ Docker container
○ Kubernetes container running in a pod
What is a log stream?
● A log stream is logs generated by a single instance of some application:
○ Some Linux process (Unix daemon)
○ Docker container
○ Kubernetes container running in a pod
● Logs belonging to a single stream are traditionally written to a single file and
investigated with cat, grep, sort, cut, uniq, tail, etc. commands.
What is a log stream?
● A stream in distributed system can be uniquely identified by the instance location
such as its TCP address (aka instance label in Prometheus ecosystem).
What is a log stream?
● A stream in distributed system can be uniquely identified by the instance location
such as its TCP address (aka instance label in Prometheus ecosystem).
● Multiple instances of a single application (aka shards or replicas) can be identified
by the application name (aka job label in Prometheus ecosystem).
What is a log stream?
● A stream in distributed system can be uniquely identified by the instance location
such as its TCP address (aka instance label in Prometheus ecosystem).
● Multiple instances of a single application (aka shards or replicas) can be identified
by the application name (aka job label in Prometheus ecosystem).
● Additional labels can be attached to log streams, so they could be used during log
analysis. For example, environment, datacenter, zone, namespace, etc.
What is a log stream?
● ELK misses the concept of log streams, so it may be non-trivial to perform
stream-based log analysis there.
What is a log stream?
● ELK misses the concept of log streams, so it may be non-trivial to perform
stream-based log analysis there.
● Grafana Loki supports the log stream concept from the beginning.
What is a log stream?
● ELK misses the concept of log streams, so it may be non-trivial to perform
stream-based log analysis there.
● Grafana Loki supports the log stream concept from the beginning.
● VictoriaLogs provides support for log streams.
LogsQL examples: querying log streams
● VictoriaLogs allows querying streams via _stream label with Prometheus label
filters
LogsQL examples: querying log streams
● VictoriaLogs allows querying streams via _stream label with Prometheus label
filters
● _stream:{job=”nginx”} - search for logs from nginx streams
LogsQL examples: querying log streams
● VictoriaLogs allows querying streams via _stream label with Prometheus label
filters
● _stream:{job=”nginx”} - search for logs from nginx streams
● _stream:{env=~”qa|staging”,zone!=”us-east”} - search for log streams from qa or
staging environments at all the zones except us-east
LogsQL examples: querying log streams
● VictoriaLogs allows querying streams via _stream label with Prometheus label
filters
● _stream:{job=”nginx”} - search for logs from nginx streams
● _stream:{env=~”qa|staging”,zone!=”us-east”} - search for log streams from qa or
staging environments at all the zones except us-east
● _time:-1h.. AND _stream:{job=”nginx”} AND level:error - search for logs for the
last hour from nginx streams with the level label containing the “error” word
Stream labels vs log labels
● Stream labels remain static, e.g. they do not change across logs belonging to the
same stream.
Stream labels vs log labels
● Stream labels remain static, e.g. they do not change across logs belonging to the
same stream.
● The recommended stream labels are instance and job. These labels simplify
correlation between Prometheus metrics and logs.
Stream labels vs log labels
● Stream labels remain static, e.g. they do not change across logs belonging to the
same stream.
● The recommended stream labels are instance and job. These labels simplify
correlation between Prometheus metrics and logs.
● Stream labels allow grouping logs by individual streams during querying, which
can simplify log analysis.
Stream labels vs log labels
● Stream labels remain static, e.g. they do not change across logs belonging to the
same stream.
● The recommended stream labels are instance and job. These labels simplify
correlation between Prometheus metrics and logs.
● Stream labels allow grouping logs by individual streams during querying, which
can simplify log analysis.
● Stream labels can be used for narrowing down the amounts of logs to search and
optimizing the query speed.
Stream labels vs log labels
● Log labels can change inside the same stream. For example, level, trace_id, ip,
user_id, response_duration, etc.
Stream labels vs log labels
● Log labels can change inside the same stream. For example, level, trace_id, ip,
user_id, response_duration, etc.
● Log labels are known as log fields from structured logging.
Stream labels vs log labels
● Log labels can change inside the same stream. For example, level, trace_id, ip,
user_id, response_duration, etc.
● Log labels are known as log fields from structured logging.
● Searching via log labels simplify narrowing down the search results.
LogsQL: stats over access logs
LogsQL: stats over access logs
● It is quite common to collect access logs (e.g. nginx or apache logs)
LogsQL: stats over access logs
● It is quite common to collect access logs (e.g. nginx or apache logs)
● It is quite common to analyze access logs with grep, cut, sort, uniq, tail, etc.
commands
LogsQL: stats over access logs
● It is quite common to collect access logs (e.g. nginx or apache logs)
● It is quite common to analyze access logs with grep, cut, sort, uniq, tail, etc.
commands
● Examples:
○ To get the top 10 paths with the biggest number of 404 HTTP errors
○ To calculate per-domain p99 response duration and the number of requests
○ To get the number of unique IPs, which requested the given url
LogsQL: stats over access logs
● It is quite common to collect access logs (e.g. nginx or apache logs)
● It is quite common to analyze access logs with grep, cut, sort, uniq, tail, etc.
commands
● Examples:
○ To get the top 10 paths with the biggest number of 404 HTTP errors
○ To calculate per-domain p99 response duration and the number of requests
○ To get the number of unique IPs, which requested the given url
● ELK and Grafana Loki do not provide functionality to efficiently perform these
tasks :(
LogsQL: stats over access logs
● It is quite common to collect access logs (e.g. nginx or apache logs)
● It is quite common to analyze access logs with grep, cut, sort, uniq, tail, etc.
commands
● Examples:
○ To get the top 10 paths with the biggest number of 404 HTTP errors
○ To calculate per-domain p99 response duration and the number of requests
○ To get the number of unique IPs, which requested the given url
● ELK and Grafana Loki do not provide functionality to efficiently perform these
tasks :(
● VictoraLogs comes to rescue!
LogsQL: stats over access logs: example
Top 10 paths from nginx streams at prod for the last hour with the biggest
number of requests, which led to 404 error.
Additionally, calculate the number of unique ip addresses seen per path.
LogsQL: stats over access logs: example
_time:-1h.. AND _stream:{job=”nginx”,env=”prod”}
Top 10 paths from nginx streams at prod for the last hour with the biggest
number of requests, which led to 404 error.
Additionally, calculate the number of unique ip addresses seen per path.
Select logs from nginx at prod for the last hour
LogsQL: stats over access logs: example
_time:-1h.. AND _stream:{job=”nginx”,env=”prod”} |
extract ‘<ip> <*> “<*> <path> <*>” <status>’
Top 10 paths from nginx streams at prod for the last hour with the biggest
number of requests, which led to 404 error.
Additionally, calculate the number of unique ip addresses seen per path.
Extract ip, path and http status code from nginx log message
LogsQL: stats over access logs: example
_time:-1h.. AND _stream:{job=”nginx”,env=”prod”} |
extract ‘<ip> <*> “<*> <path> <*>” <status>’ |
filter status:404
Top 10 paths from nginx streams at prod for the last hour with the biggest
number of requests, which led to 404 error.
Additionally, calculate the number of unique ip addresses seen per path.
Filter logs with http status code = 404
LogsQL: stats over access logs: example
_time:-1h.. AND _stream:{job=”nginx”,env=”prod”} |
extract ‘<ip> <*> “<*> <path> <*>” <status>’ |
filter status:404 |
stats by (path) (
count() as requests,
uniq(ip) as uniq_ips,
)
Top 10 paths from nginx streams at prod for the last hour with the biggest
number of requests, which led to 404 error.
Additionally, calculate the number of unique ip addresses seen per path.
Count the number of requests and unique ip addresses per each path
LogsQL: stats over access logs: example
_time:-1h.. AND _stream:{job=”nginx”,env=”prod”} |
extract ‘<ip> <*> “<*> <path> <*>” <status>’ |
filter status:404 |
stats by (path) (
count() as requests,
uniq(ip) as uniq_ips,
) |
sort by requests desc
Top 10 paths from nginx streams at prod for the last hour with the biggest
number of requests, which led to 404 error.
Additionally, calculate the number of unique ip addresses seen per path.
Sort by requests in descending order
LogsQL: stats over access logs: example
_time:-1h.. AND _stream:{job=”nginx”,env=”prod”} |
extract ‘<ip> <*> “<*> <path> <*>” <status>’ |
filter status:404 |
stats by (path) (
count() as requests,
uniq(ip) as uniq_ips,
) |
sort by requests desc |
limit 10
Top 10 paths from nginx streams at prod for the last hour with the biggest
number of requests, which led to 404 error.
Additionally, calculate the number of unique ip addresses seen per path.
Leave the first 10 entries
VictoriaLogs: CLI integration
VictoriaLogs: CLI integration
VictoriaLogs comes with vlogs-cli tool, which can be combined with the traditional CLI
commands during log investigation.
VictoriaLogs: CLI integration
VictoriaLogs comes with vlogs-cli tool, which can be combined with the traditional CLI
commands during log investigation.
Example: obtain logs from nginx streams for the last hour and then feed the results to
standard CLI tools - jq, grep and tail - for further processing:
vlogs-cli -q ‘_time:-1h.. AND _stream:{job=”nginx”}’ | jq ._msg | grep 404 | tail
VictoriaLogs: recap
● Open source solution for log management
● Easy to setup and operate
● Optimized for low resource usage (CPU, RAM, disk space)
● Easy yet powerful query language - LogsQL
● Scales both vertically and horizontally
● Supports data ingestion from Logstash, Fluentd and Promtail
FAQ
When VictoriaLogs will be ready to use?
Soon - it is in active development now
Is VictoraLogs open source?
Yes!
How about enterprise features?
Sure! GDPR, security, auth, rate limiting and anomaly
detection will be available in VictoriaLogs enterprise!
How does VictoriaLogs compare to
ClickHouse for logs?
VictoriaLogs uses core optimizations similar to ClickHouse
VictoriaLogs is easier to setup and operate than ClickHouse
Will VictoriaLogs provide datasources for
Grafana and Kibana?
Yes, eventually
How about cloud version of VictoriaLogs?
Yes, eventually
Will VictoriaLogs support JSON and
structured logs?
Yes, from day one!
Will data partitioning be supported?
Yes, VictoriaLogs partitions data by weeks
Partitions are self-contained and can be removed / moved /
archived independently
VictoriaLogs: questions?
Aliaksandr Valialkin, CTO VictoriaMetrics

More Related Content

VictoriaLogs: Open Source Log Management System - Preview

  • 1. © 2023 VictoriaMetrics VictoriaLogs: preview Aliaksandr Valialkin, CTO VictoriaMetrics
  • 2. Existing open source log management systems ● ELK stack - https://www.elastic.co/observability/log-monitoring ● Grafana Loki - https://grafana.com/oss/loki/
  • 3. ELK stack ● Strong points: ○ Fast full-text search via ElasticSearch
  • 4. ELK stack ● Strong points: ○ Fast full-text search via ElasticSearch ○ Widespread usage of Logstash and Fluentbit for logs’ ingestion
  • 5. ELK stack ● Strong points: ○ Fast full-text search via ElasticSearch ○ Widespread usage of Logstash and Fluentbit for logs’ ingestion ○ Advanced log analysis via Kibana
  • 6. ELK stack ● Strong points: ○ Fast full-text search via ElasticSearch ○ Widespread usage of Logstash and Fluentbit for logs’ ingestion ○ Advanced log analysis via Kibana ● Weak points: ○ Slow data ingestion
  • 7. ELK stack ● Strong points: ○ Fast full-text search via ElasticSearch ○ Widespread usage of Logstash and Fluentbit for logs’ ingestion ○ Advanced log analysis via Kibana ● Weak points: ○ Slow data ingestion ○ High CPU and RAM usage during data ingestion
  • 8. ELK stack ● Strong points: ○ Fast full-text search via ElasticSearch ○ Widespread usage of Logstash and Fluentbit for logs’ ingestion ○ Advanced log analysis via Kibana ● Weak points: ○ Slow data ingestion ○ High CPU and RAM usage during data ingestion ○ Bad on-disk compression for stored logs
  • 9. ELK stack ● Strong points: ○ Fast full-text search via ElasticSearch ○ Widespread usage of Logstash and Fluentbit for logs’ ingestion ○ Advanced log analysis via Kibana ● Weak points: ○ Slow data ingestion ○ High CPU and RAM usage during data ingestion ○ Bad on-disk compression for stored logs ○ Missing log stream concept
  • 10. ELK stack ● Strong points: ○ Fast full-text search via ElasticSearch ○ Widespread usage of Logstash and Fluentbit for logs’ ingestion ○ Advanced log analysis via Kibana ● Weak points: ○ Slow data ingestion ○ High CPU and RAM usage during data ingestion ○ Bad on-disk compression for stored logs ○ Missing log stream concept ○ Missing ability for querying advanced stats from access logs
  • 11. ELK stack ● Strong points: ○ Fast full-text search via ElasticSearch ○ Widespread usage of Logstash and Fluentbit for logs’ ingestion ○ Advanced log analysis via Kibana ● Weak points: ○ Slow data ingestion ○ High CPU and RAM usage during data ingestion ○ Bad on-disk compression for stored logs ○ Missing log stream concept ○ Missing ability for querying advanced stats from access logs ○ Missing CLI integration (e.g. $ elk “some query” | grep … | sort … | tail)
  • 12. Grafana Loki ● Strong points: ○ Lower CPU and RAM usage during data ingestion
  • 13. Grafana Loki ● Strong points: ○ Lower CPU and RAM usage during data ingestion ○ Good on-disk compression for stored logs
  • 14. Grafana Loki ● Strong points: ○ Lower CPU and RAM usage during data ingestion ○ Good on-disk compression for stored logs ○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/
  • 15. Grafana Loki ● Strong points: ○ Lower CPU and RAM usage during data ingestion ○ Good on-disk compression for stored logs ○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/ ○ Ability to query advanced stats from access logs
  • 16. Grafana Loki ● Strong points: ○ Lower CPU and RAM usage during data ingestion ○ Good on-disk compression for stored logs ○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/ ○ Ability to query advanced stats from access logs ○ CLI integration - https://grafana.com/docs/loki/latest/tools/logcli/
  • 17. Grafana Loki ● Strong points: ○ Lower CPU and RAM usage during data ingestion ○ Good on-disk compression for stored logs ○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/ ○ Ability to query advanced stats from access logs ○ CLI integration - https://grafana.com/docs/loki/latest/tools/logcli/ ● Weak points: ○ LogQL is non-trivial to use for typical queries - https://grafana.com/docs/loki/latest/logql/
  • 18. Grafana Loki ● Strong points: ○ Lower CPU and RAM usage during data ingestion ○ Good on-disk compression for stored logs ○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/ ○ Ability to query advanced stats from access logs ○ CLI integration - https://grafana.com/docs/loki/latest/tools/logcli/ ● Weak points: ○ LogQL is non-trivial to use for typical queries - https://grafana.com/docs/loki/latest/logql/ ○ Full-text search queries may be slow
  • 19. Grafana Loki ● Strong points: ○ Lower CPU and RAM usage during data ingestion ○ Good on-disk compression for stored logs ○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/ ○ Ability to query advanced stats from access logs ○ CLI integration - https://grafana.com/docs/loki/latest/tools/logcli/ ● Weak points: ○ LogQL is non-trivial to use for typical queries - https://grafana.com/docs/loki/latest/logql/ ○ Full-text search queries may be slow ○ Grafana UI for logs isn’t so good compared to Kibana
  • 20. Grafana Loki ● Strong points: ○ Lower CPU and RAM usage during data ingestion ○ Good on-disk compression for stored logs ○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/ ○ Ability to query advanced stats from access logs ○ CLI integration - https://grafana.com/docs/loki/latest/tools/logcli/ ● Weak points: ○ LogQL is non-trivial to use for typical queries - https://grafana.com/docs/loki/latest/logql/ ○ Full-text search queries may be slow ○ Grafana UI for logs isn’t so good compared to Kibana ○ Missing ability to set individual labels per each log line (ip, user_id, trace_id, etc.) during data ingestion - https://grafana.com/docs/loki/latest/fundamentals/labels/ . This frequently leads to high cardinality issues.
  • 21. What is VictoriaLogs? ● Open source log management system from VictoriaMetrics
  • 22. What is VictoriaLogs? ● Open source log management system from VictoriaMetrics ● Easy to setup and operate
  • 23. What is VictoriaLogs? ● Open source log management system from VictoriaMetrics ● Easy to setup and operate ● Scales vertically and horizontally
  • 24. What is VictoriaLogs? ● Open source log management system from VictoriaMetrics ● Easy to setup and operate ● Scales vertically and horizontally ● Optimized for low resource usage (CPU, RAM, disk space)
  • 25. What is VictoriaLogs? ● Open source log management system from VictoriaMetrics ● Easy to setup and operate ● Scales vertically and horizontally ● Optimized for low resource usage (CPU, RAM, disk space) ● Accepts data from Logstash and Fluentbit in Elasticsearch format
  • 26. What is VictoriaLogs? ● Open source log management system from VictoriaMetrics ● Easy to setup and operate ● Scales vertically and horizontally ● Optimized for low resource usage (CPU, RAM, disk space) ● Accepts data from Logstash and Fluentbit in Elasticsearch format ● Accepts data from Promtail in Loki format
  • 27. What is VictoriaLogs? ● Open source log management system from VictoriaMetrics ● Easy to setup and operate ● Scales vertically and horizontally ● Optimized for low resource usage (CPU, RAM, disk space) ● Accepts data from Logstash and Fluentbit in Elasticsearch format ● Accepts data from Promtail in Loki format ● Supports stream concept from Loki
  • 28. What is VictoriaLogs? ● Open source log management system from VictoriaMetrics ● Easy to setup and operate ● Scales vertically and horizontally ● Optimized for low resource usage (CPU, RAM, disk space) ● Accepts data from Logstash and Fluentbit in Elasticsearch format ● Accepts data from Promtail in Loki format ● Supports stream concept from Loki ● Provides easy to use yet powerful query language - LogsQL
  • 29. What is VictoriaLogs? ● Open source log management system from VictoriaMetrics ● Easy to setup and operate ● Scales vertically and horizontally ● Optimized for low resource usage (CPU, RAM, disk space) ● Accepts data from Logstash and Fluentbit in Elasticsearch format ● Accepts data from Promtail in Loki format ● Supports stream concept from Loki ● Provides easy to use yet powerful query language - LogsQL ● It is in active development right now
  • 31. LogsQL examples: search by time ● _time:-5m.. - search for logs for the last 5 minutes
  • 32. LogsQL examples: search by time ● _time:-5m.. - search for logs for the last 5 minutes ● _time:-1h..-5m - search for logs on the time range [now()-1h … now()-5m]
  • 33. LogsQL examples: search by time ● _time:-5m.. - search for logs for the last 5 minutes ● _time:-1h..-5m - search for logs on the time range [now()-1h … now()-5m] ● _time:2023-03-01..2023-03-31 - search for logs at March 2023
  • 34. LogsQL examples: search by time ● _time:-5m.. - search for logs for the last 5 minutes ● _time:-1h..-5m - search for logs on the time range [now()-1h … now()-5m] ● _time:2023-03-01..2023-03-31 - search for logs at March 2023 ● _time:2023-03 - the same as above
  • 35. LogsQL examples: search by time ● _time:-5m.. - search for logs for the last 5 minutes ● _time:-1h..-5m - search for logs on the time range [now()-1h … now()-5m] ● _time:2023-03-01..2023-03-31 - search for logs at March 2023 ● _time:2023-03 - the same as above ● _time:2023-03-20T22 - search for logs at March 20, 2023, at 22 UTC
  • 36. LogsQL examples: search by time ● _time:-5m.. - search for logs for the last 5 minutes ● _time:-1h..-5m - search for logs on the time range [now()-1h … now()-5m] ● _time:2023-03-01..2023-03-31 - search for logs at March 2023 ● _time:2023-03 - the same as above ● _time:2023-03-20T22 - search for logs at March 20, 2023, at 22 UTC ● It is recommended specifying _time filter in order to narrow down the search scope and speed up the query
  • 37. LogsQL examples: full-text search ● error - case-insensitive search for log messages with “error” word. Messages with “Error”, “ERROR”, “eRRoR”, etc. will be also found
  • 38. LogsQL examples: full-text search ● error - case-insensitive search for log messages with “error” word. Messages with “Error”, “ERROR”, “eRRoR”, etc. will be also found ● fail* - case-insensitive search for log messages with words starting with “fail”, such as “fail”, “failure”, “FAILED”, etc.
  • 39. LogsQL examples: full-text search ● error - case-insensitive search for log messages with “error” word. Messages with “Error”, “ERROR”, “eRRoR”, etc. will be also found ● fail* - case-insensitive search for log messages with words starting with “fail”, such as “fail”, “failure”, “FAILED”, etc. ● “Error” - case-sensitive search for log messages with the “Error” word
  • 40. LogsQL examples: full-text search ● error - case-insensitive search for log messages with “error” word. Messages with “Error”, “ERROR”, “eRRoR”, etc. will be also found ● fail* - case-insensitive search for log messages with words starting with “fail”, such as “fail”, “failure”, “FAILED”, etc. ● “Error” - case-sensitive search for log messages with the “Error” word ● “failed to open” - search for logs containing “failed to open” phrase
  • 41. LogsQL examples: full-text search ● error - case-insensitive search for log messages with “error” word. Messages with “Error”, “ERROR”, “eRRoR”, etc. will be also found ● fail* - case-insensitive search for log messages with words starting with “fail”, such as “fail”, “failure”, “FAILED”, etc. ● “Error” - case-sensitive search for log messages with the “Error” word ● “failed to open” - search for logs containing “failed to open” phrase ● exact(“foo bar”) - search for logs with the exact “foo bar” message
  • 42. LogsQL examples: full-text search ● error - case-insensitive search for log messages with “error” word. Messages with “Error”, “ERROR”, “eRRoR”, etc. will be also found ● fail* - case-insensitive search for log messages with words starting with “fail”, such as “fail”, “failure”, “FAILED”, etc. ● “Error” - case-sensitive search for log messages with the “Error” word ● “failed to open” - search for logs containing “failed to open” phrase ● exact(“foo bar”) - search for logs with the exact “foo bar” message ● re(“https?://[^s]+”) - search for logs matching the given regexp, e.g. logs with http or https urls
  • 43. LogsQL examples: combining search queries ● error OR warning - search for log messages with either “error” or “warning” words
  • 44. LogsQL examples: combining search queries ● error OR warning - search for log messages with either “error” or “warning” words ● err* AND fail* - search for logs containing both words, which start from err and fail. For example, “ERROR: the file /foo/bar failed to open”
  • 45. LogsQL examples: combining search queries ● error OR warning - search for log messages with either “error” or “warning” words ● err* AND fail* - search for logs containing both words, which start from err and fail. For example, “ERROR: the file /foo/bar failed to open” ● error AND NOT “/foo/bar” - search for logs containing “error” word, but without “/foo/bar” string
  • 46. LogsQL examples: combining search queries ● error OR warning - search for log messages with either “error” or “warning” words ● err* AND fail* - search for logs containing both words, which start from err and fail. For example, “ERROR: the file /foo/bar failed to open” ● error AND NOT “/foo/bar” - search for logs containing “error” word, but without “/foo/bar” string ● _time:-1h.. AND (error OR warning) AND NOT debug - search for logs for the last hour with either “error” or “warning” words, but without “debug” word
  • 47. LogsQL examples: searching arbitrary labels ● By default the search is performed in the log message
  • 48. LogsQL examples: searching arbitrary labels ● By default the search is performed in the log message ● Every log entry can contain additional labels. For example, level, ip, user_id, trace_id, etc.
  • 49. LogsQL examples: searching arbitrary labels ● By default the search is performed in the log message ● Every log entry can contain additional labels. For example, level, ip, user_id, trace_id, etc. ● LogsQL allows searching in any label via label_name:query syntax
  • 50. LogsQL examples: searching arbitrary labels ● level:(error or warning) - case-insensitive search for logs with level label containing “error” or “warning”
  • 51. LogsQL examples: searching arbitrary labels ● level:(error or warning) - case-insensitive search for logs with level label containing “error” or “warning” ● trace_id:”012345-6789ab-cdef” AND error - search for logs with the given trace_id label, which contain “error” word
  • 53. What is a log stream? ● A log stream is logs generated by a single instance of some application: ○ Some Linux process (Unix daemon) ○ Docker container ○ Kubernetes container running in a pod
  • 54. What is a log stream? ● A log stream is logs generated by a single instance of some application: ○ Some Linux process (Unix daemon) ○ Docker container ○ Kubernetes container running in a pod ● Logs belonging to a single stream are traditionally written to a single file and investigated with cat, grep, sort, cut, uniq, tail, etc. commands.
  • 55. What is a log stream? ● A stream in distributed system can be uniquely identified by the instance location such as its TCP address (aka instance label in Prometheus ecosystem).
  • 56. What is a log stream? ● A stream in distributed system can be uniquely identified by the instance location such as its TCP address (aka instance label in Prometheus ecosystem). ● Multiple instances of a single application (aka shards or replicas) can be identified by the application name (aka job label in Prometheus ecosystem).
  • 57. What is a log stream? ● A stream in distributed system can be uniquely identified by the instance location such as its TCP address (aka instance label in Prometheus ecosystem). ● Multiple instances of a single application (aka shards or replicas) can be identified by the application name (aka job label in Prometheus ecosystem). ● Additional labels can be attached to log streams, so they could be used during log analysis. For example, environment, datacenter, zone, namespace, etc.
  • 58. What is a log stream? ● ELK misses the concept of log streams, so it may be non-trivial to perform stream-based log analysis there.
  • 59. What is a log stream? ● ELK misses the concept of log streams, so it may be non-trivial to perform stream-based log analysis there. ● Grafana Loki supports the log stream concept from the beginning.
  • 60. What is a log stream? ● ELK misses the concept of log streams, so it may be non-trivial to perform stream-based log analysis there. ● Grafana Loki supports the log stream concept from the beginning. ● VictoriaLogs provides support for log streams.
  • 61. LogsQL examples: querying log streams ● VictoriaLogs allows querying streams via _stream label with Prometheus label filters
  • 62. LogsQL examples: querying log streams ● VictoriaLogs allows querying streams via _stream label with Prometheus label filters ● _stream:{job=”nginx”} - search for logs from nginx streams
  • 63. LogsQL examples: querying log streams ● VictoriaLogs allows querying streams via _stream label with Prometheus label filters ● _stream:{job=”nginx”} - search for logs from nginx streams ● _stream:{env=~”qa|staging”,zone!=”us-east”} - search for log streams from qa or staging environments at all the zones except us-east
  • 64. LogsQL examples: querying log streams ● VictoriaLogs allows querying streams via _stream label with Prometheus label filters ● _stream:{job=”nginx”} - search for logs from nginx streams ● _stream:{env=~”qa|staging”,zone!=”us-east”} - search for log streams from qa or staging environments at all the zones except us-east ● _time:-1h.. AND _stream:{job=”nginx”} AND level:error - search for logs for the last hour from nginx streams with the level label containing the “error” word
  • 65. Stream labels vs log labels ● Stream labels remain static, e.g. they do not change across logs belonging to the same stream.
  • 66. Stream labels vs log labels ● Stream labels remain static, e.g. they do not change across logs belonging to the same stream. ● The recommended stream labels are instance and job. These labels simplify correlation between Prometheus metrics and logs.
  • 67. Stream labels vs log labels ● Stream labels remain static, e.g. they do not change across logs belonging to the same stream. ● The recommended stream labels are instance and job. These labels simplify correlation between Prometheus metrics and logs. ● Stream labels allow grouping logs by individual streams during querying, which can simplify log analysis.
  • 68. Stream labels vs log labels ● Stream labels remain static, e.g. they do not change across logs belonging to the same stream. ● The recommended stream labels are instance and job. These labels simplify correlation between Prometheus metrics and logs. ● Stream labels allow grouping logs by individual streams during querying, which can simplify log analysis. ● Stream labels can be used for narrowing down the amounts of logs to search and optimizing the query speed.
  • 69. Stream labels vs log labels ● Log labels can change inside the same stream. For example, level, trace_id, ip, user_id, response_duration, etc.
  • 70. Stream labels vs log labels ● Log labels can change inside the same stream. For example, level, trace_id, ip, user_id, response_duration, etc. ● Log labels are known as log fields from structured logging.
  • 71. Stream labels vs log labels ● Log labels can change inside the same stream. For example, level, trace_id, ip, user_id, response_duration, etc. ● Log labels are known as log fields from structured logging. ● Searching via log labels simplify narrowing down the search results.
  • 72. LogsQL: stats over access logs
  • 73. LogsQL: stats over access logs ● It is quite common to collect access logs (e.g. nginx or apache logs)
  • 74. LogsQL: stats over access logs ● It is quite common to collect access logs (e.g. nginx or apache logs) ● It is quite common to analyze access logs with grep, cut, sort, uniq, tail, etc. commands
  • 75. LogsQL: stats over access logs ● It is quite common to collect access logs (e.g. nginx or apache logs) ● It is quite common to analyze access logs with grep, cut, sort, uniq, tail, etc. commands ● Examples: ○ To get the top 10 paths with the biggest number of 404 HTTP errors ○ To calculate per-domain p99 response duration and the number of requests ○ To get the number of unique IPs, which requested the given url
  • 76. LogsQL: stats over access logs ● It is quite common to collect access logs (e.g. nginx or apache logs) ● It is quite common to analyze access logs with grep, cut, sort, uniq, tail, etc. commands ● Examples: ○ To get the top 10 paths with the biggest number of 404 HTTP errors ○ To calculate per-domain p99 response duration and the number of requests ○ To get the number of unique IPs, which requested the given url ● ELK and Grafana Loki do not provide functionality to efficiently perform these tasks :(
  • 77. LogsQL: stats over access logs ● It is quite common to collect access logs (e.g. nginx or apache logs) ● It is quite common to analyze access logs with grep, cut, sort, uniq, tail, etc. commands ● Examples: ○ To get the top 10 paths with the biggest number of 404 HTTP errors ○ To calculate per-domain p99 response duration and the number of requests ○ To get the number of unique IPs, which requested the given url ● ELK and Grafana Loki do not provide functionality to efficiently perform these tasks :( ● VictoraLogs comes to rescue!
  • 78. LogsQL: stats over access logs: example Top 10 paths from nginx streams at prod for the last hour with the biggest number of requests, which led to 404 error. Additionally, calculate the number of unique ip addresses seen per path.
  • 79. LogsQL: stats over access logs: example _time:-1h.. AND _stream:{job=”nginx”,env=”prod”} Top 10 paths from nginx streams at prod for the last hour with the biggest number of requests, which led to 404 error. Additionally, calculate the number of unique ip addresses seen per path. Select logs from nginx at prod for the last hour
  • 80. LogsQL: stats over access logs: example _time:-1h.. AND _stream:{job=”nginx”,env=”prod”} | extract ‘<ip> <*> “<*> <path> <*>” <status>’ Top 10 paths from nginx streams at prod for the last hour with the biggest number of requests, which led to 404 error. Additionally, calculate the number of unique ip addresses seen per path. Extract ip, path and http status code from nginx log message
  • 81. LogsQL: stats over access logs: example _time:-1h.. AND _stream:{job=”nginx”,env=”prod”} | extract ‘<ip> <*> “<*> <path> <*>” <status>’ | filter status:404 Top 10 paths from nginx streams at prod for the last hour with the biggest number of requests, which led to 404 error. Additionally, calculate the number of unique ip addresses seen per path. Filter logs with http status code = 404
  • 82. LogsQL: stats over access logs: example _time:-1h.. AND _stream:{job=”nginx”,env=”prod”} | extract ‘<ip> <*> “<*> <path> <*>” <status>’ | filter status:404 | stats by (path) ( count() as requests, uniq(ip) as uniq_ips, ) Top 10 paths from nginx streams at prod for the last hour with the biggest number of requests, which led to 404 error. Additionally, calculate the number of unique ip addresses seen per path. Count the number of requests and unique ip addresses per each path
  • 83. LogsQL: stats over access logs: example _time:-1h.. AND _stream:{job=”nginx”,env=”prod”} | extract ‘<ip> <*> “<*> <path> <*>” <status>’ | filter status:404 | stats by (path) ( count() as requests, uniq(ip) as uniq_ips, ) | sort by requests desc Top 10 paths from nginx streams at prod for the last hour with the biggest number of requests, which led to 404 error. Additionally, calculate the number of unique ip addresses seen per path. Sort by requests in descending order
  • 84. LogsQL: stats over access logs: example _time:-1h.. AND _stream:{job=”nginx”,env=”prod”} | extract ‘<ip> <*> “<*> <path> <*>” <status>’ | filter status:404 | stats by (path) ( count() as requests, uniq(ip) as uniq_ips, ) | sort by requests desc | limit 10 Top 10 paths from nginx streams at prod for the last hour with the biggest number of requests, which led to 404 error. Additionally, calculate the number of unique ip addresses seen per path. Leave the first 10 entries
  • 86. VictoriaLogs: CLI integration VictoriaLogs comes with vlogs-cli tool, which can be combined with the traditional CLI commands during log investigation.
  • 87. VictoriaLogs: CLI integration VictoriaLogs comes with vlogs-cli tool, which can be combined with the traditional CLI commands during log investigation. Example: obtain logs from nginx streams for the last hour and then feed the results to standard CLI tools - jq, grep and tail - for further processing: vlogs-cli -q ‘_time:-1h.. AND _stream:{job=”nginx”}’ | jq ._msg | grep 404 | tail
  • 88. VictoriaLogs: recap ● Open source solution for log management ● Easy to setup and operate ● Optimized for low resource usage (CPU, RAM, disk space) ● Easy yet powerful query language - LogsQL ● Scales both vertically and horizontally ● Supports data ingestion from Logstash, Fluentd and Promtail
  • 89. FAQ
  • 90. When VictoriaLogs will be ready to use? Soon - it is in active development now
  • 91. Is VictoraLogs open source? Yes!
  • 92. How about enterprise features? Sure! GDPR, security, auth, rate limiting and anomaly detection will be available in VictoriaLogs enterprise!
  • 93. How does VictoriaLogs compare to ClickHouse for logs? VictoriaLogs uses core optimizations similar to ClickHouse VictoriaLogs is easier to setup and operate than ClickHouse
  • 94. Will VictoriaLogs provide datasources for Grafana and Kibana? Yes, eventually
  • 95. How about cloud version of VictoriaLogs? Yes, eventually
  • 96. Will VictoriaLogs support JSON and structured logs? Yes, from day one!
  • 97. Will data partitioning be supported? Yes, VictoriaLogs partitions data by weeks Partitions are self-contained and can be removed / moved / archived independently