SlideShare a Scribd company logo
David Pilato
Developer | Evangelist, @dadoonet
Managing your
Black Friday Logs
Managing your black friday logs - Code Europe
Data Platform
Architectures
life:universe
user:soulmate
_Search? outside the box
city:restaurant
car:model
fridge:leftovers
work:dreamjob
Managing your black friday logs - Code Europe
Logging
Metrics
Security Analytics
APM
@dadoonet sli.do/elastic!10
The Elastic Journey of Data
Beats
Log
Files
Metrics
Wire
Data
your{beat}
@dadoonet sli.do/elastic!11
The Elastic Journey of Data
Beats
Log
Files
Metrics
Wire
Data
your{beat}
Elasticsearch
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
@dadoonet sli.do/elastic!12
The Elastic Journey of Data
Beats
Log
Files
Metrics
Wire
Data
your{beat}
Elasticsearch
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
Kibana
Instances (X)
@dadoonet sli.do/elastic!13
The Elastic Journey of Data
Beats
Log
Files
Metrics
Wire
Data
your{beat}
Elasticsearch
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
Logstash
Nodes (X)
Kibana
Instances (X)
@dadoonet sli.do/elastic!14
The Elastic Journey of Data
Beats
Log
Files
Metrics
Wire
Data
your{beat}
Data
Store
Web
APIs
Social Sensors
Elasticsearch
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
Logstash
Nodes (X)
Kibana
Instances (X)
@dadoonet sli.do/elastic!15
The Elastic Journey of Data
Beats
Log
Files
Metrics
Wire
Data
your{beat}
Data
Store
Web
APIs
Social Sensors
Elasticsearch
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
Logstash
Nodes (X)
Kibana
Instances (X)
NotificationQueues Storage Metrics
@dadoonet sli.do/elastic!16
The Elastic Journey of Data
Beats
Log
Files
Metrics
Wire
Data
your{beat}
Data
Store
Web
APIs
Social Sensors
Elasticsearch
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
Logstash
Nodes (X)
Kafka
Redis
Messaging
Queue
Kibana
Instances (X)
NotificationQueues Storage Metrics
@dadoonet sli.do/elastic!17
The Elastic Journey of Data
Beats
Log
Files
Metrics
Wire
Data
your{beat}
Data
Store
Web
APIs
Social Sensors
Elasticsearch
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
Logstash
Nodes (X)
Kafka
Redis
Messaging
Queue
Kibana
Instances (X)
NotificationQueues Storage Metrics
X-Pack
X-PackX-Pack
@dadoonet sli.do/elastic!18
Provision and manage multiple Elastic Stack
environments and provide
search-aaS, logging-aaS, BI-aaS, data-aaS
to your entire organization
@dadoonet sli.do/elastic!19
Hosted Elasticsearch & Kibana
Includes X-Pack features
Starts at $45/mo
Available in
Amazon Web Service
Google Cloud Platform
Managing your black friday logs - Code Europe
Elasticsearch

Cluster Sizing
@dadoonet sli.do/elastic!22
Terminology
Cluster my_cluster
Server 1
Node A
d1
d2
d3
d4
d5
d6
d7
d8d9
d10
d11
d12
Index twitter
d6d3
d2
d5
d1
d4
Index logs
@dadoonet sli.do/elastic!23
Partition
Cluster my_cluster
Server 1
Node A
d1
d2
d3
d4
d5
d6
d7
d8d9
d10
d11
d12
Index twitter
d6d3
d2
d5
d1
d4
Index logs
Shards
0
1
4
2
3
0
1
@dadoonet sli.do/elastic!24
Distribution
Cluster my_cluster
Server 1
Node A
Server 2
Node Btwitter
shard P4
d1
d2
d6
d5
d10
d12
twitter
shard P2
twitter
shard P1
logs
shard P0
d2
d5
d4
logs
shard P1
d3
d4
d9
d7
d8
d11
twitter
shard P3
twitter
shard P0
d6d3
d1
@dadoonet sli.do/elastic!25
Replication
Cluster my_cluster
Server 1
Node A
Server 2
Node Btwitter
shard P4
d1
d2
d6
d5
d10
d12
twitter
shard P2
twitter
shard P1
logs
shard P0
d2
d5
d4
logs
shard P1
d3
d4
d9
d7
d8
d11
twitter
shard P3
twitter
shard P0
twitter
shard R4
d1
d2
d6
d12
twitter
shard R2
d5
d10
twitter
shard R1
d6d3
d1
d6d3
d1
logs
shard R0
d2
d5
d4
logs
shard R1
d3
d4
d9
d7
d8
d11
twitter
shard R3
twitter
shard R0
• Primaries
• Replicas
@dadoonet sli.do/elastic!26
Scaling
Data
@dadoonet sli.do/elastic!27
Scaling
Data
@dadoonet sli.do/elastic!28
Scaling
Data
@dadoonet sli.do/elastic!29
Scaling
Big Data
... ...
@dadoonet sli.do/elastic!30
Scaling
• In Elasticsearch, shards are the working unit
• More data -> More shards
Big Data
... ...
But how many shards?
@dadoonet sli.do/elastic!31
How much data?
• ~1000 events per second
• 60s * 60m * 24h * 1000 events => ~87M events per day
• 1kb per event => ~82GB per day
• 3 months => ~7TB
@dadoonet sli.do/elastic!32
Shard Size
• It depends on many different factors
‒ document size, mapping, use case, kinds of queries being executed,
desired response time, peak indexing rate, budget, ...
• After the shard sizing*, each shard should handle 45GB
• Up to 10 shards per machine
* https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing
@dadoonet sli.do/elastic!33
How many shards?
• Data size: ~7TB
• Shard Size: ~45GB*
• Total Shards: ~160
• Shards per machine: 10*
• Total Servers: 16
* https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing
Cluster my_cluster
3 months of logs
...
@dadoonet sli.do/elastic!34
But...
• How many indices?
• What do you do if the daily data grows?
• What do you do if you want to delete old data?
@dadoonet sli.do/elastic!35
Time-Based Data
• Logs, social media streams, time-based events
• Timestamp + Data
• Do not change
• Typically search for recent events
• Older documents become less important
• Hard to predict the data size
@dadoonet sli.do/elastic!36
Time-Based Data
• Time-based Indices is the best option
‒ create a new index each day, week, month, year, ...
‒ search the indices you need in the same request
@dadoonet!37
Daily Indices
Cluster my_cluster
d6d3
d2
d5
d1
d4
logs-2018-04-10
@dadoonet!38
Daily Indices
Cluster my_cluster
d6d3
d2
d5
d1
d4
logs-2018-04-11
d6d3
d2
d5
d1
d4
logs-2018-04-10
@dadoonet!39
Daily Indices
Cluster my_cluster
d6d3
d2
d5
d1
d4
logs-2018-04-10
d6d3
d2
d5
d1
d4
logs-2018-04-12
d6d3
d2
d5
d1
d4
logs-2018-04-11
@dadoonet!40
Templates
• Every new created index starting with 'logs-' will have
‒ 2 shards
‒ 1 replica (for each primary shard)
‒ 60 seconds refresh interval
PUT _template/logs
{
"template": "logs-*",
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1,
"refresh_interval": "60s"
}
}
More on that later
@dadoonet!41
Alias
Cluster my_cluster
d6d3
d2
d5
d1
d4
logs-2018-04-10
users
Application
logs-write
logs-read
@dadoonet!42
Alias
Cluster my_cluster
d6d3
d2
d5
d1
d4
logs-2018-04-10
users
Application
logs-write
logs-read
d6d3
d2
d5
d1
d4
logs-2018-04-11
@dadoonet!43
Alias
Cluster my_cluster
d6d3
d2
d5
d1
d4
logs-2018-04-10
users
Application
logs-write
logs-read
d6d3
d2
d5
d1
d4
logs-2018-04-11
d6d3
d2
d5
d1
d4
logs-2018-04-12
Detour: Rollover API
https://www.elastic.co/guide/en/elasticsearch/reference/5.6/indices-rollover-index.html
@dadoonet sli.do/elastic!45
Do not Overshard
• 3 different logs
• 1 index per day each
• 1GB each
• 5 shards (default): so 200mb / shard vs 45gb
• 6 months retention
• ~900 shards for ~180GB
• we needed ~4 shards!
don't keep default values! Cluster my_cluster
access-...
d6d3
d2
d5
d1
d4
application-...
d6d5
d9
d5
d1
d7
mysql-...
d10d59
d3
d5
d0
d4
Managing your black friday logs - Code Europe
@dadoonet sli.do/elastic!47
Scaling
Big Data
... ...1M users
But what happens if we have 2M users?
@dadoonet sli.do/elastic!48
Scaling
Big Data
... ...1M users
... ...1M users
@dadoonet sli.do/elastic!49
Scaling
Big Data
... ...1M users
... ...1M users
... ...1M users
@dadoonet sli.do/elastic!50
Scaling
Big Data
... ...
... ...
... ...
U
s
e
r
s
@dadoonet sli.do/elastic!51
Shards are the working unit
• Primaries
‒ More data -> More shards
‒ write throughput (More writes -> More primary shards)
• Replicas
‒ high availability (1 replica is the default)
‒ read throughput (More reads -> More replicas)
Detour: Shrink API
https://www.elastic.co/guide/en/elasticsearch/reference/6.2/indices-shrink-index.html
Detour: Split API
https://www.elastic.co/guide/en/elasticsearch/reference/6.2/indices-split-index.html
Optimal Bulk Size
@dadoonet sli.do/elastic!55
What is Bulk?
Elasticsearch
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
X-Pack
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
_____
1000

log events
Beats
Logstash
Application
1000 index requests
with 1 document
1 bulk request with
1000 documents
@dadoonet sli.do/elastic!56
What is the optimal bulk size?
Elasticsearch
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
X-Pack
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
_____
1000

log events
Beats
Logstash
Application
4 *
250?
1 *
1000?
2 *
500?
@dadoonet sli.do/elastic!57
It depends...
• on your application (language, libraries, ...)
• document size (100b, 1kb, 100kb, 1mb, ...)
• number of nodes
• node size
• number of shards
• shards distribution
@dadoonet sli.do/elastic!58
Test it ;)
Elasticsearch
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
X-Pack
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
_____
1000000

log events
Beats
Logstash
Application
4000 * 250-> 160s
1000 * 1000-> 155s
2000 * 500-> 164s
@dadoonet sli.do/elastic!59
Test it ;)
DATE=`date +%Y.%m.%d`
LOG=logs/logs.txt
exec_test () {
curl -s -XDELETE "http://USER:PASS@HOST:9200/logstash-$DATE"
sleep 10
export SIZE=$1
time cat $LOG | ./bin/logstash -f logstash.conf
}
for SIZE in 100 500 1000 3000 5000 10000; do
for i in {1..20}; do
exec_test $SIZE
done; done;
input { stdin{} }
filter {}
output {
elasticsearch {
hosts => ["10.12.145.189"]
flush_size => "${SIZE}"
} }
In Beats set "bulk_max_size"
in the output.elasticsearch
@dadoonet!60
Test it ;)
• 2 node cluster (m3.large)
‒ 2 vCPU, 7.5GB Memory, 1x32GB SSD
• 1 index server (m3.large)
‒ logstash
‒ kibana
# docs 100 500 1000 3000 5000 10000
time(s) 191.7 161.9 163.5 160.7 160.7 161.5
Distribute the Load
@dadoonet!62
Avoid Bottlenecks
Elasticsearch
X-Pack
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
1000000

log events
Beats
Logstash
Application
single node
Node 1
Node 2
round robin
output {
elasticsearch {
hosts => ["node1","node2"]
} }
@dadoonet!63
Load Balancer
Elasticsearch
X-Pack
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
1000000

log events
Beats
Logstash
Application
LB
Node 2
Node 1
@dadoonet!64
Coordinating-only Node
Elasticsearch
X-Pack
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
1000000

log events
Beats
Logstash
Application
Node 3

co-node
Node 2
Node 1
@dadoonet!65
Test it ;)
#docs
time(s)
1000 5000 10000
NO Round Robin 163.5 160.7 161.5
Round Robin 161.3 158.2 159.4
• 2 node cluster (m3.large)
‒ 2 vCPU, 7.5GB Memory, 1x32GB SSD
• 1 index server (m3.large)
‒ logstash (round robin configured)
‒ hosts => ["10.12.145.189", "10.121.140.167"]
‒ kibana
Optimizing Disk IO
@dadoonet!67
Durability
index a doc
time
lucene flush
buffer
index a doc
buffer
index a doc
buffer
buffer
segment
@dadoonet!68
refresh_interval
• Dynamic per-index setting
• Increase to get better write throughput to an index
• New documents will take more time to be available for Search.
PUT logstash-2017.05.16/_settings
{
"refresh_interval": "60s"
}
#docs
time(s)
1000 5000 10000
1s refresh 161.3 158.2 159.4
60s refresh 156.7 152.1 152.6
@dadoonet!69
Durability
index a doc
time
lucene flush
buffer
segment
trans_log
buffer
trans_log
buffer
trans_log
elasticsearch flush
doc
op
lucene commit
segment
segment
@dadoonet!70
Translog fsync every 5s (1.7)
index a doc
buffer
trans_log
doc
op
index a doc
buffer
trans_log
doc
op
Primary
Replica
redundancy doesn’t help if all nodes lose power
@dadoonet!71
Async Transaction Log
• index.translog.durability
‒ request (default)
‒ async
• index.translog.sync_interval (only if async is set)
• Dynamic per-index settings
• Be careful, you are relaxing the safety guarantees
#docs
time(s)
1000 5000 10000
Request fsync 161.3 158.2 159.4
5s sync 152.4 149.1 150.3
Final Remarks
@dadoonet sli.do/elastic!73
Final Remarks
Beats
Log
Files
Metrics
Wire
Data
your{beat}
Data
Store
Web
APIs
Social Sensors
Elasticsearch
Master
Nodes (3)
Ingest
Nodes (X)
Data Nodes
Hot (X)
Data Notes
Warm (X)
Logstash
Nodes (X)
Kafka
Redis
Messaging
Queue
Kibana
Instances (X)
NotificationQueues Storage Metrics
X-Pack
X-PackX-Pack
@dadoonet sli.do/elastic!74
Final Remarks
• Primaries
‒ More data -> More shards
‒ Do not overshard!
• Replicas
‒ high availability (1 replica is the default)
‒ read throughput (More reads -> More replicas)
Big Data
... ...
... ...
... ...
U
s
e
r
s
@dadoonet!75
Final Remarks
• Bulk and Test
• Distribute the Load
• Refresh Interval
• Async Trans Log (careful)
#docs per bulk 1000 5000 10000
Default 163.5 160.7 161.5
RR+60s+Async5s 152.4 149.1 150.3
David Pilato
Developer | Evangelist, @dadoonet
Managing your
Black Friday Logs
dzięki!

More Related Content

Managing your black friday logs - Code Europe