Мониторинг. Опять, rootconf 2016

МОНИТОРИНГ.
ОПЯТЬ.
Всеволод Поляков

Platform Engineer . Grammarly
ctrlok.com

Что такое метрики?

Внутренние процессы

Системные метрики

Зачем нужны
метрики?

what?
• RRD-like (gram.ly/gfsx)
• so.it.is.my.metric → /so/it/is/my/metric.wsp
• Fixed retention (by namepattern)
• Fixed size (actually no)

Retention and size
• 1s:1d → 1 036 828 bytes
• 10s:10d → 1 036 828 bytes
• 1s:365d → 378 432 028 bytes (1 TB ~ 3 000)
• 10s:365d → 37 843 228 bytes (1 TB ~ 30 000)
whisper calc

Retention and size
• 10s:30d,1m:120d,10m:365d → 4 564 864 bytes
• 240 864 metrics in 1 TB
• aggregation: average, sum, min, max, and last.
• can be assign per metric

How
• terraform (https://www.terraform.io/)
• docker (https://www.docker.com/)
• ansible (https://www.ansible.com/)
• rocker (https://github.com/grammarly/rocker)
• rocker-compose (https://github.com/grammarly/rocker-compose)

carbon-cache.py
• single-core
• many options in conﬁg ﬁle
• default
link

Start load testing
• m4.xlarge instance (4 CPU, 16 GB ram, 256 GB disk EBS gp2)
• retentions = 1s:1d
• MAX_CACHE_SIZE, MAX_UPDATES_PER_SECOND,
MAX_CREATES_PER_MINUTE = inf
• defaults
• almost 1.5h to get limit :(

carbon-cache.py cache size → 75k ms

results
• 75 000 ms max
• 60 000 ms ﬂagman speed
• IO :(

Try to tune!
• WHISPER_SPARSE_CREATE = true
(don’t allocate space on creation)
non-linear IO load.
• CACHE_WRITE_STRATEGY =
sorted (default)

results
• cache ﬂush problem :(

Try to tune!
• CACHE_WRITE_STRATEGY = max
will give a strong ﬂush preference to
frequently updated metrics and will
also reduce random ﬁle-io.

results
• cache ﬂush problem :(

Try to tune!
• CACHE_WRITE_STRATEGY = naive
just ﬂush. Better with random IO.

results
• still CPU

• Maybe it’s IO EBS limitation? → 512 GB disk.
• No.

go-carbon
• multi-core single daemon
• written in golang
• not many options to tune :(
link

Start load testing
• m4.xlarge instance (4 CPU, 16 GB ram, 256 GB disk EBS gp2)
• retentions = 1s:1d
• max-size = 0
• max-updates-per-second = 0
• almost 1h to get limit :(

results
• but it’s without sparse.
• try to implement

try to tune!
remaining := whisper.Size() - whisper.MetadataSize()
whisper.file.Seek(int64(remaining-1), 0)
whisper.file.Write([]byte{0})
chunkSize := 16384
zeros := make([]byte, chunkSize)
for remaining > chunkSize {
// if _, err = whisper.file.Write(zeros); err != nil {
// return nil, err
// }
remaining -= chunkSize
}
if _, err = whisper.file.Write(zeros[:remaining]); err != nil {
return nil, err
}

try to tune!
• max update operation = 1500

results
• TLDR 210 000 - 240 000 ms ﬂagman speed
• 31 000 000 cache size!

try to tune!
• max update operation = 0
• input-buffer = 400 000

results
• 10-20kk cache size!

try to tune!
• vm.dirty_background_ratio=40
• vm.dirty_ratio=60

results
• 180k+ ms ±without cache

carbon-relay.py
• twisted based
• native

Start load testing
• c4.xlarge instance (4 CPU, 7.5 GB ram)
• ~1 Gb lan
• default parameters
• hashing
• 10 connections

carbon-relay-ng
• golang-based
• web-panel
• live-updates
• aggregators
• spooling
link

carbon-c-relay
• написан на C
• advanced cluster management

from 100 000 to 1 600 000 reqs

1 400 000 ﬂagman speed. Or not?

Итак…
go-carbon + carbon-c-relay = ♡

Различия
• Окружение
• Роль
• Трек (Модификатор)
• IP
• Датацентр
• Что-угодно

TSDB с тегами
• inﬂuxDB
• openTSDB (hbase)
• cyanite (cassandra)
• newTS (cassandra)
• Prometheus

(cluster) inﬂux, 130k metrics
увеличить график

openTSDB
single instance + hbase cluster = upto 150k metrics

Найти уникальное

Zipper
• https://github.com/grobian/carbonserver
• https://github.com/dgryski/carbonzipper
• https://github.com/dgryski/carbonapi

ALSO
• https://github.com/jssjr/carbonate
• https://github.com/jjneely/buckytools
• https://github.com/dgryski/carbonmem
• https://github.com/grobian/carbonwriter

Планы
• Патч statsd → ES
• Патч carbonserver → carbonlink

feel free to ask
• Vsevolod Polyakov
• ctrlok@gmail.com
• skype: ctrlok1987
• github.com/ctrlok
• twitter.com/ctrlok
• slack: HangOps
• Gitter: dev_ua/devops
• skype: DevOps from Ukraine
• slack.ukrops.club
Мы хайрим!

Мониторинг. Опять, rootconf 2016

More Related Content

Мониторинг. Опять, rootconf 2016