SlideShare a Scribd company logo
CASSANDRA DAY ATLANTA 2016
MONITORING CASSANDRA
Aaron Morton
@aaronmorton
CEO
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
AboutThe Last Pickle.
Work with clients to deliver and improve Apache Cassandra
based solutions.
Apache Cassandra Committer and DataStax MVPs.
Based in New Zealand,Australia, France & USA.
Metrics
Monitoring & Alerting
Insights
codehale / yammer / drop wizard
Metrics
<dependency
groupId=“io.dropwizard.metrics"
artifactId=“metrics-core"
version="3.1.0" />
Metrics
Seperate Collection from
Reporting.
Metrics Collection
Metrics are always collected.
Metrics
Metrics have a dotted
notation name, timestamp, and
value e.g.
com.thelastpickle.presenters.count=2
MetricTypes
Gauge.
A simple value.
MetricTypes
Ratio Gauge.
A ratio between two values.
MetricTypes
Histograms.
The distribution of values in a
stream of data.
Histograms
Quantiles (e.g. 75th, 95th)
calculated using reservoir
sampling.
(Check docs.)
Histograms
Default Exponentially Decaying
Reservoirs, (roughly) the last five
minutes of data, exponential
weighting towards newer data.
(Check docs.)
MetricTypes
Meter
Measures the per second rate
at which a set of events occur.
Meter
Three different exponentially-
weighted moving average
rates: 1, 5, and 15 minutes
MetricTypes
Timer.
Histogram of duration and
rate of events .
Reporting
Reporters run in the
Cassandra process, pushing
metrics to external services.
Reporters
ConsoleReporter,
GraphiteReporter,
InfluxDBReporter,
RiemannReporter,
…
Reporters In Cassandra
Configuration file:
metrics-reporter-config-
sample.yaml
Reporters In Cassandra
graphite:
-
period: 10
timeunit: 'SECONDS'
prefix: 'cassandra.prod.ip_1_2_3_4.'
hosts:
- host: '1.2.3.4'
port: 2003
predicate:
color: "white"
useQualifiedName: true
patterns:
- "^org.apache.cassandra.metrics.+"
metrics-reporter-config
Configures Metrics reporters.
github.com/addthis/metrics-
reporter-config
metrics-reporter-config
Supports:
Ganglia
Graphite
Riemann
JMX
Cassandra creates JMX
MBeans for each Metric.
JMX
Reporters
Reporters may change the
name of measures, e.g.
95thPercentile == p95
Metrics
Monitoring & Alerting
Insights
Monitoring and Alerting
Use what you like and what
works for you.
Monitoring Platforms
OpsCentre, Grafana &
Graphite, DataDog, Riemann
Metrics
Monitoring & Alerting
Insights
Names ?
All under
org.apache.cassandra.metrics
Scale ?
Latency? microseconds
Rates? per second
Data? bytes
Percentiles ?
75thPercentile
95thPercentile
99thPercentile
Rates ?
OneMinuteRate
RequestThroughput - All Requests
ClientRequest.
$REQUEST.Latency.1MinuteRate
CASRead, CASWrite,
RangeSlice, Read, ViewWrite,
Write
A Note On Requests
We will focus on
Read, Write
But there are others
CAS*, RangeSlice, ViewWrite
RequestThroughput - PerTable
Table.$KEYSPACE.$TABLE.
ReadLatency.1MinuteRate
WriteLatency.1MinuteRate
Request Latency - All Requests
ClientRequest.
Write.Latency.95percentile
Read.Latency.95percentile
Request Latency - PerTable
Table.$KEYSPACE.$TABLE.
CoordinatorReadLatency.95percentile
Local Latency - PerTable
Table.$KEYSPACE.$TABLE.
WriteLatency.95percentile
ReadLatency.95percentile
Local Read Path
Table.$KEYSPACE.$TABLE.
KeyCacheHitRate.value
BloomFilterFalseRatio.value
LiveScannedHistogram.95percentile
TombstoneScannedHistogram.95percentile
SSTablesPerReadHistogram.95percentile
Memory Usage
Table.$KEYSPACE.$TABLE.
BloomFilterOffHeapMemoryUsed.value
IndexSummaryOffHeapMemoryUsed.value
MemtableOnHeapSize.value
MemtableOffHeapSize.value
Clients
Client.connnectedNativeClients.value
CQL.PreparedStatementsRatio.value
CQL.PreparedStatementsEvicted.value
Client Errors
ClientRequest.
$REQUEST.Unavailables.1MinuteRate
$REQUEST.Timeouts.1MinuteRate
$REQUEST.Failures.1MinuteRate
Inconsistency
Storage.TotalHints.count
HintedHandOffManager.
Hints_created-$IP_ADDRESS.count
Connection.TotalTimeouts.1MinuteRate
Connection.$IP_ADDRESS.Timeouts.
1MinuteRate
Inconsistency
Will also want to monitor
dropped messages, later…
Eventual Consistency
ReadRepair.Attempted.1MinuteRate
ReadRepair.RepairedBackground.
1MinuteRate
ReadRepair.RepairedBlocking.1MinuteRate
Server Errors
Storage.Exceptions.count
Disk Usage
Storage.Load.count
Table.$KEYSPACE.$TABLE.
TotalDiskSpaceUsed.count
Compactions
Compaction.PendingTasks.value
Compaction.TotalCompactionsCompleted.
1MinuteRate
Table.$KEYSPACE.$TABLE.PendingCompactions
.value
Thread Pool Performance
ThreadPools.request.
MutationStage.PendingTasks.value
ReadStage.PendingTasks.value
CounterMutationStage.PendingTasks.value
RequestResponseStage.PendingTasks.value
ViewMutationStage.PendingTasks.value
Thread Pool Performance
DroppedMessage.
MUTATION.Dropped.1MinuteRate
READ.Dropped.1MinuteRate
Thread Pool Performance
DroppedMessage.
$VERB.InternalDroppedLatency
.95thPercentile
$VERB.CrossNodeDroppedLatency
.95thPercentile
Commit Log Performance
CommitLog.
PendingTasks.Value
WaitingOnSegmentAllocation.
95thPercentile
WaitingOnCommit.Value
Thanks.
Aaron Morton
@aaronmorton
Co-Founder & Principal Consultant
www.thelastpickle.com

More Related Content

Cassandra Day Atlanta 2016 - Monitoring Cassandra