Time Series Data With
Apache Cassandra
Berlin Buzzwords
May 27, 2014
Eric Evans

OpenNMS: What It Is
● Network Management System
○ Discovery and Provisioning
○ Service monitoring
○ Data collection
○ Event management, notifications
● Java, open source, GPLv3
● Since 1999
Time series: RRDTool
● Round Robin Database
● First released 1999
● Time series storage
● File-based, constant-size, self-maintaining
● Automatic, incremental aggregation

… and oh yeah, graphing
● 5+ IOPs per update (read-modify-write)!
● 100,000s of metrics, 1,000s IOPS
● 1,000,000s of metrics, 10,000s IOPS
● 15,000 RPM SAS drive, ~175-200 IOPS
Time Series Data with Apache Cassandra
We collect and write a great deal; We read
(graph) relatively little.
So why are we aggregating everything?

● Not everything is a graph
● Inflexible
● Incremental backups impractical
● Availability subject to filesystem access
Metrics typically appear in groups that are
accessed together.
Optimizing storage for grouped access is a
great idea!
What OpenNMS needs:
● High throughput
● High availability
● Late aggregation
● Grouped storage/retrieval
● Apache top-level project
● Distributed database
● Highly available
● High throughput
● Tunable consistency

Write Properties
● Optimized for write throughput
● Sorted on disk
● Perfect for time series!
Key: Apple
Key: Apple

Key: Apple
CAP Theorem
Partition tolerance
R+W > N

Distribution Properties
● Symmetrical
● Linearly scalable
● Redundant
● Highly available
D ata odelM
Data Model
Data Model
T1 T2 T3

Data Model
M1 M2
V1 V2
M1 M2
V1 V2
M1 M2
V1 V2
Time Series Data with Apache Cassandra
Data Model
CREATE TABLE samples (
T timestamp,
M text,
V double,
resource text,
PRIMARY KEY(resource, T, M)
Data model
V1T1 M1 V2T1 M2 T1 V3M3resource

Data model
SELECT * FROM samples
WHERE resource = ‘resource’
AND T = ‘T1’;
V1T1 M1 V2T1 M2 T1 V3M3resource
Data model
T1 M1 V1resource
V1T1 M1 V2T1 M2 T1 V3M3resource
Data model
T1 M1 V1
T1 M2 V2
V1T1 M1 V2T1 M2 T1 V3M3resource
Data model
T1 M1 V1
T1 M2 V2
T1 M3 V3
V1T1 M1 V2T1 M2 T1 V3M3resource

Data model
SELECT * FROM samples
WHERE resource = ‘resource’
AND T >= ‘T1’ AND T <= ‘T3’;
V1T1 M1 V1T2 M1 T3 V1M1resource
● Standalone time series data-store
● Raw sample storage and retrieval
● Flexible aggregations (computed at read)
○ Rate (counter types)
○ Functions pluggable
○ Arbitrary calculations
● Cassandra-speed
● Java API
● REST interface
● Apache licensed
● Github (

Time Series Data with Apache Cassandra