SlideShare a Scribd company logo
© 2018 Cisco Systems, Inc. All rights reserved.
MongoDB for High Volume
Time Series Data Streams
MongoDB World 2018
Gabriel Ng, Tom Monk, Kollivakkam Raghavan
Cisco Systems, Inc.
© 2018 Cisco Systems, Inc. All rights reserved.
The Problem – Network Assurance
Network assurance is the guarantee that the network is
doing what the operator(s) intended it to do.
© 2018 Cisco Systems, Inc. All rights reserved.
The Ecosystem - IT Demands on the Modern Datacenter
Agility
Months -> Hours
SecurityMobility
Scale
10K -> 100K -> 1B
• Network Engineers
• Application Developers
• Security Architects
• Server Engineers
• Network Operations
• Data Scientists
• Dev Ops
© 2018 Cisco Systems, Inc. All rights reserved.
Policy-Based Data Center
 Controller with end-to-end application
awareness
 IP fabric connecting all physical and virtual
workloads and services
 Application Network Profile (ANP) pushed
to all components
Database Tier
Application Tier
Web Tier
ProfilesController
© 2018 Cisco Systems, Inc. All rights reserved.
The Constraints
 Must work with multiple form factors
 Personal laptop – single VM (limited memory, SSD disk)
 Lab Environment – multiple VMs (HDD disks – limited storage)
 Production – multiple VMs (HDD/SSD disks)
 Solution must work with limited memory
 Constrained Wired Tiger cache
 Need to maximize write throughput without compromising reads
 Querying flexibility
 Extensive use of Aggregation pipeline
© 2018 Cisco Systems, Inc. All rights reserved.
Cisco Network Assurance Engine
Cisco NAE
Cisco Network Assurance Engine: How It Works
Comprehensive
Network Modeling
Using formal methods (area of
Comp Sci) to mathematically
compute consistency
Analyze the results and
recommend remediation steps for
problems
Data
Collection
Captures all non-packet data:
intent, policy, state across data
center network
Intelligent
Analysis
User Interface: Search and Visualization
User Interface: Change Management Events
Events: What, Where, Why, and How
User Interface: Incident and Problem Management
© 2018 Cisco Systems, Inc. All rights reserved.
High Volume Time Series Data Stream App
 ~12 million time series data points per hour for the
largest fabrics
 Proper analysis of data stream requires keeping
several hours of recent context data on hand
 Streaming data platform with 3 tier web stack WT
cache
Relative size of
WiredTiger cache
Random access index
eventcollection
© 2018 Cisco Systems, Inc. All rights reserved.
Agenda
 Incremental Optimizations
 B-tree Indexes
 Date Interval Partitioning
 System Configs
 Pre-aggregation
 Replication Factor
 Final Design
© 2018 Cisco Systems, Inc. All rights reserved.
Sample Time Series Data
 The time at which the data was produced is a critical property
 Other examples:
 Stock ticker data
 HTTP logs
 Twitter firehose
{
"timestamp": ISODate("2018-05-14T08:20:27.433Z"),
"rule": "sys/actrl/scope-16777200/rule-16777200-s-any-d-any-f-implarp",
"leaf": "topology/pod-1/node-1023",
"hitcount": NumberDecimal("934852839479")
}
© 2018 Cisco Systems, Inc. All rights reserved.
B-tree Indexes: Refresher
 MongoDB indexes use a B-tree
data structure
 B-tree is a form of a binary search
tree, so index keys are kept in
sorted order
Sample index: {_id: 1}
{
_id: 1
…
}
{
_id: 4
…
}
{
_id: 9
…
}
{
_id: 10
…
}
{
_id: 11
…
}
{
_id: 12
…
}
Document data:
{
_id: 13
…
}
{
_id: 15
…
}
{
_id: 16
…
}
{
_id: 20
…
}
{
_id: 25
…
}
© 2018 Cisco Systems, Inc. All rights reserved.
B-tree Indexes: Typical Access Pattern
 Many workloads require supporting random access patterns in indexes
 This is why it is a “law” in database design to make sure your indexes fit in memory
1 23 45 678 9 1011
© 2018 Cisco Systems, Inc. All rights reserved.
B-tree Indexes: Time Series Writes Pattern
 For time series data, including the timestamp as the first field in a compound index yields beautiful
properties
Data access pattern for compound index
{ timestamp: 1, rule: 1 }
on an insert-only time series workload.
1 2 3 4 5 6 7 8 9 10 11
© 2018 Cisco Systems, Inc. All rights reserved.
B-tree Indexes: Breaking the law!
Reference: MongoDB Manual. Ensure Indexes Fit in RAM. Section “Indexes that Hold Only Recent Values in RAM.”
https://docs.mongodb.com/manual/tutorial/ensure-indexes-fit-ram/
Page on disk!
Page in RAM
© 2018 Cisco Systems, Inc. All rights reserved.
B-tree Indexes: Prefix Compression
 Putting timestamp first in a compound index also allows WiredTiger to
do prefix compression on the key values
 Timestamp values in hex:
 0x5ab92260
 0x5ab93070
 0x5ab93e80
 0x5ab94c90
 0x5ab95aa0
 Does it look familiar?
 "_id" : ObjectId("5ab95aa0f32f9359485f8bb3")
© 2018 Cisco Systems, Inc. All rights reserved.
Before Date Interval Partitioning
“Right-sized”
index
eventcollection
WT
cache
Relative size of
WiredTiger cache
© 2018 Cisco Systems, Inc. All rights reserved.
After Date Interval Partitioning
One “logical”
event collection
…
event_may_1_2018
event_may_2_2018
event_may_3_2018
event_may_4_2018
event_may_5_2018
event_may_6_2018
…
event_june_1_2018
event_june_2_2018
event_june_3_2018
event_june_4_2018
event_june_5_2018
event_june_6_2018
WT
cache
Relative size of
WiredTiger cache
Many “physical”
event collections
© 2018 Cisco Systems, Inc. All rights reserved.
MongoDB Performance Configs
 Engage with MongoDB support
 Read: https://docs.mongodb.com/manual/administration/production-notes
 Run Mdiags:
 https://github.com/mongodb/support-tools/blob/master/mdiag/mdiag.sh
 XFS better performance than EXT4
 TCP Keepalive setting recommendations
 Enable swap
 Configs specifically for our product:
 WiredTiger cache size
© 2018 Cisco Systems, Inc. All rights reserved.
Pre-aggregation
 For trend queries that must span a large range of time, use pre-aggregation to create a collection of
summarized trend stats
 Raw data point: Daily aggregation:
{
"timestamp": ISODate("2018-05-14T08:20:27.433Z"),
"rule": "sys/actrl/scope-16777200/rule-16777200-s-
any-d-any-f-implarp",
"leaf": "topology/pod-1/node-1023",
"hitcount": NumberDecimal("934852839479")
}
{
"timestamp": ISODate("2018-05-14T00:00:00.000Z"),
"rule": "sys/actrl/scope-16777200/rule-16777200-s-any-
d-any-f-implarp",
"leaf": "topology/pod-1/node-1023",
"hitcounts": [
"270" : NumberDecimal("934852348595"),
...
"28947" : NumberDecimal("934857839479")
]
}
Reference: Sandeep Parikh & Kelly Stirman, Schema Design for Time Series Data in MongoDB.
https://www.mongodb.com/blog/post/schema-design-for-time-series-data-in-mongodb
© 2018 Cisco Systems, Inc. All rights reserved.
Replication Factor 2 or 3?
Reference: MongoDB Manual. Three Member Replica Sets.
https://docs.mongodb.com/manual/core/replica-set-architecture-three-members/
• Write concern 1 (w1) writes block until acknowledged by primary.
• Write concern 2 (w2) writes block until acknowledged by primary and at least one secondary.
• w1 writes are not 100% durable in the case of primary failure – durability vs. performance
Primary with Two Secondary Members
“PSS,” replication factor 3
Primary with a Secondary and an Arbiter
“PSA,” replication factor 2
© 2018 Cisco Systems, Inc. All rights reserved.
Replication Factor 2 or 3?
0
5,000
10,000
15,000
PSS, w2 writes PSS, w1 writes* PSA, w1 writes
Write throughput on single shard (docs/sec)
Write throughput on single shard (docs/sec)
* PSS, w1 writes resulted in the replication state of the secondary nodes drifting minutes apart under heavy write loads
Reference: Mike LaSpina. Benchmarks on three node replica sets. August 10, 2017.
© 2018 Cisco Systems, Inc. All rights reserved.
Replication Factor 2 or 3?
 CPU and disk load high during heavy periods of writes for Primary and
Secondary
 Secondary lags behind during these periods
 Experiment with PSA (Primary, Secondary and Arbiter)
 Higher write throughput because Primary does not need to service two
Secondary nodes
 Tradeoff/judgement call: 100% durable writes or write throughput?
© 2018 Cisco Systems, Inc. All rights reserved.
Final Design
 Date interval partitioning in favor of right-sized B-
tree indexes
 Pre-aggregate expensive queries
 Replication factor 2 write concern 1 for max write
throughput
© 2018 Cisco Systems, Inc. All rights reserved.
Questions?
 More information on Cisco Network Assurance Engine
http://cs.co/9007D3dWL
 Cisco is hiring!
http://jobs.cisco.com/

More Related Content

MongoDB World 2018: MongoDB for High Volume Time Series Data Streams

  • 1. © 2018 Cisco Systems, Inc. All rights reserved. MongoDB for High Volume Time Series Data Streams MongoDB World 2018 Gabriel Ng, Tom Monk, Kollivakkam Raghavan Cisco Systems, Inc.
  • 2. © 2018 Cisco Systems, Inc. All rights reserved. The Problem – Network Assurance Network assurance is the guarantee that the network is doing what the operator(s) intended it to do.
  • 3. © 2018 Cisco Systems, Inc. All rights reserved. The Ecosystem - IT Demands on the Modern Datacenter Agility Months -> Hours SecurityMobility Scale 10K -> 100K -> 1B • Network Engineers • Application Developers • Security Architects • Server Engineers • Network Operations • Data Scientists • Dev Ops
  • 4. © 2018 Cisco Systems, Inc. All rights reserved. Policy-Based Data Center  Controller with end-to-end application awareness  IP fabric connecting all physical and virtual workloads and services  Application Network Profile (ANP) pushed to all components Database Tier Application Tier Web Tier ProfilesController
  • 5. © 2018 Cisco Systems, Inc. All rights reserved. The Constraints  Must work with multiple form factors  Personal laptop – single VM (limited memory, SSD disk)  Lab Environment – multiple VMs (HDD disks – limited storage)  Production – multiple VMs (HDD/SSD disks)  Solution must work with limited memory  Constrained Wired Tiger cache  Need to maximize write throughput without compromising reads  Querying flexibility  Extensive use of Aggregation pipeline
  • 6. © 2018 Cisco Systems, Inc. All rights reserved. Cisco Network Assurance Engine Cisco NAE
  • 7. Cisco Network Assurance Engine: How It Works Comprehensive Network Modeling Using formal methods (area of Comp Sci) to mathematically compute consistency Analyze the results and recommend remediation steps for problems Data Collection Captures all non-packet data: intent, policy, state across data center network Intelligent Analysis
  • 8. User Interface: Search and Visualization
  • 9. User Interface: Change Management Events Events: What, Where, Why, and How
  • 10. User Interface: Incident and Problem Management
  • 11. © 2018 Cisco Systems, Inc. All rights reserved. High Volume Time Series Data Stream App  ~12 million time series data points per hour for the largest fabrics  Proper analysis of data stream requires keeping several hours of recent context data on hand  Streaming data platform with 3 tier web stack WT cache Relative size of WiredTiger cache Random access index eventcollection
  • 12. © 2018 Cisco Systems, Inc. All rights reserved. Agenda  Incremental Optimizations  B-tree Indexes  Date Interval Partitioning  System Configs  Pre-aggregation  Replication Factor  Final Design
  • 13. © 2018 Cisco Systems, Inc. All rights reserved. Sample Time Series Data  The time at which the data was produced is a critical property  Other examples:  Stock ticker data  HTTP logs  Twitter firehose { "timestamp": ISODate("2018-05-14T08:20:27.433Z"), "rule": "sys/actrl/scope-16777200/rule-16777200-s-any-d-any-f-implarp", "leaf": "topology/pod-1/node-1023", "hitcount": NumberDecimal("934852839479") }
  • 14. © 2018 Cisco Systems, Inc. All rights reserved. B-tree Indexes: Refresher  MongoDB indexes use a B-tree data structure  B-tree is a form of a binary search tree, so index keys are kept in sorted order Sample index: {_id: 1} { _id: 1 … } { _id: 4 … } { _id: 9 … } { _id: 10 … } { _id: 11 … } { _id: 12 … } Document data: { _id: 13 … } { _id: 15 … } { _id: 16 … } { _id: 20 … } { _id: 25 … }
  • 15. © 2018 Cisco Systems, Inc. All rights reserved. B-tree Indexes: Typical Access Pattern  Many workloads require supporting random access patterns in indexes  This is why it is a “law” in database design to make sure your indexes fit in memory 1 23 45 678 9 1011
  • 16. © 2018 Cisco Systems, Inc. All rights reserved. B-tree Indexes: Time Series Writes Pattern  For time series data, including the timestamp as the first field in a compound index yields beautiful properties Data access pattern for compound index { timestamp: 1, rule: 1 } on an insert-only time series workload. 1 2 3 4 5 6 7 8 9 10 11
  • 17. © 2018 Cisco Systems, Inc. All rights reserved. B-tree Indexes: Breaking the law! Reference: MongoDB Manual. Ensure Indexes Fit in RAM. Section “Indexes that Hold Only Recent Values in RAM.” https://docs.mongodb.com/manual/tutorial/ensure-indexes-fit-ram/ Page on disk! Page in RAM
  • 18. © 2018 Cisco Systems, Inc. All rights reserved. B-tree Indexes: Prefix Compression  Putting timestamp first in a compound index also allows WiredTiger to do prefix compression on the key values  Timestamp values in hex:  0x5ab92260  0x5ab93070  0x5ab93e80  0x5ab94c90  0x5ab95aa0  Does it look familiar?  "_id" : ObjectId("5ab95aa0f32f9359485f8bb3")
  • 19. © 2018 Cisco Systems, Inc. All rights reserved. Before Date Interval Partitioning “Right-sized” index eventcollection WT cache Relative size of WiredTiger cache
  • 20. © 2018 Cisco Systems, Inc. All rights reserved. After Date Interval Partitioning One “logical” event collection … event_may_1_2018 event_may_2_2018 event_may_3_2018 event_may_4_2018 event_may_5_2018 event_may_6_2018 … event_june_1_2018 event_june_2_2018 event_june_3_2018 event_june_4_2018 event_june_5_2018 event_june_6_2018 WT cache Relative size of WiredTiger cache Many “physical” event collections
  • 21. © 2018 Cisco Systems, Inc. All rights reserved. MongoDB Performance Configs  Engage with MongoDB support  Read: https://docs.mongodb.com/manual/administration/production-notes  Run Mdiags:  https://github.com/mongodb/support-tools/blob/master/mdiag/mdiag.sh  XFS better performance than EXT4  TCP Keepalive setting recommendations  Enable swap  Configs specifically for our product:  WiredTiger cache size
  • 22. © 2018 Cisco Systems, Inc. All rights reserved. Pre-aggregation  For trend queries that must span a large range of time, use pre-aggregation to create a collection of summarized trend stats  Raw data point: Daily aggregation: { "timestamp": ISODate("2018-05-14T08:20:27.433Z"), "rule": "sys/actrl/scope-16777200/rule-16777200-s- any-d-any-f-implarp", "leaf": "topology/pod-1/node-1023", "hitcount": NumberDecimal("934852839479") } { "timestamp": ISODate("2018-05-14T00:00:00.000Z"), "rule": "sys/actrl/scope-16777200/rule-16777200-s-any- d-any-f-implarp", "leaf": "topology/pod-1/node-1023", "hitcounts": [ "270" : NumberDecimal("934852348595"), ... "28947" : NumberDecimal("934857839479") ] } Reference: Sandeep Parikh & Kelly Stirman, Schema Design for Time Series Data in MongoDB. https://www.mongodb.com/blog/post/schema-design-for-time-series-data-in-mongodb
  • 23. © 2018 Cisco Systems, Inc. All rights reserved. Replication Factor 2 or 3? Reference: MongoDB Manual. Three Member Replica Sets. https://docs.mongodb.com/manual/core/replica-set-architecture-three-members/ • Write concern 1 (w1) writes block until acknowledged by primary. • Write concern 2 (w2) writes block until acknowledged by primary and at least one secondary. • w1 writes are not 100% durable in the case of primary failure – durability vs. performance Primary with Two Secondary Members “PSS,” replication factor 3 Primary with a Secondary and an Arbiter “PSA,” replication factor 2
  • 24. © 2018 Cisco Systems, Inc. All rights reserved. Replication Factor 2 or 3? 0 5,000 10,000 15,000 PSS, w2 writes PSS, w1 writes* PSA, w1 writes Write throughput on single shard (docs/sec) Write throughput on single shard (docs/sec) * PSS, w1 writes resulted in the replication state of the secondary nodes drifting minutes apart under heavy write loads Reference: Mike LaSpina. Benchmarks on three node replica sets. August 10, 2017.
  • 25. © 2018 Cisco Systems, Inc. All rights reserved. Replication Factor 2 or 3?  CPU and disk load high during heavy periods of writes for Primary and Secondary  Secondary lags behind during these periods  Experiment with PSA (Primary, Secondary and Arbiter)  Higher write throughput because Primary does not need to service two Secondary nodes  Tradeoff/judgement call: 100% durable writes or write throughput?
  • 26. © 2018 Cisco Systems, Inc. All rights reserved. Final Design  Date interval partitioning in favor of right-sized B- tree indexes  Pre-aggregate expensive queries  Replication factor 2 write concern 1 for max write throughput
  • 27. © 2018 Cisco Systems, Inc. All rights reserved. Questions?  More information on Cisco Network Assurance Engine http://cs.co/9007D3dWL  Cisco is hiring! http://jobs.cisco.com/

Editor's Notes

  1. Let’s double-click to see how it works. 1. Starting from the left – what data do we collect. Candid goes to every leaf, every spine in the network and collects all the configurations and control-place state, data-plane state, even hardware state like TCAM tables, VLAN tables etc. From the controller we pick up the entire policy and configs and a representation of the intent. In addition, we have the implicit intent based on the expected network behavior. 2. With all this we now build the comprehensive network model – underlay, overlay, and tenancy layers. 3. Against this model – we run checks based on 30+ years of Cisco operational domain experience. These checks are based on 3 things: i) our expertise on how networks and our hardware should correctly operate, - there should be no routing loops, or no overlapping subnets in a VRFs of duplicate Ips and so on. ii) best design practices that we learn from our AS teams. If you want a subnet to talk externally what are all the BD and L3out configs required, or all the access policies required to correctly deploy an EPG iii) finally, from our TAC cases. The 10% of of failure scenarios that cause 90% of failures in the field. Bringing this collective knowledge for all our customers. Every 15 mins orso, the engine builds the most real-time model of the network, and runs these checks against that model – like an intelligent robot watching your back, always checking the network for correctness.
  2. Narrative: Discuss smart events, discuss the drilling down into human readable suggested next steps. The “Assurance Engine” talks to you…
  3. Narrative: Discuss smart events, discuss the drilling down into human readable suggested next steps. The “Assurance Engine” talks to you…
  4. Narrative: Discuss smart events, discuss the drilling down into human readable suggested next steps. The “Assurance Engine” talks to you…