SlideShare a Scribd company logo
Achieving Scale with MongoDB
David Lutz
Senior SolutionsArchitect
Agenda
• Optimize for Scale
– Design the Schema
– Build Indexes
– Use Monitoring Tools
– Use WiredTiger Storage
• Vertical Scaling
• Horizontal Scaling
Don’t Substitute Scaling for Optimization
• Make sure you are solving the right problem
• Remedy schema and index problems first
• We’ll discuss both …
Optimization Tips:
Schema Design
Documents are Rich Data Structures
{
customer_id : 123,
first_name : ”John”,
last_name : "Smith”,
address : {
location : [45.123,47.232]
street : "123 Main Street",
city : "Houston",
state : "TX",
zip_code : "77027"
}
profession : [“banking”, ”trader”]
policies: [ {
policy_number : 13,
description : “short term”,
deductible : 500
},
{ policy_number : 14,
description : “dental”,
visits : […]
} ]
}
Fields can contain an array of sub-documents
Fields
Typed field values
Fields can contain arrays
The Document Data Model
Matches Application Objects
– Eases development
Flexible
– Evolves with application
High performance
– Designed for access pattern
{ customer_id : 123,
first_name : ”John",
last_name : "Smith",
address : {
street: "123 Main Street",
city: "Houston",
state: "TX",
zip_code: "77027"
}
policies: [ {
policy_number : 13,
description: “short term”,
deductible: 500
},
{ policy_number : 14,
description: “dental”,
visits: […]
} ]
}
The Importance of Schema Design
• Very different from RDBMS schema design
• With MongoDB Schema:
– denormalize the data
– create a schema with prior knowledge of your
actual query patterns, then …
– write simple queries
Real World Example
Product catalog for retailer selling in 20 countries
{
_id: 375,
en_US: { name: …, description: …, <etc…> },
en_GB: { name: …, description: …, <etc…> },
fr_FR: { name: …, description: …, <etc…> },
fr_CA: { name: …, description: …, <etc…> },
de_DE: …,
<… and so on for other locales …>
}
• What's good about this schema?
–Each document contains all the data
about the product across all possible
locales.
–It is the most efficient way to retrieve all
translations of a product in a single
query (English, French, German, etc).
Real World Example
But that’s not how the data was accessed!
db.catalog.find( { _id: 375 }, { en_US: true } );
db.catalog.find( { _id: 375 }, { fr_FR: true } );
db.catalog.find( { _id: 375 }, { de_DE: true } );
… and so forth for other locales
The data model did not fit the access pattern.
Real World Example
Inefficient use of resources
Data in BLUE are being
used. Data in RED take
up memory but are not in
demand.
{
_id: 375,
en_US: { name: …, description: …, <etc…> },
en_GB: { name: …, description: …, <etc…> },
fr_FR: { name: …, description: …, <etc…> },
fr_CA: { name: …, description: …, <etc…> },
de_DE: …,
<… and so on for other locales …>
}
{
_id: 42,
en_US: { name: …, description: …, <etc…> },
en_GB: { name: …, description: …, <etc…> },
fr_FR: { name: …, description: …, <etc…> },
fr_CA: { name: …, description: …, <etc…> },
de_DE: …,
<… and so on for other locales …>
}
Consequences of Schema Redesign
• Queries induced minimal memory overhead
• 20x as many products fit in RAM at once
• Disk IO utilization reduced
• Application latency reduced
{
_id: "375-en_GB",
name: …,
description: …,
<… the rest of the document …>
}
Schema Design Patterns
• Pattern: pre-computing interesting quantities,
ideally with each write operation
• Pattern: putting unrelated items in different
collections to take advantage of indexing
• Anti-pattern: appending to arrays ad infinitum
• Anti-pattern: importing relational schemas
(3NF) directly into MongoDB
Schema Design Resources
• The docs! Data Model Design, Patterns & Examples
https://docs.mongodb.org/manual/core/data-model-design/
https://docs.mongodb.org/manual/applications/data-models/
https://docs.mongodb.org/manual/MongoDB-data-models-guide.pdf
• The blogs! http://blog.mongodb.org
"6 Rules of Thumb for Schema Design"
– Part 1: http://goo.gl/TFJ3dr
– Part 2: http://goo.gl/qTdGhP
– Part 3: http://goo.gl/JFO1pI
• Webinars, training, consulting, etc…
Optimization Tips:
Indexing
Indexes
• Single biggest tunable performance factor
• Tree-structured references to your documents
• Indexing and schema design go hand in hand
)))))))))))))
)
Indexing Mistakes and Their Fixes
• Failing to build necessary indexes
– Run .explain(), examine slow query log and
system.profile collection, download mtools
• Building unnecessary indexes
– Talk to your application developers about usage
• Running ad-hoc queries in production
– Use a staging environment, use secondaries
Indexing Strategies
• Create indexes that support your queries!
• Create highly selective indexes
• Eliminate duplicate indexes with compound indexes
– db.collection.ensureIndex({A:1, B:1, C:1})
– allows queries using leftmost prefix
• Order index columns to support scans & sorts
• Create indexes that support covered queries
• Prevent collection scans in pre-production environments
db.runCommand( { setParameter: 1, notablescan: 1 } )
Indexing Example – Before an Index
db.tweets.explain("executionStats").find( {"user.lang":”ja"}
)
{"winningPlan" : {
"inputStage" : {
"stage" : “COLLSCAN",
"keyPattern" : {
"user.lang" : 1 } } },
"executionStats" : {
"nReturned" : 3560,
"executionTimeMillis" : 56,
"totalKeysExamined" : 0,
"totalDocsExamined" : 51428 } }
Indexing Example – After an Index
db.tweets.explain("executionStats").find( {"user.lang":”ja"}
)
{"winningPlan" : {
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"user.lang" : 1 } } },
"executionStats" : {
"nReturned" : 3560,
"executionTimeMillis" : 8,
"totalKeysExamined" : 3560,
"totalDocsExamined" : 3560 } }
Optimization Tips:
Monitoring
The Best Way to Run MongoDB
Ops Manager allows you
leverage and automate the best practices
we’ve learned from thousands of
deployments in a comprehensive
application that helps you run MongoDB
safely and reliably.
Benefits include:
10x-20x more efficient operations
Complete performance visibility
Assisted performance optimization
Ops Manager Provides:
for Developers
• Visual Query Profiler
for Administrators
• Index Suggestions
• Automated Index Builds
• Monitoring and Alerting
for Operations
• APM Integration
• Database Automation
• Backup with Point-In-Time
Recovery
Fast and simple
query optimization
with the Visual
Query Profiler
Query Visualization and Optimization
Example Deployment – 12 Servers
Install, Configure
150+ steps
…Error handling, throttling, alerts
Scale out, move servers, resize oplog, etc.
10-180+ steps
Upgrades, downgrades
100+ steps
Without Ops Manager
With Ops Manager
Also Available in the Cloud
Cloud Manager
allows you to leverage and
automate the best practices
we’ve learned from thousands of
deployments in a comprehensive
application that helps you run
MongoDB safely and reliably …
in the cloud!
http://cloud.mongodb.com
Manual Monitoring Tools
mongod
log file
profiler (collection)
query engine
Review log files, or
Use mtools to visualize them –
http://github.com/rueckstiess/mtools
.explain(), is your friend
• queryPlanner
• executionStats
• allPlansExecution
ENABLE
WiredTiger Storage Engine
7x - 10x Performance & 50% - 80% Less Storage
• 100% backwards compatible
• Non-disruptive upgrade
• Same data model, query language, ops
• WRITE performance gains driven by document-
level concurrency control
• Storage savings driven by native compression MongoDB
3.0/3.2
MongoDB
2.6
Performance
Vertical Scaling
Factors:
– RAM
– Disk
– CPU
– Network
We are Here to Pump you Up
Primary
Secondary
Secondary
Replica Set Primary
Secondary
Secondary
Replica Set
Before you add hardware....
• Make sure you are solving the right scaling problem
• Remedy schema and index problems first
– schema and index problems can look like hardware
problems
• Tune the Operating System
– ulimits, swap, NUMA, NOOP scheduler with
hypervisors
• Tune the IO subsystem
– ext4 or XFS vs SAN, RAID10,
readahead, noatime
• See MongoDB “Production Checklist”
• Heed logfile startup warnings
Working Set Exceeds Physical Memory
Initial Architecture
4-Way Cluster backed by spinning disk
Application / mongos
mongod
Vertical Scaling
Scaling random IOPS with SSDs
Application / mongos
mongod SSD
Horizontal Scaling
Horizontal Scaling
Rapidly growing business means more shards
Application w/ driver
& mongos
…16 more shards…
mongod
What is a Shard Key?
• Shard key must be indexed
• Shard key is used to partition your collection
• Shard key must exist in every document
• Shard key is immutable
• Shard key values are immutable
• Shard key is used to route requests to shards
See How to Choose a Shard Key: The Card Game
https://www.kchodorow.com/blog/2011/01/04/how-to-choose-a-shard-key-the-card-game/
Shard Key Characteristics
• A good shard key has:
– sufficient cardinality
– distributed writes
– targeted reads ("query isolation")
• Shard key should be in every query, if possible
– Scatter-gather otherwise
• Choosing a good shard key is important!
– affects performance and scalability
– changing it later can be expensive
Range-based Sharding
Bagpipes Iceberg
Snow
Cone
A - C D - O P - Z
Shard
Shard key range
Shard key
Balancing
Dates Dragons
A - C D - O P - Z
Balancing
A - De Df - O P - Z
Background process balances data across shards
Other Forms of Sharding
There are more advanced types of sharding that
are discussed in our sharding webinars.
• Tag-aware, aka zone partitioning, is a special case of
range-based sharding that allows for data locality
• Hash-based, aka hash partitioning, uses a hashed
value derived from the shard key(s) for assignment
Examples of Scale
Cluster, Performance & Data Scale
Cluster Scale Performance Scale Data Scale
Entertain
Co.
1400
servers
250M Ticks
/ Sec
Petabytes
Asian
Internet
Co.
1000+
servers
300K+ Ops
/ Sec
10s of
billions of
objects
250+
servers
Fed Agency
500K+ Ops
/ Sec
13B
documents

More Related Content

Webinar: Scaling MongoDB

  • 1. Achieving Scale with MongoDB David Lutz Senior SolutionsArchitect
  • 2. Agenda • Optimize for Scale – Design the Schema – Build Indexes – Use Monitoring Tools – Use WiredTiger Storage • Vertical Scaling • Horizontal Scaling
  • 3. Don’t Substitute Scaling for Optimization • Make sure you are solving the right problem • Remedy schema and index problems first • We’ll discuss both …
  • 5. Documents are Rich Data Structures { customer_id : 123, first_name : ”John”, last_name : "Smith”, address : { location : [45.123,47.232] street : "123 Main Street", city : "Houston", state : "TX", zip_code : "77027" } profession : [“banking”, ”trader”] policies: [ { policy_number : 13, description : “short term”, deductible : 500 }, { policy_number : 14, description : “dental”, visits : […] } ] } Fields can contain an array of sub-documents Fields Typed field values Fields can contain arrays
  • 6. The Document Data Model Matches Application Objects – Eases development Flexible – Evolves with application High performance – Designed for access pattern { customer_id : 123, first_name : ”John", last_name : "Smith", address : { street: "123 Main Street", city: "Houston", state: "TX", zip_code: "77027" } policies: [ { policy_number : 13, description: “short term”, deductible: 500 }, { policy_number : 14, description: “dental”, visits: […] } ] }
  • 7. The Importance of Schema Design • Very different from RDBMS schema design • With MongoDB Schema: – denormalize the data – create a schema with prior knowledge of your actual query patterns, then … – write simple queries
  • 8. Real World Example Product catalog for retailer selling in 20 countries { _id: 375, en_US: { name: …, description: …, <etc…> }, en_GB: { name: …, description: …, <etc…> }, fr_FR: { name: …, description: …, <etc…> }, fr_CA: { name: …, description: …, <etc…> }, de_DE: …, <… and so on for other locales …> }
  • 9. • What's good about this schema? –Each document contains all the data about the product across all possible locales. –It is the most efficient way to retrieve all translations of a product in a single query (English, French, German, etc). Real World Example
  • 10. But that’s not how the data was accessed! db.catalog.find( { _id: 375 }, { en_US: true } ); db.catalog.find( { _id: 375 }, { fr_FR: true } ); db.catalog.find( { _id: 375 }, { de_DE: true } ); … and so forth for other locales The data model did not fit the access pattern. Real World Example
  • 11. Inefficient use of resources Data in BLUE are being used. Data in RED take up memory but are not in demand. { _id: 375, en_US: { name: …, description: …, <etc…> }, en_GB: { name: …, description: …, <etc…> }, fr_FR: { name: …, description: …, <etc…> }, fr_CA: { name: …, description: …, <etc…> }, de_DE: …, <… and so on for other locales …> } { _id: 42, en_US: { name: …, description: …, <etc…> }, en_GB: { name: …, description: …, <etc…> }, fr_FR: { name: …, description: …, <etc…> }, fr_CA: { name: …, description: …, <etc…> }, de_DE: …, <… and so on for other locales …> }
  • 12. Consequences of Schema Redesign • Queries induced minimal memory overhead • 20x as many products fit in RAM at once • Disk IO utilization reduced • Application latency reduced { _id: "375-en_GB", name: …, description: …, <… the rest of the document …> }
  • 13. Schema Design Patterns • Pattern: pre-computing interesting quantities, ideally with each write operation • Pattern: putting unrelated items in different collections to take advantage of indexing • Anti-pattern: appending to arrays ad infinitum • Anti-pattern: importing relational schemas (3NF) directly into MongoDB
  • 14. Schema Design Resources • The docs! Data Model Design, Patterns & Examples https://docs.mongodb.org/manual/core/data-model-design/ https://docs.mongodb.org/manual/applications/data-models/ https://docs.mongodb.org/manual/MongoDB-data-models-guide.pdf • The blogs! http://blog.mongodb.org "6 Rules of Thumb for Schema Design" – Part 1: http://goo.gl/TFJ3dr – Part 2: http://goo.gl/qTdGhP – Part 3: http://goo.gl/JFO1pI • Webinars, training, consulting, etc…
  • 16. Indexes • Single biggest tunable performance factor • Tree-structured references to your documents • Indexing and schema design go hand in hand ))))))))))))) )
  • 17. Indexing Mistakes and Their Fixes • Failing to build necessary indexes – Run .explain(), examine slow query log and system.profile collection, download mtools • Building unnecessary indexes – Talk to your application developers about usage • Running ad-hoc queries in production – Use a staging environment, use secondaries
  • 18. Indexing Strategies • Create indexes that support your queries! • Create highly selective indexes • Eliminate duplicate indexes with compound indexes – db.collection.ensureIndex({A:1, B:1, C:1}) – allows queries using leftmost prefix • Order index columns to support scans & sorts • Create indexes that support covered queries • Prevent collection scans in pre-production environments db.runCommand( { setParameter: 1, notablescan: 1 } )
  • 19. Indexing Example – Before an Index db.tweets.explain("executionStats").find( {"user.lang":”ja"} ) {"winningPlan" : { "inputStage" : { "stage" : “COLLSCAN", "keyPattern" : { "user.lang" : 1 } } }, "executionStats" : { "nReturned" : 3560, "executionTimeMillis" : 56, "totalKeysExamined" : 0, "totalDocsExamined" : 51428 } }
  • 20. Indexing Example – After an Index db.tweets.explain("executionStats").find( {"user.lang":”ja"} ) {"winningPlan" : { "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "user.lang" : 1 } } }, "executionStats" : { "nReturned" : 3560, "executionTimeMillis" : 8, "totalKeysExamined" : 3560, "totalDocsExamined" : 3560 } }
  • 22. The Best Way to Run MongoDB Ops Manager allows you leverage and automate the best practices we’ve learned from thousands of deployments in a comprehensive application that helps you run MongoDB safely and reliably. Benefits include: 10x-20x more efficient operations Complete performance visibility Assisted performance optimization
  • 23. Ops Manager Provides: for Developers • Visual Query Profiler for Administrators • Index Suggestions • Automated Index Builds • Monitoring and Alerting for Operations • APM Integration • Database Automation • Backup with Point-In-Time Recovery
  • 24. Fast and simple query optimization with the Visual Query Profiler Query Visualization and Optimization
  • 25. Example Deployment – 12 Servers Install, Configure 150+ steps …Error handling, throttling, alerts Scale out, move servers, resize oplog, etc. 10-180+ steps Upgrades, downgrades 100+ steps Without Ops Manager
  • 27. Also Available in the Cloud Cloud Manager allows you to leverage and automate the best practices we’ve learned from thousands of deployments in a comprehensive application that helps you run MongoDB safely and reliably … in the cloud! http://cloud.mongodb.com
  • 28. Manual Monitoring Tools mongod log file profiler (collection) query engine Review log files, or Use mtools to visualize them – http://github.com/rueckstiess/mtools .explain(), is your friend • queryPlanner • executionStats • allPlansExecution ENABLE
  • 30. 7x - 10x Performance & 50% - 80% Less Storage • 100% backwards compatible • Non-disruptive upgrade • Same data model, query language, ops • WRITE performance gains driven by document- level concurrency control • Storage savings driven by native compression MongoDB 3.0/3.2 MongoDB 2.6 Performance
  • 32. Factors: – RAM – Disk – CPU – Network We are Here to Pump you Up Primary Secondary Secondary Replica Set Primary Secondary Secondary Replica Set
  • 33. Before you add hardware.... • Make sure you are solving the right scaling problem • Remedy schema and index problems first – schema and index problems can look like hardware problems • Tune the Operating System – ulimits, swap, NUMA, NOOP scheduler with hypervisors • Tune the IO subsystem – ext4 or XFS vs SAN, RAID10, readahead, noatime • See MongoDB “Production Checklist” • Heed logfile startup warnings
  • 34. Working Set Exceeds Physical Memory
  • 35. Initial Architecture 4-Way Cluster backed by spinning disk Application / mongos mongod
  • 36. Vertical Scaling Scaling random IOPS with SSDs Application / mongos mongod SSD
  • 38. Horizontal Scaling Rapidly growing business means more shards Application w/ driver & mongos …16 more shards… mongod
  • 39. What is a Shard Key? • Shard key must be indexed • Shard key is used to partition your collection • Shard key must exist in every document • Shard key is immutable • Shard key values are immutable • Shard key is used to route requests to shards See How to Choose a Shard Key: The Card Game https://www.kchodorow.com/blog/2011/01/04/how-to-choose-a-shard-key-the-card-game/
  • 40. Shard Key Characteristics • A good shard key has: – sufficient cardinality – distributed writes – targeted reads ("query isolation") • Shard key should be in every query, if possible – Scatter-gather otherwise • Choosing a good shard key is important! – affects performance and scalability – changing it later can be expensive
  • 41. Range-based Sharding Bagpipes Iceberg Snow Cone A - C D - O P - Z Shard Shard key range Shard key
  • 43. Balancing A - De Df - O P - Z Background process balances data across shards
  • 44. Other Forms of Sharding There are more advanced types of sharding that are discussed in our sharding webinars. • Tag-aware, aka zone partitioning, is a special case of range-based sharding that allows for data locality • Hash-based, aka hash partitioning, uses a hashed value derived from the shard key(s) for assignment
  • 46. Cluster, Performance & Data Scale Cluster Scale Performance Scale Data Scale Entertain Co. 1400 servers 250M Ticks / Sec Petabytes Asian Internet Co. 1000+ servers 300K+ Ops / Sec 10s of billions of objects 250+ servers Fed Agency 500K+ Ops / Sec 13B documents

Editor's Notes

  1. MongoDB Ops Manager can do a lot for [ops teams]. Best Practices, Automated. Ops Manager takes best practices for running MongoDB and automates them. So you run ops the way MongoDB engineers would do it. This not only makes it more fool-proof, but it also helps you… Cut Management Overhead. No custom scripting or special setup needed. You can spend less time running and managing manual tasks because Ops Manager takes care of a lot of the work for you, letting you focus on other tasks. Meet SLAs. Automating critical management tasks makes it easier to meet uptime SLAs. This includes managing failover as well as doing rolling upgrades with no downtime. Scale Easily. Provision new nodes and systems with a single click.
  2. MongoDB Ops Manager can do a lot for [ops teams]. Best Practices, Automated. Ops Manager takes best practices for running MongoDB and automates them. So you run ops the way MongoDB engineers would do it. This not only makes it more fool-proof, but it also helps you… Cut Management Overhead. No custom scripting or special setup needed. You can spend less time running and managing manual tasks because Ops Manager takes care of a lot of the work for you, letting you focus on other tasks. Meet SLAs. Automating critical management tasks makes it easier to meet uptime SLAs. This includes managing failover as well as doing rolling upgrades with no downtime. Scale Easily. Provision new nodes and systems with a single click.
  3. Key Takeaway: Save time, effort, reduce risk with Fast and simple query optimization via the new Visual Query Profiler Talking points: Query and write latency are consolidated and displayed visually; your ops teams can easily identify slower queries and latency spikes Visual query profiler analyzes the data it displays and provides recommendations for new indexes that can be created to improve query performance Ops Manager and Cloud Manager can automate the rollout of new indexes, reducing risk and your team’s operational overhead
  4. It is, of course, possible to do these things without MMS. But it takes work. Typically manual work, or custom scripting. In either case, these things take time, require you to check for mistakes and are more prone to having things go wrong.
  5. MongoDB Ops Manager can do a lot for [ops teams]. Best Practices, Automated. Ops Manager takes best practices for running MongoDB and automates them. So you run ops the way MongoDB engineers would do it. This not only makes it more fool-proof, but it also helps you… Cut Management Overhead. No custom scripting or special setup needed. You can spend less time running and managing manual tasks because Ops Manager takes care of a lot of the work for you, letting you focus on other tasks. Meet SLAs. Automating critical management tasks makes it easier to meet uptime SLAs. This includes managing failover as well as doing rolling upgrades with no downtime. Scale Easily. Provision new nodes and systems with a single click.
  6. The figures above are examples. Your application will govern your performance.