SlideShare a Scribd company logo
MongoDB Performance
Manosh Malai
CTO, Mydbops
3rd September 2020
7th Mydbops Database Meetup
Interested in Open Source technologies
Interested in MongoDB, DevOps & DevOpSec Practices
Tech Speaker/Blogger
CTO, Mydbops IT Solution
Manosh Malai
About Me
Consulting
Services
Managed
Services
Focuses on MySQL, MongoDB and PostgreSQL
Mydbops Services
250 + Clients In 4 Yrs. of Operations
Our Clients
MongoDB Performance Best Practices
MongoDB Performance Analysis Tool
Introduction
Agenda
INTRODUCTION
Why MongoDB
Ad Hoc Queries
Schema-less Database
Indexing
Aggregation
GridFS
Sharding
Replication
Document Orientated
PERFORMANCE BEST PRACTICES
1. SCHEMA DESIGN
2. INDEXING
3. LINUX TUNING
{
{
Modelling Approach RDBMS
Develop Application
and Queries
Define/Re-Define Data
Model
Production
Denormalize/Poor
Performance
R
D
B
M
S
Design normalized Data Model/Schema
Develop Application
Data model dictates how to write queries for
application operation
Application evolve and data became denormalized
Re-structured the Data Model and normalized
This cause poor performance and required downtime
Modelling Approach MongoDB
Develop Application
and Queries
Define Data Model
Production
New Requirement
M
o
n
g
o
D
B
MQL
Develop the Application
Define the Data Model
Application evolve
Improve the Data Model
Application Evolve and improve data model will happen
recursively without any downtime and complication
Design is part of each phase of the application lifetime
Strategy of Modelling
Goal 1 Goal 2
Goal 3 Goal 4
Deep Knowledge about
Application behaviour
Predict C, U, R, D
Operation perform
on Database and
priorities
Based on Prediction,
map the relationship
between entities and
C, U, R, D
Finalize the Data model,
which suite to the
application
RDBMS MongoDB
Eid FName LName Email Mobile JobName
101 Manosh Malai abc@mydboxxxxxxxxxx xxx
102 Kabilesh P.R def@mydboxxxxxxxxxx xxx
id Eid Skill
1 101 Linux
2 101 MongoDB
id Eid CertName CertNO
1 101 RHCSS xxx
2 101 AWS xxx
3 101 MongoDB xxx
{
Eid: "101",
FName: "Manosh",
LName: "Malai",
Email: "abc@mydbops.com",
Mobile: xxxxxxxxxx,
JobName: "xxx",
Skills: ["Linux", "MongoDB"],
Certifications: [
{
CertName: "RHCSS",
CertNo: "xxx"
},
{
CertName: "AWS",
CertNo: "xxx"
},
{
CertName: "MongoDB",
CertNo: "xxx"
}
]
}
Data Model Type
Embedded Model Link/Reference/Normalized Model
Emp_Collection:
{
Eid: "101",
FName: "Manosh",
LName: "Malai",
Email: "abc@mydbops.com",
Mobile: xxxxxxxxxx,
JobName: "xxx",
Skills: ["Linux", "MongoDB"],
Certifications: [
{ CertName: "RHCSS", CertNo: "xxx" },
{ CertName: "AWS", CertNo: "xxx" },
{ CertName: "MongoDB", CertNo: "xxx" }
]
}
Emp_Collection:
{
Eid: "101",
FName: "Manosh",
LName: "Malai",
Email: "abc@mydbops.com",
Mobile: xxxxxxxxxx,
JobName: "xxx",
Skills: ["Linux", "MongoDB"]
}
Emp_Certification_Collection:
{
CertName: "RHCSS",
CertNo: "xxx",
Eid: "101",
}
Choose Embedded VS Reference
How frequently does the embedded data get
accessed?
Does the embedded information change/update
often?
Is the data queried using the embedded
information?
Design Pattern
Understand your application’s query patterns, Design your data model, Select the appropriate
indexes.
MongoDB has a flexible schema does not mean you can ignore schema design.
Prioritize embedding, unless there is an unavoidable reason.
Don't be afraid of application-level joins: If the index is built correctly and the returned results
are limited by projection conditions, then application-level joins will not be much more
expensive than joins in relational databases.
Key Consideration(RECAP TOO)
Array should not grow without bound
When the array size growing outbound, index performance on the array will fall down
Avoid lookup if they can avoided
Avoid huge number of collection
Avoid default _id Field: 12 bytes is too large and some computational cost
Optimization for keys: Every Document had schema, so every document store keyname in document
and it consume more space.
Key Consideration(RECAP TOO)
In the Database world, index plays a vital role in a performance, that not an exception with MongoDB
Indexing
Single Field Indexes
Compound Indexes
Multikey Indexes
Text Indexes
Wildcard Indexes
2dsphere Indexes
2d Indexes
geoHaystack Indexes
Hashed Indexes
Index Type Index Properties
TTL Indexes
Unique Indexes
Partial Indexes
Case Insensitive Indexes
Hidden Indexes
Sparse Indexes
Follow ESR Rule in Compound Indexes
Use Covered Queries as much possible
How Prefix Compression improves query performance and Disk usage
Indexing Strategies
and so on.
Follow ESR Rule in Compound Indexes
EQUAL SORT
RANGE
In Single Field Index, the document can either be ascending or descending sort regardless of the physical
ordering of the index key
ESR is no strict rule. Its just a guideline, help to produce better query performance
If we put equality key first, we will limit the amount of data we looking
Avoid blocking/In-memory sorting
Fail to follow ESR guidelines in index creation drives us to unwanted totalKeysExamined, totalDocsExamined
traversal, and put stress on memory and CPU resource. finally, executionTimeMillis of the query too more than
the advised value.
ESR db.emp.find({role: "mongodb-dba", exp: {$gt: 5}}).sort({location: 1})
MongoDB-DBA MySQL-DBA
ROLE:
6
LOCATION:
Bangalore Bangalore
1
Hyderabad
3
Chennai
2
EXP:
10
7
5
5
BLOCKING SORT
Chennai
2
Bangalore
1
Hyderabad
3
RESULT
ROOT db.emp.createIndex({role:1, exp:1, location:1})
E
R
S
ESR db.emp.find({role: "mongodb-dba", exp: {$gt: 5}}).sort({location: 1})
MongoDB-DBA MySQL-DBA
ROLE:
6
LOCATION:
Chennai
2
Hyderabad
3
EXP: 5
7 5
10
ROOT db.emp.createIndex({role:1, location:1, exp:1})
Bangalore
1
E
S
R
Bangalore
Key Consideration
Index creation in foreground will do collection level locking.
Index creation in the background helps to overcome the locking bottleneck but decrease the efficiency of index
traversal.
In MongoDB 4.2 version index creation system was reconstructed. This new indexing method help to
overcome the above-specified incompetence or inefficiency.
Recommend the developer to write the covered query. This kind of query will be entirely satisfied with an index.
So zero documents need to be inspected to satisfy the query, and this makes the query run lot faster. All the
projection keys need to be indexed.
Cont...
Combine: Possibly a Range
Use Index to sort the result and avoid blocking sort.
Remove Duplicate and unused index, it also improve the disk throughput and memory optimization.
Operator name mislead between Equality and Range, use index bound to make sure operator your using is
Range or Equality
$ne
$nin
Regex
$in Alone: Equality Match
B-Tree & Prefix Compression: Query performance & Disk usage
In B-Tree indexes, Low Cardinality value actually harm performance
In Low Cardinality value preference to use Partial Index
prefix Index compression- Repeated prefix value is not written
WITHOUT PREFIX COMPRESSION
MongoDB-DBA,Bangalore,10
MongoDB-DBA,Chennai,5
MySQL-DBA,Bangalore,7
MongoDB-DBA,Hyderabad,5
MongoDB-DBA,Bangalore,10
,Chennai,5
,Hyderabad,5
MySQL-DBA,Bangalore,7
Without Prefix Comp
With Prefix Comp
Linux Tuning
Swappiness sysctl -w vm.swappiness=1
Dirty Ratio
sysctl -w vm.dirty_ratio = 15
sysctl -w vm.dirty_background_ratio = 5
zone_reclaim_mode sysctl -w vm.zone_reclaim_mode=0
Linux Tuning
# Edit the file
/etc/systemd/system/multi-user.target.wants/mongod.service
ExecStart=/usr/bin/mongod --config /etc/mongod.conf
To
ExecStart=/usr/bin/numactl --interleave=all /usr/bin/mongod --config /etc/mongod.conf
systemctl daemon-reload
systemctl stop mongod
systemctl start mongod
NUMA
Linux Tuning
# Verifying
$ cat /sys/block/xvda/queue/scheduler
noop [deadline] cfq
# Adjusting the value dynamically
$ echo "noop" > /sys/block/xvda/queue/scheduler
$ vim /etc/sysconfig/grub
GRUB_CMDLINE_LINUX="console=tty0 crashkernel=auto console=ttyS0,115200 elevator=noop"
$ grub2-mkconfig -o /boot/grub2/grub.cfg
IO Scheduler
Linux Tuning
$ echo "never" > /sys/kernel/mm/transparent_hugepage/enabled
$ echo "never" > /sys/kernel/mm/transparent_hugepage/defrag
$ vim /etc/sysconfig/grub
GRUB_CMDLINE_LINUX="console=tty0 crashkernel=auto console=ttyS0,115200 elevator=noop
transparent_hugepage=never"
$ grub2-mkconfig -o /boot/grub2/grub.cfg
Transparent Huge Pages
Linux Tuning
$ vi /etc/systemd/system/multi-user.target.wants/mongod.service
# (file size)
LimitFSIZE=infinity
# (cpu time) LimitCPU=infinity
# (virtual memory size)
LimitAS=infinity
# (locked-in-memory size)
LimitMEMLOCK=infinity
# (open files)
LimitNOFILE=64000
# (processes/threads)
LimitNPROC=64000
ulimit Settings
Linux Tuning
vim /etc/security/limits.conf
mongo hard cpu unlimited
mongo soft cpu unlimited
mongo hard memlock unlimited
mongo soft memlock unlimited
mongo hard nofile 64000
mongo soft nofile 64000
mongo hard nproc 192276
mongo soft nproc 192276
mongo hard fsize unlimited
mongo soft fsize unlimited
mongo hard as unlimited
mongo soft as unlimited
ulimit Settings
Linux Tuning
$ vi /etc/sysctl.conf
net.core.somaxconn = 4096
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_time = 120
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_keepalive_probes = 6
Network Stack
MONGODB PERFORMANCE ANALYSIS TOOL
MongoDB Explain
queryPlanner
executionStats
allPlansExecution
db.<collection name>.find({}).explain()
Important Parameter
queryPlanner.winningPlan.inputStage.stage
executionStats.nReturned
executionStats.totalKeysExamined
executionStats.totalDocsExamined
MongoDB Mtools
mtools is a collection of helper scripts to parse, filter, and visualize MongoDB log files. For every DBA this is a
Swiss army knives tools.
mlogfilter
mloginfo
mlaunch
mlogfilter
mlogfilter mongod.log --slow --json | mongoimport -d test -c mycoll
mlogfilter mongod.log --namespace admin.$cmd --slow 1000
mlogfilter mongod.log --operation <query, insert, update, delete, command, getmore>
mlogfilter mongod.log --pattern '{"_id": 1, "host": 1, "ns": 1}'
mlogfilter mongod.log --from FROM [FROM ...], --to TO [TO ...]
mlogfilter mongod.log --from Aug --to Sep
mloginfo
mloginfo mongod.log --queries
mloginfo mongod.log --restarts
mloginfo mongod.log --connections
mloginfo mongod.log --rsstate
Keyhole
Keyhole help to produce performance analytics summaries. The information includes MongoDB
configurations, cluster statistics, database schema, indexes, and index usages.
Analyzing mongo logs and Full-Time Diagnostic data Capture (FTDC),
Cluster Info:
keyhole --allinfo "mongodb://user:secret@host.local/test?replicaSet=rs"
FTDC Data and Grafana Integration:
keyhole --web --diag /data/db/diagnostic.data
Logs Analytics:
keyhole --loginfo -v /var/log/mongodb/mongod.log.2018-06-07T11-08-32.gz
QUESTIONS ?
Thank You
Reference
https://medium.com/swlh/mongodb-indexes-deep-dive-understanding-indexes-9bcec6ed7aa6
https://www.mongodb.com/blog/post/performance-best-practices-hardware-and-os-configuration
https://www.slideshare.net/mongodb/mongodb-local-toronto-2019-tips-and-tricks-for-effective-indexing
https://www.mongodb.com/blog/post/performance-best-practices-indexing
https://github.com/rueckstiess/mtools
https://github.com/simagix/keyhole
https://www.youtube.com/watch?v=Mj2YM8t2G2w
https://mydbops.wordpress.com/category/mongodb/

More Related Content

MongoDB Tips and Tricks

  • 1. MongoDB Performance Manosh Malai CTO, Mydbops 3rd September 2020 7th Mydbops Database Meetup
  • 2. Interested in Open Source technologies Interested in MongoDB, DevOps & DevOpSec Practices Tech Speaker/Blogger CTO, Mydbops IT Solution Manosh Malai About Me
  • 3. Consulting Services Managed Services Focuses on MySQL, MongoDB and PostgreSQL Mydbops Services
  • 4. 250 + Clients In 4 Yrs. of Operations Our Clients
  • 5. MongoDB Performance Best Practices MongoDB Performance Analysis Tool Introduction Agenda
  • 7. Why MongoDB Ad Hoc Queries Schema-less Database Indexing Aggregation GridFS Sharding Replication Document Orientated
  • 9. 1. SCHEMA DESIGN 2. INDEXING 3. LINUX TUNING { {
  • 10. Modelling Approach RDBMS Develop Application and Queries Define/Re-Define Data Model Production Denormalize/Poor Performance R D B M S Design normalized Data Model/Schema Develop Application Data model dictates how to write queries for application operation Application evolve and data became denormalized Re-structured the Data Model and normalized This cause poor performance and required downtime
  • 11. Modelling Approach MongoDB Develop Application and Queries Define Data Model Production New Requirement M o n g o D B MQL Develop the Application Define the Data Model Application evolve Improve the Data Model Application Evolve and improve data model will happen recursively without any downtime and complication Design is part of each phase of the application lifetime
  • 12. Strategy of Modelling Goal 1 Goal 2 Goal 3 Goal 4 Deep Knowledge about Application behaviour Predict C, U, R, D Operation perform on Database and priorities Based on Prediction, map the relationship between entities and C, U, R, D Finalize the Data model, which suite to the application
  • 13. RDBMS MongoDB Eid FName LName Email Mobile JobName 101 Manosh Malai abc@mydboxxxxxxxxxx xxx 102 Kabilesh P.R def@mydboxxxxxxxxxx xxx id Eid Skill 1 101 Linux 2 101 MongoDB id Eid CertName CertNO 1 101 RHCSS xxx 2 101 AWS xxx 3 101 MongoDB xxx { Eid: "101", FName: "Manosh", LName: "Malai", Email: "abc@mydbops.com", Mobile: xxxxxxxxxx, JobName: "xxx", Skills: ["Linux", "MongoDB"], Certifications: [ { CertName: "RHCSS", CertNo: "xxx" }, { CertName: "AWS", CertNo: "xxx" }, { CertName: "MongoDB", CertNo: "xxx" } ] }
  • 14. Data Model Type Embedded Model Link/Reference/Normalized Model Emp_Collection: { Eid: "101", FName: "Manosh", LName: "Malai", Email: "abc@mydbops.com", Mobile: xxxxxxxxxx, JobName: "xxx", Skills: ["Linux", "MongoDB"], Certifications: [ { CertName: "RHCSS", CertNo: "xxx" }, { CertName: "AWS", CertNo: "xxx" }, { CertName: "MongoDB", CertNo: "xxx" } ] } Emp_Collection: { Eid: "101", FName: "Manosh", LName: "Malai", Email: "abc@mydbops.com", Mobile: xxxxxxxxxx, JobName: "xxx", Skills: ["Linux", "MongoDB"] } Emp_Certification_Collection: { CertName: "RHCSS", CertNo: "xxx", Eid: "101", }
  • 15. Choose Embedded VS Reference How frequently does the embedded data get accessed? Does the embedded information change/update often? Is the data queried using the embedded information?
  • 17. Understand your application’s query patterns, Design your data model, Select the appropriate indexes. MongoDB has a flexible schema does not mean you can ignore schema design. Prioritize embedding, unless there is an unavoidable reason. Don't be afraid of application-level joins: If the index is built correctly and the returned results are limited by projection conditions, then application-level joins will not be much more expensive than joins in relational databases. Key Consideration(RECAP TOO)
  • 18. Array should not grow without bound When the array size growing outbound, index performance on the array will fall down Avoid lookup if they can avoided Avoid huge number of collection Avoid default _id Field: 12 bytes is too large and some computational cost Optimization for keys: Every Document had schema, so every document store keyname in document and it consume more space. Key Consideration(RECAP TOO)
  • 19. In the Database world, index plays a vital role in a performance, that not an exception with MongoDB Indexing Single Field Indexes Compound Indexes Multikey Indexes Text Indexes Wildcard Indexes 2dsphere Indexes 2d Indexes geoHaystack Indexes Hashed Indexes Index Type Index Properties TTL Indexes Unique Indexes Partial Indexes Case Insensitive Indexes Hidden Indexes Sparse Indexes
  • 20. Follow ESR Rule in Compound Indexes Use Covered Queries as much possible How Prefix Compression improves query performance and Disk usage Indexing Strategies and so on.
  • 21. Follow ESR Rule in Compound Indexes EQUAL SORT RANGE In Single Field Index, the document can either be ascending or descending sort regardless of the physical ordering of the index key ESR is no strict rule. Its just a guideline, help to produce better query performance If we put equality key first, we will limit the amount of data we looking Avoid blocking/In-memory sorting Fail to follow ESR guidelines in index creation drives us to unwanted totalKeysExamined, totalDocsExamined traversal, and put stress on memory and CPU resource. finally, executionTimeMillis of the query too more than the advised value.
  • 22. ESR db.emp.find({role: "mongodb-dba", exp: {$gt: 5}}).sort({location: 1}) MongoDB-DBA MySQL-DBA ROLE: 6 LOCATION: Bangalore Bangalore 1 Hyderabad 3 Chennai 2 EXP: 10 7 5 5 BLOCKING SORT Chennai 2 Bangalore 1 Hyderabad 3 RESULT ROOT db.emp.createIndex({role:1, exp:1, location:1}) E R S
  • 23. ESR db.emp.find({role: "mongodb-dba", exp: {$gt: 5}}).sort({location: 1}) MongoDB-DBA MySQL-DBA ROLE: 6 LOCATION: Chennai 2 Hyderabad 3 EXP: 5 7 5 10 ROOT db.emp.createIndex({role:1, location:1, exp:1}) Bangalore 1 E S R Bangalore
  • 24. Key Consideration Index creation in foreground will do collection level locking. Index creation in the background helps to overcome the locking bottleneck but decrease the efficiency of index traversal. In MongoDB 4.2 version index creation system was reconstructed. This new indexing method help to overcome the above-specified incompetence or inefficiency. Recommend the developer to write the covered query. This kind of query will be entirely satisfied with an index. So zero documents need to be inspected to satisfy the query, and this makes the query run lot faster. All the projection keys need to be indexed.
  • 25. Cont... Combine: Possibly a Range Use Index to sort the result and avoid blocking sort. Remove Duplicate and unused index, it also improve the disk throughput and memory optimization. Operator name mislead between Equality and Range, use index bound to make sure operator your using is Range or Equality $ne $nin Regex $in Alone: Equality Match
  • 26. B-Tree & Prefix Compression: Query performance & Disk usage In B-Tree indexes, Low Cardinality value actually harm performance In Low Cardinality value preference to use Partial Index prefix Index compression- Repeated prefix value is not written WITHOUT PREFIX COMPRESSION MongoDB-DBA,Bangalore,10 MongoDB-DBA,Chennai,5 MySQL-DBA,Bangalore,7 MongoDB-DBA,Hyderabad,5 MongoDB-DBA,Bangalore,10 ,Chennai,5 ,Hyderabad,5 MySQL-DBA,Bangalore,7 Without Prefix Comp With Prefix Comp
  • 27. Linux Tuning Swappiness sysctl -w vm.swappiness=1 Dirty Ratio sysctl -w vm.dirty_ratio = 15 sysctl -w vm.dirty_background_ratio = 5 zone_reclaim_mode sysctl -w vm.zone_reclaim_mode=0
  • 28. Linux Tuning # Edit the file /etc/systemd/system/multi-user.target.wants/mongod.service ExecStart=/usr/bin/mongod --config /etc/mongod.conf To ExecStart=/usr/bin/numactl --interleave=all /usr/bin/mongod --config /etc/mongod.conf systemctl daemon-reload systemctl stop mongod systemctl start mongod NUMA
  • 29. Linux Tuning # Verifying $ cat /sys/block/xvda/queue/scheduler noop [deadline] cfq # Adjusting the value dynamically $ echo "noop" > /sys/block/xvda/queue/scheduler $ vim /etc/sysconfig/grub GRUB_CMDLINE_LINUX="console=tty0 crashkernel=auto console=ttyS0,115200 elevator=noop" $ grub2-mkconfig -o /boot/grub2/grub.cfg IO Scheduler
  • 30. Linux Tuning $ echo "never" > /sys/kernel/mm/transparent_hugepage/enabled $ echo "never" > /sys/kernel/mm/transparent_hugepage/defrag $ vim /etc/sysconfig/grub GRUB_CMDLINE_LINUX="console=tty0 crashkernel=auto console=ttyS0,115200 elevator=noop transparent_hugepage=never" $ grub2-mkconfig -o /boot/grub2/grub.cfg Transparent Huge Pages
  • 31. Linux Tuning $ vi /etc/systemd/system/multi-user.target.wants/mongod.service # (file size) LimitFSIZE=infinity # (cpu time) LimitCPU=infinity # (virtual memory size) LimitAS=infinity # (locked-in-memory size) LimitMEMLOCK=infinity # (open files) LimitNOFILE=64000 # (processes/threads) LimitNPROC=64000 ulimit Settings
  • 32. Linux Tuning vim /etc/security/limits.conf mongo hard cpu unlimited mongo soft cpu unlimited mongo hard memlock unlimited mongo soft memlock unlimited mongo hard nofile 64000 mongo soft nofile 64000 mongo hard nproc 192276 mongo soft nproc 192276 mongo hard fsize unlimited mongo soft fsize unlimited mongo hard as unlimited mongo soft as unlimited ulimit Settings
  • 33. Linux Tuning $ vi /etc/sysctl.conf net.core.somaxconn = 4096 net.ipv4.tcp_fin_timeout = 30 net.ipv4.tcp_keepalive_intvl = 30 net.ipv4.tcp_keepalive_time = 120 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.tcp_keepalive_probes = 6 Network Stack
  • 35. MongoDB Explain queryPlanner executionStats allPlansExecution db.<collection name>.find({}).explain() Important Parameter queryPlanner.winningPlan.inputStage.stage executionStats.nReturned executionStats.totalKeysExamined executionStats.totalDocsExamined
  • 36. MongoDB Mtools mtools is a collection of helper scripts to parse, filter, and visualize MongoDB log files. For every DBA this is a Swiss army knives tools. mlogfilter mloginfo mlaunch
  • 37. mlogfilter mlogfilter mongod.log --slow --json | mongoimport -d test -c mycoll mlogfilter mongod.log --namespace admin.$cmd --slow 1000 mlogfilter mongod.log --operation <query, insert, update, delete, command, getmore> mlogfilter mongod.log --pattern '{"_id": 1, "host": 1, "ns": 1}' mlogfilter mongod.log --from FROM [FROM ...], --to TO [TO ...] mlogfilter mongod.log --from Aug --to Sep
  • 38. mloginfo mloginfo mongod.log --queries mloginfo mongod.log --restarts mloginfo mongod.log --connections mloginfo mongod.log --rsstate
  • 39. Keyhole Keyhole help to produce performance analytics summaries. The information includes MongoDB configurations, cluster statistics, database schema, indexes, and index usages. Analyzing mongo logs and Full-Time Diagnostic data Capture (FTDC), Cluster Info: keyhole --allinfo "mongodb://user:secret@host.local/test?replicaSet=rs" FTDC Data and Grafana Integration: keyhole --web --diag /data/db/diagnostic.data Logs Analytics: keyhole --loginfo -v /var/log/mongodb/mongod.log.2018-06-07T11-08-32.gz