Sig Narváez
Principal Solution Architect
Migrate Anything* to MongoDB Atlas
Why MongoDB? Why Atlas?
Prep Items
Which Migration Path? (Options)
Post steps
Migrating Other Data Stores
Q&A ⇒ db.SigNarvaez.find({}).explain()
Why MongoDB? Why Atlas?
Why MongoDB? A: Next Gen Multi-Model data platform
MongoDB is the most powerful data management platform in the market today
Flexible Multi-Structured Schema is designed to adapt to changes
2D &
Left-Outer Join
Schema Validation
Horizontal Scale
Files & Metadata
Text Search
Multiple Languages
Faceted Search
Graph &
Data Structures
Flexible Schema
MongoDB Atlas Data Platform
Migration Prep
Self-Managed MongoDB to Fully Managed MongoDB Atlas
Prep Items
Prep Items: Atlas Cluster Sizing
What is the current cluster hardware like?
Disk (size & speed)
What is the workload like?
Reads / Sec?
Writes / Sec?
Docs / Sec?
Peak Connections?
APM: DataDog, NewRelic, ?
cmd line: mongostat, mongotop,
iostat, top, free, vmstat,
MongoDB Shell:
Prep Items: Atlas Cluster Sizing
On-Prem or Cloud Reserved Instances
Most-likely Overprovisioned
figure it out!
Match the current hardware
Run performance tests hours / days
Upscale: CPU or RAM > 75% (1 hr)
Dowscale: CPU and RAM < 50% (72 hrs)
Prep Items: Expert Atlas Cluster Sizing
#Shards by Storage = Total Storage ÷ Max Storage Per Shard
#Shards by RAM = Total RAM ÷ Max RAM Per Shard
#Shards by Cores = Total Cores ÷ Max Cores Per Shard
#Shards by IOPS = Total IOPS ÷ Max IOPS Per Shard
#Shards by Network Bandwidth = Peak Gbps ÷ Gbps Capacity Per Shard
#Shards by Disk Bandwidth = Peak Mbps ÷ Mbps Capacity Per Shard
Complete MongoDB Atlas Sizing Talk from MDBW19:
Work with your local MongoDB Solution Architect
Prep Items: Version, Driver & Retries
Ensure your current driver is 3.6+ compatible
As of Feb 2020 Atlas is 3.6+
You can still migrate from 2.6+!!
3.6 Retryable Writes
4.2 Retryable Reads
Fault Resiliency
Prep Items: Connectivity
● IP Whitelist | VPC Peer | Private Endpoint
● Create Users & Permissions
● Use SRV connection strings (3.6+)
Prep Items: Test Basic Ops mgeneratejs '{
"_id": "$objectid",
"dateTime": "$date",
"createdAt": "$date",
"Action" :"$string",
"severityLevel": "$integer",
"source": "$string",
"display": "$string",
"deviceServerIp": "$ip",
"details": {
"ipAddress": "$ip",
"macAddress": "$string",
"userId": "SYSTEM",
"method": "method"
}}' --jsonArray -n 1000000 | mongoimport -
-jsonArray --port 27017 --upsert -d atlas -c
Test, Test, Test
● Simulate Production Traffic
● Your own test suite
● POCDriver
● mgeneratejs
Prep Items: Increase OpLog on Source Cluster
Initial Sync
Scans every document
Replicates to target cluster
Source OpLog
Must be large enough to contain entire
initial sync oplog window in order to
replicate data changes that occurred
during initial sync
Initial Sync
Source OpLog
Prep Items: Upscale Target Cluster
Recommend upscale by 1+ tier higher
Consider higher IOPS too
Increase disk size lower cost alternative
over provisioned IOPS.
Turn off Auto-Scale
Force Failover before migration
Migration Options
Comparing Options
Live Migrate mongomirror dump/restore or import
RS or Sharded
Built-in cutover
RS only
Sharded: Professional Services
All deployments
Great for most customers Can avoid network hop Downtime proportional to data size
Built-in Atlas UI
Must temporarily allow
network access (hop)
Works with Network peering
User-controlled cut-over
Sharded -> RS
Behind the scenes
1. initial sync - copying documents
and building indexes that already
exist on the source deployment.
2. oplog sync - tailing and applying
entries from the oplog (delta).
○ “CDC” - Continues replicating
as live data is changing
○ resumable from here
Migration Dry Run
Prod ⇒ Staging/QA Atlas Cluster
Connectivity & Security
Time to perform initial sync
Restart App(s) with
new Connection
Run initial sync at least 2 times
1) Build Staging site with Initial Sync but w/o Cutover
a) Measure time
2) Repeat w/Cutover
a) Let LM / MM reach 0s replication lag
b) Restarting Apps pointing to new Cluster
c) Test, Test, Test
Migration Execution
New Prod
Live Migration
Live MigrateLive Migrate
Post Migration
Monitor the deployment
Re-size oplog or instance size accordingly (72 hours recommended)
Update IP Whitelisting, if applicable
Set up backups, alerts, and other security settings
Extra Resources
Extra Resources
Other Data Stores
Safe Harbor Statement
as amended, and Section 21E of the Securities Exchange Act of 1934, as amended. Such forward-looking statements are
subject to a number of risks, uncertainties, assumptions and other factors that could cause actual results and the timing of
certain events to differ materially from future results expressed or implied by the forward-looking statements. Factors that
could cause or contribute to such differences include, but are not limited to, those identified our filings with the Securities
and Exchange Commission. You should not rely upon forward-looking statements as predictions of future events.
Furthermore, such forward-looking statements speak only as of the date of this presentation.
In particular, the development, release, and timing of any features or functionality described for MongoDB products
remains at MongoDB’s sole discretion. This information is merely intended to outline our general product direction and it
should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver
any material, code, or functionality. Except as required by law, we undertake no obligation to update any forward-looking
statements to reflect events or circumstances after the date of such statements.
Safe Harbor Statement
All Other Data Stores … 350+!!!
Let’s choose a few
MongoDB “compatible” Key-value stores Relational DBMS
AWS DocumentDB
Azure CosmosDB
AWS DynamoDB
AWS DocumentDB
● Compatible with MongoDB 3.6
● Use the same MongoDB Drivers/SDKs, Tools and
Applications with Amazon DocumentDB
● Automatic Patching, Failover and Recovery
● Integrated with AWS services (CloudWatch, etc.)
● Functional Differences:
AWS DocumentDB Feature Gap vs. MongoDB
Fails > 60%* of MongoDB correctness tests
• Extensive testing, debugging & refactoring
required to migrate to DocumentDB
Lags mainline features by 5 years
• No retryable reads + writes
• No transactions
• No support for storage or index compression
• Missing many aggregation stages that allow
expressive data handling
• No lossless decimal type
• No search and geospatial queries
• Indexes are not copied over via the utilities
(mongodump and mongorestore)
• No materialized views
MongoDB’s most
important value is
developer productivity
These limitations can
significantly reduce
that value
*60% for 3.6, 64% for 4.2*
AWS DocumentDB Feature Gap vs. MongoDB
Not based on the MongoDB server
emulates the MongoDB API
does not provide complete functionality
Yet, Developers are directed to use official
MongoDB Drivers, Documentation and University
to learn how to connect and develop?
What is this experience like? ...
Possible Migration Options
Method Considerations
Offline mongodump / mongorestore
Does not dump admin database
Recreate user(s) (DocumentDB does not provide RBAC*)
Does not support Kinesis Streams, Data Pipeline, etc.
Change Streams (limited) could be used (likely very fragile)
[ec2-user@ip-172-31-1-79 dump]$ mongodump --host --username snarvaez --ssl --sslCAFile /home/ec2-user/rds-
2020-02-24T05:01:23.523+0000writing SigsTest.coll to
2020-02-24T05:01:23.525+0000done dumping SigsTest.coll (1 document)
[ec2-user@ip-172-31-1-79 bin]$ ./mongomirror --host rs0/ --username snarvaez --ssl --sslCAFile /home/ec2-user/rds-
combined-ca-bundle.pem --destination Cluster0-shard-0/cluster0-shard-00-00-,,cluster0-shard-00-02- --destinationUsername snarvaez
mongomirror version: 0.9.1
git version: 0bc45282784aa74bc25c336412efca7f84749aa4
Go version: go1.12.13
os: linux
arch: amd64
compiler: gc
2020-02-24T05:02:56.564+0000Error initializing mongomirror: could not initialize source
connection: could not connect to server: server selection error: server selection timeout
current topology: Type: Single
Addr:, Type: Unknown, State:
Connected, Average RTT: 0, Last error: connection([-121]) connection is closed
Azure CosmosDB
Advertised Strengths
1. Globally Distributed
2. Linearly Scalable
3. Schema-Agnostic Indexing
4. Multi-Model
5. Multi-API and Multi-Language Support
6. Multi-Consistency Support
7. Indexes Data Automatically
8. High Availability
9. Guaranteed Low Latency
10. Multi-Master Support
Azure CosmosDB Feature Gap vs. MongoDB
Also not based on the MongoDB server - It emulates the MongoDB API
Large feature gaps vs. mainline
● No multi document ACID Transactions, Materialized Views, Retryable Writes, Lossless
Decimals, Text Search, Schema Validation, etc.
● 3.2 and 3.6 modes. 3.2 clusters cannot be upgraded to 3.6 at this time (Feb 2020)
● Numerous Incompatibilities
Many operations work differently and are not documented - left to developers to figure out
Scalability needs Handling + Rapid Cost Escalations
● RUs determine scalability - developers need error handling when max RUs exceeded
Azure Only - Lock-in
Possible migration options
Method Considerations
Offline mongodump / mongorestore
Not an option - backups cannot be restored to another target
Offline Via Azure Data Factory* or
Azure DocumentDB Data Migration Tool*
ETL Export to JSON / mongoimport
Via Change Feed
Similar to using Change Streams + Azure Functions to write to Atlas
AWS DynamoDB
DynamoDB is a wide-column key/value store. Each
entry is called Item and consists of Attributes.
Widely used in AWS Ecosystem ⇒ AWS Only
Migration may required due to
● Increased / Unpredictable Cost
● Functionality insufficient for Business or Dev
Productivity - App has outgrown the data store
● etc.
Possible migration options
Method Considerations
CUD operations
via MongoDB Driver
• Modernization
• On-Prem to Cloud
• Monolith to MicroServices
• Oracle exit strategy
• Cisco migrated $4B
eCommerce Platform
Possible migration options
Method Tools & Patterns
Q & A
  • 1. #MDBlocal Sig Narváez Principal Solution Architect SOCAL @SigNarvaez Migrate Anything* to MongoDB Atlas
  • 2. #MDBLocal Agenda Why MongoDB? Why Atlas? Prep Items Which Migration Path? (Options) Post steps Migrating Other Data Stores Q&A ⇒ db.SigNarvaez.find({}).explain()
  • 4. #MDBLocal Why MongoDB? A: Next Gen Multi-Model data platform Mobile Apps MongoDB is the most powerful data management platform in the market today 01 10JSON Flexible Multi-Structured Schema is designed to adapt to changes GeoSpatial GeoJSON 2D & 2DSphere Relational Left-Outer Join Views Schema Validation Key/Value Horizontal Scale In-Memory Binaries Files & Metadata Encrypted Search Text Search Multiple Languages Faceted Search Graph Graph & Hierarchical Recursive Lookups Document Rich JSON Data Structures Flexible Schema
  • 6. Migration Prep Self-Managed MongoDB to Fully Managed MongoDB Atlas
  • 8. #MDBLocal Prep Items: Atlas Cluster Sizing What is the current cluster hardware like? RAM Disk (size & speed) CPUs What is the workload like? Reads / Sec? Writes / Sec? Docs / Sec? Peak Connections? APM: DataDog, NewRelic, ? cmd line: mongostat, mongotop, iostat, top, free, vmstat, etc. MongoDB Shell: db.serverStatus().connections
  • 9. #MDBLocal Prep Items: Atlas Cluster Sizing On-Prem or Cloud Reserved Instances Most-likely Overprovisioned Let ATLAS AUTO-SCALE figure it out! Match the current hardware Run performance tests hours / days Upscale: CPU or RAM > 75% (1 hr) Dowscale: CPU and RAM < 50% (72 hrs)
  • 10. #MDBLocal Prep Items: Expert Atlas Cluster Sizing #Shards by Storage = Total Storage ÷ Max Storage Per Shard #Shards by RAM = Total RAM ÷ Max RAM Per Shard #Shards by Cores = Total Cores ÷ Max Cores Per Shard #Shards by IOPS = Total IOPS ÷ Max IOPS Per Shard #Shards by Network Bandwidth = Peak Gbps ÷ Gbps Capacity Per Shard #Shards by Disk Bandwidth = Peak Mbps ÷ Mbps Capacity Per Shard Complete MongoDB Atlas Sizing Talk from MDBW19: Work with your local MongoDB Solution Architect
  • 11. #MDBLocal Prep Items: Version, Driver & Retries Ensure your current driver is 3.6+ compatible As of Feb 2020 Atlas is 3.6+ You can still migrate from 2.6+!! 3.6 Retryable Writes 4.2 Retryable Reads Fault Resiliency
  • 12. #MDBLocal Prep Items: Connectivity ● IP Whitelist | VPC Peer | Private Endpoint ● Create Users & Permissions ● Use SRV connection strings (3.6+) vs.
  • 13. #MDBLocal Prep Items: Test Basic Ops mgeneratejs '{ "_id": "$objectid", "dateTime": "$date", "createdAt": "$date", "Action" :"$string", "severityLevel": "$integer", "source": "$string", "display": "$string", "deviceServerIp": "$ip", "details": { "ipAddress": "$ip", "macAddress": "$string", "userId": "SYSTEM", "method": "method" }}' --jsonArray -n 1000000 | mongoimport - -jsonArray --port 27017 --upsert -d atlas -c iot Test, Test, Test ● Simulate Production Traffic ● Your own test suite ● POCDriver > ● mgeneratejs >
  • 14. #MDBLocal Prep Items: Increase OpLog on Source Cluster Initial Sync Scans every document Replicates to target cluster Source OpLog Must be large enough to contain entire initial sync oplog window in order to replicate data changes that occurred during initial sync Initial Sync Source OpLog
  • 15. #MDBLocal Prep Items: Upscale Target Cluster Recommend upscale by 1+ tier higher Consider higher IOPS too Increase disk size lower cost alternative over provisioned IOPS. Turn off Auto-Scale Force Failover before migration
  • 17. #MDBLocal Comparing Options Live Migrate mongomirror dump/restore or import RS or Sharded Built-in cutover RS only Sharded: Professional Services All deployments Great for most customers Can avoid network hop Downtime proportional to data size Built-in Atlas UI Must temporarily allow network access (hop) Works with Network peering User-controlled cut-over Sharded -> RS
  • 18. #MDBLocal Behind the scenes 1. initial sync - copying documents and building indexes that already exist on the source deployment. 2. oplog sync - tailing and applying entries from the oplog (delta). ○ “CDC” - Continues replicating as live data is changing ○ resumable from here
  • 19. #MDBLocal Migration Dry Run Prod ⇒ Staging/QA Atlas Cluster Dry-run: Connectivity & Security Time to perform initial sync Restart App(s) with new Connection Run initial sync at least 2 times 1) Build Staging site with Initial Sync but w/o Cutover a) Measure time 2) Repeat w/Cutover a) Let LM / MM reach 0s replication lag b) Restarting Apps pointing to new Cluster c) Test, Test, Test
  • 26. #MDBLocal Housekeeping Monitor the deployment Re-size oplog or instance size accordingly (72 hours recommended) Update IP Whitelisting, if applicable Set up backups, alerts, and other security settings
  • 30. 30 This presentation contains “forward-looking statements” within the meaning of Section 27A of the Securities Act of 1933, as amended, and Section 21E of the Securities Exchange Act of 1934, as amended. Such forward-looking statements are subject to a number of risks, uncertainties, assumptions and other factors that could cause actual results and the timing of certain events to differ materially from future results expressed or implied by the forward-looking statements. Factors that could cause or contribute to such differences include, but are not limited to, those identified our filings with the Securities and Exchange Commission. You should not rely upon forward-looking statements as predictions of future events. Furthermore, such forward-looking statements speak only as of the date of this presentation. In particular, the development, release, and timing of any features or functionality described for MongoDB products remains at MongoDB’s sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality. Except as required by law, we undertake no obligation to update any forward-looking statements to reflect events or circumstances after the date of such statements. Safe Harbor Statement
  • 31. #MDBLocal All Other Data Stores … 350+!!!
  • 32. #MDBLocal Let’s choose a few MongoDB “compatible” Key-value stores Relational DBMS AWS DocumentDB Azure CosmosDB AWS DynamoDB
  • 33. #MDBLocal AWS DocumentDB ● Compatible with MongoDB 3.6 ● Use the same MongoDB Drivers/SDKs, Tools and Applications with Amazon DocumentDB ● Automatic Patching, Failover and Recovery ● Integrated with AWS services (CloudWatch, etc.) ● Functional Differences: nal-differences.html
  • 34. #MDBLocal AWS DocumentDB Feature Gap vs. MongoDB Fails > 60%* of MongoDB correctness tests • Extensive testing, debugging & refactoring required to migrate to DocumentDB Lags mainline features by 5 years • No retryable reads + writes • No transactions • No support for storage or index compression • Missing many aggregation stages that allow expressive data handling • No lossless decimal type • No search and geospatial queries • Indexes are not copied over via the utilities (mongodump and mongorestore) • No materialized views MongoDB’s most important value is developer productivity These limitations can significantly reduce that value *60% for 3.6, 64% for 4.2*
  • 35. #MDBLocal AWS DocumentDB Feature Gap vs. MongoDB Not based on the MongoDB server emulates the MongoDB API does not provide complete functionality Yet, Developers are directed to use official MongoDB Drivers, Documentation and University to learn how to connect and develop? What is this experience like? ...
  • 36. #MDBLocal Possible Migration Options Method Considerations Offline mongodump / mongorestore Does not dump admin database Recreate user(s) (DocumentDB does not provide RBAC*) Online build-your-own Does not support Kinesis Streams, Data Pipeline, etc. Change Streams (limited) could be used (likely very fragile) * nctional-differences.html#functional-differences.mongodump- mongorestore
  • 37. #MDBLocal [ec2-user@ip-172-31-1-79 dump]$ mongodump --host --username snarvaez --ssl --sslCAFile /home/ec2-user/rds- combined-ca-bundle.pem 2020-02-24T05:01:23.523+0000writing SigsTest.coll to 2020-02-24T05:01:23.525+0000done dumping SigsTest.coll (1 document) [ec2-user@ip-172-31-1-79 bin]$ ./mongomirror --host rs0/ --username snarvaez --ssl --sslCAFile /home/ec2-user/rds- combined-ca-bundle.pem --destination Cluster0-shard-0/cluster0-shard-00-00-,,cluster0-shard-00-02- --destinationUsername snarvaez mongomirror version: 0.9.1 git version: 0bc45282784aa74bc25c336412efca7f84749aa4 Go version: go1.12.13 os: linux arch: amd64 compiler: gc 2020-02-24T05:02:56.564+0000Error initializing mongomirror: could not initialize source connection: could not connect to server: server selection error: server selection timeout current topology: Type: Single Servers: Addr:, Type: Unknown, State: Connected, Average RTT: 0, Last error: connection([-121]) connection is closed
  • 38. #MDBLocal Azure CosmosDB Advertised Strengths 1. Globally Distributed 2. Linearly Scalable 3. Schema-Agnostic Indexing 4. Multi-Model 5. Multi-API and Multi-Language Support 6. Multi-Consistency Support 7. Indexes Data Automatically 8. High Availability 9. Guaranteed Low Latency 10. Multi-Master Support
  • 39. #MDBLocal Azure CosmosDB Feature Gap vs. MongoDB Also not based on the MongoDB server - It emulates the MongoDB API Large feature gaps vs. mainline ● No multi document ACID Transactions, Materialized Views, Retryable Writes, Lossless Decimals, Text Search, Schema Validation, etc. ● 3.2 and 3.6 modes. 3.2 clusters cannot be upgraded to 3.6 at this time (Feb 2020) ● Numerous Incompatibilities Many operations work differently and are not documented - left to developers to figure out Scalability needs Handling + Rapid Cost Escalations ● RUs determine scalability - developers need error handling when max RUs exceeded Azure Only - Lock-in
  • 40. #MDBLocal Possible migration options Method Considerations Offline mongodump / mongorestore Not an option - backups cannot be restored to another target Offline Via Azure Data Factory* or Azure DocumentDB Data Migration Tool* ETL Export to JSON / mongoimport Online build-your-own Via Change Feed Similar to using Change Streams + Azure Functions to write to Atlas * * *
  • 41. #MDBLocal AWS DynamoDB DynamoDB is a wide-column key/value store. Each entry is called Item and consists of Attributes. Widely used in AWS Ecosystem ⇒ AWS Only Migration may required due to ● Increased / Unpredictable Cost ● Functionality insufficient for Business or Dev Productivity - App has outgrown the data store ● etc. dynamodb-partition-key/
  • 42. #MDBLocal mongoimport Possible migration options Method Considerations Offline Online build-your-own CUD operations via MongoDB Driver atapipeline/latest/DeveloperGuid e/dp-importexport-ddb-part2 mazondynamodb/latest/develop erguide/Streams.Lambda.html
  • 43. #MDBLocal RDBMS Why? • Modernization • On-Prem to Cloud • Monolith to MicroServices • Oracle exit strategy Who? • Cisco migrated $4B eCommerce Platform
  • 44. #MDBLocal Possible migration options Method Tools & Patterns ETL & CDC Strangler Pattern