Atlas Data Lake
Technical Deep-Dive
Craig Wilson, Senior Staff Engineer, MongoDB
State of Affairs
Businesses have a humongous amount of data
• IDC predicts that by 2025 global data will reach 175 Zettabytes and 49% of it will reside in the
public cloud.
Cloud storage is cost-effective
Cloud storage is hard to operationalize
A New Service Offered by MongoDB Atlas
Access long-term data
Query long-term data
Analyze long-term data
Look and act like MongoDB
Access customer’s data securely
Handle queries over vast amounts of data
Handle long-running queries
Efficient use of resources

Emulating MongoDB
Must be able to communicate with our drivers
Written in Go
Implemented a TCP server
Used mongo-go-driver’s wireprotocol package
Used mongo-go-driver's bson package
Must have the same security as MongoDB
Users configured in Atlas
Implemented MongoDB’s security model
Require the use of TLS + SNI(Server Name Indicator)
Must behave like MongoDB
Implemented commands for a read-only server
Used the server’s aggregation engine

Customer’s Data
Security: Customers
Customers have complete control
Provide us with an IAM Role
Configure your buckets
Configure your users in Atlas
Security: Atlas
Atlas controls access to your data
Storage of IAM Role
Temporary Credentials
Customers control their data layout
Databases, Collections
Store Store
DataSource DataSource

Configuration: File Formats
• BSON (gzipped)
• JSON (gzipped)
• Avro (gzipped)
• CSV/TSV (gzipped)
• Parquet
Configuration (S3 Bucket): ent-archive
- a-m.json
- n-z.json
- 2019
- 1.parquet
- 2.parquet
- 2018
- 1.parquet
- 2017.json.gz
- 2016.json.gz
Configuration: Store
s3 : {
name: "ent-archive",
bucket: "ent-archive",
region: "us-east-1",
prefix: "/archive/"
Configuration: Data
history: {
customers: [{
store: "ent-archive",
definition: "/customers/*"
invoices: [{
store: "ent-archive",
definition: "/invoices/{year int}/*"
}, {
store: "ent-archive",
definition: "/invoices/{year int}.json.gz"

history: {
invoices: [{
store: "ent-archive",
definition: "/invoices/{year int}/*"
}, {
store: "ent-archive",
definition : "/invoices/{year int}.json.gz
}, {
store: "atlas",
db: "customers",
collection: "invoices"
Configuration: Data (Future)
MQL à Distributed MQL
Load Balancer
Load Balancer
Load Balancer
Load Balancer
Load Balancer
Load Balancer

Query Example: $limit
{ $match: { year: { $gt: 2000 } } }
{ $limit: 10 }
{ $limit: 10 }
{ $match: { year: { $gt: 2000 } } }
{ $limit: 10 }
Query Example: $group
{ $group: { _id: "$year",
totalAvg_sum: { $sum: "$amount" },
totalAvg_count: { $sum: 1 }
} }
{ $group: { _id: "$_id",
totalAvg_sum: { $sum: "$totalAvg_sum" },
totalAvg_count: { $sum: "$totalAvg_count" }
} }
{ $project: { _id: "$_id", totalAvg: { $divide: ["$totalAvg_sum", "$totalAvg_count"] } } }
{ $group: { _id: "$year", totalAvg: { $avg: "amount" } } }

More supported MongoDB operators.
Geo operators
Full Text Search
File Formats
• Bzip2
• Snappy
• Zstd
Microsoft Azure
Google Cloud

Lots to do
Craig Wilson
Senior Staff Engineer, MongoDB
Our Developer focused talks
are back on the road!
Find one near you
At your MongoDB.local, you’ll learn technologies, tool, and best practices
That make it easy for you to build data-driven applications without distraction.

