MongoDB .local Houston 2019: MongoDB Atlas Data Lake Technical Deep Dive
- 3. State of Affairs
Businesses have a humongous amount of data
• IDC predicts that by 2025 global data will reach 175 Zettabytes and 49% of it will reside in the public
cloud.
Cloud storage is cost-effective
Cloud storage is hard to operationalize
- 4. A New Service Offered by MongoDB Atlas
Access long-term data
Query long-term data
Analyze long-term data
- 5. Requirements
Look and act like MongoDB
Access customer’s data securely
Handle queries over vast amounts of data
Handle long-running queries
Efficient use of resources
- 7. Language
Must be able to communicate with our drivers
Written in Go
Implemented a TCP server
Used mongo-go-driver’s wireprotocol package
Used mongo-go-driver's bson package
- 8. Security
Must have the same security as MongoDB
Users configured in Atlas
Implemented MongoDB’s security model
Require the use of TLS + SNI(Server Name Indicator)
- 15. Configuration (S3 Bucket): ent-archive
/archive/customers
- a-m.json
- n-z.json
/archive/invoices
- 2019
- 1.parquet
- 2.parquet
- 2018
- 1.parquet
- 2017.json.gz
- 2016.json.gz
- 17. Configuration: Data
history: {
customers: [{
store: "ent-archive",
definition: "/customers/*"
}],
invoices: [{
store: "ent-archive",
definition: "/invoices/{year int}/*"
}, {
store: "ent-archive",
definition: "/invoices/{year int}.json.gz"
}]
}
- 18. Configuration: Data (Future)
history: {
invoices: [{
store: "ent-archive",
definition: "/invoices/{year int}/*"
}, {
store: "ent-archive",
definition : "/invoices/{year int}.json.gz
}, {
store: "atlas",
db: "customers",
collection: "invoices"
}]
}
- 23. Query Example: $limit
Map:
{ $match: { year: { $gt: 2000 } } }
{ $limit: 10 }
Reduce:
{ $limit: 10 }
{ $match: { year: { $gt: 2000 } } }
{ $limit: 10 }
- 24. Query Example: $group
Map:
{ $group: { _id: "$year",
totalAvg_sum: { $sum: "$amount" },
totalAvg_count: { $sum: 1 }
} }
Reduce:
{ $group: { _id: "$_id",
totalAvg_sum: { $sum: "$totalAvg_sum" },
totalAvg_count: { $sum: "$totalAvg_count" }
} }
Finalize:
{ $project: { _id: "$_id", totalAvg: { $divide: ["$totalAvg_sum", "$totalAvg_count"] } } }
{ $group: { _id: "$year", totalAvg: { $avg: "amount" } } }