SlideShare a Scribd company logo
Charles Sarrazin, MongoDB
Raiders of the Anti-Patterns:
A journey towards fixing schema mistakes in MongoDB
@csarrazi
Charles Sarrazin
Principal Consulting Engineer, Professional Services, Paris, FR
Our Journey
§ Packing
§ Anti-Patterns
§ Fixing schema issues
gracefully
§ Conclusion
Packing

Recommended for you

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB

MongoDB is an open-source, document-oriented database that provides high performance and horizontal scalability. It uses a document-model where data is organized in flexible, JSON-like documents rather than rigidly defined rows and tables. Documents can contain multiple types of nested objects and arrays. MongoDB is best suited for applications that need to store large amounts of unstructured or semi-structured data and benefit from horizontal scalability and high performance.

mongodbnosqldatabase
Oracle RAC 19c - the Basis for the Autonomous Database
Oracle RAC 19c - the Basis for the Autonomous DatabaseOracle RAC 19c - the Basis for the Autonomous Database
Oracle RAC 19c - the Basis for the Autonomous Database

Oracle Real Application Clusters (RAC) has been Oracle's premier database availability and scalability solution for more than two decades as it provides near linear horizontal scalability without the need to change the application code. This session explains why Oracle RAC 19c is the basis for Oracle's Autonomous Database by introducing some of its latest features, some of which were specifically designed for ATP-D, as well as by taking a peek under the hood of the dedicated Autonomous Database Service (ATP-D).

oracle racoracle databaseautonomous
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen

Databricks used to use a static manually maintained wiki page for internal data exploration. We will discuss how we leverage Amundsen, an open source data discovery tool from Linux Foundation AI & Data, to improve productivity with trust by surfacing the most relevant dataset and SQL analytics dashboard with its important information programmatically at Databricks internally. We will also talk about how we integrate Amundsen with Databricks world class infrastructure to surface metadata including: Surface the most popular tables used within Databricks Support fuzzy search and facet search for dataset- Surface rich metadata on datasets: Lineage information (downstream table, upstream table, downstream jobs, downstream users) Dataset owner Dataset frequent users Delta extend metadata (e.g change history) ETL job that generates the dataset Column stats on numeric type columns Dashboards that use the given dataset Use Databricks data tab to show the sample data Surface metadata on dashboards including: create time, last update time, tables used, etc Last but not least, we will discuss how we incorporate internal user feedback and provide the same discovery productivity improvements for Databricks customers in the future.

Our backpack
§ Design Patterns
§ Monitoring tools
§ Log analysis
§ Additional tools
Design Patterns
Representation
§ Attribute
§ Schema Versioning
§ Document Versioning
§ Tree
§ Polymorphism
§ Pre-allocation
Access Frequency
§ Subset
§ Approximation
§ Extended Reference
Grouping
§ Computed
§ Bucket
§ Outlier
https://www.mongodb.com/blog/post/building-with-patterns-a-summary
Data Modeling
Patterns
Use Cases
https://university.mongodb.com/courses/M320/about
Monitoring tools
For example
• Ops/Cloud Manager
• MongoDB Compass

Recommended for you

Data pipeline with kafka
Data pipeline with kafkaData pipeline with kafka
Data pipeline with kafka

A talk given on 2018-06-16 in HK Open Source Conference 2018. The rise of the Apache Kafka starts a new generation of data pipeline - the stream-processing pipeline. In this talk, Dr. Mole Wong will walk you through the concept of the stream-processing data pipeline, and how this data pipeline can be set up. He will also discuss the use cases of such a data pipeline.

kafkabig datadatabase
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™

The document provides an introduction to NoSQL and HBase. It discusses what NoSQL is, the different types of NoSQL databases, and compares NoSQL to SQL databases. It then focuses on HBase, describing its architecture and components like HMaster, regionservers, Zookeeper. It explains how HBase stores and retrieves data, the write process involving memstores and compaction. It also covers HBase shell commands for creating, inserting, querying and deleting data.

hadoopnosqlbig data
What’s New in Oracle Database 19c - Part 1
What’s New in Oracle Database 19c - Part 1What’s New in Oracle Database 19c - Part 1
What’s New in Oracle Database 19c - Part 1

Oracle Database 19c, builds upon key architectural, distributed data and performance innovations established in earlier versions Oracle Database 12c and 18c releases. Oracle 19c has many new features, in this presentation we have covered below areas Automated Installation, Configuration and Patching AutoUpgrade and Database Utilities

oracle 19coracle 18coracle 21c
Log Analysis
mtools
• mlogfilter
• mloginfo
• mplotqueries
https://github.com/rueckstiess/mtools
Additional tools
• Oplog analysis
• db.currentOp()
• Profiler
• db.collection.explain()
Anti-Patterns
Understanding your data model and identifying
mistakes
The Fauna
a.k.a « One Collection Fits All »
or « Schemaless »

Recommended for you

The Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBThe Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDB

The document discusses the right and wrong use cases for MongoDB. It outlines some of the key benefits of MongoDB, including its performance, scalability, data model and query model. Specific use cases that are well-suited for MongoDB include building a single customer view, powering mobile applications, and performing real-time analytics. Cache-only workloads are identified as not being a good use case. The document provides examples of large companies successfully using MongoDB for these right use cases.

Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security

This document provides an overview of Apache Sentry, an open source authorization module for Hadoop. It discusses how Sentry provides fine-grained, role-based authorization across Hadoop components like Hive, Impala and Solr to address the fragmented authorization in Hadoop. Sentry stores authorization policies that map users and groups to roles with privileges for resources like databases, tables and collections. It evaluates rules to determine access for a user based on their group memberships and role privileges.

information technologyhadoop securitycloudera
Oracle GoldenGate
Oracle GoldenGate Oracle GoldenGate
Oracle GoldenGate

Oracle GoldenGate is the leading real-time data integration software provider in the industry - customers include 3 of the top 5 commercial banks, 3 of the top 3 busiest ATM networks, and 4 of the top 5 telecommunications providers. Oracle GoldenGate moves transactional data in real-time across heterogeneous database, hardware and operating systems with minimal impact. The software platform captures, routes, and delivers data in real time, enabling organizations to maintain continuous uptime for critical applications during planned and unplanned outages. Additionally, it moves data from transaction processing environments to read-only reporting databases and analytical applications for accurate, timely reporting and improved business intelligence for the enterprise.

real-time data integrationheterogenous data movementdisaster recovery
The Squashed Database
Symptoms
§ Slow writes
§ High number of indexes (>20-25)
The Fauna
The Anti-Pattern
§ Access patterns are actually
different based on document type
§ Each document type depends on a
specific index
§ No common access patterns
The Actual Reason
§ While indexes improve reads, they
might negatively impact writes
§ You may only have up to 64
indexes in a single collection
§ If you don’t use Partial or Sparse
indexes, null or absent values will
still be indexed
The Fauna
Takeaways
§ Documents sharing different access pattern or business logic
should be stored in separate collections
§ You can temporarily rely on Partial Indexes in order to reduce the
size of indexes and performance impact
§ Spending a just a little time for schema design is important
The Squashed
Database
a.k.a « Flat documents » or
« The RDBMS schema »

Recommended for you

MongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialMongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster Tutorial

This document provides an overview and instructions for deploying, upgrading, and troubleshooting a MongoDB sharded cluster. It describes the components of a sharded cluster including shards, config servers, and mongos processes. It provides recommendations for initial deployment including using replica sets for shards and config servers, DNS names instead of IPs, and proper user authorization. The document also outlines best practices for upgrading between minor and major versions, including stopping the balancer, upgrading processes in rolling fashion, and handling incompatible changes when downgrading major versions.

mongodb
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb

The document is a slide presentation on MongoDB that introduces the topic and provides an overview. It defines MongoDB as a document-oriented, open source database that provides high performance, high availability, and easy scalability. It also discusses MongoDB's use for big data applications, how it is non-relational and stores data as JSON-like documents in collections without a defined schema. The presentation provides steps for installing MongoDB and describes some basic concepts like databases, collections, documents and commands.

Oracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best PracticesOracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best Practices

Presentation from RheoData Webinar on 8/18/2021 Topic: Oracle GoldeGate 21c New Features and Best Practices

oracleoracle goldengateoracle goldengate 21c
The Squashed Database
Symptoms
§ High IOPS (random reads/writes)
§ Low throughput
§ High yields and/or nReturned
§ High index size
The Squashed Database
The Anti-Pattern
§ Flat documents stored in separate
collections
§ Only using root-level fields and no
hierarchy
The Actual Reason
§ In order to parse a flat document,
MongoDB will read each field
sequentially
§ Normalization also means
redundant data (relations)
§ Data needs to be consolidated
using JOINs ($lookup)
The Squashed Database
Takeaways
§ Simply transposing your data model from a RDBMS to MongoDB
won’t be as helpful for scaling up
§ Consider grouping data from multiple tables in a single collection,
by embedding the relations (1:1, 1:n) when data volume is
reasonable
$project the Elephant
a.k.a. « Bloated documents » or
« The $project »

Recommended for you

Mongo db intro.pptx
Mongo db intro.pptxMongo db intro.pptx
Mongo db intro.pptx

This document provides an overview and introduction to MongoDB. It discusses how new types of applications, data, volumes, development methods and architectures necessitated new database technologies like NoSQL. It then defines MongoDB and describes its features, including using documents to store data, dynamic schemas, querying capabilities, indexing, auto-sharding for scalability, replication for availability, and using memory for performance. Use cases are presented for companies like Foursquare and Craigslist that have migrated large volumes of data and traffic to MongoDB to gain benefits like flexibility, scalability, availability and ease of use over traditional relational database systems.

MongoDB Fundamentals
MongoDB FundamentalsMongoDB Fundamentals
MongoDB Fundamentals

The document discusses MongoDB concepts including: - MongoDB uses a document-oriented data model with dynamic schemas and supports embedding and linking of related data. - Replication allows for high availability and data redundancy across multiple nodes. - Sharding provides horizontal scalability by distributing data across nodes in a cluster. - MongoDB supports both eventual and immediate consistency models.

mongodbnosql
Oracle GoldenGate Microservices Overview ( with Demo )
Oracle GoldenGate Microservices Overview ( with Demo )Oracle GoldenGate Microservices Overview ( with Demo )
Oracle GoldenGate Microservices Overview ( with Demo )

OGG Microservices Architecture introduces new types of processes to replace those in the classic architecture. The main components are the Service Manager, Administration Server, Distribution Server, Receiver Server, and Performance Metrics Server. The Administration Server and Admin Client allow managing GoldenGate processes through a web interface and command line tool respectively. A demo is shown configuring the source and target databases, creating credentials, extract, path, and replicat to replicate a table from the source to target.

microservicesgolden gategg
$project the Elephant
Symptoms
§ High read IOPS
§ High cache activity (bytes read into cache)
§ High number of yields when reading a single document
§ Slow indexed queries when reading a single document
§ Result length lower than document size
§ Generally, big document size (> 200+ KB)
$project the Elephant
The Anti-Pattern
§ Using big document (>100kb)
while only projecting a few fields
The Actual Reason
§ Documents are the base level
transfer unit from disk to memory
§ Even when using a single field, the
whole document is loaded from
disk to the WiredTiger cache
$project the Elephant
Takeaways
§ Use smaller documents with more
frequently accessed data
§ Store less frequently accessed data
in another collection
Also known as the Subset Pattern
https://www.mongodb.com/blog/post/building-with-patterns-the-subset-pattern
The Single-Person
Bridge
a.k.a. « The Auto-Incrementing
Counter » or « SQL in
MongoDB »

Recommended for you

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB

Intro to MongoDB Get a jumpstart on MongoDB, use cases, and next steps for building your first app with Buzz Moschetti, MongoDB Enterprise Architect. @BuzzMoschetti

intro to mongodbmongodbmore about mongodb
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB

In this presentation, Raghavendra BM of Valuebound has discussed the basics of MongoDB - an open-source document database and leading NoSQL database. ---------------------------------------------------------- Get Socialistic Our website: http://valuebound.com/ LinkedIn: http://bit.ly/2eKgdux Facebook: https://www.facebook.com/valuebound/ Twitter: http://bit.ly/2gFPTi8

mongodbopen sourcenosql
MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...
MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...
MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...

This document discusses MongoDB's cloud database offerings including MongoDB Atlas, Ops Manager, and Cloud Manager. It provides an overview of key features such as automated backups, point-in-time restore, queryable snapshots, global availability, security, and elastic scaling. The document also demonstrates MongoDB's managed backup capabilities in Atlas including cloud provider snapshots on AWS and Azure, as well as a roadmap for future disaster recovery features.

mongodbmongodb world
The Single-Person Bridge
Symptoms
§ Some updates seem to take a long time
§ MongoDB logs show writeConflicts>0 for these updates
§ The application seems to perform write operations sequentially
The Single-Person Bridge
The Anti-Pattern
§ Simulating a SQL sequence by
using a counter document and
findOneAndModify
The Actual Reason
§ As WiredTiger uses a document-
level lock, concurrent updates to a
single document will block other
writes to the same document
The Single-Person Bridge
Takeaways
§ Do not try to simulate sequences in MongoDB
§ Instead, rely on ObjectIDs, UUIDs or GUIDs
Sorted Monkeys
a.k.a. « Sorted Array Push »

Recommended for you

MongoDB
MongoDBMongoDB
MongoDB

Compilation of information to provide an introduction to MongoDB with particular emphasis on its C# driver.

mongodb
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)

This document provides a summary of a presentation on Big Data and NoSQL databases. It introduces the presenters, Melissa Demsak and Don Demsak, and their backgrounds. It then discusses how data storage needs have changed with the rise of Big Data, including the problems created by large volumes of data. The presentation contrasts traditional relational database implementations with NoSQL data stores, identifying five categories of NoSQL data models: document, key-value, graph, and column family. It provides examples of databases that fall under each category. The presentation concludes with a comparison of real-world scenarios and which data storage solutions might be best suited to each scenario.

bigdata nosql
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOL

AOL experienced explosive growth and needed a new database that was both flexible and easy to deploy with little effort. They chose MongoDB. Due to the complexity of internal systems and the data, most of the migration process was spent building a new identity platform and adapters for legacy apps to talk to MongoDB. Systems were migrated in 4 phases to ensure that users were not impacted during the switch. Turning on dual reads/writes to both legacy databases and MongoDB also helped get production traffic into MongoDB during the process. Ultimately, the project was successful with the help of MongoDB support. Today, the team has 15 shards, with 60-70 GB per shard.

mongodbdaysmongodb
Sorted Monkeys
Symptoms
§ Very high Oplog churn (Oplog GB/Hour)
§ Low Oplog window with default Oplog size
§ Oplog size is very high compared to data size to ensure proper
operations (target Oplog window > 3 days)
Sorted Monkeys
The Anti-Pattern
§ Using $push on big arrays (>20
entries) with:
§ The $sort modifier
§ The $slice modifier
The Actual Reason
§ Oplog operations are idempotent,
meaning that these operations are
replaced with a $set statement,
replacing the full array.
Sorted Monkeys
Takeaways
§ Only rely on the $slice and $sort modifiers when manipulating
small arrays
§ You can rely on in-memory or application-level sorts for medium-
sized result sets
The Tree in the House
a.k.a. « Push until the End »

Recommended for you

Solving Your Backup Needs Using MongoDB Ops Manager, Cloud Manager and Atlas
Solving Your Backup Needs Using MongoDB Ops Manager, Cloud Manager and AtlasSolving Your Backup Needs Using MongoDB Ops Manager, Cloud Manager and Atlas
Solving Your Backup Needs Using MongoDB Ops Manager, Cloud Manager and Atlas

Backup is an important part of your MongoDB deployment. Come and learn about the different offerings MongoDB has to help meet your backup requirements.

mongodbmongodb.local
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases

NoSQL databases are non-relational databases designed for large volumes of data across many servers. They emerged to address scaling and reliability issues with relational databases. While different technologies, NoSQL databases are designed for distribution without a single point of failure and to sacrifice consistency for availability if needed. Examples include Dynamo, BigTable, Cassandra and CouchDB.

nosql frphp
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql

comprehensive Introduction to NoSQL solutions inside the big data landscape. Graph store? Column store? key Value store? Document Store? redis or memcache? dynamo db? mongo db ? hbase? Cloud or open source?

mongodbcolumnar storecloud
The Tree in the House
Symptoms
§ Your application worked fine for some period of time
§ After a while, some updates fail with:
Resulting document after update is larger than 16777216
The Tree in the House
The Anti-Pattern
§ Using unbounded arrays for
storing data (e.g. Audit logs for
tracing document updates)
The Actual Reason
§ MongoDB documents are limited
to 16MB
§ Depending on relationship, you
might reach maximum document
size if not careful
The Tree in the House
Takeaways
§ For 1:n relationships, you need to
consider cardinality
§ Differentiate 1 to few (<10k array
elements) from 1 to zillions
§ Consider using the Subset, Outlier
or Bucket patterns
Fixing schema
issues gracefully

Recommended for you

MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...
MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...
MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...

Backup is an important part of your MongoDB deployment. Come and learn about the different offerings MongoDB has to help meet your backup requirements.

mongodbmongodb.local
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...

Backup is an important part of your MongoDB deployment. Come and learn about the different offerings MongoDB has to help meet your backup requirements.

mongodbmongodb.local
The Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb ClusterThe Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb Cluster

This document summarizes best practices for scaling MongoDB deployments. It discusses Behance's use of MongoDB for their activity feed, including moving from 40 nodes with 250M documents on ext3 to 60 nodes with 400M documents on ext4. It covers topics like sharding, replica sets, indexing, maintenance, and hardware considerations for large MongoDB clusters.

mongodbdatabasesnos
Considerations
§ Availability
§ Can your business afford scheduled downtime?
§ Do you need to keep multiple versions of your app online?
§ Performance
§ How does the migration affect performance?
§ Rollback Strategy
§ How do we go back if we run into a problem?
§ Risk
§ What is the impact of a failed migration?
Migration Strategies
§ One-Time
§ Blue/Green
§ Y-Write
§ Read & Upgrade
One-Time
Principles Pros
§ Fastest migration path
§ Immediate economies of
scales
Cons
§ High risk
§ Requires tremendous
coordination
§ Complex parallel testing
§ Labor intensive
YOLO!
Blue/Green
Principles Pros
§ Always available
§ Easy rollback: change
router to point to
previous version
Cons
§ You need to be able to
sync the two DBs
§ Use ChangeStreams
§ You need double the
hardware or resources

Recommended for you

Lessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistLessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at Craigslist

Lessons Learned from Migrating 2+ Billion Documents at Craigslist outlines Craigslist's migration from MySQL to MongoDB. Some key lessons include: knowing your hardware limitations, that replica sets provide high availability during reboots, understanding your data types and sizes, and being aware of limitations with sharding and replica set re-sync processes. The migration addressed issues with their archive data storage and provided a more scalable and performant system.

mongodbmongodata
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes

1. The document discusses using MongoDB and data lakes for enterprise data management. It outlines the current issues with relational databases and how MongoDB addresses challenges like flexibility, scalability and performance. 2. Various architectures for enterprise data management with MongoDB are presented, including using it for raw, transformed and aggregated data stores. 3. The benefits of combining MongoDB and Hadoop in a data lake are greater agility, insight from handling different data structures, scalability and low latency for real-time decisions.

Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational Databases

High level presentation of non-relational databases focusing on key vale, document, column oriented, and graph.

nosqldatabasenonrelational
Y-Write
Principles Pros
§ Always available
§ Easy rollback: stop
writing to new schema
§ Legacy applications can
still read from the old
schema
Cons
§ You need to be able to
sync the two DBs
§ Write logic needs to be
centralized and migrated
before read logic
Read & Upgrade
Principles Pros
§ Always available
§ Good performance
Cons
§ You need to consider
schema backward and
forward compatibility
§ Schema upgrade is part
of the application logic
§ Requires a depreciation
roadmap to remove
legacy code
Ensuring backward compatibility
Do
§ Insert data in existing collections
§ Add new field
§ Create a new collection/database
Don’t
§ Rename/Remove field
§ Remove data
§ Change field type or format
§ Remove/Rename
collection/database
Summary
Availability Performance Risk Cost
One Time ✗✗ ✓ ✗✗ ✓✓
Blue/Green ✓ ✗ ✓✓ ✗✗
Y-Write ✓✓ ✓ ✓✓ ✓✓
Read &
Upgrade
✓ ✓✓ ✗ ✓

Recommended for you

MongoDB Aggregation Performance
MongoDB Aggregation PerformanceMongoDB Aggregation Performance
MongoDB Aggregation Performance

Find out which is faster, SQL or NoSQL, for traditional reporting tasks. Discover how you can optimise MongoDB aggregation pipelines and how to push complex computation down to the database.

mdbe18
Storage Systems for High Scalable Systems Presentation
Storage Systems for High Scalable Systems PresentationStorage Systems for High Scalable Systems Presentation
Storage Systems for High Scalable Systems Presentation

Haytham ElFadeel presented on the limitations of traditional database systems and introduced key-value storage systems as a next generation storage solution. Traditional databases are complex, have poor performance, and do not scale well. In contrast, key-value storage systems have a simple data model of key-value pairs, are designed from the start to scale horizontally across many machines, and can provide much better performance. Examples of popular key-value storage systems discussed included Amazon Dynamo, Facebook Cassandra, Redis, and MongoDB.

rdbmsnosqlscalability
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB

The document provides an agenda for a two-day training on NoSQL and MongoDB. Day 1 covers an introduction to NoSQL concepts like distributed and decentralized databases, CAP theorem, and different types of NoSQL databases including key-value, column-oriented, and document-oriented databases. It also covers functions and indexing in MongoDB. Day 2 focuses on specific MongoDB topics like aggregation framework, sharding, queries, schema-less design, and indexing.

nosqlmongodbbigdata
Conclusion
Key takeaways
Regularly reassess your hypotheses
§ Your access patterns will change over time
§ Check your actual access patterns
Key takeaways
MongoDB provides flexible migration options
§ You can combine both online and offline schema migrations
§ Consider your development lifecycle and your release schedule to
choose your migration strategy
§ Use $jsonSchema to handle schema validation or check migration
status
But more importantly…

Recommended for you

Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb

This document provides an introduction and overview of MongoDB. It begins with definitions of NoSQL databases and describes the main types: key-value stores, wide column stores, document stores, and graph stores. It then discusses MongoDB specifically, describing it as a free, open-source, document-oriented database that uses JSON-like documents with dynamic schemas. The document outlines how to quickly install MongoDB using Docker, and how to perform basic CRUD operations like creating databases and collections, inserting, reading, updating, and deleting documents. It also discusses some key MongoDB concepts like its support for the CAP theorem prioritizing availability and partition tolerance over strong consistency.

nosqlmongodbdocument db
Scaling a SaaS backend with PostgreSQL - A case study
Scaling a SaaS backend with PostgreSQL - A case studyScaling a SaaS backend with PostgreSQL - A case study
Scaling a SaaS backend with PostgreSQL - A case study

The document discusses the challenges of scaling a PostgreSQL database for a SAAS backend with growing data. It describes how the company initially separated OLTP and OLAP data into separate databases but later unified them into a single database approach. It discusses partitioning the data using separate databases for each customer account and the benefits and limitations of this approach. It also covers additional performance issues encountered and solutions implemented including advisory locks, bulk loading optimizations, and maintaining spare databases to speed up new account creation. The document emphasizes the importance of schemas for code versioning and staging releases.

postgresql scaling saas databases schemas
2010 mongo berlin-scaling
2010 mongo berlin-scaling2010 mongo berlin-scaling
2010 mongo berlin-scaling

Eliot Horowitz discusses various techniques for scaling MongoDB deployments, including optimization of schema design, indexes, hardware configuration, embedding documents, and different replication architectures like replica sets and sharding. The key techniques for scaling reads are to optimize schemas and indexes, use replica sets to distribute reads to slaves, and scale by adding more slaves. For writes, sharding allows scaling by partitioning data across multiple shard clusters.

mongodbnosql
…Take some time to think
about your data model!
Questions?
Thank you for taking our FREE
MongoDB classes at
university.mongodb.com
Register Now!
https://university.mongodb.com/courses/M320/about

Recommended for you

MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB Internals

MongoDB is a document-oriented NoSQL database that uses flexible schemas and provides high performance, high availability, and easy scalability. It uses either MMAP or WiredTiger storage engines and supports features like sharding, aggregation pipelines, geospatial indexing, and GridFS for large files. While MongoDB has better performance than Cassandra or Couchbase according to benchmarks, it has limitations such as a single-threaded aggregation and lack of joins across collections.

computerdatamongodb
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

This presentation discusses migrating data from other data stores to MongoDB Atlas. It begins by explaining why MongoDB and Atlas are good choices for data management. Several preparation steps are covered, including sizing the target Atlas cluster, increasing the source oplog, and testing connectivity. Live migration, mongomirror, and dump/restore options are presented for migrating between replicasets or sharded clusters. Post-migration steps like monitoring and backups are also discussed. Finally, migrating from other data stores like AWS DocumentDB, Azure CosmosDB, DynamoDB, and relational databases are briefly covered.

mongodb atlasmongodb socal 2020
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.

mongodb socal 2020
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Schema Mistakes in MongoDB

More Related Content

What's hot

MongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce Framework
Edureka!
 
Mongo DB 102
Mongo DB 102Mongo DB 102
Mongo DB 102
Abhijeet Vaikar
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
Oracle RAC 19c - the Basis for the Autonomous Database
Oracle RAC 19c - the Basis for the Autonomous DatabaseOracle RAC 19c - the Basis for the Autonomous Database
Oracle RAC 19c - the Basis for the Autonomous Database
Markus Michalewicz
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
Databricks
 
Data pipeline with kafka
Data pipeline with kafkaData pipeline with kafka
Data pipeline with kafka
Mole Wong
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
Prashant Gupta
 
What’s New in Oracle Database 19c - Part 1
What’s New in Oracle Database 19c - Part 1What’s New in Oracle Database 19c - Part 1
What’s New in Oracle Database 19c - Part 1
Satishbabu Gunukula
 
The Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBThe Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDB
MongoDB
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
bigdatagurus_meetup
 
Oracle GoldenGate
Oracle GoldenGate Oracle GoldenGate
Oracle GoldenGate
oracleonthebrain
 
MongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialMongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster Tutorial
Jason Terpko
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
neela madheswari
 
Oracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best PracticesOracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best Practices
Bobby Curtis
 
Mongo db intro.pptx
Mongo db intro.pptxMongo db intro.pptx
Mongo db intro.pptx
JWORKS powered by Ordina
 
MongoDB Fundamentals
MongoDB FundamentalsMongoDB Fundamentals
MongoDB Fundamentals
MongoDB
 
Oracle GoldenGate Microservices Overview ( with Demo )
Oracle GoldenGate Microservices Overview ( with Demo )Oracle GoldenGate Microservices Overview ( with Demo )
Oracle GoldenGate Microservices Overview ( with Demo )
Mari Kupatadze
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
MongoDB
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
valuebound
 

What's hot (20)

MongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad Query
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce Framework
 
Mongo DB 102
Mongo DB 102Mongo DB 102
Mongo DB 102
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Oracle RAC 19c - the Basis for the Autonomous Database
Oracle RAC 19c - the Basis for the Autonomous DatabaseOracle RAC 19c - the Basis for the Autonomous Database
Oracle RAC 19c - the Basis for the Autonomous Database
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
 
Data pipeline with kafka
Data pipeline with kafkaData pipeline with kafka
Data pipeline with kafka
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
What’s New in Oracle Database 19c - Part 1
What’s New in Oracle Database 19c - Part 1What’s New in Oracle Database 19c - Part 1
What’s New in Oracle Database 19c - Part 1
 
The Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBThe Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDB
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
 
Oracle GoldenGate
Oracle GoldenGate Oracle GoldenGate
Oracle GoldenGate
 
MongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialMongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster Tutorial
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
 
Oracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best PracticesOracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best Practices
 
Mongo db intro.pptx
Mongo db intro.pptxMongo db intro.pptx
Mongo db intro.pptx
 
MongoDB Fundamentals
MongoDB FundamentalsMongoDB Fundamentals
MongoDB Fundamentals
 
Oracle GoldenGate Microservices Overview ( with Demo )
Oracle GoldenGate Microservices Overview ( with Demo )Oracle GoldenGate Microservices Overview ( with Demo )
Oracle GoldenGate Microservices Overview ( with Demo )
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 

Similar to MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Schema Mistakes in MongoDB

MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...
MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...
MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...
MongoDB
 
MongoDB
MongoDBMongoDB
MongoDB
fsbrooke
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
Don Demcsak
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB
 
Solving Your Backup Needs Using MongoDB Ops Manager, Cloud Manager and Atlas
Solving Your Backup Needs Using MongoDB Ops Manager, Cloud Manager and AtlasSolving Your Backup Needs Using MongoDB Ops Manager, Cloud Manager and Atlas
Solving Your Backup Needs Using MongoDB Ops Manager, Cloud Manager and Atlas
MongoDB
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Jon Meredith
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql
Omid Vahdaty
 
MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...
MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...
MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...
MongoDB
 
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...
MongoDB
 
The Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb ClusterThe Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb Cluster
Chris Henry
 
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistLessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Jeremy Zawodny
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational Databases
Chris Baglieri
 
MongoDB Aggregation Performance
MongoDB Aggregation PerformanceMongoDB Aggregation Performance
MongoDB Aggregation Performance
MongoDB
 
Storage Systems for High Scalable Systems Presentation
Storage Systems for High Scalable Systems PresentationStorage Systems for High Scalable Systems Presentation
Storage Systems for High Scalable Systems Presentation
andyman3000
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
Rajesh Menon
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
Mohammed Ragab
 
Scaling a SaaS backend with PostgreSQL - A case study
Scaling a SaaS backend with PostgreSQL - A case studyScaling a SaaS backend with PostgreSQL - A case study
Scaling a SaaS backend with PostgreSQL - A case study
Oliver Seemann
 
2010 mongo berlin-scaling
2010 mongo berlin-scaling2010 mongo berlin-scaling
2010 mongo berlin-scaling
MongoDB
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB Internals
Siraj Memon
 

Similar to MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Schema Mistakes in MongoDB (20)

MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...
MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...
MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...
 
MongoDB
MongoDBMongoDB
MongoDB
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOL
 
Solving Your Backup Needs Using MongoDB Ops Manager, Cloud Manager and Atlas
Solving Your Backup Needs Using MongoDB Ops Manager, Cloud Manager and AtlasSolving Your Backup Needs Using MongoDB Ops Manager, Cloud Manager and Atlas
Solving Your Backup Needs Using MongoDB Ops Manager, Cloud Manager and Atlas
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql
 
MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...
MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...
MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...
 
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...
 
The Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb ClusterThe Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb Cluster
 
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistLessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at Craigslist
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational Databases
 
MongoDB Aggregation Performance
MongoDB Aggregation PerformanceMongoDB Aggregation Performance
MongoDB Aggregation Performance
 
Storage Systems for High Scalable Systems Presentation
Storage Systems for High Scalable Systems PresentationStorage Systems for High Scalable Systems Presentation
Storage Systems for High Scalable Systems Presentation
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
 
Scaling a SaaS backend with PostgreSQL - A case study
Scaling a SaaS backend with PostgreSQL - A case studyScaling a SaaS backend with PostgreSQL - A case study
Scaling a SaaS backend with PostgreSQL - A case study
 
2010 mongo berlin-scaling
2010 mongo berlin-scaling2010 mongo berlin-scaling
2010 mongo berlin-scaling
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB Internals
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
rajancomputerfbd
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
Vijayananda Mohire
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
Liveplex
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
BookNet Canada
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
Larry Smarr
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
Lidia A.
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
ScyllaDB
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
Awais Yaseen
 
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
Matthew Sinclair
 
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
Sally Laouacheria
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
BookNet Canada
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
Aurora Consulting
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc
 

Recently uploaded (20)

Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
 
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
 
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
 

MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Schema Mistakes in MongoDB

  • 1. Charles Sarrazin, MongoDB Raiders of the Anti-Patterns: A journey towards fixing schema mistakes in MongoDB @csarrazi
  • 2. Charles Sarrazin Principal Consulting Engineer, Professional Services, Paris, FR
  • 3. Our Journey § Packing § Anti-Patterns § Fixing schema issues gracefully § Conclusion
  • 5. Our backpack § Design Patterns § Monitoring tools § Log analysis § Additional tools
  • 6. Design Patterns Representation § Attribute § Schema Versioning § Document Versioning § Tree § Polymorphism § Pre-allocation Access Frequency § Subset § Approximation § Extended Reference Grouping § Computed § Bucket § Outlier https://www.mongodb.com/blog/post/building-with-patterns-a-summary
  • 8. Monitoring tools For example • Ops/Cloud Manager • MongoDB Compass
  • 9. Log Analysis mtools • mlogfilter • mloginfo • mplotqueries https://github.com/rueckstiess/mtools
  • 10. Additional tools • Oplog analysis • db.currentOp() • Profiler • db.collection.explain()
  • 11. Anti-Patterns Understanding your data model and identifying mistakes
  • 12. The Fauna a.k.a « One Collection Fits All » or « Schemaless »
  • 13. The Squashed Database Symptoms § Slow writes § High number of indexes (>20-25)
  • 14. The Fauna The Anti-Pattern § Access patterns are actually different based on document type § Each document type depends on a specific index § No common access patterns The Actual Reason § While indexes improve reads, they might negatively impact writes § You may only have up to 64 indexes in a single collection § If you don’t use Partial or Sparse indexes, null or absent values will still be indexed
  • 15. The Fauna Takeaways § Documents sharing different access pattern or business logic should be stored in separate collections § You can temporarily rely on Partial Indexes in order to reduce the size of indexes and performance impact § Spending a just a little time for schema design is important
  • 16. The Squashed Database a.k.a « Flat documents » or « The RDBMS schema »
  • 17. The Squashed Database Symptoms § High IOPS (random reads/writes) § Low throughput § High yields and/or nReturned § High index size
  • 18. The Squashed Database The Anti-Pattern § Flat documents stored in separate collections § Only using root-level fields and no hierarchy The Actual Reason § In order to parse a flat document, MongoDB will read each field sequentially § Normalization also means redundant data (relations) § Data needs to be consolidated using JOINs ($lookup)
  • 19. The Squashed Database Takeaways § Simply transposing your data model from a RDBMS to MongoDB won’t be as helpful for scaling up § Consider grouping data from multiple tables in a single collection, by embedding the relations (1:1, 1:n) when data volume is reasonable
  • 20. $project the Elephant a.k.a. « Bloated documents » or « The $project »
  • 21. $project the Elephant Symptoms § High read IOPS § High cache activity (bytes read into cache) § High number of yields when reading a single document § Slow indexed queries when reading a single document § Result length lower than document size § Generally, big document size (> 200+ KB)
  • 22. $project the Elephant The Anti-Pattern § Using big document (>100kb) while only projecting a few fields The Actual Reason § Documents are the base level transfer unit from disk to memory § Even when using a single field, the whole document is loaded from disk to the WiredTiger cache
  • 23. $project the Elephant Takeaways § Use smaller documents with more frequently accessed data § Store less frequently accessed data in another collection Also known as the Subset Pattern https://www.mongodb.com/blog/post/building-with-patterns-the-subset-pattern
  • 24. The Single-Person Bridge a.k.a. « The Auto-Incrementing Counter » or « SQL in MongoDB »
  • 25. The Single-Person Bridge Symptoms § Some updates seem to take a long time § MongoDB logs show writeConflicts>0 for these updates § The application seems to perform write operations sequentially
  • 26. The Single-Person Bridge The Anti-Pattern § Simulating a SQL sequence by using a counter document and findOneAndModify The Actual Reason § As WiredTiger uses a document- level lock, concurrent updates to a single document will block other writes to the same document
  • 27. The Single-Person Bridge Takeaways § Do not try to simulate sequences in MongoDB § Instead, rely on ObjectIDs, UUIDs or GUIDs
  • 28. Sorted Monkeys a.k.a. « Sorted Array Push »
  • 29. Sorted Monkeys Symptoms § Very high Oplog churn (Oplog GB/Hour) § Low Oplog window with default Oplog size § Oplog size is very high compared to data size to ensure proper operations (target Oplog window > 3 days)
  • 30. Sorted Monkeys The Anti-Pattern § Using $push on big arrays (>20 entries) with: § The $sort modifier § The $slice modifier The Actual Reason § Oplog operations are idempotent, meaning that these operations are replaced with a $set statement, replacing the full array.
  • 31. Sorted Monkeys Takeaways § Only rely on the $slice and $sort modifiers when manipulating small arrays § You can rely on in-memory or application-level sorts for medium- sized result sets
  • 32. The Tree in the House a.k.a. « Push until the End »
  • 33. The Tree in the House Symptoms § Your application worked fine for some period of time § After a while, some updates fail with: Resulting document after update is larger than 16777216
  • 34. The Tree in the House The Anti-Pattern § Using unbounded arrays for storing data (e.g. Audit logs for tracing document updates) The Actual Reason § MongoDB documents are limited to 16MB § Depending on relationship, you might reach maximum document size if not careful
  • 35. The Tree in the House Takeaways § For 1:n relationships, you need to consider cardinality § Differentiate 1 to few (<10k array elements) from 1 to zillions § Consider using the Subset, Outlier or Bucket patterns
  • 37. Considerations § Availability § Can your business afford scheduled downtime? § Do you need to keep multiple versions of your app online? § Performance § How does the migration affect performance? § Rollback Strategy § How do we go back if we run into a problem? § Risk § What is the impact of a failed migration?
  • 38. Migration Strategies § One-Time § Blue/Green § Y-Write § Read & Upgrade
  • 39. One-Time Principles Pros § Fastest migration path § Immediate economies of scales Cons § High risk § Requires tremendous coordination § Complex parallel testing § Labor intensive YOLO!
  • 40. Blue/Green Principles Pros § Always available § Easy rollback: change router to point to previous version Cons § You need to be able to sync the two DBs § Use ChangeStreams § You need double the hardware or resources
  • 41. Y-Write Principles Pros § Always available § Easy rollback: stop writing to new schema § Legacy applications can still read from the old schema Cons § You need to be able to sync the two DBs § Write logic needs to be centralized and migrated before read logic
  • 42. Read & Upgrade Principles Pros § Always available § Good performance Cons § You need to consider schema backward and forward compatibility § Schema upgrade is part of the application logic § Requires a depreciation roadmap to remove legacy code
  • 43. Ensuring backward compatibility Do § Insert data in existing collections § Add new field § Create a new collection/database Don’t § Rename/Remove field § Remove data § Change field type or format § Remove/Rename collection/database
  • 44. Summary Availability Performance Risk Cost One Time ✗✗ ✓ ✗✗ ✓✓ Blue/Green ✓ ✗ ✓✓ ✗✗ Y-Write ✓✓ ✓ ✓✓ ✓✓ Read & Upgrade ✓ ✓✓ ✗ ✓
  • 46. Key takeaways Regularly reassess your hypotheses § Your access patterns will change over time § Check your actual access patterns
  • 47. Key takeaways MongoDB provides flexible migration options § You can combine both online and offline schema migrations § Consider your development lifecycle and your release schedule to choose your migration strategy § Use $jsonSchema to handle schema validation or check migration status
  • 49. …Take some time to think about your data model!
  • 51. Thank you for taking our FREE MongoDB classes at university.mongodb.com