SlideShare a Scribd company logo
Compare – DynamoDb vs. MongoDB
Higher Ed
2
Requirements
 Unstructured data storage
 ACID compliance not necessary
 Fast read/write
 Ability to index data and search
 Full text search (?)
 Java/Spring support
 JavaScript support
 REST API
 Community support
 Scaling up and maintenance
3
Shard – when database grows large
 Horizontal partitioning of database where rows are held in separate database
servers
 Compare that to normalization or vertical partitioning where data is split into
columns
 Advantages
• Reduces index size in each table in each database (performance +)
• Load can be spread out over multiple machines (performance ++)
 Disadvantages
• Increased reliance on interconnected servers
• Query latency when more than one shard is searched
• Issues with consistency and durability
4
DynamoDb Internals
 Key/Value Pair
• Uses JSON only as a transport protocol
• Data is not being stored "on-disk" in the JSON data format
• Applications that use DynamoDB must either implement their own JSON
parsing or use a library like one of the AWS SDKs to do this parsing for them.
 Data Types
• Scalar – string, number and binary (BLOB and CLOB)
• Multivalued – string set, number set and binary set
5
DynamoDb Internals
 Data Model
• Table – no fixed schema (columns, datatype etc)
 Needs a fixed primary key, its data type and secondary index (if necessary)
 Limit to 256 tables per region per account
• Items - individual records in a table
 Limited to 400 kb
• Attributes
• Support one-to-one, one-to-many and many-to-many relationship
6
DynamoDb Internals
 Keys - need to create at the table creation time
• Primary Keys – Hash, Hash and Range keys
• Local Secondary Keys – can access only single partition
 Limit – 5 indexes per table/ 20 attributes max
• Global Secondary Keys – can access any partition
 Limit – 5 indexes per table
 Creating a secondary index, you define the alternate key for the index, along with
any other attributes that you want to be projected in the index. DynamoDB copies
these attributes into the index, along with the primary key attributes from the table
 Add, update, delete action on table is automatically reflected on the index
7
DynamoDb Internals
 Throughput
• A read capacity unit size is 4 kb
• A write capacity unit size is 1 kb
• To read an item of 5kb the # of read capacity unit required = 2
• These units are defined while creating a table
• AWS sends alerts when these limits are exceeded
• AWS also throttles further request beyond the capacity defined
8
DynamoDb Operations
 Table level – create, update, delete, list, describe
 Item/attribute level – add, update, delete
 Query – query a table with hash key and range key. Result limits to 1 MB
 Scan – reads all items from a table. Slower than query
 Parallel scan is also available to makes things faster
 Supports pagination
9
DynamoDb Features
 Fully Managed NoSql database service – handles scaling, partitioning, upgrades
 Durable – automatically replicates to different availability zones
 Scalable – automatically distributes data to multiple server as size grows
 Fast – on EC2 instance single digit millisecond latency for item size of 1kb
• 5 ms for read, 10 ms for write
 Simple Administration – Amazon Web Console
 Fault Tolerant – automatically replicates data
 Flexible – each item in a table can have different number of attributes
 Indexing – primary key of each item. Global and local secondary indexes allow
user to query non-primary key attributes
 Secure – authentication, use of latest cryptographic technique, ability to integrate
with IAM (AWS Identity and access management)
10
DynamoDb Features
 Could be Cost-Effective – per 1kb item, $0.01/hour for every 10 writes/sec
• $0.01/hour for every 50 strongly consistent read/sec
• $0.28 per million writes
• $0.056 per million strongly consistent reads
• $1.00 per GB/month for indexed storage
 SDK – AWS SDK for Java/.NET/PHP etc.
• Supports all table operations, query and scans
 Service Oriented Architecture – Rest support – simple API, only 12 operations
• Data transfers as simple GET/POST/DELETE
 Large items can be stored in S3 buckets, thereby reducing cost
 Monitoring – AWS management console, Cloudwatch, Command line tool
11
DynamoDb Features
 Can be integrated with RedShift – a data warehousing tool
 DynamoDb Local - small client-side database and server that mimics the
DynamoDB service. Available as a .jar file
12
MongoDb Internals (derived from humongous)
 Document Oriented database
• Data is stored in BSON format (Binary JSON)
• Supports up-to 100 levels of nesting
 Data Types – BSON
• String, Integer, Boolean, Double, Arrays, Date*, Timestamp, Binary *, Null
• Min/Max keys – compare against lowest and highest BSON elements
• Object – embedded documents
• ObjectId* – store document’s ID
• Regular Expression *
• JavaScript code *
• Symbol – reserved for languages that use specific symbol type
* Indicates non-JSON types
13
MongoDb Internals (derived from humongous)
 Data Model
• Collections – documents that share similar structure
• Document – similar to rows in RDBMS
 Maximum BSON document size is 16 MB
• Field – similar to columns in RDBMS
14
MongoDb Query
 Query
• Key/value – key can be any field in the document, including the primary key
• Range – greater than, less than or equal to, between
• Geospatial – proximity criteria, intersection and inclusion
• Text search – result shows relevance order
• Aggregation – count, min, max, average etc
• Map reduce
 Covered Queries – queries that return only indexed fields
 Query Optimization – MongoDB performs automatic optimization
 When necessary developer can utilize more indexes through index intersection
15
MongoDb Index
 Index
• Unique
• Compound
• ArrayTime-to-live (TTL)
• Geospatial
• Sparse
• Text search
 Size of index entry must be less than 1024 bytes
 A single collection can have no more than 64 indexes
16
MongoDb – Sample Query
 Return states with populatin above 10 millions
db.zipcodes.aggregate( [
{ $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
{ $match: { totalPop: { $gte: 10*1000*1000 } } }
] )
17
MongoDb Features
 Mongo Shell – JavaScript shell that supports nearly all MongoDB commands
 Auto Shard – automatically balances data in the cluster
 Automatic Replica Failover
 Query Router - queries that don’t use the shard key, the query router broadcasts
the query to all shards and aggregate and sort the results
 ACID compliant at the document level
 Security - MongoDB Enterprise Advance provides extensive support authentication,
authorization, auditing and encryption
 MondbOps manager – deploy, upgrade (no downtime), monitor, backup and scale
MongoDB instances.
• Hosted MongoDB Management Service also provides many of these capabilities
 Provides in-memory caching
18
MongoDb Features
 Large community support, 4th largest database in use right after SQL databases
 Spring Data Project for MongoDB
 Pluggable storage engine
• For low latency high performance – WiredTiger or in-memory
• Analytical process – HDFS storage engine
• Replica set automatically migrates independent of storage format – no complex
ETL
 Both Java and JavaScript API are available and documented
 MongoDB University provides free education
• https://university.mongodb.com/
 Third-party hosted support exists for MongoDB with various price plans
• https://mongolab.com/
• http://mongodirector.com/
19
References
 http://aws.amazon.com/dynamodb/
 http://www.mongodb.org/
 http://docs.aws.amazon.com/amazondynamodb/latest/developerguide
 http://db-engines.com/en/system/Amazon+DynamoDB%3BMongoDB – little old
 http://blog.cloudthat.in/5-reasons-why-dynamodb-is-better-than-mongodb/
 http://www.masonzhang.com/2013/08/7-reasons-you-should-use-mongodb-over.html
 http://www.mongodb.com/presentations/automate-mongodb-mongodb-management-service-0
 http://www.mongodb.com/presentations/webinar-enterprise-architects-view-mongodb-0

More Related Content

Dynamo vs Mongo

  • 1. Compare – DynamoDb vs. MongoDB Higher Ed
  • 2. 2 Requirements  Unstructured data storage  ACID compliance not necessary  Fast read/write  Ability to index data and search  Full text search (?)  Java/Spring support  JavaScript support  REST API  Community support  Scaling up and maintenance
  • 3. 3 Shard – when database grows large  Horizontal partitioning of database where rows are held in separate database servers  Compare that to normalization or vertical partitioning where data is split into columns  Advantages • Reduces index size in each table in each database (performance +) • Load can be spread out over multiple machines (performance ++)  Disadvantages • Increased reliance on interconnected servers • Query latency when more than one shard is searched • Issues with consistency and durability
  • 4. 4 DynamoDb Internals  Key/Value Pair • Uses JSON only as a transport protocol • Data is not being stored "on-disk" in the JSON data format • Applications that use DynamoDB must either implement their own JSON parsing or use a library like one of the AWS SDKs to do this parsing for them.  Data Types • Scalar – string, number and binary (BLOB and CLOB) • Multivalued – string set, number set and binary set
  • 5. 5 DynamoDb Internals  Data Model • Table – no fixed schema (columns, datatype etc)  Needs a fixed primary key, its data type and secondary index (if necessary)  Limit to 256 tables per region per account • Items - individual records in a table  Limited to 400 kb • Attributes • Support one-to-one, one-to-many and many-to-many relationship
  • 6. 6 DynamoDb Internals  Keys - need to create at the table creation time • Primary Keys – Hash, Hash and Range keys • Local Secondary Keys – can access only single partition  Limit – 5 indexes per table/ 20 attributes max • Global Secondary Keys – can access any partition  Limit – 5 indexes per table  Creating a secondary index, you define the alternate key for the index, along with any other attributes that you want to be projected in the index. DynamoDB copies these attributes into the index, along with the primary key attributes from the table  Add, update, delete action on table is automatically reflected on the index
  • 7. 7 DynamoDb Internals  Throughput • A read capacity unit size is 4 kb • A write capacity unit size is 1 kb • To read an item of 5kb the # of read capacity unit required = 2 • These units are defined while creating a table • AWS sends alerts when these limits are exceeded • AWS also throttles further request beyond the capacity defined
  • 8. 8 DynamoDb Operations  Table level – create, update, delete, list, describe  Item/attribute level – add, update, delete  Query – query a table with hash key and range key. Result limits to 1 MB  Scan – reads all items from a table. Slower than query  Parallel scan is also available to makes things faster  Supports pagination
  • 9. 9 DynamoDb Features  Fully Managed NoSql database service – handles scaling, partitioning, upgrades  Durable – automatically replicates to different availability zones  Scalable – automatically distributes data to multiple server as size grows  Fast – on EC2 instance single digit millisecond latency for item size of 1kb • 5 ms for read, 10 ms for write  Simple Administration – Amazon Web Console  Fault Tolerant – automatically replicates data  Flexible – each item in a table can have different number of attributes  Indexing – primary key of each item. Global and local secondary indexes allow user to query non-primary key attributes  Secure – authentication, use of latest cryptographic technique, ability to integrate with IAM (AWS Identity and access management)
  • 10. 10 DynamoDb Features  Could be Cost-Effective – per 1kb item, $0.01/hour for every 10 writes/sec • $0.01/hour for every 50 strongly consistent read/sec • $0.28 per million writes • $0.056 per million strongly consistent reads • $1.00 per GB/month for indexed storage  SDK – AWS SDK for Java/.NET/PHP etc. • Supports all table operations, query and scans  Service Oriented Architecture – Rest support – simple API, only 12 operations • Data transfers as simple GET/POST/DELETE  Large items can be stored in S3 buckets, thereby reducing cost  Monitoring – AWS management console, Cloudwatch, Command line tool
  • 11. 11 DynamoDb Features  Can be integrated with RedShift – a data warehousing tool  DynamoDb Local - small client-side database and server that mimics the DynamoDB service. Available as a .jar file
  • 12. 12 MongoDb Internals (derived from humongous)  Document Oriented database • Data is stored in BSON format (Binary JSON) • Supports up-to 100 levels of nesting  Data Types – BSON • String, Integer, Boolean, Double, Arrays, Date*, Timestamp, Binary *, Null • Min/Max keys – compare against lowest and highest BSON elements • Object – embedded documents • ObjectId* – store document’s ID • Regular Expression * • JavaScript code * • Symbol – reserved for languages that use specific symbol type * Indicates non-JSON types
  • 13. 13 MongoDb Internals (derived from humongous)  Data Model • Collections – documents that share similar structure • Document – similar to rows in RDBMS  Maximum BSON document size is 16 MB • Field – similar to columns in RDBMS
  • 14. 14 MongoDb Query  Query • Key/value – key can be any field in the document, including the primary key • Range – greater than, less than or equal to, between • Geospatial – proximity criteria, intersection and inclusion • Text search – result shows relevance order • Aggregation – count, min, max, average etc • Map reduce  Covered Queries – queries that return only indexed fields  Query Optimization – MongoDB performs automatic optimization  When necessary developer can utilize more indexes through index intersection
  • 15. 15 MongoDb Index  Index • Unique • Compound • ArrayTime-to-live (TTL) • Geospatial • Sparse • Text search  Size of index entry must be less than 1024 bytes  A single collection can have no more than 64 indexes
  • 16. 16 MongoDb – Sample Query  Return states with populatin above 10 millions db.zipcodes.aggregate( [ { $group: { _id: "$state", totalPop: { $sum: "$pop" } } }, { $match: { totalPop: { $gte: 10*1000*1000 } } } ] )
  • 17. 17 MongoDb Features  Mongo Shell – JavaScript shell that supports nearly all MongoDB commands  Auto Shard – automatically balances data in the cluster  Automatic Replica Failover  Query Router - queries that don’t use the shard key, the query router broadcasts the query to all shards and aggregate and sort the results  ACID compliant at the document level  Security - MongoDB Enterprise Advance provides extensive support authentication, authorization, auditing and encryption  MondbOps manager – deploy, upgrade (no downtime), monitor, backup and scale MongoDB instances. • Hosted MongoDB Management Service also provides many of these capabilities  Provides in-memory caching
  • 18. 18 MongoDb Features  Large community support, 4th largest database in use right after SQL databases  Spring Data Project for MongoDB  Pluggable storage engine • For low latency high performance – WiredTiger or in-memory • Analytical process – HDFS storage engine • Replica set automatically migrates independent of storage format – no complex ETL  Both Java and JavaScript API are available and documented  MongoDB University provides free education • https://university.mongodb.com/  Third-party hosted support exists for MongoDB with various price plans • https://mongolab.com/ • http://mongodirector.com/
  • 19. 19 References  http://aws.amazon.com/dynamodb/  http://www.mongodb.org/  http://docs.aws.amazon.com/amazondynamodb/latest/developerguide  http://db-engines.com/en/system/Amazon+DynamoDB%3BMongoDB – little old  http://blog.cloudthat.in/5-reasons-why-dynamodb-is-better-than-mongodb/  http://www.masonzhang.com/2013/08/7-reasons-you-should-use-mongodb-over.html  http://www.mongodb.com/presentations/automate-mongodb-mongodb-management-service-0  http://www.mongodb.com/presentations/webinar-enterprise-architects-view-mongodb-0