Dynamo vs Mongo
- 2. 2
Requirements
Unstructured data storage
ACID compliance not necessary
Fast read/write
Ability to index data and search
Full text search (?)
Java/Spring support
JavaScript support
REST API
Community support
Scaling up and maintenance
- 3. 3
Shard – when database grows large
Horizontal partitioning of database where rows are held in separate database
servers
Compare that to normalization or vertical partitioning where data is split into
columns
Advantages
• Reduces index size in each table in each database (performance +)
• Load can be spread out over multiple machines (performance ++)
Disadvantages
• Increased reliance on interconnected servers
• Query latency when more than one shard is searched
• Issues with consistency and durability
- 4. 4
DynamoDb Internals
Key/Value Pair
• Uses JSON only as a transport protocol
• Data is not being stored "on-disk" in the JSON data format
• Applications that use DynamoDB must either implement their own JSON
parsing or use a library like one of the AWS SDKs to do this parsing for them.
Data Types
• Scalar – string, number and binary (BLOB and CLOB)
• Multivalued – string set, number set and binary set
- 5. 5
DynamoDb Internals
Data Model
• Table – no fixed schema (columns, datatype etc)
Needs a fixed primary key, its data type and secondary index (if necessary)
Limit to 256 tables per region per account
• Items - individual records in a table
Limited to 400 kb
• Attributes
• Support one-to-one, one-to-many and many-to-many relationship
- 6. 6
DynamoDb Internals
Keys - need to create at the table creation time
• Primary Keys – Hash, Hash and Range keys
• Local Secondary Keys – can access only single partition
Limit – 5 indexes per table/ 20 attributes max
• Global Secondary Keys – can access any partition
Limit – 5 indexes per table
Creating a secondary index, you define the alternate key for the index, along with
any other attributes that you want to be projected in the index. DynamoDB copies
these attributes into the index, along with the primary key attributes from the table
Add, update, delete action on table is automatically reflected on the index
- 7. 7
DynamoDb Internals
Throughput
• A read capacity unit size is 4 kb
• A write capacity unit size is 1 kb
• To read an item of 5kb the # of read capacity unit required = 2
• These units are defined while creating a table
• AWS sends alerts when these limits are exceeded
• AWS also throttles further request beyond the capacity defined
- 8. 8
DynamoDb Operations
Table level – create, update, delete, list, describe
Item/attribute level – add, update, delete
Query – query a table with hash key and range key. Result limits to 1 MB
Scan – reads all items from a table. Slower than query
Parallel scan is also available to makes things faster
Supports pagination
- 9. 9
DynamoDb Features
Fully Managed NoSql database service – handles scaling, partitioning, upgrades
Durable – automatically replicates to different availability zones
Scalable – automatically distributes data to multiple server as size grows
Fast – on EC2 instance single digit millisecond latency for item size of 1kb
• 5 ms for read, 10 ms for write
Simple Administration – Amazon Web Console
Fault Tolerant – automatically replicates data
Flexible – each item in a table can have different number of attributes
Indexing – primary key of each item. Global and local secondary indexes allow
user to query non-primary key attributes
Secure – authentication, use of latest cryptographic technique, ability to integrate
with IAM (AWS Identity and access management)
- 10. 10
DynamoDb Features
Could be Cost-Effective – per 1kb item, $0.01/hour for every 10 writes/sec
• $0.01/hour for every 50 strongly consistent read/sec
• $0.28 per million writes
• $0.056 per million strongly consistent reads
• $1.00 per GB/month for indexed storage
SDK – AWS SDK for Java/.NET/PHP etc.
• Supports all table operations, query and scans
Service Oriented Architecture – Rest support – simple API, only 12 operations
• Data transfers as simple GET/POST/DELETE
Large items can be stored in S3 buckets, thereby reducing cost
Monitoring – AWS management console, Cloudwatch, Command line tool
- 11. 11
DynamoDb Features
Can be integrated with RedShift – a data warehousing tool
DynamoDb Local - small client-side database and server that mimics the
DynamoDB service. Available as a .jar file
- 12. 12
MongoDb Internals (derived from humongous)
Document Oriented database
• Data is stored in BSON format (Binary JSON)
• Supports up-to 100 levels of nesting
Data Types – BSON
• String, Integer, Boolean, Double, Arrays, Date*, Timestamp, Binary *, Null
• Min/Max keys – compare against lowest and highest BSON elements
• Object – embedded documents
• ObjectId* – store document’s ID
• Regular Expression *
• JavaScript code *
• Symbol – reserved for languages that use specific symbol type
* Indicates non-JSON types
- 13. 13
MongoDb Internals (derived from humongous)
Data Model
• Collections – documents that share similar structure
• Document – similar to rows in RDBMS
Maximum BSON document size is 16 MB
• Field – similar to columns in RDBMS
- 14. 14
MongoDb Query
Query
• Key/value – key can be any field in the document, including the primary key
• Range – greater than, less than or equal to, between
• Geospatial – proximity criteria, intersection and inclusion
• Text search – result shows relevance order
• Aggregation – count, min, max, average etc
• Map reduce
Covered Queries – queries that return only indexed fields
Query Optimization – MongoDB performs automatic optimization
When necessary developer can utilize more indexes through index intersection
- 15. 15
MongoDb Index
Index
• Unique
• Compound
• ArrayTime-to-live (TTL)
• Geospatial
• Sparse
• Text search
Size of index entry must be less than 1024 bytes
A single collection can have no more than 64 indexes
- 16. 16
MongoDb – Sample Query
Return states with populatin above 10 millions
db.zipcodes.aggregate( [
{ $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
{ $match: { totalPop: { $gte: 10*1000*1000 } } }
] )
- 17. 17
MongoDb Features
Mongo Shell – JavaScript shell that supports nearly all MongoDB commands
Auto Shard – automatically balances data in the cluster
Automatic Replica Failover
Query Router - queries that don’t use the shard key, the query router broadcasts
the query to all shards and aggregate and sort the results
ACID compliant at the document level
Security - MongoDB Enterprise Advance provides extensive support authentication,
authorization, auditing and encryption
MondbOps manager – deploy, upgrade (no downtime), monitor, backup and scale
MongoDB instances.
• Hosted MongoDB Management Service also provides many of these capabilities
Provides in-memory caching
- 18. 18
MongoDb Features
Large community support, 4th largest database in use right after SQL databases
Spring Data Project for MongoDB
Pluggable storage engine
• For low latency high performance – WiredTiger or in-memory
• Analytical process – HDFS storage engine
• Replica set automatically migrates independent of storage format – no complex
ETL
Both Java and JavaScript API are available and documented
MongoDB University provides free education
• https://university.mongodb.com/
Third-party hosted support exists for MongoDB with various price plans
• https://mongolab.com/
• http://mongodirector.com/
- 19. 19
References
http://aws.amazon.com/dynamodb/
http://www.mongodb.org/
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide
http://db-engines.com/en/system/Amazon+DynamoDB%3BMongoDB – little old
http://blog.cloudthat.in/5-reasons-why-dynamodb-is-better-than-mongodb/
http://www.masonzhang.com/2013/08/7-reasons-you-should-use-mongodb-over.html
http://www.mongodb.com/presentations/automate-mongodb-mongodb-management-service-0
http://www.mongodb.com/presentations/webinar-enterprise-architects-view-mongodb-0