Prepare Your Data For The Cloud
- 2. Agenda
Relational DBMS's : Pros & Cons
Non-Relational DBMS's : Pros & Cons
Types of Non-Relational DBMS's
Current Market State
Applicability of Different Data-Bases in
different environments
2
- 3. Relational DBMS - Pros
Data Integrity
ACID Capabilities
High Level Query Model
Data Normalization
Data Independence
3
- 4. Relational DBMS - Cons
Scaling Issues
By Duplication (Master-Slave)
By Sharding/Division (Not transparent)
Fixed Schema
Mostly disk-oriented (Performance)
May fair poorly with large data
4
- 5. Non-Relational DBMS - Pros
Scalability
Replication / Availability
Performance
Deployment Flexibility
Modelling Flexibility
Faster Development (?)
5
- 6. Non-Relational DBMS - Cons
Lack of Transactional Support
Data Integrity is Application's responsibility
Data Duplication / Application Dependent
Eventually Consistent (mostly)
No Standardization
New Technology
6
- 11. Key Value Data-Bases
Object is completely Opaque to DB
Mostly GET, PUT & DELETE operations are
supported
There may be limits on size of Objects
Inspired by Amazon Dynamo Paper
11
- 13. Document Data-Bases
Object is not completely opaque to DB
Every Object has it's own schema
FirstName="Bob", Address="5 Oak St.", Hobby="sailing".
FirstName="Jonathan", Children=("Michael,10", "Jennifer,8")
Can perform queries based on Object's
attributes
Possible to describe relationships between
Objects
Joins and Transactions are not supported
Good for XML or JSON objects
13
- 15. Column-Store Data-Bases
Richer than Document Stores
Multi-Dimensional Map
Tables
Row
Column
Time-Stamp
Supports Multiple Data Types
Usually use an Underlying DFS
Inspired by Google Big Table Paper
15
- 17. Key Factors while Making a Choice
Application Architecture Requirements
Platform choices
Non-Functional Requirements
Consistency
Availability
Partition
Security
Data Redemption
17
- 19. Scenario 1
Feature First
Corporate Data
Consistency Requirements
Business Intelligence
Legacy Application
RDBMS on Amazon Cloud, RackSpace (IaaS) or
Microsoft Azure/Amazon RDS (PaaS)
19
- 20. Scenario 2
Consumer Facing Application
Big Files (Images, BLOB's, Files)
Geographically Distributed
Mostly writes
Not heavy requirement on Rich Queries
Key-Value Data Stores (Amazon S3, Project
Voldemort, Redis)
20
- 21. Scenario 3
Hundreds Of Government Documents with
different schemas
Need to serve on Web
Data Mining
Document Data-Stores (Amazon SimpleDB,
Apache Couche DB, MongoDB)
21
- 22. Scenario 4
Scale First
Huge Data-Set
Analytical Requirements
Consumer Facing
High Availability over Consistency
Column Data-Stores (Google App Engine, Hbase,
Cassandra)
22
- 23. Mix & Match of Earlier Scenarios
Polyglot Persistence
RDBMS for low-volume
and high value
Key-Value DB for large
files with little queries
Memcached DB for short-
lived Data
Column DB for Analytics
23
- 25. Conclusions
One Size does not Fit all
Many choices
No-SQL DB's providing Alternatives
RDBMS serve useful purpose
25
- 27. References
http://nosql-database.org/
http://www.drdobbs.com/database/218900502
http://perspectives.mvdirona.com/2009/11/03/OneSizeDoesNotFitAll.aspx
http://blog.nahurst.com/visual-guide-to-nosql-systems
http://blog.heroku.com/archives/2010/7/20/nosql/
http://www.vineetgupta.com/2010/01/nosql-databases-part-1-landscape.html
http://project-voldemort.com/
http://code.google.com/p/redis/
http://memcachedb.org/
http://aws.amazon.com/simpledb/
http://couchdb.apache.org/
http://www.mongodb.org
http://riak.basho.com/
27