This document provides an overview of NoSQL databases, including a brief history, classifications, pros and cons of usage, and trends. It discusses how NoSQL technologies originated from distributed computing needs and were driven by scalability, parallelization, and costs. Major classifications of NoSQL databases are described as column-oriented stores, key-value stores, document stores, and graph databases. Examples like MongoDB, Cassandra, and Neo4j are outlined. Both benefits and limitations of NoSQL are presented. Emerging trends around SQL access and adoption of Hadoop are also noted.
Report
Share
Report
Share
1 of 37
More Related Content
Lviv EDGE 2 - NoSQL
1. NoSQL
By Zenyk Matchyshyn
Staff Engineer, Lohika
1
2. Agenda
• History
• Architecture vs Technology
• Classification
• Pros and Cons of usage
• Trends
• Q/A
2
5. History
• NoSQL Technologies are not new
• Many ideas originate from distributed
computing, grid computing and parallel
computing
• Main drivers:
• Scalability
• Parallelization
• Costs
5
6. Google
• In the beginning… there was Google!
• Google shared scientific papers:
• “The Google File System”, October 2003
• “MapReduce: Simplified Data Processing on
Large Clusters”, December 2004
• “Bigtable: A Distributed Storage System for
Structured Data”, November 2006
• “The Chubby Lock Service for Loosely-
Coupled Distributed Systems”, November
2006
6
7. Amazon
• … and Amazon!
• “Dynamo: Amazon Highly Available key/value
Store”, October 2007
7
8. New technologies!
• Creators of Lucene wanted to create a full
search solution
• Ended up with Hadoop and Hadoop
Distributed File System (HDFS)
• Success helped adoption and new solutions
emerged
8
10. Architecture vs Technology
• SQL is not bad, it’s just different
• You can use SQL DB in NoSQL way, e.g.
MySQL as a key-value database
• You can do SQL queries on Hadoop data
10
11. Architecture
• The way you store data
• The way you query data
• Technology environment
11
15. Chart
memcached
Scalability & Performance
Key/value
Column
oriented
Document
store
RDBMS
Depth of Functionality
15
16. Column oriented
• Based on Google Bigtable
• Column oriented is a revers of Row oriented
• Assumption is that datacenters are
transcontinental and connected using
standard Internet
• C and P from CAP Theorem
• Data consistent and partitioned but trouble
with availability
16
17. HBase
• Spin off from Hadoop project -
http://hbase.apache.org/
• Written in Java
• A lot of interfaces – Thrift, REST, JRuby, etc.
• SQL-like access through Hive -
http://hive.apache.org/
• HBase ORM – Surus -
https://github.com/mushkevych/surus
• Used by Facebook, Hulu, Yahoo!, Ning, etc.
17
18. Hypertable
• Developed by Zvents, open sourced
• Written in C++
• Running on top of distributed file system
• Used by Baidu
18
19. Key/Value
• Key/Value Store – Oracle Berkley DB (Oracle
NoSQL), Redis, Kyoto Cabinet
• Can store strings, arrays, hashes
19
20. Oracle NoSQL
• Sign of things to come!
• http://www.oracle.com/technetwork/database/
nosqldb/overview/index.html
• Written in Java
• Configurable consistency
• BerkleyDB as a backend
• No single node of failure
• Transactions
20
21. Redis
• http://redis.io/
• Lots of bindings
• Written in C
• In-memory, with optional durability
• Also a document store
21
22. Key/Value – eventual consistency
• K/V Availability over Consistency
• Inspired by Amazon Dynamo
• Dynamo based on assumption of high speed
network links between data centers and
datacenters are close to each other
• A and P from CAP Theorem
• Achieve eventual consistency through
replication and verification
• Consistency is eventual
22
23. Cassandra
• http://cassandra.apache.org/
• Multidimensional map indexed by key
• No single point of failure
• Decentralized
• Tunable consistency
• Used by Facebook, Cisco, IBM, Rackspace
23
24. Voldemort
• http://project-voldemort.com/
• Developed by LinkedIn
• Written in Java
• Developers oriented – a lot of modules are
pluggable
• Strictly key/value
24
25. Document stores
• Document Databases
• Document oriented stores are semi structured
• Mostly JSON oriented
• Also called schema free rows
• Can query by field
25
26. MongoDB
• http://www.mongodb.org/
• Schema-free, document-oriented
• Written in C++
• Lots of interfaces
• JSON documents
• Query language, supports indexing
• Map/Reduce
26
27. CouchDB
• http://couchdb.apache.org/
• RESTful API
• JSON documents
• Written in Erlang
• Supports ACID
• Map/Reduce
• Eventual consistency
27
28. Graph
• Provide ways to store graphs
• Provide traversing
• Graph oriented functionality
28
29. Neo4j
• http://neo4j.org/
• Written in Java
• Stores and navigates graphs
• Stable and proven
• Commercial and free licenses
29
31. Pros and Cons
• Scalability
• Transactional Integrity and Consistency
• Data Modeling
• Query Support
• Access and Interface Availability
31
32. Typical Usage
• Large amount of data
• Read/Write balanced?
• Read Heavy
• Write Heavy
• Scan
• Geospatial
• Map/Reduce
• Social data
32
33. Is it for you?
• Technology is still developing
• Be ready to patch
• SQL is easier
• Not all startups will end up being Facebooks
• Some things can be solvable only with
NoSQL
33
35. Trends
• Oracle released Oracle NoSQL!
• Adoption of Hadoop soars
• SQL like access to NoSQL stores taking form
– UnSQL -
http://www.unqlspec.org/display/UnQL/Home
• You can participate!
35
36. Opportunities
• Spring Data -
http://www.springsource.org/spring-data
• Cloud Foundry PaaS -
http://www.cloudfoundry.com/
• ORM/Simplification
36