Lviv EDGE 2 - NoSQL

NoSQL

By Zenyk Matchyshyn
Staff Engineer, Lohika
1

Agenda
• History
• Architecture vs Technology
• Classification
• Pros and Cons of usage
• Trends
• Q/A

2

History
• NoSQL Technologies are not new
• Many ideas originate from distributed
computing, grid computing and parallel
computing
• Main drivers:
• Scalability
• Parallelization
• Costs

5

Google
• In the beginning… there was Google!
• Google shared scientific papers:
• “The Google File System”, October 2003
• “MapReduce: Simplified Data Processing on
Large Clusters”, December 2004
• “Bigtable: A Distributed Storage System for
Structured Data”, November 2006
• “The Chubby Lock Service for Loosely-
Coupled Distributed Systems”, November
2006
6

Amazon

• … and Amazon!
• “Dynamo: Amazon Highly Available key/value
Store”, October 2007

7

New technologies!

• Creators of Lucene wanted to create a full
search solution
• Ended up with Hadoop and Hadoop
Distributed File System (HDFS)
• Success helped adoption and new solutions
emerged

8

ARCHITECTURE VS TECHNOLOGY

9

Architecture vs Technology

• SQL is not bad, it’s just different
• You can use SQL DB in NoSQL way, e.g.
MySQL as a key-value database
• You can do SQL queries on Hadoop data

10

Architecture

• The way you store data
• The way you query data
• Technology environment

11

Terms

• ACID – Atomicity, Consistency, Isolation,
Durability
• CAP Theorem – Consistency, Availability,
Partition tolerance
• Eventual consistency
• Hashing
• Schema

13

Classification

• Column oriented stores
• Key/Value stores
• Key/Value stores with configurable
consistency
• Document stores
• Graph stores

14

Chart

memcached
Scalability & Performance

Key/value
Column
oriented
Document
store

RDBMS

Depth of Functionality

15

Column oriented
• Based on Google Bigtable
• Column oriented is a revers of Row oriented
• Assumption is that datacenters are
transcontinental and connected using
standard Internet
• C and P from CAP Theorem
• Data consistent and partitioned but trouble
with availability

16

HBase
• Spin off from Hadoop project -
http://hbase.apache.org/
• Written in Java
• A lot of interfaces – Thrift, REST, JRuby, etc.
• SQL-like access through Hive -
http://hive.apache.org/
• HBase ORM – Surus -
https://github.com/mushkevych/surus
• Used by Facebook, Hulu, Yahoo!, Ning, etc.

17

Hypertable
• Developed by Zvents, open sourced
• Written in C++
• Running on top of distributed file system
• Used by Baidu

18

Key/Value

• Key/Value Store – Oracle Berkley DB (Oracle
NoSQL), Redis, Kyoto Cabinet
• Can store strings, arrays, hashes

19

Oracle NoSQL
• Sign of things to come!
• http://www.oracle.com/technetwork/database/
nosqldb/overview/index.html
• Written in Java
• Configurable consistency
• BerkleyDB as a backend
• No single node of failure
• Transactions

20

Redis

• http://redis.io/
• Lots of bindings
• Written in C
• In-memory, with optional durability
• Also a document store

21

Key/Value – eventual consistency
• K/V Availability over Consistency
• Inspired by Amazon Dynamo
• Dynamo based on assumption of high speed
network links between data centers and
datacenters are close to each other
• A and P from CAP Theorem
• Achieve eventual consistency through
replication and verification
• Consistency is eventual
22

Cassandra
• http://cassandra.apache.org/
• Multidimensional map indexed by key
• No single point of failure
• Decentralized
• Tunable consistency
• Used by Facebook, Cisco, IBM, Rackspace

23

Voldemort
• http://project-voldemort.com/
• Developed by LinkedIn
• Written in Java
• Developers oriented – a lot of modules are
pluggable
• Strictly key/value

24

Document stores

• Document Databases
• Document oriented stores are semi structured
• Mostly JSON oriented
• Also called schema free rows
• Can query by field

25

MongoDB

• http://www.mongodb.org/
• Schema-free, document-oriented
• Written in C++
• Lots of interfaces
• JSON documents
• Query language, supports indexing
• Map/Reduce

26

CouchDB

• http://couchdb.apache.org/
• RESTful API
• JSON documents
• Written in Erlang
• Supports ACID
• Map/Reduce
• Eventual consistency

27

Graph

• Provide ways to store graphs
• Provide traversing
• Graph oriented functionality

28

Neo4j

• http://neo4j.org/
• Written in Java
• Stores and navigates graphs
• Stable and proven
• Commercial and free licenses

29

PROS AND CONS OF USAGE

30

Pros and Cons

• Scalability
• Transactional Integrity and Consistency
• Data Modeling
• Query Support
• Access and Interface Availability

31

Typical Usage

• Large amount of data
• Read/Write balanced?
• Read Heavy
• Write Heavy
• Scan
• Geospatial
• Map/Reduce
• Social data
32

Is it for you?

• Technology is still developing
• Be ready to patch
• SQL is easier
• Not all startups will end up being Facebooks
• Some things can be solvable only with
NoSQL

33

Trends
• Oracle released Oracle NoSQL!
• Adoption of Hadoop soars
• SQL like access to NoSQL stores taking form
– UnSQL -
http://www.unqlspec.org/display/UnQL/Home
• You can participate!

35

Opportunities

• Spring Data -
http://www.springsource.org/spring-data
• Cloud Foundry PaaS -
http://www.cloudfoundry.com/
• ORM/Simplification

36

Lviv EDGE 2 - NoSQL

More Related Content

Lviv EDGE 2 - NoSQL