SlideShare a Scribd company logo
NoSQL


   By Zenyk Matchyshyn
   Staff Engineer, Lohika
                        1
Agenda
 •   History
 •   Architecture vs Technology
 •   Classification
 •   Pros and Cons of usage
 •   Trends
 •   Q/A




                                  2
HISTORY


          3
4
History
 •   NoSQL Technologies are not new
 •   Many ideas originate from distributed
     computing, grid computing and parallel
     computing
 •   Main drivers:
     •   Scalability
     •   Parallelization
     •   Costs


                                              5
Google
 •   In the beginning… there was Google!
 •   Google shared scientific papers:
     •   “The Google File System”, October 2003
     •   “MapReduce: Simplified Data Processing on
         Large Clusters”, December 2004
     •   “Bigtable: A Distributed Storage System for
         Structured Data”, November 2006
     •   “The Chubby Lock Service for Loosely-
         Coupled Distributed Systems”, November
         2006
                                                       6
Amazon

 •   … and Amazon!
 •   “Dynamo: Amazon Highly Available key/value
     Store”, October 2007




                                                  7
New technologies!


 •   Creators of Lucene wanted to create a full
     search solution
 •   Ended up with Hadoop and Hadoop
     Distributed File System (HDFS)
 •   Success helped adoption and new solutions
     emerged




                                                  8
ARCHITECTURE VS TECHNOLOGY



                             9
Architecture vs Technology


 •   SQL is not bad, it’s just different
 •   You can use SQL DB in NoSQL way, e.g.
     MySQL as a key-value database
 •   You can do SQL queries on Hadoop data




                                             10
Architecture


 •   The way you store data
 •   The way you query data
 •   Technology environment




                              11
CLASSIFICATION


                 12
Terms


 •   ACID – Atomicity, Consistency, Isolation,
     Durability
 •   CAP Theorem – Consistency, Availability,
     Partition tolerance
 •   Eventual consistency
 •   Hashing
 •   Schema


                                                 13
Classification


 •   Column oriented stores
 •   Key/Value stores
 •   Key/Value stores with configurable
     consistency
 •   Document stores
 •   Graph stores



                                          14
Chart



                            memcached
Scalability & Performance




                                   Key/value
                                                 Column
                                                 oriented
                                                                   Document
                                                                     store




                                                                              RDBMS




                                               Depth of Functionality


                                                                                      15
Column oriented
 •   Based on Google Bigtable
 •   Column oriented is a revers of Row oriented
 •   Assumption is that datacenters are
     transcontinental and connected using
     standard Internet
 •   C and P from CAP Theorem
 •   Data consistent and partitioned but trouble
     with availability


                                                   16
HBase
 •   Spin off from Hadoop project -
     http://hbase.apache.org/
 •   Written in Java
 •   A lot of interfaces – Thrift, REST, JRuby, etc.
 •   SQL-like access through Hive -
     http://hive.apache.org/
 •   HBase ORM – Surus -
     https://github.com/mushkevych/surus
 •   Used by Facebook, Hulu, Yahoo!, Ning, etc.

                                                       17
Hypertable
 •   Developed by Zvents, open sourced
 •   Written in C++
 •   Running on top of distributed file system
 •   Used by Baidu




                                                 18
Key/Value


 •   Key/Value Store – Oracle Berkley DB (Oracle
     NoSQL), Redis, Kyoto Cabinet
 •   Can store strings, arrays, hashes




                                               19
Oracle NoSQL
 •   Sign of things to come!
 •   http://www.oracle.com/technetwork/database/
     nosqldb/overview/index.html
 •   Written in Java
 •   Configurable consistency
 •   BerkleyDB as a backend
 •   No single node of failure
 •   Transactions

                                               20
Redis

 •   http://redis.io/
 •   Lots of bindings
 •   Written in C
 •   In-memory, with optional durability
 •   Also a document store




                                           21
Key/Value – eventual consistency
 •   K/V Availability over Consistency
 •   Inspired by Amazon Dynamo
 •   Dynamo based on assumption of high speed
     network links between data centers and
     datacenters are close to each other
 •   A and P from CAP Theorem
 •   Achieve eventual consistency through
     replication and verification
 •   Consistency is eventual
                                                22
Cassandra
 •   http://cassandra.apache.org/
 •   Multidimensional map indexed by key
 •   No single point of failure
 •   Decentralized
 •   Tunable consistency
 •   Used by Facebook, Cisco, IBM, Rackspace




                                               23
Voldemort
 •   http://project-voldemort.com/
 •   Developed by LinkedIn
 •   Written in Java
 •   Developers oriented – a lot of modules are
     pluggable
 •   Strictly key/value




                                                  24
Document stores

 •   Document Databases
 •   Document oriented stores are semi structured
 •   Mostly JSON oriented
 •   Also called schema free rows
 •   Can query by field




                                                25
MongoDB

 •   http://www.mongodb.org/
 •   Schema-free, document-oriented
 •   Written in C++
 •   Lots of interfaces
 •   JSON documents
 •   Query language, supports indexing
 •   Map/Reduce


                                         26
CouchDB

 •   http://couchdb.apache.org/
 •   RESTful API
 •   JSON documents
 •   Written in Erlang
 •   Supports ACID
 •   Map/Reduce
 •   Eventual consistency

                                  27
Graph


 •   Provide ways to store graphs
 •   Provide traversing
 •   Graph oriented functionality




                                    28
Neo4j


 •   http://neo4j.org/
 •   Written in Java
 •   Stores and navigates graphs
 •   Stable and proven
 •   Commercial and free licenses




                                    29
PROS AND CONS OF USAGE


                         30
Pros and Cons


 •   Scalability
 •   Transactional Integrity and Consistency
 •   Data Modeling
 •   Query Support
 •   Access and Interface Availability




                                               31
Typical Usage

 •   Large amount of data
 •   Read/Write balanced?
 •   Read Heavy
 •   Write Heavy
 •   Scan
 •   Geospatial
 •   Map/Reduce
 •   Social data
                            32
Is it for you?


  •   Technology is still developing
  •   Be ready to patch
  •   SQL is easier
  •   Not all startups will end up being Facebooks
  •   Some things can be solvable only with
      NoSQL



                                                     33
TRENDS


         34
Trends
 •   Oracle released Oracle NoSQL!
 •   Adoption of Hadoop soars
 •   SQL like access to NoSQL stores taking form
     – UnSQL -
     http://www.unqlspec.org/display/UnQL/Home
 •   You can participate!




                                               35
Opportunities


 •   Spring Data -
     http://www.springsource.org/spring-data
 •   Cloud Foundry PaaS -
     http://www.cloudfoundry.com/
 •   ORM/Simplification




                                               36
Q/A




      37

More Related Content

Lviv EDGE 2 - NoSQL

  • 1. NoSQL By Zenyk Matchyshyn Staff Engineer, Lohika 1
  • 2. Agenda • History • Architecture vs Technology • Classification • Pros and Cons of usage • Trends • Q/A 2
  • 4. 4
  • 5. History • NoSQL Technologies are not new • Many ideas originate from distributed computing, grid computing and parallel computing • Main drivers: • Scalability • Parallelization • Costs 5
  • 6. Google • In the beginning… there was Google! • Google shared scientific papers: • “The Google File System”, October 2003 • “MapReduce: Simplified Data Processing on Large Clusters”, December 2004 • “Bigtable: A Distributed Storage System for Structured Data”, November 2006 • “The Chubby Lock Service for Loosely- Coupled Distributed Systems”, November 2006 6
  • 7. Amazon • … and Amazon! • “Dynamo: Amazon Highly Available key/value Store”, October 2007 7
  • 8. New technologies! • Creators of Lucene wanted to create a full search solution • Ended up with Hadoop and Hadoop Distributed File System (HDFS) • Success helped adoption and new solutions emerged 8
  • 10. Architecture vs Technology • SQL is not bad, it’s just different • You can use SQL DB in NoSQL way, e.g. MySQL as a key-value database • You can do SQL queries on Hadoop data 10
  • 11. Architecture • The way you store data • The way you query data • Technology environment 11
  • 13. Terms • ACID – Atomicity, Consistency, Isolation, Durability • CAP Theorem – Consistency, Availability, Partition tolerance • Eventual consistency • Hashing • Schema 13
  • 14. Classification • Column oriented stores • Key/Value stores • Key/Value stores with configurable consistency • Document stores • Graph stores 14
  • 15. Chart memcached Scalability & Performance Key/value Column oriented Document store RDBMS Depth of Functionality 15
  • 16. Column oriented • Based on Google Bigtable • Column oriented is a revers of Row oriented • Assumption is that datacenters are transcontinental and connected using standard Internet • C and P from CAP Theorem • Data consistent and partitioned but trouble with availability 16
  • 17. HBase • Spin off from Hadoop project - http://hbase.apache.org/ • Written in Java • A lot of interfaces – Thrift, REST, JRuby, etc. • SQL-like access through Hive - http://hive.apache.org/ • HBase ORM – Surus - https://github.com/mushkevych/surus • Used by Facebook, Hulu, Yahoo!, Ning, etc. 17
  • 18. Hypertable • Developed by Zvents, open sourced • Written in C++ • Running on top of distributed file system • Used by Baidu 18
  • 19. Key/Value • Key/Value Store – Oracle Berkley DB (Oracle NoSQL), Redis, Kyoto Cabinet • Can store strings, arrays, hashes 19
  • 20. Oracle NoSQL • Sign of things to come! • http://www.oracle.com/technetwork/database/ nosqldb/overview/index.html • Written in Java • Configurable consistency • BerkleyDB as a backend • No single node of failure • Transactions 20
  • 21. Redis • http://redis.io/ • Lots of bindings • Written in C • In-memory, with optional durability • Also a document store 21
  • 22. Key/Value – eventual consistency • K/V Availability over Consistency • Inspired by Amazon Dynamo • Dynamo based on assumption of high speed network links between data centers and datacenters are close to each other • A and P from CAP Theorem • Achieve eventual consistency through replication and verification • Consistency is eventual 22
  • 23. Cassandra • http://cassandra.apache.org/ • Multidimensional map indexed by key • No single point of failure • Decentralized • Tunable consistency • Used by Facebook, Cisco, IBM, Rackspace 23
  • 24. Voldemort • http://project-voldemort.com/ • Developed by LinkedIn • Written in Java • Developers oriented – a lot of modules are pluggable • Strictly key/value 24
  • 25. Document stores • Document Databases • Document oriented stores are semi structured • Mostly JSON oriented • Also called schema free rows • Can query by field 25
  • 26. MongoDB • http://www.mongodb.org/ • Schema-free, document-oriented • Written in C++ • Lots of interfaces • JSON documents • Query language, supports indexing • Map/Reduce 26
  • 27. CouchDB • http://couchdb.apache.org/ • RESTful API • JSON documents • Written in Erlang • Supports ACID • Map/Reduce • Eventual consistency 27
  • 28. Graph • Provide ways to store graphs • Provide traversing • Graph oriented functionality 28
  • 29. Neo4j • http://neo4j.org/ • Written in Java • Stores and navigates graphs • Stable and proven • Commercial and free licenses 29
  • 30. PROS AND CONS OF USAGE 30
  • 31. Pros and Cons • Scalability • Transactional Integrity and Consistency • Data Modeling • Query Support • Access and Interface Availability 31
  • 32. Typical Usage • Large amount of data • Read/Write balanced? • Read Heavy • Write Heavy • Scan • Geospatial • Map/Reduce • Social data 32
  • 33. Is it for you? • Technology is still developing • Be ready to patch • SQL is easier • Not all startups will end up being Facebooks • Some things can be solvable only with NoSQL 33
  • 34. TRENDS 34
  • 35. Trends • Oracle released Oracle NoSQL! • Adoption of Hadoop soars • SQL like access to NoSQL stores taking form – UnSQL - http://www.unqlspec.org/display/UnQL/Home • You can participate! 35
  • 36. Opportunities • Spring Data - http://www.springsource.org/spring-data • Cloud Foundry PaaS - http://www.cloudfoundry.com/ • ORM/Simplification 36
  • 37. Q/A 37