SlideShare a Scribd company logo
Navigating NoSQL in cloudy skies
Presented at:
Chicago IT Architects Group

Jan 15, 2013
shankar ramachandran
works with:
Microsoft Web Stack of Love
Microsoft SQL Server


also works with:
Skipping
essential steps,
just creates an
illusion of
speed &
growth.
simple.




          5
Agenda
• What NoSQL is & What it is not
• Why NoSQL – 2 specific reasons
• Conceptual Fundamentals & Grounding
• 3 techniques to classify & choose
• Way ahead
What

• Variety of non-
  relational database
  systems
• Usually schema-less
• Mostly open-source



• Not anti-RDBMS
• Not a replacement
No – relational tables –
were harmed in the making of
      this presentation.
Why NoSQL?
Reason #1
Big Data
“Big Data”
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
4 Vs of Big Data
Volume                      Velocity
• Terabytes and Petabytes   • Time sensitive real-time
                              data processing & decision
                              making


Variety                     Value

• Of structured and         • Inherent value always
  unstructured data
RDBMS can handle all that. Right??
• Scaling up has a limit.

• Sharding          - spread data across servers.


• Denormalization - potentially duplicates data in the
                    database, requiring updates to multiple tables when a
.                   duplicated data item is changed


• Distributed Caching - caching recently accessed data in memory
                        and storing that data across any number of servers
.                       or virtual machines. Think Memcached.
RDBMS tactics - Downside & Pitfalls
• Re-sharding is disruptive.


• Maintain schema on every server

• Distributed Caching accelerates just the reads


• You lose relational benefits anyway.
aggregate-oriented
        vs.
aggregate-ignorant
Aggregate-orientation
   • Unit of data can have a more complex
     structure than a set of simple tuples.

   • Excellent fit to run on a cluster.

   • Atomic manipulation of single
     aggregate.

   • Application code takes precedence.
Reason #2
Impedance Mismatch
• Difference between relational model & in-memory data
  structures

• Simple tuples

• ORMs provide a bridge ; complicate query
  performance.
{ product : "Tintin Statue",
created : Date(’11-16-2010’),
title : "Brass replica of Tintin",
tags : [ "tintin", "herge", "snowy"],
comments : [
{ author : ‘Shankar', comment : 'I love it' },
{ author : ‘Skeet', comment : 'me too!!' } ] }
Concepts
3 properties of distributed databases
• Consistency means that each client always has the
  same view of the data.
• Availability - node always available for read and
  write.
• Partition tolerance means that the system works
  well across physical network partitions.
consistency   availability partition-tolerance only-2-out-of-3


              CAP Theorem
consistency   availability



partition-tolerance




                This is incorrect
consistency   availability




partition-tolerance
horizontal-partitioning   multiple-instances shared-nothing

                          sharding
commodity-hardware   distributed infinite-expansion


   horizontal-scalability
google-patented-framework   map: chop data   reduce: fold data


                   MapReduce
low-latency   order-of-reads delayed-gratification


        eventual-consistency
For the academically inclined:

    Proprietary DB   high-performance        Google App. Engine

                     Google BigTable

                     Amazon Dynamo
    Proprietary system   high-availability    AWS key-value
quick shout-out
Object oriented
Faster and Declarative.

Lack of interoperability and recovery standards.



                          End-to-end development, database &
                          deployment platform


                          Embeddable and fast. Lack of querying
                          capabilities.
XML
Native XML database systems.

Typically XQuery used as querying mechanism.

Advantage or Disadvantage based on XML affinity.




      Sedna               Tamino
Choice By Data Model
aggregate-ignorant
Graph

Graph-data structure      associative-datasets node/edges
           Small records with complex interconnections.




                                                      GraphDB
aggregate-oriented
Key/Value
in-memory processing/caching

hyper-efficient associative storage




             Voldemort
Wide-Column

horizontally-partition   fully distributed Dynamo + BigTable
Document-oriented

schema-less   collection-based-JSON-like   dynamic-indexing
Navigating NoSQL in cloudy skies
Choice By CAP

           CA            AP           CP
   RDBMS        Riak          MongoDB
                Dynamo        Hbase
                Cassandra     Redis
                CouchDB       Hypertable
                Voldemort
C       C++          C#        Erlang      Java

Redis   MongoDB      RavenDB   CouchDB     Cassandra

        Hypertable   GraphDB   Couchbase   Hadoop

        Kyoto                  Riak        HBase
        Tycoon
                               Scalaris    neo4J

                                           Voldemort
Way Ahead
What are
Microsoft &
Oracle up to?
Microsoft Polybase
Oracle NoSQL
Navigating NoSQL in cloudy skies
polyglot persistence
  … a highly possible future
We learnt that ...

RDBMSs are here to stay. NoSQL is not creating
a paradigm shift.

NoSQL provides a set of non-relation data
stores & technologies that have affinity for
being processed in a clustered environment.

Some of them NoSQL databases also offer a
solution to Impedance Mismatch thus
increasing application developer productivity.

What Aggregate-Orientation in data modeling
means.

What the different types of database types
are.

And most importantly ... we now know that
RDBMS systems need DBAs - Database
Architects & Admins.
NoSQL systems need DBAs too - Developers
Beyond Awesome!
Twitter: @areshankar
Computers are useless. They can only
                      give you answers.
                           Pablo Picasso
           Cubist painter (1881 - 1973)




?

More Related Content

Navigating NoSQL in cloudy skies

  • 2. Presented at: Chicago IT Architects Group Jan 15, 2013
  • 3. shankar ramachandran works with: Microsoft Web Stack of Love Microsoft SQL Server also works with:
  • 4. Skipping essential steps, just creates an illusion of speed & growth.
  • 6. Agenda • What NoSQL is & What it is not • Why NoSQL – 2 specific reasons • Conceptual Fundamentals & Grounding • 3 techniques to classify & choose • Way ahead
  • 7. What • Variety of non- relational database systems • Usually schema-less • Mostly open-source • Not anti-RDBMS • Not a replacement
  • 8. No – relational tables – were harmed in the making of this presentation.
  • 15. 4 Vs of Big Data Volume Velocity • Terabytes and Petabytes • Time sensitive real-time data processing & decision making Variety Value • Of structured and • Inherent value always unstructured data
  • 16. RDBMS can handle all that. Right?? • Scaling up has a limit. • Sharding - spread data across servers. • Denormalization - potentially duplicates data in the database, requiring updates to multiple tables when a . duplicated data item is changed • Distributed Caching - caching recently accessed data in memory and storing that data across any number of servers . or virtual machines. Think Memcached.
  • 17. RDBMS tactics - Downside & Pitfalls • Re-sharding is disruptive. • Maintain schema on every server • Distributed Caching accelerates just the reads • You lose relational benefits anyway.
  • 18. aggregate-oriented vs. aggregate-ignorant
  • 19. Aggregate-orientation • Unit of data can have a more complex structure than a set of simple tuples. • Excellent fit to run on a cluster. • Atomic manipulation of single aggregate. • Application code takes precedence.
  • 22. • Difference between relational model & in-memory data structures • Simple tuples • ORMs provide a bridge ; complicate query performance.
  • 23. { product : "Tintin Statue", created : Date(’11-16-2010’), title : "Brass replica of Tintin", tags : [ "tintin", "herge", "snowy"], comments : [ { author : ‘Shankar', comment : 'I love it' }, { author : ‘Skeet', comment : 'me too!!' } ] }
  • 25. 3 properties of distributed databases • Consistency means that each client always has the same view of the data. • Availability - node always available for read and write. • Partition tolerance means that the system works well across physical network partitions.
  • 26. consistency availability partition-tolerance only-2-out-of-3 CAP Theorem
  • 27. consistency availability partition-tolerance This is incorrect
  • 28. consistency availability partition-tolerance
  • 29. horizontal-partitioning multiple-instances shared-nothing sharding
  • 30. commodity-hardware distributed infinite-expansion horizontal-scalability
  • 31. google-patented-framework map: chop data reduce: fold data MapReduce
  • 32. low-latency order-of-reads delayed-gratification eventual-consistency
  • 33. For the academically inclined: Proprietary DB high-performance Google App. Engine Google BigTable Amazon Dynamo Proprietary system high-availability AWS key-value
  • 35. Object oriented Faster and Declarative. Lack of interoperability and recovery standards. End-to-end development, database & deployment platform Embeddable and fast. Lack of querying capabilities.
  • 36. XML Native XML database systems. Typically XQuery used as querying mechanism. Advantage or Disadvantage based on XML affinity. Sedna Tamino
  • 37. Choice By Data Model
  • 39. Graph Graph-data structure associative-datasets node/edges Small records with complex interconnections. GraphDB
  • 42. Wide-Column horizontally-partition fully distributed Dynamo + BigTable
  • 43. Document-oriented schema-less collection-based-JSON-like dynamic-indexing
  • 45. Choice By CAP CA AP CP RDBMS Riak MongoDB Dynamo Hbase Cassandra Redis CouchDB Hypertable Voldemort
  • 46. C C++ C# Erlang Java Redis MongoDB RavenDB CouchDB Cassandra Hypertable GraphDB Couchbase Hadoop Kyoto Riak HBase Tycoon Scalaris neo4J Voldemort
  • 52. polyglot persistence … a highly possible future
  • 53. We learnt that ... RDBMSs are here to stay. NoSQL is not creating a paradigm shift. NoSQL provides a set of non-relation data stores & technologies that have affinity for being processed in a clustered environment. Some of them NoSQL databases also offer a solution to Impedance Mismatch thus increasing application developer productivity. What Aggregate-Orientation in data modeling means. What the different types of database types are. And most importantly ... we now know that RDBMS systems need DBAs - Database Architects & Admins. NoSQL systems need DBAs too - Developers Beyond Awesome!
  • 55. Computers are useless. They can only give you answers. Pablo Picasso Cubist painter (1881 - 1973) ?