Navigating NoSQL in cloudy skies
- 6. Agenda
• What NoSQL is & What it is not
• Why NoSQL – 2 specific reasons
• Conceptual Fundamentals & Grounding
• 3 techniques to classify & choose
• Way ahead
- 7. What
• Variety of non-
relational database
systems
• Usually schema-less
• Mostly open-source
• Not anti-RDBMS
• Not a replacement
- 8. No – relational tables –
were harmed in the making of
this presentation.
- 15. 4 Vs of Big Data
Volume Velocity
• Terabytes and Petabytes • Time sensitive real-time
data processing & decision
making
Variety Value
• Of structured and • Inherent value always
unstructured data
- 16. RDBMS can handle all that. Right??
• Scaling up has a limit.
• Sharding - spread data across servers.
• Denormalization - potentially duplicates data in the
database, requiring updates to multiple tables when a
. duplicated data item is changed
• Distributed Caching - caching recently accessed data in memory
and storing that data across any number of servers
. or virtual machines. Think Memcached.
- 17. RDBMS tactics - Downside & Pitfalls
• Re-sharding is disruptive.
• Maintain schema on every server
• Distributed Caching accelerates just the reads
• You lose relational benefits anyway.
- 19. Aggregate-orientation
• Unit of data can have a more complex
structure than a set of simple tuples.
• Excellent fit to run on a cluster.
• Atomic manipulation of single
aggregate.
• Application code takes precedence.
- 22. • Difference between relational model & in-memory data
structures
• Simple tuples
• ORMs provide a bridge ; complicate query
performance.
- 23. { product : "Tintin Statue",
created : Date(’11-16-2010’),
title : "Brass replica of Tintin",
tags : [ "tintin", "herge", "snowy"],
comments : [
{ author : ‘Shankar', comment : 'I love it' },
{ author : ‘Skeet', comment : 'me too!!' } ] }
- 25. 3 properties of distributed databases
• Consistency means that each client always has the
same view of the data.
• Availability - node always available for read and
write.
• Partition tolerance means that the system works
well across physical network partitions.
- 26. consistency availability partition-tolerance only-2-out-of-3
CAP Theorem
- 27. consistency availability
partition-tolerance
This is incorrect
- 32. low-latency order-of-reads delayed-gratification
eventual-consistency
- 33. For the academically inclined:
Proprietary DB high-performance Google App. Engine
Google BigTable
Amazon Dynamo
Proprietary system high-availability AWS key-value
- 35. Object oriented
Faster and Declarative.
Lack of interoperability and recovery standards.
End-to-end development, database &
deployment platform
Embeddable and fast. Lack of querying
capabilities.
- 36. XML
Native XML database systems.
Typically XQuery used as querying mechanism.
Advantage or Disadvantage based on XML affinity.
Sedna Tamino
- 45. Choice By CAP
CA AP CP
RDBMS Riak MongoDB
Dynamo Hbase
Cassandra Redis
CouchDB Hypertable
Voldemort
- 46. C C++ C# Erlang Java
Redis MongoDB RavenDB CouchDB Cassandra
Hypertable GraphDB Couchbase Hadoop
Kyoto Riak HBase
Tycoon
Scalaris neo4J
Voldemort
- 53. We learnt that ...
RDBMSs are here to stay. NoSQL is not creating
a paradigm shift.
NoSQL provides a set of non-relation data
stores & technologies that have affinity for
being processed in a clustered environment.
Some of them NoSQL databases also offer a
solution to Impedance Mismatch thus
increasing application developer productivity.
What Aggregate-Orientation in data modeling
means.
What the different types of database types
are.
And most importantly ... we now know that
RDBMS systems need DBAs - Database
Architects & Admins.
NoSQL systems need DBAs too - Developers
Beyond Awesome!