SlideShare a Scribd company logo
Data, data, data. I cannot make bricks without clay. Sherlock Holmes, Sherlock Holmes [2009]
DataQualitative or Quantitative attributes of a variable or set of variablesLowest level of abstraction from which information and then knowledge are derived.Representation of a fact, figure and idea.
A well organized newspaper or a clumsy, cluttered one?
Data explosionFrom Gigabytes to Terabytes to Petabytes to perhaps (I’m out of nomenclature)-bytes
NoSQL= Not Only SQL!= No to SQL!= Never SQL
Open SourceAbridged version of this presentation and notes will be available for everyone.Distributed under no LicenseFREE AS IN SPEECH AND BEER
WEB 2.0DDBMSRDBMS performanceOODBRnDCloud ComputingMultiple SolutionsNecessity is the mother of Invention
SQL Databases, the ‘Hammer’It’s a wonderful tool
Commercial SQL DatabasesEven Gods use itDesignPowerErgonomicsEase of useFeaturesWarrantyUpgradesApart fromHole in the Pocket
Nail is a nail, Screw is a screwHammering a screw or Screw driving a nail is FOOLISHNESS!
Non-relational next generation operational data stores and databasesWhat?NoSQL is a new look at data to deliver:High Performance
Unlimited horizontal scalability
Economic, common, unreliable hardware
Auto Sharding
Support for wide range of data
Recursive, Hierarchical
Non-Rigid
High AvailabilityWhat? (Continued…)Partly or completely independent of RDBMS conceptsNo specific implementationBreakthrough ApproachesKey:Non-relational approachNon-ACIDnessA STEP BACKWARDS, THEN MANY STEPS FORWARD
NoSQL, the ‘screwdriver’Yet another tool in our repository to go along with the hammer
NoSQL is about choiceNot all problems are nails.Not all screws are same.GOOD PROGRAMMING PRACTICE: Know your tools and use them appropriately
SQL DatabasesDataRelationalTabular – Rows/ColumnsInterfaceSqlBasic Design InspirationSet TheoryACID DesignScale Up DesignOracle
MySQL
Teradata
SQLite
SQL ServerAnd many more
Why? Is all data really relational?
 If Consistency is ensured, do we have to enforce/check it again at the database level.
 Are RDBMS ready for challenges of the future like:
Dynamic schema/metadata
Huge amounts of data
Through horizontal auto scaling
Ability to handle complex data types
Images, Videos, Audios and much moreNot Really!
Why? (Continued…)RDBMS drawbacks:ScalabilityCRUDPerformanceWrite OverheadLimited by single disk architectureLack of In Memory designRigid schema designAnd more …..
HAMMERSAre under someHammering
DRAWBACKSEEPDIVE
ScalabilityTrue ScalabilityHorizontal ScalingTransparency to the applicationNo single point of failureProblems with SQL databasesVertical ScalingPartitioning aka ShardingRead SlavesAnti PatternsNormalized DataJoinsACID Transactions
No BreadcrumbsCRUD is crudeDelete/Update strategy is improperCRA!Create, Read, Archive – way to go aheadAudit information is lost in CRUD but not in the case of CRA
Naive Data SupportNot designed for Complex Data StructuresRecursiveHierarchicalOrdered ListCircularDynamic Metadata
Logical/Physical separation concernsRelational model -> Logical ModelRDBMS implement it at physical levelUsing Multiple indicesArtificial overhead in managing the databaseFrequent drop and create index to make DB perform
Spinning Disk StorageDesign flaw for most RDBMS systemsWith cheaper memory, Memory based approach should also be included in the designDefiance of Moore’s lawDisk reads grew only 12.5 times in about 50 yearsDisk writes much lesser.Disk write is expensive.RDBMS make things worse by writing more.ACID rains are UNHEALTHY
Think ‘Out of the ROM’
At Snail’s paceRDBMS engine growth – SLOWOptimizations have been minor since initial daysMajority of growth due to Moore’s lawFaster hardwareSlightly faster storageFaster memoryWhat when Moore’s law diminishes thanks to external factors like heat generated.
Database size limitsRDBMS are too slowOver multiterabyte and petabyte databasesPurpose designed parallel processing would be needed to handle such capacities of data in a RDBMS.
RDBMS has been there since years and is proven technologyWhat aboutNoSQL
RDBMSgrew fast but growth slowed down over time and might eventually reach a stale pointNoSQLunarguably a new immature tool, has been growing faster than RDBMS ever didand is being supported by the Big Players
Did you sayBIG PLAYERS!WHO?
NoSQL Real World ImplementationsGoogle – BigTable
Facebook – Hbase
Digg – Cassandra
Amazon – Dynamo
Trend Micro – Hbase
Netflix – Amazon SimpleDB
Shutterfly – MongoDB
LinkedIn – Voldemortand moreMicrosoft is considering NoSQL as well for Azure services so is TwitterAre we next?Major IT Companies have implemented or even better created their own NoSQL to manage huge Data stores which couldn’t be managed by SQL Databases.
We are used to SQL and relatedness, why can’t they just fix RDBMSto handle Big DataSTORAGE SEEK RATESLarge writes and ACID being a huge limitationBig Data can be handled via Scale Out/Partitionability across Multiple Nodes
CAP TheoremApplies to distributed shared data system
CAP THEOREM
A Deeper lookConsistency: The system is in a consistent state after an operationAll clients see the same dataStrong Consistency(ACID) vs. Eventual (BASE)Availability: ‘Always On’ mode, no downtimeAll clients can find some available replicaSoftware/hardware upgrade tolerancePartition Tolerance: The system continues to function even when split into disconnected subsets (by a network disruption)Reads and Writes combined
CPSome data maybe inaccessible but rest is accurate/consistent
Sharded database
TERADATA comes hereCASingle Site ClustersRDBMSPaxosNoSQLAPSystem is still available under partitioning but some of the data returned may be inaccurateAll of the operations in the transaction will complete, or none will.The database will be in a consistent state when the transaction begins and ends.The transaction will behave as if it is the only operation being performed upon the database.Upon completion of the transaction, the operation will not be reversed.AtomicityConsistencyIsolationDurability
BasicallyAvailableSoft StateEventually ConsistentWhen Availability and Partitionability are prioritized over Consistency, think in terms of BASE
Eventual ConsistencyIf no new updates are made to the object, eventually all accesses will return the last updated value.Ex: Domain Name System (DNS)
Types of Eventual ConsistencyRead-your-write consistencySession consistencyMonotonic read consistencyMonotonic write consistencyCausal consistencyPractically, Read-your-write consistency and monotonic read consistency are desirable in an eventually consistent system
Hash()Different Apps – Different CAP requirementPrioritize amongConsistency – AvailabilityAvailability – PartitionabilityConsistency - Partitionability
WHERE?So will NoSQL eventually replace RDBMSs everywhere?No, RDBMS are there to stay.NoSQL is here to help.
Wherever you want to takeAdvantageof NoSQL
Big DataDenormalizeShardScale OutAnd look no further than NoSQL
Write Intensive ApplicationsI/OpS of the Best storage device <<< n * I/OpS of relatively cheaper storage devices in simple terms:          ‘HARNESS THE POWER OF YOUR CLOUD’
Fast Key-Value AccessNoSQL – ‘User, you are looking for $value’RDBMS – ‘Query executing ….’A O(1) Hash operation or O(log n) B+/B tree traversals
Flexible Schema and Data types‘I once was a integer, then a string then a date; What am I’  - FieldRDBMS – ‘WTH! Whatever you are, You are beyond my scope’
Transient DataData – ‘I’m here only for a while and want to get my work done fast’RDBMS – ‘You are data and you shall be treated like the rest’NoSQL – ‘Okay, I’ll allot you space in the RAM using Memcached If available otherwise you still have my cloud’
High Write AvailabilityWarning - Incoming data ….NoSQL – ‘Anytime you like, user’RDBMS – ‘This is insane, I’m already busy  with other things’
ECONOMICSRDBMS – ‘I’m powered by a wonderful, beautiful rabbit’NoSQL – ‘I’m powered by many cute little hamsters’
No Single Point of FailureDesigned to run overEconomicCommonly AvailableUnreliable hardware
Full table scan operationsMapReduce:Map: To define your problems into optimal sub problems which can be computed in parallel and reduced laterReduce:To merge the sub optimal solutions into the resultDivide and Conquer your way to VictoryPowered by MapReduce! Or something similar
Ability to restore, maintain, repair itselfNo DBA required Design
HOW?Let us welcome Keys, Values, Collections, Data Structures, Objects, Documents  Graphs
NoSQL ViewThe basic approach at data:Key/Value storeRun on multiple machinesPartitions and Replication across these machinesRelax consistencyAim at Eventual ConsistencyAsynchronous replicationBut not all NoSQL take the same path.
Document StoreKey-Value StoreObjectNoSQLMultivalueGraph StoresBigTable ClonesTuble Store
Key-Value StoresOne key, one value, no duplicates and crazy fastDistributed hash tablesThe value is stored as binary object – BLOBThe DB doesn’t understand it and doesn’t want toEx: Amazon Dynamo, MemcacheDB
Key4Key3Key2Key1Key/Value store doesn’t know what is in here
Document StoreKey-value store, but the value is structured and understood by the DBQuerying data is possibleOn not just the keyEx: MongoDB, CouchDB, Riaketc

More Related Content

NoSQL

Editor's Notes

  1. SQL Databases approach data in the form of sets and tables. Incidentally its strength soon become its weakness.Assumptions made:Data is represented in the form of tables. Row and ColumnsData in each table can be related to data in another.Data can/has to be searchable through all columns.Strengths:Data manipulation through Set theory.Enforce relational constraints with its management system.Weakness:Relational ness becomes an overhead once data becomes real huge.Large amounts of writes in a SQL database is a lot of burden on the DBMS apart from the storage disk.
  2. NoSQL is a collection of databases which elude from the drawbacks of RDBMS without completely giving up on Relational Models. They are not stringent when it comes to certain core RDBMS concepts like ACID complianceand other integrity constraints.The priority is to support high levels of scalability through easy partitioning abilities across multiple cheap naïve hardware by giving up on Consistency which SQL databases look at delivering apart from some amount of relatedness from the data.
  3. The CAP theorem states that any shared-data system can only achieve two of these three.Consistency (All database clients see the same data, even with concurrent updates.)Availability (All database clients are able to access some version of the data.)Partition tolerance (The database can be split over multiple servers.)http://www.julianbrowne.com/article/viewer/brewers-cap-theoremhttp://devblog.streamy.com/2009/08/24/cap-theorem/http://www.royans.net/arch/brewers-cap-theorem-on-distributed-systems/