NoSQL
- 1. Data, data, data. I cannot make bricks without clay. Sherlock Holmes, Sherlock Holmes [2009]
- 2. DataQualitative or Quantitative attributes of a variable or set of variablesLowest level of abstraction from which information and then knowledge are derived.Representation of a fact, figure and idea.
- 6. Open SourceAbridged version of this presentation and notes will be available for everyone.Distributed under no LicenseFREE AS IN SPEECH AND BEER
- 10. Nail is a nail, Screw is a screwHammering a screw or Screw driving a nail is FOOLISHNESS!
- 18. High AvailabilityWhat? (Continued…)Partly or completely independent of RDBMS conceptsNo specific implementationBreakthrough ApproachesKey:Non-relational approachNon-ACIDnessA STEP BACKWARDS, THEN MANY STEPS FORWARD
- 20. NoSQL is about choiceNot all problems are nails.Not all screws are same.GOOD PROGRAMMING PRACTICE: Know your tools and use them appropriately
- 27. If Consistency is ensured, do we have to enforce/check it again at the database level.
- 28. Are RDBMS ready for challenges of the future like:
- 38. No BreadcrumbsCRUD is crudeDelete/Update strategy is improperCRA!Create, Read, Archive – way to go aheadAudit information is lost in CRUD but not in the case of CRA
- 39. Naive Data SupportNot designed for Complex Data StructuresRecursiveHierarchicalOrdered ListCircularDynamic Metadata
- 41. Spinning Disk StorageDesign flaw for most RDBMS systemsWith cheaper memory, Memory based approach should also be included in the designDefiance of Moore’s lawDisk reads grew only 12.5 times in about 50 yearsDisk writes much lesser.Disk write is expensive.RDBMS make things worse by writing more.ACID rains are UNHEALTHY
- 43. At Snail’s paceRDBMS engine growth – SLOWOptimizations have been minor since initial daysMajority of growth due to Moore’s lawFaster hardwareSlightly faster storageFaster memoryWhat when Moore’s law diminishes thanks to external factors like heat generated.
- 44. Database size limitsRDBMS are too slowOver multiterabyte and petabyte databasesPurpose designed parallel processing would be needed to handle such capacities of data in a RDBMS.
- 45. RDBMS has been there since years and is proven technologyWhat aboutNoSQL
- 46. RDBMSgrew fast but growth slowed down over time and might eventually reach a stale pointNoSQLunarguably a new immature tool, has been growing faster than RDBMS ever didand is being supported by the Big Players
- 55. LinkedIn – Voldemortand moreMicrosoft is considering NoSQL as well for Azure services so is TwitterAre we next?Major IT Companies have implemented or even better created their own NoSQL to manage huge Data stores which couldn’t be managed by SQL Databases.
- 56. We are used to SQL and relatedness, why can’t they just fix RDBMSto handle Big DataSTORAGE SEEK RATESLarge writes and ACID being a huge limitationBig Data can be handled via Scale Out/Partitionability across Multiple Nodes
- 59. A Deeper lookConsistency: The system is in a consistent state after an operationAll clients see the same dataStrong Consistency(ACID) vs. Eventual (BASE)Availability: ‘Always On’ mode, no downtimeAll clients can find some available replicaSoftware/hardware upgrade tolerancePartition Tolerance: The system continues to function even when split into disconnected subsets (by a network disruption)Reads and Writes combined
- 62. TERADATA comes hereCASingle Site ClustersRDBMSPaxosNoSQLAPSystem is still available under partitioning but some of the data returned may be inaccurateAll of the operations in the transaction will complete, or none will.The database will be in a consistent state when the transaction begins and ends.The transaction will behave as if it is the only operation being performed upon the database.Upon completion of the transaction, the operation will not be reversed.AtomicityConsistencyIsolationDurability
- 64. Eventual ConsistencyIf no new updates are made to the object, eventually all accesses will return the last updated value.Ex: Domain Name System (DNS)
- 65. Types of Eventual ConsistencyRead-your-write consistencySession consistencyMonotonic read consistencyMonotonic write consistencyCausal consistencyPractically, Read-your-write consistency and monotonic read consistency are desirable in an eventually consistent system
- 66. Hash()Different Apps – Different CAP requirementPrioritize amongConsistency – AvailabilityAvailability – PartitionabilityConsistency - Partitionability
- 67. WHERE?So will NoSQL eventually replace RDBMSs everywhere?No, RDBMS are there to stay.NoSQL is here to help.
- 70. Write Intensive ApplicationsI/OpS of the Best storage device <<< n * I/OpS of relatively cheaper storage devices in simple terms: ‘HARNESS THE POWER OF YOUR CLOUD’
- 71. Fast Key-Value AccessNoSQL – ‘User, you are looking for $value’RDBMS – ‘Query executing ….’A O(1) Hash operation or O(log n) B+/B tree traversals
- 72. Flexible Schema and Data types‘I once was a integer, then a string then a date; What am I’ - FieldRDBMS – ‘WTH! Whatever you are, You are beyond my scope’
- 73. Transient DataData – ‘I’m here only for a while and want to get my work done fast’RDBMS – ‘You are data and you shall be treated like the rest’NoSQL – ‘Okay, I’ll allot you space in the RAM using Memcached If available otherwise you still have my cloud’
- 75. ECONOMICSRDBMS – ‘I’m powered by a wonderful, beautiful rabbit’NoSQL – ‘I’m powered by many cute little hamsters’
- 76. No Single Point of FailureDesigned to run overEconomicCommonly AvailableUnreliable hardware
- 77. Full table scan operationsMapReduce:Map: To define your problems into optimal sub problems which can be computed in parallel and reduced laterReduce:To merge the sub optimal solutions into the resultDivide and Conquer your way to VictoryPowered by MapReduce! Or something similar
- 79. HOW?Let us welcome Keys, Values, Collections, Data Structures, Objects, Documents Graphs
- 80. NoSQL ViewThe basic approach at data:Key/Value storeRun on multiple machinesPartitions and Replication across these machinesRelax consistencyAim at Eventual ConsistencyAsynchronous replicationBut not all NoSQL take the same path.
- 82. Key-Value StoresOne key, one value, no duplicates and crazy fastDistributed hash tablesThe value is stored as binary object – BLOBThe DB doesn’t understand it and doesn’t want toEx: Amazon Dynamo, MemcacheDB
- 84. Document StoreKey-value store, but the value is structured and understood by the DBQuerying data is possibleOn not just the keyEx: MongoDB, CouchDB, Riaketc
- 85. Each database has collectionsEach collection has a set of documentsThey are well-designed for access through applicationsSuitable for web applicationsFew Document databases provide SQL Like query interface now
- 87. BigTable & its ClonesDatabase, tables, rows, columns and ’ SuperColumn’Row consists of columns and SuperColumnsFew supercolumns can be made a mustEach supercolumn – arbitrary set of columnsRows are typically versioned by a system assigned timestamp.
- 88. Intended for tables with huge number of columnsMillions can also be supported very easily‘a sparse, distributed multi-dimensional sorted map’Also referred to as Wide Column storesEx: Google BigTable, Cassandra, Hbase, Voldemort, Azure Tables
- 90. Graph DatabasesNodes, Edges, PropertiesReplace traditional tables, columns, rowsGraph database can be implement in different waysKey/value store, columnar, bigtable clone or even combination of theseFields are used to directly store the id of another entity forming the edge
- 91. Graph database is a multi-relational graphNo need for secondary indexesRelationships in RDBMS are ‘weak’Relationships in Graphs are ‘strong’The rest don’t really care about relations at db level
- 94. Too Many Cooks and RecipesNo specific recipe!Major implementations:GraphDocument storeTabularKey value storeEventually consistentHierarchicalOrderedOther Known Recipes:MultivalueObjectTuble Store
- 95. The MenuOn DiskBigTableMembaseTokyo CabinetIn RAMMemcachedVelocityEventually ConsistentCassandraDynamoRiakHierarchicalGT.MOrderedBerkeley DBNMDBC-ISAMMultivalueeXeOpenQMDocument StoreCouchDBLotus NotesMongoDBGraphAllegroGraphNeo4jDEXTabularBigTableHbaseHyperTableThe list isn’t even a quarter of the whole
- 96. _theOpenSourceIssueMost of them are open source Thus fork-ablelike LinuxThe first of the lotGoogle’s BigTableAmazon’s DynamoAll in all, there are about 10 roots with 4 major ones.
- 100. MongoDBDocument StoreJSON StorageREST ….. Not out of the boxMap/ReduceMaster slave replicationStrong suite of query APIsGood support for SQLWork in Progress:Autosharding based scalabilityFailover supportOpen SourceNon RelationalScalableSchemalessQueryable
- 101. Document OrientedMongo stores documents in collectionsDocuments are slightly enhanced JSON ObjectsComplex data structures is very much possibleData Modelling is a more natural process
- 102. Embeddable ObjectsComplexity.begin()Embed objects within a single documentDocument is an enhanced form of object like mentioned earlierThe same thing in RDBMS can be achieved using multiple tables and joining them togetherConsider our requirement is to store a blogging post with this informationPost ContentPost TitlePost Author CommentsComment orderComment contentComment author
- 104. MongoDB SolutionDocuments …. Each one of them is a post{ Name: $name, Author: $author,Comment: [ { Author: $author1, Comment: $comment1} , { Author: $author2,Comment: $comment2,Replies: [ { Author: $author3,Comment: $comment3} ] } ] }
- 108. Schema-lessNo database enforced SchemaAddition, Deletion of columns are simpleIts about how the application uses APIsData definition need not be defined up front.
- 110. MongoDB - Why Not? Lacks transactionsDoesn’t completely support SQLLacks built-in revisioning system like CouchDBLacks full text searching features
Editor's Notes
- SQL Databases approach data in the form of sets and tables. Incidentally its strength soon become its weakness.Assumptions made:Data is represented in the form of tables. Row and ColumnsData in each table can be related to data in another.Data can/has to be searchable through all columns.Strengths:Data manipulation through Set theory.Enforce relational constraints with its management system.Weakness:Relational ness becomes an overhead once data becomes real huge.Large amounts of writes in a SQL database is a lot of burden on the DBMS apart from the storage disk.
- NoSQL is a collection of databases which elude from the drawbacks of RDBMS without completely giving up on Relational Models. They are not stringent when it comes to certain core RDBMS concepts like ACID complianceand other integrity constraints.The priority is to support high levels of scalability through easy partitioning abilities across multiple cheap naïve hardware by giving up on Consistency which SQL databases look at delivering apart from some amount of relatedness from the data.
- The CAP theorem states that any shared-data system can only achieve two of these three.Consistency (All database clients see the same data, even with concurrent updates.)Availability (All database clients are able to access some version of the data.)Partition tolerance (The database can be split over multiple servers.)http://www.julianbrowne.com/article/viewer/brewers-cap-theoremhttp://devblog.streamy.com/2009/08/24/cap-theorem/http://www.royans.net/arch/brewers-cap-theorem-on-distributed-systems/