NoSQL

1. Data, data, data. I cannot make bricks without clay. Sherlock Holmes, Sherlock Holmes [2009]

2. DataQualitative or Quantitative attributes of a variable or set of variablesLowest level of abstraction from which information and then knowledge are derived.Representation of a fact, figure and idea.

3. A well organized newspaper or a clumsy, cluttered one?

4. Data explosionFrom Gigabytes to Terabytes to Petabytes to perhaps (I’m out of nomenclature)-bytes

5. NoSQL= Not Only SQL!= No to SQL!= Never SQL

6. Open SourceAbridged version of this presentation and notes will be available for everyone.Distributed under no LicenseFREE AS IN SPEECH AND BEER

7. WEB 2.0DDBMSRDBMS performanceOODBRnDCloud ComputingMultiple SolutionsNecessity is the mother of Invention

8. SQL Databases, the ‘Hammer’It’s a wonderful tool

9. Commercial SQL DatabasesEven Gods use itDesignPowerErgonomicsEase of useFeaturesWarrantyUpgradesApart fromHole in the Pocket

10. Nail is a nail, Screw is a screwHammering a screw or Screw driving a nail is FOOLISHNESS!

11. Non-relational next generation operational data stores and databasesWhat?NoSQL is a new look at data to deliver:High Performance

12. Unlimited horizontal scalability

13. Economic, common, unreliable hardware

14. Auto Sharding

15. Support for wide range of data

16. Recursive, Hierarchical

17. Non-Rigid

18. High AvailabilityWhat? (Continued…)Partly or completely independent of RDBMS conceptsNo specific implementationBreakthrough ApproachesKey:Non-relational approachNon-ACIDnessA STEP BACKWARDS, THEN MANY STEPS FORWARD

19. NoSQL, the ‘screwdriver’Yet another tool in our repository to go along with the hammer

20. NoSQL is about choiceNot all problems are nails.Not all screws are same.GOOD PROGRAMMING PRACTICE: Know your tools and use them appropriately

21. SQL DatabasesDataRelationalTabular – Rows/ColumnsInterfaceSqlBasic Design InspirationSet TheoryACID DesignScale Up DesignOracle

22. MySQL

23. Teradata

24. SQLite

25. SQL ServerAnd many more

26. Why? Is all data really relational?

27. If Consistency is ensured, do we have to enforce/check it again at the database level.

28. Are RDBMS ready for challenges of the future like:

29. Dynamic schema/metadata

30. Huge amounts of data

31. Through horizontal auto scaling

32. Ability to handle complex data types

33. Images, Videos, Audios and much moreNot Really!

34. Why? (Continued…)RDBMS drawbacks:ScalabilityCRUDPerformanceWrite OverheadLimited by single disk architectureLack of In Memory designRigid schema designAnd more …..

35. HAMMERSAre under someHammering

36. DRAWBACKSEEPDIVE

37. ScalabilityTrue ScalabilityHorizontal ScalingTransparency to the applicationNo single point of failureProblems with SQL databasesVertical ScalingPartitioning aka ShardingRead SlavesAnti PatternsNormalized DataJoinsACID Transactions

38. No BreadcrumbsCRUD is crudeDelete/Update strategy is improperCRA!Create, Read, Archive – way to go aheadAudit information is lost in CRUD but not in the case of CRA

39. Naive Data SupportNot designed for Complex Data StructuresRecursiveHierarchicalOrdered ListCircularDynamic Metadata

40. Logical/Physical separation concernsRelational model -> Logical ModelRDBMS implement it at physical levelUsing Multiple indicesArtificial overhead in managing the databaseFrequent drop and create index to make DB perform

41. Spinning Disk StorageDesign flaw for most RDBMS systemsWith cheaper memory, Memory based approach should also be included in the designDefiance of Moore’s lawDisk reads grew only 12.5 times in about 50 yearsDisk writes much lesser.Disk write is expensive.RDBMS make things worse by writing more.ACID rains are UNHEALTHY

42. Think ‘Out of the ROM’

43. At Snail’s paceRDBMS engine growth – SLOWOptimizations have been minor since initial daysMajority of growth due to Moore’s lawFaster hardwareSlightly faster storageFaster memoryWhat when Moore’s law diminishes thanks to external factors like heat generated.

44. Database size limitsRDBMS are too slowOver multiterabyte and petabyte databasesPurpose designed parallel processing would be needed to handle such capacities of data in a RDBMS.

45. RDBMS has been there since years and is proven technologyWhat aboutNoSQL

46. RDBMSgrew fast but growth slowed down over time and might eventually reach a stale pointNoSQLunarguably a new immature tool, has been growing faster than RDBMS ever didand is being supported by the Big Players

47. Did you sayBIG PLAYERS!WHO?

48. NoSQL Real World ImplementationsGoogle – BigTable

49. Facebook – Hbase

50. Digg – Cassandra

51. Amazon – Dynamo

52. Trend Micro – Hbase

53. Netflix – Amazon SimpleDB

54. Shutterfly – MongoDB

55. LinkedIn – Voldemortand moreMicrosoft is considering NoSQL as well for Azure services so is TwitterAre we next?Major IT Companies have implemented or even better created their own NoSQL to manage huge Data stores which couldn’t be managed by SQL Databases.

56. We are used to SQL and relatedness, why can’t they just fix RDBMSto handle Big DataSTORAGE SEEK RATESLarge writes and ACID being a huge limitationBig Data can be handled via Scale Out/Partitionability across Multiple Nodes

57. CAP TheoremApplies to distributed shared data system

58. CAP THEOREM

59. A Deeper lookConsistency: The system is in a consistent state after an operationAll clients see the same dataStrong Consistency(ACID) vs. Eventual (BASE)Availability: ‘Always On’ mode, no downtimeAll clients can find some available replicaSoftware/hardware upgrade tolerancePartition Tolerance: The system continues to function even when split into disconnected subsets (by a network disruption)Reads and Writes combined

60. CPSome data maybe inaccessible but rest is accurate/consistent

61. Sharded database

62. TERADATA comes hereCASingle Site ClustersRDBMSPaxosNoSQLAPSystem is still available under partitioning but some of the data returned may be inaccurateAll of the operations in the transaction will complete, or none will.The database will be in a consistent state when the transaction begins and ends.The transaction will behave as if it is the only operation being performed upon the database.Upon completion of the transaction, the operation will not be reversed.AtomicityConsistencyIsolationDurability

63. BasicallyAvailableSoft StateEventually ConsistentWhen Availability and Partitionability are prioritized over Consistency, think in terms of BASE

64. Eventual ConsistencyIf no new updates are made to the object, eventually all accesses will return the last updated value.Ex: Domain Name System (DNS)

65. Types of Eventual ConsistencyRead-your-write consistencySession consistencyMonotonic read consistencyMonotonic write consistencyCausal consistencyPractically, Read-your-write consistency and monotonic read consistency are desirable in an eventually consistent system

66. Hash()Different Apps – Different CAP requirementPrioritize amongConsistency – AvailabilityAvailability – PartitionabilityConsistency - Partitionability

67. WHERE?So will NoSQL eventually replace RDBMSs everywhere?No, RDBMS are there to stay.NoSQL is here to help.

68. Wherever you want to takeAdvantageof NoSQL

69. Big DataDenormalizeShardScale OutAnd look no further than NoSQL

70. Write Intensive ApplicationsI/OpS of the Best storage device <<< n * I/OpS of relatively cheaper storage devices in simple terms: ‘HARNESS THE POWER OF YOUR CLOUD’

71. Fast Key-Value AccessNoSQL – ‘User, you are looking for $value’RDBMS – ‘Query executing ….’A O(1) Hash operation or O(log n) B+/B tree traversals

72. Flexible Schema and Data types‘I once was a integer, then a string then a date; What am I’ - FieldRDBMS – ‘WTH! Whatever you are, You are beyond my scope’

73. Transient DataData – ‘I’m here only for a while and want to get my work done fast’RDBMS – ‘You are data and you shall be treated like the rest’NoSQL – ‘Okay, I’ll allot you space in the RAM using Memcached If available otherwise you still have my cloud’

74. High Write AvailabilityWarning - Incoming data ….NoSQL – ‘Anytime you like, user’RDBMS – ‘This is insane, I’m already busy with other things’

75. ECONOMICSRDBMS – ‘I’m powered by a wonderful, beautiful rabbit’NoSQL – ‘I’m powered by many cute little hamsters’

76. No Single Point of FailureDesigned to run overEconomicCommonly AvailableUnreliable hardware

77. Full table scan operationsMapReduce:Map: To define your problems into optimal sub problems which can be computed in parallel and reduced laterReduce:To merge the sub optimal solutions into the resultDivide and Conquer your way to VictoryPowered by MapReduce! Or something similar

78. Ability to restore, maintain, repair itselfNo DBA required Design

79. HOW?Let us welcome Keys, Values, Collections, Data Structures, Objects, Documents Graphs

80. NoSQL ViewThe basic approach at data:Key/Value storeRun on multiple machinesPartitions and Replication across these machinesRelax consistencyAim at Eventual ConsistencyAsynchronous replicationBut not all NoSQL take the same path.

81. Document StoreKey-Value StoreObjectNoSQLMultivalueGraph StoresBigTable ClonesTuble Store

82. Key-Value StoresOne key, one value, no duplicates and crazy fastDistributed hash tablesThe value is stored as binary object – BLOBThe DB doesn’t understand it and doesn’t want toEx: Amazon Dynamo, MemcacheDB

83. Key4Key3Key2Key1Key/Value store doesn’t know what is in here

84. Document StoreKey-value store, but the value is structured and understood by the DBQuerying data is possibleOn not just the keyEx: MongoDB, CouchDB, Riaketc

85. Each database has collectionsEach collection has a set of documentsThey are well-designed for access through applicationsSuitable for web applicationsFew Document databases provide SQL Like query interface now

86. Key4Key3Key2Key1Name: $NameValue: $ValueVersion: $VersionType: $TypeEmb Object1Objects inside ObjectsCRAZY!Emb Object2

87. BigTable & its ClonesDatabase, tables, rows, columns and ’ SuperColumn’Row consists of columns and SuperColumnsFew supercolumns can be made a mustEach supercolumn – arbitrary set of columnsRows are typically versioned by a system assigned timestamp.

88. Intended for tables with huge number of columnsMillions can also be supported very easily‘a sparse, distributed multi-dimensional sorted map’Also referred to as Wide Column storesEx: Google BigTable, Cassandra, Hbase, Voldemort, Azure Tables

89. Key1Key2Key3

90. Graph DatabasesNodes, Edges, PropertiesReplace traditional tables, columns, rowsGraph database can be implement in different waysKey/value store, columnar, bigtable clone or even combination of theseFields are used to directly store the id of another entity forming the edge

91. Graph database is a multi-relational graphNo need for secondary indexesRelationships in RDBMS are ‘weak’Relationships in Graphs are ‘strong’The rest don’t really care about relations at db level

92. AddressAge: 32MattMobileAprilIs related toSSNSpouseownsDrivesHondaModelCityregistration

93. Key-Value StoreSizeDocument StoreBigTable CloneGraph DatabasesComplexity

94. Too Many Cooks and RecipesNo specific recipe!Major implementations:GraphDocument storeTabularKey value storeEventually consistentHierarchicalOrderedOther Known Recipes:MultivalueObjectTuble Store

95. The MenuOn DiskBigTableMembaseTokyo CabinetIn RAMMemcachedVelocityEventually ConsistentCassandraDynamoRiakHierarchicalGT.MOrderedBerkeley DBNMDBC-ISAMMultivalueeXeOpenQMDocument StoreCouchDBLotus NotesMongoDBGraphAllegroGraphNeo4jDEXTabularBigTableHbaseHyperTableThe list isn’t even a quarter of the whole

96. _theOpenSourceIssueMost of them are open source Thus fork-ablelike LinuxThe first of the lotGoogle’s BigTableAmazon’s DynamoAll in all, there are about 10 roots with 4 major ones.

97. No single database to rule them all

98. Real World ImplementationsDigg’s 3TB for Green Badges [CASSANDRA]Facebook’s 50TB for Inbox Search [HBASE]eBay’s 2PB overall dataGoogle’s

99. Naïve Recipe

100. MongoDBDocument StoreJSON StorageREST ….. Not out of the boxMap/ReduceMaster slave replicationStrong suite of query APIsGood support for SQLWork in Progress:Autosharding based scalabilityFailover supportOpen SourceNon RelationalScalableSchemalessQueryable

101. Document OrientedMongo stores documents in collectionsDocuments are slightly enhanced JSON ObjectsComplex data structures is very much possibleData Modelling is a more natural process

102. Embeddable ObjectsComplexity.begin()Embed objects within a single documentDocument is an enhanced form of object like mentioned earlierThe same thing in RDBMS can be achieved using multiple tables and joining them togetherConsider our requirement is to store a blogging post with this informationPost ContentPost TitlePost Author CommentsComment orderComment contentComment author

103. RDBMS solution

104. MongoDB SolutionDocuments …. Each one of them is a post{ Name: $name, Author: $author,Comment: [ { Author: $author1, Comment: $comment1} , { Author: $author2,Comment: $comment2,Replies: [ { Author: $author3,Comment: $comment3} ] } ] }

105. RDBMS Viewpoint

106. ODFMongodb’ed

108. Schema-lessNo database enforced SchemaAddition, Deletion of columns are simpleIts about how the application uses APIsData definition need not be defined up front.

109. Other FeaturesData TaggingCachingReal Time AnalyticsImage StorageDynamic QueriesBinary Storage

110. MongoDB - Why Not? Lacks transactionsDoesn’t completely support SQLLacks built-in revisioning system like CouchDBLacks full text searching features

111. Try MongoDB @http://try.mongodb.org/

112. EOL

113. Calm down!Eventually Answered SystemAll your questions will be answered eventually

NoSQL

Related slideshows

More Related Content

NoSQL

Editor's Notes