SlideShare a Scribd company logo
JSON Data
Modeling
Matthew D. Groves, @mgroves
Modeling Data in a Relational World
2
Billing
ConnectionsPurchases
Contacts
Customer
Where am I?
3
• GDG Indy
• https://www.meetup.com/indy-gdg
Who am I?
4
• Matthew D. Groves
• Developer Advocate for Couchbase
• @mgroves on Twitter
• Podcast and blog: https://crosscuttingconcerns.com
• "I am not an expert, but I am an enthusiast." – Alan Stevens
by @natelovett
JSON Data
Modeling
Matthew D. Groves, @mgroves
6
AGENDA
01/ Why NoSQL?
02/ JSON Data Modeling
03/ Accessing Data
04/ Migrating Data
05/ Summary / Q&A
Why NoSQL?
7
1
NoSQL Landscape
Document
• Couchbase
• MongoDB
• DynamoDB
• Firestore
Graph
• OrientDB
• Neo4J
• CosmosDB
Key-Value
• Couchbase
• Riak
• BerkeleyDB
• Redis Wide Column
• Hbase
• Cassandra
• Hypertable
NoSQL Landscape
• Get by key(s)
• Set by key(s)
• Replace by key(s)
• Delete by key(s)
• Map/Reduce
Document
• Couchbase
• MongoDB
• DynamoDB
• Firestore
What's NoSQL?
1
0
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
Why NoSQL? Scalability
Why NoSQL? Flexibility
Why NoSQL? Availability
Why NoSQL? Performance
Use Cases for NoSQL
• Communication
• Gaming
• Advertising
• Travel booking
• Loyalty programs
• Fraud monitoring
• Social media
• Finance
• Caching
• Session
• User profile
• Catalog
• Content management
• Personalization
• Customer 360
• IoT
https://www.couchbase.com/customers
Use Cases
1
6
JSON Data
Modeling
1
7
2
Properties of Real-World Data
1
8
Modeling Data in a Relational World
1
9
Billing
ConnectionsPurchases
Contacts
Customer
CustomerID Name DOB
CBL2015 Jane Smith 1990-01-30
Table: Customer
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30”
}
Customer DocumentKey: CBL2015
©2017 Couchbase Inc. 21
CustomerID Name DOB
CBL2015 Jane Smith 1990-01-30
Table: Customer {
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Purchases" : [
{
"item" : "laptop",
"amount" : 1499.99,
"date" : "2019-03",
}
]
}
Customer DocumentKey: CBL2015
CustomerID Item Amount Date
CBL2015 laptop 1499.99 2019-03
Table: Purchases
CustomerID Name DOB
CBL2015 Jane Smith 1990-01-30
Table: Customer {
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Purchases" : [
{
"item" : "laptop",
"amount" : 1499.99,
"date" : "2019-03",
},
{
"item" : "phone",
"amount" : 99.99,
"date" : "2018-12"
}
]
}
Customer DocumentKey: CBL2015
CustomerID Item Amount Date
CBL2015 laptop 1499.99 2019-03
CBL2015 phone 99.99 2018-12
Table: Purchases
CustomerID ConnId Relation
CBL2015 XYZ987 Brother
CBL2015 SKR007 Father
Table: Connections {
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Billing" : [
{
"type" : "visa",
"cardnum" : "5827-2842-...",
"expiry" : "2019-03"
}, ...
],
"Connections" : [
{
"ConnId" : "XYZ987",
"Relation" : "Brother"
},
{
"ConnId" : "SKR007",
"Relation" : "Father"
}
}
Customer DocumentKey: CBL201
©2017 Couchbase Inc. 24
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"cardnum" : "5827-2842…",
"expiry" : "2019-03",
"cardType" : "visa",
"Connections" : [
{
"CustId" : "XYZ987",
"Relation" : "Brother"
},
{
"CustId" : "SKR007",
" Relation " : "Father"
}
],
"Purchases" : [
{ "id":12, item: "mac", "amt": 2823.52
}
{ "id":19, item: "ipad2", "amt": 623.52
}
]
}
DocumentKey: CBL2015
Custome
rID
Name DOB Cardnum Expiry CardType
CBL201
5
Jane
Smith
1990-01-
30
5827-
2842…
2019-03 visa
CustomerI
D
ConnId Relation
CBL2015 XYZ987 Brother
CBL2015 SKR007 Father
CustomerI
D
item amt
CBL2015 mac 2823.5
2
CBL2015 ipad2 623.52
CustomerI
D
ConnId Name
CBL2015 XYZ987 Joe
Smith
CBL2015 SKR007 Sam
Smith
Contacts
Customer
ConnectionsPurchases
{
"Name" : "Bob Jones",
"DOB" : "1980-01-29",
"Billing" : [
{
"type" : "visa",
"cardnum" : "5927-2842-2847-3909",
"expiry" : "2020-03"
},
{
"type" : "master",
"cardnum" : "6273-2842-2847-3909",
"expiry" : "2019-11"
}
],
"Connections" : [
{
"CustId" : "XYZ987",
"Relation" : "Brother"
},
{
"CustId" : "PQR823",
"Relation" : "Father"
}
],
"Purchases" : [
{ "id":12, item: "mac", "amt": 2823.52 },
{ "id":19, item: "ipad2", "amt": 623.52 }
]
}
DocumentKey: CBL2016
CustomerID Name DOB
CBL2016 Bob Jones 1980-01-29
Custome
rID
Type Cardnum Expiry
CBL2016 visa 5927… 2020-03
CBL2016 maste
r
6273… 2019-11
CustomerI
D
ConnId Relation
CBL2016 XYZ987 Brother
CBL2016 SKR007 Father
CustomerI
D
item amt
CBL2016 mac 2823.5
2
CBL2016 ipad2 623.52
CustomerI
D
ConnI
d
Name
CBL201
6
XYZ98
7
Joe
Smith
CBL201
6
SKR0
07
Sam
Smith
Contacts
Customer
Billing
ConnectionsPurchases
Relationship is one-to-one or one-to-many
Store related data as nested objects
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Purchases" : [
{
"item" : "laptop",
"amount" : 1499.99,
"date" : "2019-03",
},
{
"item" : "phone",
"amount" : 99.99,
"date" : "2018-12"
}
]
}
Modeling your data: Strategies / rules of thumb
Relationship is many-to-one or many-to-
many
Store related data as separate documents
{
"Name" : "Jane
Smith",
"DOB" : "1990-01-
30",
"Connections" : [
"XYZ987",
"PQR823",
"PQR828"
]
}
Modeling your data: Strategies / rules of thumb
Modeling tools
• Hackolade
• Erwin DM NoSQL
• Idera ER/Studio
Accessing Data
2
9
3
Data reads are mostly parent fields
Store children as separate documents
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Connections" : [
"XYZ987",
"PQR823",
"PQR828"
]
}
Modeling your data: Strategies / rules of thumb
Data reads are mostly parent + child fields
Store children as nested objects
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Purchases" : [
{
"item" : "laptop",
"amount" : 1499.99,
"date" : "2019-03",
},
{
"item" : "phone",
"amount" : 99.99,
"date" : "2018-12"
}
]
}
Modeling your data: Strategies / rules of thumb
Data writes are mostly parent or child (not
both)
Store children as separate documents
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Connections" : [
"XYZ987",
"PQR823",
"PQR828"
]
}
Modeling your data: Strategies / rules of thumb
Data writes are mostly parent and child (both)
Store children as nested objects
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Purchases" : [
{
"item" : "laptop",
"amount" : 1499.99,
"date" : "2019-03",
},
{
"item" : "phone",
"amount" : 99.99,
"date" : "2018-12"
}
]
}
Modeling your data: Strategies / rules of thumb
If … Then …
Relationship is one-to-one or one-to-many Store related data as nested objects
Relationship is many-to-one or many-to-
many
Store related data as separate documents
Data reads are mostly parent fields Store children as separate documents
Data reads are mostly parent + child fields Store children as nested objects
Data writes are mostly parent or child (not
both)
Store children as separate documents
Data writes are mostly parent and child
(both)
Store children as nested objects
Modeling your data: Strategies / rules of thumb
Accessing your data (Couchbase)
Key-Value
(CRUD)
N1QL
(SQL
Query)
Full Text
(Search)
Documents
Indexes Indexes
Views
(JS Query)
Analytics
(Query)
MapReduc
e
SQL++
Key/Value
public ShoppingCart GetCartById(string id)
{
return _bucket.Get<ShoppingCart>(id).Value;
}
public void CreateShoppingCart()
{
_bucket.Insert(new Document<ShoppingCart>
{
Id = "shopping-cart-1",
Content = new ShoppingCart { . . . }
});
}
Key/Value: Recommendations for keys
•Natural Keys
•Human Readable
•Deterministic
•Semantic
Key/Value: Example keys
• author::matt
• author::matt::blogs
• blog::csharp_8_features
• blog::csharp_8_features::comments
Subdocument access
3
9
{
"username": "mgroves",
"profile": {
"phoneNumber": "123-456-7890",
"address": {
"street": "123 main st",
"city": "Grove City",
"state": "Ohio"
}
}
}
Subcollection (Firestore)
4
0
N1QL
Understanding your Query Plan
Full Text Search
Concept Strategies & Recommendations
Key-Value Operations provide the best
possible performance
• Create an effective key naming strategy
• Create an optimized data model
Full Text Search is well-suited to text • Facets / ranges / geography
• Language aware
N1QL queries provide the most flexibility –
everything else
• Query data regardless of how it is
modeled
• Good indexing is vital
Accessing your data: Strategies and recommendation
Migrating Data
4
5
4
Migration options: Requirements
ETL / data cleanse / data enrichment
Migration options: Tools
Migration options: BYO
Migration options: KISS
Export
Transform
Import
StagingNoSQL
Relational
Migration Recommendations: Align
Migration Recommendations: Expect Failure
Migration Recommendations: Ensure
Sync NoSQL and relational? Automatic Replication
Couchbase
KafkaSource Sink
RDBMSDCP
Stream
How can you sync NoSQL and relational?
RDBMS
Handler / Eventing
Couchbase
GoldenGate
https://github.com/mahurtado/CouchbaseGoldenGateAdapter
Sync NoSQL and relational? Manual.
Summary
5
6
5
Pick the right
application
Summary
Proof of Concept
Summary
Match the data
access method to
requirements
Summary
https://blog.couchbase.com/proof-of-concept-move-
relational/
https://blog.couchbase.com/json-data-modeling-rdbms-users/
Resources: Blog posts
Resources: Me!
6
1
•@mgroves
•twitch.tv/matthewdgroves
•forums.couchbase.com
Frequently Asked Questions
6
2
1. How is Couchbase different than Mongo?
2. Is Couchbase the same thing as CouchDb?
3. How tall are you? Do you play basketball?
4. What is the Couchbase licensing situation?
5. Is Couchbase a Managed Cloud Service (DBaaS)?
Managed Cloud Server (DBaaS)
6
3
< Back
https://www.couchbase.com/products/cloud
MongoDB vs Couchbase
6
4
• Architecture
• Memory first architecture
• Master-master architecture
• Auto-sharding
• Features
• SQL (N1QL)
• Full Text Search
• Analytics (NoETL)
< Back
Licensing
6
5
< Back
Couchbase Server Community
• Source code is Open Source (Apache 2)
• Binary release is one release behind Enterprise (except major versions)
• Free to use in dev/test/qa/prod
• Forum support only
Couchbase Server Enterprise
• Source code is mostly Open Source (Apache 2)
• Some features not available on Community (XDCR TLS, MDS, Rack Zone,
etc)
• Free to use in dev/test/qa
• Need commercial license for prod
• Paid support provided
CouchDB and Couchbase
6
6
< Back
memcached

More Related Content

JSON Data Modeling - GDG Indy - April 2020

  • 1. JSON Data Modeling Matthew D. Groves, @mgroves
  • 2. Modeling Data in a Relational World 2 Billing ConnectionsPurchases Contacts Customer
  • 3. Where am I? 3 • GDG Indy • https://www.meetup.com/indy-gdg
  • 4. Who am I? 4 • Matthew D. Groves • Developer Advocate for Couchbase • @mgroves on Twitter • Podcast and blog: https://crosscuttingconcerns.com • "I am not an expert, but I am an enthusiast." – Alan Stevens by @natelovett
  • 5. JSON Data Modeling Matthew D. Groves, @mgroves
  • 6. 6 AGENDA 01/ Why NoSQL? 02/ JSON Data Modeling 03/ Accessing Data 04/ Migrating Data 05/ Summary / Q&A
  • 8. NoSQL Landscape Document • Couchbase • MongoDB • DynamoDB • Firestore Graph • OrientDB • Neo4J • CosmosDB Key-Value • Couchbase • Riak • BerkeleyDB • Redis Wide Column • Hbase • Cassandra • Hypertable
  • 9. NoSQL Landscape • Get by key(s) • Set by key(s) • Replace by key(s) • Delete by key(s) • Map/Reduce Document • Couchbase • MongoDB • DynamoDB • Firestore
  • 10. What's NoSQL? 1 0 Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
  • 15. Use Cases for NoSQL • Communication • Gaming • Advertising • Travel booking • Loyalty programs • Fraud monitoring • Social media • Finance • Caching • Session • User profile • Catalog • Content management • Personalization • Customer 360 • IoT https://www.couchbase.com/customers
  • 19. Modeling Data in a Relational World 1 9 Billing ConnectionsPurchases Contacts Customer
  • 20. CustomerID Name DOB CBL2015 Jane Smith 1990-01-30 Table: Customer { "Name" : "Jane Smith", "DOB" : "1990-01-30” } Customer DocumentKey: CBL2015
  • 21. ©2017 Couchbase Inc. 21 CustomerID Name DOB CBL2015 Jane Smith 1990-01-30 Table: Customer { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Purchases" : [ { "item" : "laptop", "amount" : 1499.99, "date" : "2019-03", } ] } Customer DocumentKey: CBL2015 CustomerID Item Amount Date CBL2015 laptop 1499.99 2019-03 Table: Purchases
  • 22. CustomerID Name DOB CBL2015 Jane Smith 1990-01-30 Table: Customer { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Purchases" : [ { "item" : "laptop", "amount" : 1499.99, "date" : "2019-03", }, { "item" : "phone", "amount" : 99.99, "date" : "2018-12" } ] } Customer DocumentKey: CBL2015 CustomerID Item Amount Date CBL2015 laptop 1499.99 2019-03 CBL2015 phone 99.99 2018-12 Table: Purchases
  • 23. CustomerID ConnId Relation CBL2015 XYZ987 Brother CBL2015 SKR007 Father Table: Connections { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Billing" : [ { "type" : "visa", "cardnum" : "5827-2842-...", "expiry" : "2019-03" }, ... ], "Connections" : [ { "ConnId" : "XYZ987", "Relation" : "Brother" }, { "ConnId" : "SKR007", "Relation" : "Father" } } Customer DocumentKey: CBL201
  • 24. ©2017 Couchbase Inc. 24 { "Name" : "Jane Smith", "DOB" : "1990-01-30", "cardnum" : "5827-2842…", "expiry" : "2019-03", "cardType" : "visa", "Connections" : [ { "CustId" : "XYZ987", "Relation" : "Brother" }, { "CustId" : "SKR007", " Relation " : "Father" } ], "Purchases" : [ { "id":12, item: "mac", "amt": 2823.52 } { "id":19, item: "ipad2", "amt": 623.52 } ] } DocumentKey: CBL2015 Custome rID Name DOB Cardnum Expiry CardType CBL201 5 Jane Smith 1990-01- 30 5827- 2842… 2019-03 visa CustomerI D ConnId Relation CBL2015 XYZ987 Brother CBL2015 SKR007 Father CustomerI D item amt CBL2015 mac 2823.5 2 CBL2015 ipad2 623.52 CustomerI D ConnId Name CBL2015 XYZ987 Joe Smith CBL2015 SKR007 Sam Smith Contacts Customer ConnectionsPurchases
  • 25. { "Name" : "Bob Jones", "DOB" : "1980-01-29", "Billing" : [ { "type" : "visa", "cardnum" : "5927-2842-2847-3909", "expiry" : "2020-03" }, { "type" : "master", "cardnum" : "6273-2842-2847-3909", "expiry" : "2019-11" } ], "Connections" : [ { "CustId" : "XYZ987", "Relation" : "Brother" }, { "CustId" : "PQR823", "Relation" : "Father" } ], "Purchases" : [ { "id":12, item: "mac", "amt": 2823.52 }, { "id":19, item: "ipad2", "amt": 623.52 } ] } DocumentKey: CBL2016 CustomerID Name DOB CBL2016 Bob Jones 1980-01-29 Custome rID Type Cardnum Expiry CBL2016 visa 5927… 2020-03 CBL2016 maste r 6273… 2019-11 CustomerI D ConnId Relation CBL2016 XYZ987 Brother CBL2016 SKR007 Father CustomerI D item amt CBL2016 mac 2823.5 2 CBL2016 ipad2 623.52 CustomerI D ConnI d Name CBL201 6 XYZ98 7 Joe Smith CBL201 6 SKR0 07 Sam Smith Contacts Customer Billing ConnectionsPurchases
  • 26. Relationship is one-to-one or one-to-many Store related data as nested objects { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Purchases" : [ { "item" : "laptop", "amount" : 1499.99, "date" : "2019-03", }, { "item" : "phone", "amount" : 99.99, "date" : "2018-12" } ] } Modeling your data: Strategies / rules of thumb
  • 27. Relationship is many-to-one or many-to- many Store related data as separate documents { "Name" : "Jane Smith", "DOB" : "1990-01- 30", "Connections" : [ "XYZ987", "PQR823", "PQR828" ] } Modeling your data: Strategies / rules of thumb
  • 28. Modeling tools • Hackolade • Erwin DM NoSQL • Idera ER/Studio
  • 30. Data reads are mostly parent fields Store children as separate documents { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Connections" : [ "XYZ987", "PQR823", "PQR828" ] } Modeling your data: Strategies / rules of thumb
  • 31. Data reads are mostly parent + child fields Store children as nested objects { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Purchases" : [ { "item" : "laptop", "amount" : 1499.99, "date" : "2019-03", }, { "item" : "phone", "amount" : 99.99, "date" : "2018-12" } ] } Modeling your data: Strategies / rules of thumb
  • 32. Data writes are mostly parent or child (not both) Store children as separate documents { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Connections" : [ "XYZ987", "PQR823", "PQR828" ] } Modeling your data: Strategies / rules of thumb
  • 33. Data writes are mostly parent and child (both) Store children as nested objects { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Purchases" : [ { "item" : "laptop", "amount" : 1499.99, "date" : "2019-03", }, { "item" : "phone", "amount" : 99.99, "date" : "2018-12" } ] } Modeling your data: Strategies / rules of thumb
  • 34. If … Then … Relationship is one-to-one or one-to-many Store related data as nested objects Relationship is many-to-one or many-to- many Store related data as separate documents Data reads are mostly parent fields Store children as separate documents Data reads are mostly parent + child fields Store children as nested objects Data writes are mostly parent or child (not both) Store children as separate documents Data writes are mostly parent and child (both) Store children as nested objects Modeling your data: Strategies / rules of thumb
  • 35. Accessing your data (Couchbase) Key-Value (CRUD) N1QL (SQL Query) Full Text (Search) Documents Indexes Indexes Views (JS Query) Analytics (Query) MapReduc e SQL++
  • 36. Key/Value public ShoppingCart GetCartById(string id) { return _bucket.Get<ShoppingCart>(id).Value; } public void CreateShoppingCart() { _bucket.Insert(new Document<ShoppingCart> { Id = "shopping-cart-1", Content = new ShoppingCart { . . . } }); }
  • 37. Key/Value: Recommendations for keys •Natural Keys •Human Readable •Deterministic •Semantic
  • 38. Key/Value: Example keys • author::matt • author::matt::blogs • blog::csharp_8_features • blog::csharp_8_features::comments
  • 39. Subdocument access 3 9 { "username": "mgroves", "profile": { "phoneNumber": "123-456-7890", "address": { "street": "123 main st", "city": "Grove City", "state": "Ohio" } } }
  • 41. N1QL
  • 44. Concept Strategies & Recommendations Key-Value Operations provide the best possible performance • Create an effective key naming strategy • Create an optimized data model Full Text Search is well-suited to text • Facets / ranges / geography • Language aware N1QL queries provide the most flexibility – everything else • Query data regardless of how it is modeled • Good indexing is vital Accessing your data: Strategies and recommendation
  • 46. Migration options: Requirements ETL / data cleanse / data enrichment
  • 53. Sync NoSQL and relational? Automatic Replication Couchbase KafkaSource Sink RDBMSDCP Stream
  • 54. How can you sync NoSQL and relational? RDBMS Handler / Eventing Couchbase GoldenGate https://github.com/mahurtado/CouchbaseGoldenGateAdapter
  • 55. Sync NoSQL and relational? Manual.
  • 59. Match the data access method to requirements Summary
  • 62. Frequently Asked Questions 6 2 1. How is Couchbase different than Mongo? 2. Is Couchbase the same thing as CouchDb? 3. How tall are you? Do you play basketball? 4. What is the Couchbase licensing situation? 5. Is Couchbase a Managed Cloud Service (DBaaS)?
  • 63. Managed Cloud Server (DBaaS) 6 3 < Back https://www.couchbase.com/products/cloud
  • 64. MongoDB vs Couchbase 6 4 • Architecture • Memory first architecture • Master-master architecture • Auto-sharding • Features • SQL (N1QL) • Full Text Search • Analytics (NoETL) < Back
  • 65. Licensing 6 5 < Back Couchbase Server Community • Source code is Open Source (Apache 2) • Binary release is one release behind Enterprise (except major versions) • Free to use in dev/test/qa/prod • Forum support only Couchbase Server Enterprise • Source code is mostly Open Source (Apache 2) • Some features not available on Community (XDCR TLS, MDS, Rack Zone, etc) • Free to use in dev/test/qa • Need commercial license for prod • Paid support provided
  • 66. CouchDB and Couchbase 6 6 < Back memcached

Editor's Notes

  1. Most developers are probably already familiar with the relational way of modeling data But if you want the benefits of a non-relational database, you have to think differently about modeling
  2. Spend just a little time on why people are using NoSQL Talk about how data is modeled differently in JSON Let’s talk about why SQL is good and why SQL for JSON is needed Talk about accessing data, since that has an effect on modeling Maybe we'll get to migrating/syncing data from relational to nosql
  3. SQL (relational) databases are great. They give you LOT OF functionality. Great set of abstractions (tables, columns, data types, constraints, triggers, SQL, ACID TRANSACTIONS, stored procedures and more) at a highly reasonable cost. Change is inevitable One thing RDBMS does not handle well is CHANGE. Change of schema (both logical and physical), change of hardware, change of capacity. NoSQL databases ESPECIALLY ONES DESIGNED TO BE DISTRIBUTED tend to help solve problems with: agility, scalability, performance, and availability
  4. Let’s talk about what NoSQL is, first. NoSQL generally refers to databases which lack SQL or don’t use a relational model Once the SQL language, transaction became optional, flurry of databases were created using distinct approaches for common use-cases. KEY-Value simply provided quick access to data for a given KEY. Wide Column databases can store large number of arbitrary columns in each row Graph databases store data and relationships as first class concepts Document databases aggregate data into a hierarchical structure. With JSON is a means to the end. Document databases provide flexible schema,built-in data types, rich structure, implicit relationships using JSON.
  5. When we look at document databases, they originally came with a Minimal set of APIs and features But as they continue to mature, we’re seeing more features being added And generally I’m seeing a convergent trend between relational and NoSQL But anyway, this set of minimal features, lacking a SQL language and tables gives us the buzzword “nosql”
  6. Think of a document database at the simplest as a type of a key/value store, where the value is in a known format You write code where you start with a key, and you ask the database to return the document That corresponds to that key. And the same with creating/updating
  7. If you are using a cloud based DBaaS, this is basically what's going on behind the scenes Elastic scaling Size your cluster for today Scale out on demand Cost effective scaling Commodity hardware On premise or on cloud Scale OUT instead of Scale UP [example: changing the channel to a soccer game or Game of Thrones, everyone makes the same API request in the same 5 minutes] [example: TV show lets watchers vote during some period of the week, so you can scale up during that period of time] [example: black Friday]
  8. Schema flexibility Easier management of change in the business requirements Easier management of change in the structure of the data Sometimes you're pulling together data, integrating from different sources (e.g. ELT) and that flexibility helps Document database means that you have no rigid schema. You can do whatever the heck you want. That being said, you SHOULDN’T. You should still have discipline about your data.
  9. If one machine goes down, customers can still use the other. Or if you need to perform maintenance, upgrade, etc, you don't have to take the whole system down This is related to scaling Built-in replication and fail-over No application downtime when hardware fails Online maintenance & upgrade No application downtime
  10. NoSQL systems are optimized for specific access patterns Low response time for web & mobile user experience Millisecond latency Consistently high throughput to handle growth [perf measures can be subjective – talk about architecture, integrated cache, maybe mention MDS too]
  11. NoSQL is very versatile and can be used for a wide variety of use cases Including, but NOT limited to these If you're exploring NoSQL, make sure you have the right project or right use case. Using a NoSQL database does NOT mean you have to abandon relational databases. Most large websites use a combination. And it's also worth pointing out that plenty of companies are doing (most) of these use cases with relational as well. Relational usually is at least mediocre, NoSQL may be BETTER But usually the catalyst is one of the earlier reasons: performance, flexibility, scale.
  12. Let’s talk about data modeling a bit, because storing data in JSON Is different that storing in tables.
  13. Let’s look at modeling Customer data. This is an example of what a customer might look like. You might do this as part of a proof of concept, discovery, requirements gathering, planning, etc There is a rich structure: attributes, potentially sub-attributes (first name and last name) Relationships: to other data (other customers, to products perhaps) Value evolution: Maybe we’d start with one purchase, add more as Helen makes more purchases Structure evolution: Maybe we start will billing information being properties of Helen, then evolve later to be multiple billing options
  14. Let’s look at modeling Customer data. This is an example of what a customer might look like There is a rich structure: attributes, potentially sub-attributes (first name and last name) Relationships: to other data (other customers, to products perhaps) Value evolution: Maybe we’d start with one purchase, add more as Helen makes more purchases Structure evolution: Maybe we start will billing information being properties of Helen, then evolve later to be multiple billing options
  15. Let’s see how to represent customer data in JSON. The primary (CustomerID) becomes the DocumentKey Column name-Column value becomes KEY-VALUE pair.
  16. We aren’t normal form anymore Rich Structure & Relationships Billing information is stored as a sub-document There could be more than a single credit card. So, use an array.
  17. Value evolution Simply add additional array element or update a value.
  18. Structure evolution Simply add new key-value pairs No downtime to add new KV pairs Applications can validate data Structure evolution over time. Relations via Reference
  19. So, finally, you have a JSON document that represents a CUSTOMER. In a single JSON document, relationship between the data is implicit by use of sub-structures and arrays and arrays of sub-structures.
  20. So, finally, you have a JSON document that represents a CUSTOMER. In a single JSON document, relationship between the data is implicit by use of sub-structures and arrays and arrays of sub-structures.
  21. Hackolade supports couchbase, mongo, elastic, Cassandra, dynamo, firebase/firestore Erwin supports mongodb and couchbase Idera supports just mongodb
  22. The way you plan to get data out can also affect the way you model your data
  23. What types of relationships are being modeled? How are the relationships accessed?
  24. I've mostly been talking about key/value access Most NoSQL databases will have at least one other way to access data besides key/value. What does this have to do with modeling? Because modeling doesn't exist in a vacuum. You have to think about how you are going to interact with your data. I'm going to show you some examples from Couchbase. In Couchbase, we have N1QL, which is ANSI SQL for JSON I'm also going to briefly cover FTS today. There are other options I'm not going to cover today, including Analytics and Views/MapReduce.
  25. Just to reiterate on key/value If you know the key already, it's really simple and extremely fast to access that piece of data.
  26. Since key/value is so fast and easy, it would benefit us to use it as much as possible. Here are some tips to maximize your key/value usage.
  27. Starting from "matt" you can walk through this chain of documents With ONLY key/value access.
  28. Another thing to consider is whether or not your nosql database has a "subdocument" API If they do, then you have even more flexibility Not all of them do, and some databases may call this something different But the idea is this: if you only need "address", without a subdocument API You'd have to pull the entire document over the wire when doing reads/writes With a subdocument API, you can specify just a specific part to read/write This can be very helpful is you have large documents, or if you are doing a lot of reads and writes that only need a small portion of the data Firestore: there is a different concept of "subcollection". You have collections that contain documents, and then documents themselves can contain collections. Those documents have their own keys. If you DELETE a parent, then this subcollection will stick around. But when it comes to modeling considerations, with Firestore you can access these subcollections and look up individual keys in them. This is where we diverge a little bit from JSON modeling. But the idea is that you don't have to get the ENTIRE document, you can target individual pieces.
  29. In firestore, you have a collection of documents (rooms, in this case). An individual document (roomA) can itself have a collection (messages) And so on. You can do this in plain JSON in just about every other document database You can nest as deep as you want in JSON. Much like the subdocument I mentioned before, you can do a similar thing here. If you want to make a change To one message, you can address it directly with "rooms/roomA/messages/message1" EXCEPT that Firestore treats the documents in the subcollection kinda like separate documents. Gotcha: "Deleting a document does not delete its subcollections" So you should be careful when using this, you could end up with something kinda like orphan documents
  30. N1QL is powerful in it's flexibility, declarative nature, familiar to developers, JOINs, etc. But note, once we step out of key/value access, we need to involve other processes: We gotta parse the query, most likely use an index service, and in the end we'll get a bunch of keys to lookup the data. There is overhead involved, but sometimes this is a necessity.
  31. When you're outside of key/value access, you must understand the query plan. This is true for ANY database, relational or NoSQL. As an example, here's a Couchbase SQL query. I execute this and it ran in 1.2 seconds. It's using AN index on the TYPE field, but notice that name field on there. I can bring up a visualization of the query plan to see which parts are taking up the most time. In Couchbase, there is an index advisor. It suggested an index for me. After creating the index, the same query went to 146ms (about 8 times faster) Covering index could make this even faster (note the *)
  32. This is a search that revolves around text. Things like stemming, language awareness, facets, ranking, etc. This is, again, a very simple example. I'm searching for the keyword "submarine". In my application, this query may be limited to a certain search radius, or it may be limited to a certain facet, etc. But the end results are language aware, ranked matches. You would use this INSTEAD OF a sql 'like' for instance
  33. Build a proof of concept, which will help you see if NoSQL is the right fit And it will also help you understand the access patterns better Just to sum up Also note that in Couchbase, you can combine FTS and N1QL
  34. Migrating or syncing Because often Couchbase and relational are complimentary Couchbase can be used for engagement, Relational for transactions
  35. Are you going to take the time to clean up the data? Do you need to? Do you need to enrich or restructure the data to take advantage of Json? Also I'm using the term MIGRATION, but it might NOT be the case that you are abandoning a database in favor of another. You might want to sync data, you might want to make a copy of data into a more suitable database for your use case, etc.
  36. informatica, Talend, DART, ODBC, Kafka, Debezium, Spark, Kafka, Nifi
  37. Node / Python / bash / Powershell / curl / REST etc GoldenGate / DTS / SSIS CLI: cbimport, mongoimport, etc
  38. Again, proof of concept KISS: export to CSV and use N1QL to do any ETL that’s required Export to CSV Import as documents into a 'staging' bucket Use N1QL to transform Insert into new bucket
  39. Align with your data model The modeling step is vital. If you don't model your data to take advantages of JSON, then you are not going to see the advantages of using JSON. Don't treat a JSON database as if it were a relational database. Basically keep it as simple as you can and plan for failure. Developers often think of the migration process as “One and Done”, but the reality is that data migration is often an ongoing headache that DevOps needs to monitor and manage in a production environment. Make everyone’s life easier by thinking about the long game as much as possible.
  40. Plan for failure Bad source data Hardware failure Resource limitations (proof of concept vs MVP) Developers often think of the migration process as “One and Done”, but the reality is that data migration is often an ongoing headache that DevOps needs to monitor and manage in a production environment.
  41. Ensure: Interruptible, restartable, logged -> predictable Make everyone’s life easier by thinking about the long game as much as possible.
  42. From NoSQL to relational You can also turn this around and use Kafka in the other direction
  43. From relational to NoSQL: Goldendate is from oracle Cdata for SSIS and Couchbase https://github.com/mahurtado/CouchbaseGoldenGateAdapter https://www.cdata.com/drivers/couchbase
  44. Make it part of your application directly May or may not be reusable This is a lot of work, so make sure you have a good reason
  45. Focus on SOA, microservices, application/use case specific
  46. Modeling, Focus, Success Criteria, Review Architecture consider using a tool like Hackolade to define models rigorously and collaboratively
  47. Use Document type, Versionid Create optimized, understandable keys Weigh nested, referenced or mixed designs Add indexes: Simple, Compound, Functional, Partial, Array, Covering, Memory Optimized N1QL, Key-value, Views
  48. This is my family My enormous head barely fits in the picture
  49. Couchbase Cloud is currently in limited beta
  50. Memory first: integrated cache, you don't need to put redis on top of couchbase Master-master: easier scaling, better scaling Auto-sharding: we call vBuckets, you don't have to come up with a sharding scheme, it's done by crc32 N1QL: SQL, mongo has a more limited query language and it's not SQL-like Full Text Search: Using the bleve search engine, language aware FTS capabilities built in Mobile & sync: Mongo has nothing like the offline-first and sync capabilities couchbase offers Mongo DOES have a DbaaS cloud provider
  51. Everything I've shown you today is available in Community edition The only N1QL feature I can think of not in Community is INFER and Query Plan Visualizer The Enterprise features you probably don't need unless you are Enterprise developer.