SlideShare a Scribd company logo
JSON Data
Modeling
Matthew D. Groves, @mgroves
2
AGENDA
01/ Why NoSQL?
02/ JSON Data Modeling
03/ Accessing Data
04/ Migrating Data
05/ Summary / Q&A
Where am I?
3
• Tulsa Tech Fest
• https://grouplings.com/TulsaTechFest
• https://twitter.com/TulsaTechFest
Who am I?
4
• Matthew D. Groves
• Developer Advocate for Couchbase
• @mgroves on Twitter
• Podcast and blog: https://crosscuttingconcerns.com
• "I am not an expert, but I am an enthusiast." – Alan Stevens
by @natelovett
JSON Data
Modeling
Matthew D. Groves, @mgroves
Major Enterprises Across Industries are Adopting NoSQL
CommunicationsTechnology
Travel & Hospitality Media &
Entertainment
E-Commerce &
DigitalAdvertising
Retail & Apparel
Games & GamingFinance &
Business Services
Why NoSQL?
7
1
NoSQL Landscape
Document
• Couchbase
• MongoDB
• DynamoDB
• CosmosDB
Graph
• OrientDB
• Neo4J
• DEX
• GraphBase
Key-Value
• Couchbase
• Riak
• BerkeleyDB
• Redis Wide Column
• Hbase
• Cassandra
• Hypertable
NoSQL Landscape
• Get by key(s)
• Set by key(s)
• Replace by key(s)
• Delete by key(s)
• Map/Reduce
Document
• Couchbase
• MongoDB
• DynamoDB
• CosmosDB
Why NoSQL? Scalability
Why NoSQL? Flexibility
Why NoSQL? Availability
Why NoSQL? Performance
JSON Data
Modeling
1
4
2
Models for Representing Data
1
5
Data Concern Relational Model JSON Document Model
Rich Structure
Relationships
Value Evolution
Structure Evolution
Properties of Real-World Data
1
6
Modeling Data in a Relational World
1
7
Billing
ConnectionsPurchases
Contacts
Customer
CustomerID Name DOB
CBL2015 Jane Smith 1990-01-30
Table: Customer
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30”
}
Customer DocumentKey: CBL2015
©2017 Couchbase Inc. 19
CustomerID Name DOB
CBL2015 Jane Smith 1990-01-30
Table: Customer {
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Purchases" : [
{
"item" : "laptop",
"amount" : 1499.99,
"date" : "2019-03",
}
]
}
Customer DocumentKey: CBL2015
CustomerID Item Amount Date
CBL2015 laptop 1499.99 2019-03
Table: Purchases
CustomerID Name DOB
CBL2015 Jane Smith 1990-01-30
Table: Customer {
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Purchases" : [
{
"item" : "laptop",
"amount" : 1499.99,
"date" : "2019-03",
},
{
"item" : "phone",
"amount" : 99.99,
"date" : "2018-12"
}
]
}
Customer DocumentKey: CBL2015
CustomerID Item Amount Date
CBL2015 laptop 1499.99 2019-03
CBL2015 phone 99.99 2018-12
Table: Purchases
CustomerID ConnId Relation
CBL2015 XYZ987 Brother
CBL2015 SKR007 Father
Table: Connections {
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Billing" : [
{
"type" : "visa",
"cardnum" : "5827-2842-...",
"expiry" : "2019-03"
}, ...
],
"Connections" : [
{
"ConnId" : "XYZ987",
"Relation" : "Brother"
},
{
"ConnId" : "SKR007",
"Relation" : "Father"
}
}
Customer DocumentKey: CBL201
©2017 Couchbase Inc. 22
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"cardnum" : "5827-2842…",
"expiry" : "2019-03",
"cardType" : "visa",
"Connections" : [
{
"CustId" : "XYZ987",
"Relation" : "Brother"
},
{
"CustId" : "SKR007",
" Relation " : "Father"
}
],
"Purchases" : [
{ "id":12, item: "mac", "amt": 2823.52
}
{ "id":19, item: "ipad2", "amt": 623.52
}
]
}
DocumentKey: CBL2015
Custome
rID
Name DOB Cardnum Expiry CardType
CBL201
5
Jane
Smith
1990-01-
30
5827-
2842…
2019-03 visa
CustomerI
D
ConnId Relation
CBL2015 XYZ987 Brother
CBL2015 SKR007 Father
CustomerI
D
item amt
CBL2015 mac 2823.5
2
CBL2015 ipad2 623.52
CustomerI
D
ConnId Name
CBL2015 XYZ987 Joe
Smith
CBL2015 SKR007 Sam
Smith
Contacts
Customer
ConnectionsPurchases
{
"Name" : "Bob Jones",
"DOB" : "1980-01-29",
"Billing" : [
{
"type" : "visa",
"cardnum" : "5927-2842-2847-3909",
"expiry" : "2020-03"
},
{
"type" : "master",
"cardnum" : "6273-2842-2847-3909",
"expiry" : "2019-11"
}
],
"Connections" : [
{
"CustId" : "XYZ987",
"Relation" : "Brother"
},
{
"CustId" : "PQR823",
"Relation" : "Father"
}
],
"Purchases" : [
{ "id":12, item: "mac", "amt": 2823.52 }
{ "id":19, item: "ipad2", "amt": 623.52 }
]
}
DocumentKey: CBL2016
CustomerID Name DOB
CBL2016 Bob Jones 1980-01-29
Custome
rID
Type Cardnum Expiry
CBL2016 visa 5927… 2020-03
CBL2016 maste
r
6273… 2019-11
CustomerI
D
ConnId Relation
CBL2016 XYZ987 Brother
CBL2016 SKR007 Father
CustomerI
D
item amt
CBL2016 mac 2823.5
2
CBL2016 ipad2 623.52
CustomerI
D
ConnI
d
Name
CBL201
6
XYZ98
7
Joe
Smith
CBL201
6
SKR0
07
Sam
Smith
Contacts
Customer
Billing
ConnectionsPurchases
Models for Representing Data
2
4
Data Concern Relational Model JSON Document Model
Rich Structure
• Multiple flat tables
• Assembly / disassembly
 Documents
 No (or less) assembly required
Relationships
 Represented
 Queries with SQL
 Represented
 Queried…with?
Value Evolution  Data can be updated  Data can be updated
Structure Evolution
 Uniform, rigid, enforced
 Manual disruptive change
 Flexible
 Dynamic change
 Increased app responsibility
Relationship is one-to-one or one-to-many
Store related data as nested objects
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Purchases" : [
{
"item" : "laptop",
"amount" : 1499.99,
"date" : "2019-03",
},
{
"item" : "phone",
"amount" : 99.99,
"date" : "2018-12"
}
]
}
Modeling your data: Strategies / rules of thumb
Relationship is many-to-one or many-to-
many
Store related data as separate documents
{
"Name" : "Jane
Smith",
"DOB" : "1990-01-
30",
"Connections" : [
"XYZ987",
"PQR823",
"PQR828"
]
}
Modeling your data: Strategies / rules of thumb
Modeling tools
2
7
• Hackolade
• Erwin DM NoSQL
• Idera ER/Studio
Accessing Data
2
8
3
Data reads are mostly parent fields
Store children as separate documents
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Connections" : [
"XYZ987",
"PQR823",
"PQR828"
]
}
Modeling your data: Strategies / rules of thumb
Data reads are mostly parent + child fields
Store children as nested objects
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Purchases" : [
{
"item" : "laptop",
"amount" : 1499.99,
"date" : "2019-03",
},
{
"item" : "phone",
"amount" : 99.99,
"date" : "2018-12"
}
]
}
Modeling your data: Strategies / rules of thumb
Data writes are mostly parent or child (not
both)
Store children as separate documents
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Connections" : [
"XYZ987",
"PQR823",
"PQR828"
]
}
Modeling your data: Strategies / rules of thumb
Data writes are mostly parent and child (both)
Store children as nested objects
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Purchases" : [
{
"item" : "laptop",
"amount" : 1499.99,
"date" : "2019-03",
},
{
"item" : "phone",
"amount" : 99.99,
"date" : "2018-12"
}
]
}
Modeling your data: Strategies / rules of thumb
If … Then …
Relationship is one-to-one or one-to-many Store related data as nested objects
Relationship is many-to-one or many-to-
many
Store related data as separate documents
Data reads are mostly parent fields Store children as separate documents
Data reads are mostly parent + child fields Store children as nested objects
Data writes are mostly parent or child (not
both)
Store children as separate documents
Data writes are mostly parent and child
(both)
Store children as nested objects
Modeling your data: Strategies / rules of thumb
Subdocument access
3
4
{
"username": "mgroves",
"profile": {
"phoneNumber": "123-456-7890",
"address": {
"street": "123 main st",
"city": "Grove City",
"state": "Ohio"
}
}
}
Accessing your data (Couchbase)
Key-Value
(CRUD)
N1QL
(Query)
Views
(Query)
Documents
Indexes
MapReduc
e
Full Text
(Search)
Geospatial
(Search)
Indexes
MapReduc
e
Key/Value
public ShoppingCart GetCartById(Guid id)
{
return _bucket.Get<ShoppingCart>(id.ToString()).Value;
}
public void CreateShoppingCart()
{
_bucket.Insert(new Document<dynamic>
{
Id = Guid.NewGuid().ToString(),
Content = new { . . . }
});
}
Key/Value: Recommendations for keys
•Natural Keys
•Human Readable
•Deterministic
•Semantic
Key/Value: Example keys
• author::matt
• author::matt::blogs
• blog::csharp_7_features
• blog::csharp_7_features::comments
N1QL
Understanding your Query Plan
Map/Reduce
Concept Strategies & Recommendations
Key-Value Operations provide the best
possible performance
• Create an effective key naming strategy
• Create an optimized data model
Incremental MapReduce (Views) are well
suited to aggregation
• Ideal for large data sets
• Data set can be used to create complex
view indexes
N1QL queries provide the most flexibility –
everything else
• Query data regardless of how it is
modeled
• Good indexing is vital
Accessing your data: Strategies and recommendation
Migrating Data
4
3
4
Migration options: Requirements
ETL / data cleanse / data enrichment
Duration vs. Resources
Migration options: Requirements
Migration options: Requirements
Data governance
• Batch vs. Incremental
• Single threaded vs. multi-threaded
Migration options: Pick your strategy
Data migration tools:
Informatica, Looker, Talend, DART, ODBC, CData
BYO-tool
• C# / bash / Powershell / curl / REST etc
• GoldenGate / DTS / SSIS
• Hadoop, Spark, Kafka, Nifi
• CLI: cbimport, mongoimport, etc
Migration options: Pick your tools
Migration options: KISS
• CSV:
• Export to CSV
• Import as documents into a 'staging' bucket
• Use N1QL to transform
• Insert into new bucket
• SQL:
• Transform
• Export
• Insert into document database
Migration options: Recommendations
• Align with your data model
• Plan for failure
• Bad source data
• Hardware failure
• Resource limitations
• Ensure: Interruptible, restartable, logged, predictable
Sync NoSQL and relational? Automatic Replication
Couchbase
Kafka
Queue
Producer Consumer
RDBMSDCP
Stream
How can you sync NoSQL and relational?
RDBMS
Handler
Couchbase
GoldenGate
https://github.com/mahurtado/CouchbaseGoldenGateAdapter
Data Flow with NiFi
5
3
https://blog.couchbase.com/nifi-processing-flow-couchbase-server/
Sync NoSQL and relational? Manual.
Summary
5
5
5
Pick the right
application
Summary
Drive data model
from data access
patterns
Summary
Match the data
access method to
requirements
Summary
Proof of Concept
Summary
https://blog.couchbase.com/proof-of-concept-move-
relational/
https://blog.couchbase.com/json-data-modeling-rdbms-users/
Resources
Couchbase Plug
6
1
• Go to Couchbase.com to download Couchbase
• Enter to win a $100 gift card here:
https://bit.ly/FEST2018 (use code FEST2018)
Where do you find us?
6
2
•blog.couchbase.com
•@mgroves
•@couchbasedev
Frequently Asked Questions
6
3
1. How is Couchbase different than Mongo?
2. Is Couchbase the same thing as CouchDb?
3. How tall are you? Do you play basketball?
4. What is the Couchbase licensing situation?
5. Is Couchbase a Managed Cloud Service (DBaaS)?
Managed Cloud Server (DBaaS)
6
4
< Back
MongoDB vs Couchbase
6
5
• Architecture
• Memory first architecture
• Master-master architecture
• Auto-sharding
• Features
• SQL (N1QL)
• Full Text Search
• Mobile & Sync
< Back
Licensing
6
6
< Back
Couchbase Server Community
• Open source (Apache 2)
• Binary release is one release behind Enterprise (except major versions)
• Free to use in dev/test/qa/prod
• Forum support only
Couchbase Server Enterprise
• Mostly open source (Apache 2)
• Some features not available on Community (XDCR TLS, MDS, Rack Zone,
etc)
• Free to use in dev/test/qa
• Need commercial license for prod
• Paid support provided
CouchDB and Couchbase
6
7
< Back
memcached

More Related Content

JSON Data Modeling - July 2018 - Tulsa Techfest

  • 1. JSON Data Modeling Matthew D. Groves, @mgroves
  • 2. 2 AGENDA 01/ Why NoSQL? 02/ JSON Data Modeling 03/ Accessing Data 04/ Migrating Data 05/ Summary / Q&A
  • 3. Where am I? 3 • Tulsa Tech Fest • https://grouplings.com/TulsaTechFest • https://twitter.com/TulsaTechFest
  • 4. Who am I? 4 • Matthew D. Groves • Developer Advocate for Couchbase • @mgroves on Twitter • Podcast and blog: https://crosscuttingconcerns.com • "I am not an expert, but I am an enthusiast." – Alan Stevens by @natelovett
  • 5. JSON Data Modeling Matthew D. Groves, @mgroves
  • 6. Major Enterprises Across Industries are Adopting NoSQL CommunicationsTechnology Travel & Hospitality Media & Entertainment E-Commerce & DigitalAdvertising Retail & Apparel Games & GamingFinance & Business Services
  • 8. NoSQL Landscape Document • Couchbase • MongoDB • DynamoDB • CosmosDB Graph • OrientDB • Neo4J • DEX • GraphBase Key-Value • Couchbase • Riak • BerkeleyDB • Redis Wide Column • Hbase • Cassandra • Hypertable
  • 9. NoSQL Landscape • Get by key(s) • Set by key(s) • Replace by key(s) • Delete by key(s) • Map/Reduce Document • Couchbase • MongoDB • DynamoDB • CosmosDB
  • 15. Models for Representing Data 1 5 Data Concern Relational Model JSON Document Model Rich Structure Relationships Value Evolution Structure Evolution
  • 17. Modeling Data in a Relational World 1 7 Billing ConnectionsPurchases Contacts Customer
  • 18. CustomerID Name DOB CBL2015 Jane Smith 1990-01-30 Table: Customer { "Name" : "Jane Smith", "DOB" : "1990-01-30” } Customer DocumentKey: CBL2015
  • 19. ©2017 Couchbase Inc. 19 CustomerID Name DOB CBL2015 Jane Smith 1990-01-30 Table: Customer { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Purchases" : [ { "item" : "laptop", "amount" : 1499.99, "date" : "2019-03", } ] } Customer DocumentKey: CBL2015 CustomerID Item Amount Date CBL2015 laptop 1499.99 2019-03 Table: Purchases
  • 20. CustomerID Name DOB CBL2015 Jane Smith 1990-01-30 Table: Customer { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Purchases" : [ { "item" : "laptop", "amount" : 1499.99, "date" : "2019-03", }, { "item" : "phone", "amount" : 99.99, "date" : "2018-12" } ] } Customer DocumentKey: CBL2015 CustomerID Item Amount Date CBL2015 laptop 1499.99 2019-03 CBL2015 phone 99.99 2018-12 Table: Purchases
  • 21. CustomerID ConnId Relation CBL2015 XYZ987 Brother CBL2015 SKR007 Father Table: Connections { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Billing" : [ { "type" : "visa", "cardnum" : "5827-2842-...", "expiry" : "2019-03" }, ... ], "Connections" : [ { "ConnId" : "XYZ987", "Relation" : "Brother" }, { "ConnId" : "SKR007", "Relation" : "Father" } } Customer DocumentKey: CBL201
  • 22. ©2017 Couchbase Inc. 22 { "Name" : "Jane Smith", "DOB" : "1990-01-30", "cardnum" : "5827-2842…", "expiry" : "2019-03", "cardType" : "visa", "Connections" : [ { "CustId" : "XYZ987", "Relation" : "Brother" }, { "CustId" : "SKR007", " Relation " : "Father" } ], "Purchases" : [ { "id":12, item: "mac", "amt": 2823.52 } { "id":19, item: "ipad2", "amt": 623.52 } ] } DocumentKey: CBL2015 Custome rID Name DOB Cardnum Expiry CardType CBL201 5 Jane Smith 1990-01- 30 5827- 2842… 2019-03 visa CustomerI D ConnId Relation CBL2015 XYZ987 Brother CBL2015 SKR007 Father CustomerI D item amt CBL2015 mac 2823.5 2 CBL2015 ipad2 623.52 CustomerI D ConnId Name CBL2015 XYZ987 Joe Smith CBL2015 SKR007 Sam Smith Contacts Customer ConnectionsPurchases
  • 23. { "Name" : "Bob Jones", "DOB" : "1980-01-29", "Billing" : [ { "type" : "visa", "cardnum" : "5927-2842-2847-3909", "expiry" : "2020-03" }, { "type" : "master", "cardnum" : "6273-2842-2847-3909", "expiry" : "2019-11" } ], "Connections" : [ { "CustId" : "XYZ987", "Relation" : "Brother" }, { "CustId" : "PQR823", "Relation" : "Father" } ], "Purchases" : [ { "id":12, item: "mac", "amt": 2823.52 } { "id":19, item: "ipad2", "amt": 623.52 } ] } DocumentKey: CBL2016 CustomerID Name DOB CBL2016 Bob Jones 1980-01-29 Custome rID Type Cardnum Expiry CBL2016 visa 5927… 2020-03 CBL2016 maste r 6273… 2019-11 CustomerI D ConnId Relation CBL2016 XYZ987 Brother CBL2016 SKR007 Father CustomerI D item amt CBL2016 mac 2823.5 2 CBL2016 ipad2 623.52 CustomerI D ConnI d Name CBL201 6 XYZ98 7 Joe Smith CBL201 6 SKR0 07 Sam Smith Contacts Customer Billing ConnectionsPurchases
  • 24. Models for Representing Data 2 4 Data Concern Relational Model JSON Document Model Rich Structure • Multiple flat tables • Assembly / disassembly  Documents  No (or less) assembly required Relationships  Represented  Queries with SQL  Represented  Queried…with? Value Evolution  Data can be updated  Data can be updated Structure Evolution  Uniform, rigid, enforced  Manual disruptive change  Flexible  Dynamic change  Increased app responsibility
  • 25. Relationship is one-to-one or one-to-many Store related data as nested objects { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Purchases" : [ { "item" : "laptop", "amount" : 1499.99, "date" : "2019-03", }, { "item" : "phone", "amount" : 99.99, "date" : "2018-12" } ] } Modeling your data: Strategies / rules of thumb
  • 26. Relationship is many-to-one or many-to- many Store related data as separate documents { "Name" : "Jane Smith", "DOB" : "1990-01- 30", "Connections" : [ "XYZ987", "PQR823", "PQR828" ] } Modeling your data: Strategies / rules of thumb
  • 27. Modeling tools 2 7 • Hackolade • Erwin DM NoSQL • Idera ER/Studio
  • 29. Data reads are mostly parent fields Store children as separate documents { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Connections" : [ "XYZ987", "PQR823", "PQR828" ] } Modeling your data: Strategies / rules of thumb
  • 30. Data reads are mostly parent + child fields Store children as nested objects { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Purchases" : [ { "item" : "laptop", "amount" : 1499.99, "date" : "2019-03", }, { "item" : "phone", "amount" : 99.99, "date" : "2018-12" } ] } Modeling your data: Strategies / rules of thumb
  • 31. Data writes are mostly parent or child (not both) Store children as separate documents { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Connections" : [ "XYZ987", "PQR823", "PQR828" ] } Modeling your data: Strategies / rules of thumb
  • 32. Data writes are mostly parent and child (both) Store children as nested objects { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Purchases" : [ { "item" : "laptop", "amount" : 1499.99, "date" : "2019-03", }, { "item" : "phone", "amount" : 99.99, "date" : "2018-12" } ] } Modeling your data: Strategies / rules of thumb
  • 33. If … Then … Relationship is one-to-one or one-to-many Store related data as nested objects Relationship is many-to-one or many-to- many Store related data as separate documents Data reads are mostly parent fields Store children as separate documents Data reads are mostly parent + child fields Store children as nested objects Data writes are mostly parent or child (not both) Store children as separate documents Data writes are mostly parent and child (both) Store children as nested objects Modeling your data: Strategies / rules of thumb
  • 34. Subdocument access 3 4 { "username": "mgroves", "profile": { "phoneNumber": "123-456-7890", "address": { "street": "123 main st", "city": "Grove City", "state": "Ohio" } } }
  • 35. Accessing your data (Couchbase) Key-Value (CRUD) N1QL (Query) Views (Query) Documents Indexes MapReduc e Full Text (Search) Geospatial (Search) Indexes MapReduc e
  • 36. Key/Value public ShoppingCart GetCartById(Guid id) { return _bucket.Get<ShoppingCart>(id.ToString()).Value; } public void CreateShoppingCart() { _bucket.Insert(new Document<dynamic> { Id = Guid.NewGuid().ToString(), Content = new { . . . } }); }
  • 37. Key/Value: Recommendations for keys •Natural Keys •Human Readable •Deterministic •Semantic
  • 38. Key/Value: Example keys • author::matt • author::matt::blogs • blog::csharp_7_features • blog::csharp_7_features::comments
  • 39. N1QL
  • 42. Concept Strategies & Recommendations Key-Value Operations provide the best possible performance • Create an effective key naming strategy • Create an optimized data model Incremental MapReduce (Views) are well suited to aggregation • Ideal for large data sets • Data set can be used to create complex view indexes N1QL queries provide the most flexibility – everything else • Query data regardless of how it is modeled • Good indexing is vital Accessing your data: Strategies and recommendation
  • 44. Migration options: Requirements ETL / data cleanse / data enrichment
  • 45. Duration vs. Resources Migration options: Requirements
  • 47. • Batch vs. Incremental • Single threaded vs. multi-threaded Migration options: Pick your strategy
  • 48. Data migration tools: Informatica, Looker, Talend, DART, ODBC, CData BYO-tool • C# / bash / Powershell / curl / REST etc • GoldenGate / DTS / SSIS • Hadoop, Spark, Kafka, Nifi • CLI: cbimport, mongoimport, etc Migration options: Pick your tools
  • 49. Migration options: KISS • CSV: • Export to CSV • Import as documents into a 'staging' bucket • Use N1QL to transform • Insert into new bucket • SQL: • Transform • Export • Insert into document database
  • 50. Migration options: Recommendations • Align with your data model • Plan for failure • Bad source data • Hardware failure • Resource limitations • Ensure: Interruptible, restartable, logged, predictable
  • 51. Sync NoSQL and relational? Automatic Replication Couchbase Kafka Queue Producer Consumer RDBMSDCP Stream
  • 52. How can you sync NoSQL and relational? RDBMS Handler Couchbase GoldenGate https://github.com/mahurtado/CouchbaseGoldenGateAdapter
  • 53. Data Flow with NiFi 5 3 https://blog.couchbase.com/nifi-processing-flow-couchbase-server/
  • 54. Sync NoSQL and relational? Manual.
  • 57. Drive data model from data access patterns Summary
  • 58. Match the data access method to requirements Summary
  • 61. Couchbase Plug 6 1 • Go to Couchbase.com to download Couchbase • Enter to win a $100 gift card here: https://bit.ly/FEST2018 (use code FEST2018)
  • 62. Where do you find us? 6 2 •blog.couchbase.com •@mgroves •@couchbasedev
  • 63. Frequently Asked Questions 6 3 1. How is Couchbase different than Mongo? 2. Is Couchbase the same thing as CouchDb? 3. How tall are you? Do you play basketball? 4. What is the Couchbase licensing situation? 5. Is Couchbase a Managed Cloud Service (DBaaS)?
  • 64. Managed Cloud Server (DBaaS) 6 4 < Back
  • 65. MongoDB vs Couchbase 6 5 • Architecture • Memory first architecture • Master-master architecture • Auto-sharding • Features • SQL (N1QL) • Full Text Search • Mobile & Sync < Back
  • 66. Licensing 6 6 < Back Couchbase Server Community • Open source (Apache 2) • Binary release is one release behind Enterprise (except major versions) • Free to use in dev/test/qa/prod • Forum support only Couchbase Server Enterprise • Mostly open source (Apache 2) • Some features not available on Community (XDCR TLS, MDS, Rack Zone, etc) • Free to use in dev/test/qa • Need commercial license for prod • Paid support provided
  • 67. CouchDB and Couchbase 6 7 < Back memcached

Editor's Notes

  1. Spend just a little time on why people are using NoSQL Talk about how data is modeled differently in JSON Let’s talk about why SQL is good and why SQL for JSON is needed Let’s talk about the exciting stuff happening in the database ecosystem Including but not limited to the stuff Couchbase is doing If we have time, we’ll look at how a .NET developer (or Java developer, etc) would interact with SQL for JSON
  2. What’s also interesting is that we’re seeing the use of NoSQL expand inside many of these companies. Orbitz, the online travel company, is a great example – they started using Couchbase to store their hotel rate data, and now they use Couchbase in many other ways. Same with ebay, they recently presented at the Couchbase conference with a chart tracking how many instances of various nosql databases are in use, and we see growth in Cassandra, mongo, and couchbase has actually surpassed them within ebay
  3. SQL (relational) databases are great. They give you LOT OF functionality. Great set of abstractions (tables, columns, data types, constraints, triggers, SQL, ACID TRANSACTIONS, stored procedures and more) at a highly reasonable cost. Change is inevitable One thing RDBMS does not handle well is CHANGE. Change of schema (both logical and physical), change of hardware, change of capacity. NoSQL databases ESPECIALLY ONES DESIGNED TO BE DISTRIBUTED tend to help solve problems with: agility, scalability, performance, and availability
  4. Let’s talk about what NoSQL is, first. NoSQL generally refers to databases which lack SQL or don’t use a relational model Once the SQL language, transaction became optional, flurry of databases were created using distinct approaches for common use-cases. KEY-Value simply provided quick access to data for a given KEY. Wide Column databases can store large number of arbitrary columns in each row Graph databases store data and relationships as first class concepts Document databases aggregate data into a hierarchical structure. With JSON is a means to the end. Document databases provide flexible schema,built-in data types, rich structure, implicit relationships using JSON.
  5. When we look at document databases, they originally came with a Minimal set of APIs and features But as they continue to mature, we’re seeing more features being added And generally I’m seeing a convergent trend between SQL and NoSQL But anyway, this set of minimal features, lacking a SQL language and tables gives us the buzzword “nosql”
  6. Elastic scaling Size your cluster for today Scale out on demand Cost effective scaling Commodity hardware On premise or on cloud Scale OUT instead of Scale UP [example: changing the channel to a soccer game or Game of Thrones, everyone makes the same API request in the same 5 minutes] [example: TV show lets watchers vote during some period of the week, so you can scale up during that period of time] [example: black Friday]
  7. Schema flexibility Easier management of change in the business requirements Easier management of change in the structure of the data Sometimes you're pulling together data, integrating from different sources (e.g. ELT) and that flexibility helps Document database means that you have no rigid schema. You can do whatever the heck you want. That being said, you SHOULDN’T. You should still have discipline about your data.
  8. If one machine goes down, customers can still use the other. Or if you need to perform maintenance, upgrade, etc, you don't have to take the whole system down This is related to scaling Built-in replication and fail-over No application downtime when hardware fails Online maintenance & upgrade No application downtime
  9. NoSQL systems are optimized for specific access patterns Low response time for web & mobile user experience Millisecond latency Consistently high throughput to handle growth [perf measures can be subjective – talk about architecture, integrated cache, maybe mention MDS too]
  10. Let’s talk about data modeling a bit, because storing data in JSON Is different that storing in tables.
  11. So I want to compare the approaches over 4 key areas. I’m going to fill in this table, traditional SQL on the left and JSON on the right
  12. Let’s look at modeling Customer data. This is an example of what a customer might look like There is a rich structure: attributes, potentially sub-attributes (first name and last name) Relationships: to other data (other customers, to products perhaps) Value evolution: Maybe we’d start with one purchase, add more as Helen makes more purchases Structure evolution: Maybe we start will billing information being properties of Helen, then evolve later to be multiple billing options
  13. Let’s look at modeling Customer data. This is an example of what a customer might look like There is a rich structure: attributes, potentially sub-attributes (first name and last name) Relationships: to other data (other customers, to products perhaps) Value evolution: Maybe we’d start with one purchase, add more as Helen makes more purchases Structure evolution: Maybe we start will billing information being properties of Helen, then evolve later to be multiple billing options
  14. Let’s see how to represent customer data in JSON. The primary (CustomerID) becomes the DocumentKey Column name-Column value becomes KEY-VALUE pair.
  15. We aren’t normal form anymore Rich Structure & Relationships Billing information is stored as a sub-document There could be more than a single credit card. So, use an array.
  16. Value evolution Simply add additional array element or update a value.
  17. Structure evolution Simply add new key-value pairs No downtime to add new KV pairs Applications can validate data Structure evolution over time. Relations via Reference
  18. So, finally, you have a JSON document that represents a CUSTOMER. In a single JSON document, relationship between the data is implicit by use of sub-structures and arrays and arrays of sub-structures.
  19. So, finally, you have a JSON document that represents a CUSTOMER. In a single JSON document, relationship between the data is implicit by use of sub-structures and arrays and arrays of sub-structures.
  20. So I want to compare the approaches over 4 key areas. I’m going to fill in this table, traditional SQL on the left and JSON on the right
  21. Hackolade supports couchbase, mongo, elastic, Cassandra, dynamo Erwin supports mongodb, couchbase soon? Idera supports mongodb, couchbase soon?
  22. The way you plan to get data out can also affect the way you model your data
  23. What types of relationships are being modeled? How are the relationships accessed?
  24. Another thing to consider is whether or not your nosql database has a "subdocument" API Not all of them do, and some databases may call this something different But the idea is this: if you only need "address", without a subdocument API You'd have to pull the entire document over the wire when doing reads/writes With a subdocument API, you can specify just a specific part to read/write This can be very helpful is you have large documents, or if you are doing a lot of reads and writes that only need a small portion of the data
  25. I've mostly been talking about key/value access But document databases have a wide range of ways to access data other than key/value Other databases like mongo have their own javascript query API, but I find N1QL to be very compelling because I'm used to writing SQL to interact with data In Couchbase, we have N1QL, which is ANSI SQL for JSON Most document databases have MapReduce capabilities, including Couchbase. We're kinda leaning away from M/R these days in favor of N1QL, but it can still be useful in some cases There are other ways to access data like FTS and Geospatial which I'm not going to cover today
  26. Notice I’m using Guid That may not be a good idea This is C#, and you already saw how to do this with Go Also supported: Java, Node, Python, PHP, C, and many more
  27. Starting from "matt" you can walk through this chain of documents Without having to do a query You can access them with key/value operations only if you'd like
  28. N1QL is powerful in it's flexibility, declarative nature, familiar to developers, JOINs, etc. Indexing is very important, as it's not as performant as key/value or map/reduce (Maybe talk about indexing on a SQL table vs indexing on a whole bucket)
  29. Couchbase 5.0 has introduced some tools for analyzing query performance So you can see what indexes are being used, where the biggest costs are in the query And so on. There are a lot of different types of indexes for N1QL
  30. This is kinda like a materialized view It's powerful in that it can be run in parallel, can use JavaScript to do filtering/mapping, great for aggregation. It's limited in that it can't do anything like a JOIN, can't get input from other views, can only order lexigraphically
  31. Migrating or syncing Because often Couchbase and relational are complimentary Couchbase can be used for engagement, Relational for transactions
  32. Are you going to take the time to clean up the data? Do you need to? Do you need to enrich or restructure the data to take advantage of Json?
  33. Duration v resources: how long is it going to take? What tools and resources are available to you? What’s your biggest constraint – time or resources? Do you need to get the migration done in 1 hr (and have it use as many parallel resources as needed) or do you need to minimize/manage the resource impact on the existing system and it doesn’t matter how long it takes?
  34. You must obey the claw Data governance: what are the rules for moving data, auditing, etc? Do you need to keep track of where the data came from and who is allowed to access it? Many newer systems need to track where sensitive data originated. 
  35. A whole bunch at a time, or one at a time Single threaded – easier Multi-threaded – faster, complicated is the migration a one-time event or does it need to happen incrementally (every day or over a 2-3 month period where both the old system and new system are both operating in parallel)? Do you plan to do the data migration as a single thread (read all the data, write all of the data) or using a multi-threaded or multi-process approach where each thread or process reads some percentage of the data.
  36. If you're writing your own, Entity Framework can be helpful, because it can do the mapping of aggregate root C# objects for you, which you can then write to a document database So if you already have EF mappings created, you're part way there.
  37. KISS: Either export to CSV and use N1QL to do any ETL that’s required (assuming that it’s Simple) or use SQL to do simple ETL on export and then just import into CB.
  38. Basically keep it as simple as you can and plan for failure. Developers often think of the migration process as “One and Done”, but the reality is that data migration is often an ongoing headache that DevOps needs to monitor and manage in a production environment. Make everyone’s life easier by thinking about the long game as much as possible.
  39. From NoSQL to relational
  40. From relational to NoSQL: Goldendate is from oracle Cdata for SSIS and Couchbase https://github.com/mahurtado/CouchbaseGoldenGateAdapter https://www.cdata.com/drivers/couchbase
  41. A tool I recently discovered that helps you move, process, transform data all around your enterprise Called Nifi Can connect to Couchbase, SQL Server, just about any source or destination I wrote a blog post showing how to move data from SQL Server to Couchbase using NiFi I did this as a proof of concept for the Cincinnati Reds I'm far from a Nifi expert, but I really like what I've seen It allows you to make a flow that's repeatable, auditable, debuggable, can be easily stopped, started, monitored, notifications, error handling, and more And it's visual Great for migration or syncing or both https://blog.couchbase.com/nifi-processing-flow-couchbase-server/
  42. Make it part of your application directly Maybe use Akka May or may not be reusable This is a lot of work, so make sure you have a good reason
  43. Focus on SOA, application/use case specific
  44. Use Document type, Versionid Create optimized, understandable keys Weigh nested, referenced or mixed designs Add indexes: Simple, Compound, Functional, Partial, Array, Covering, Memory Optimized
  45. N1QL, Key-value, Views
  46. Focus, Success Criteria, Review Architecture consider using a tool like Hackolade to define models rigorously and collaboratively
  47. All I ask is that you give Couchbase a chance Free download You can also take it for a free test drive on the major cloud providers Also, this is something new for me this year, please go to this URL to enter to win a $100 gift card. It is literally a 1 question survey and it helps me out a lot.
  48. This is my family My enormous head barely fits in the picture
  49. Not yet. We've been talking about it at least as long as I've been with Couchbase. It's partly a technical problem, may need additional features for multi-tenant. It's partly (mostly) a business problem. Would this be worth it? Couchbase IS in the Azure and AWS marketplaces, and there are some wizards to make config easy, but it runs on your VMs.
  50. Memory first: integrated cache, you don't need to put redis on top of couchbase Master-master: easier scaling, better scaling Auto-sharding: we call vBuckets, you don't have to come up with a sharding scheme, it's done by crc32 N1QL: SQL, mongo has a more limited query language and it's not SQL-like Full Text Search: Using the bleve search engine, language aware FTS capabilities built in Mobile & sync: Mongo has nothing like the offline-first and sync capabilities couchbase offers Mongo DOES have a DbaaS cloud provider
  51. Everything I've shown you today is available in Community edition The only N1QL feature I can think of not in Community is INFER and Query Plan Visualizer The Enterprise features you probably don't need unless you are Enterprise developer.