SlideShare a Scribd company logo
Don't Drop ACID
Transactions in Distributed NoSQL
Matthew D. Groves | Developer Advocate
2
Photo caption here
This layout has a
WHITE logo for use
on a darker photo.
3
Photo caption here
This layout has a
WHITE logo for use
on a darker photo.
4
Photo caption here
This layout has a
WHITE logo for use
on a darker photo.
5
Photo caption here
This layout has a
WHITE logo for use
on a darker photo.
6
• Matthew Groves
• (Technical) Product Marketing Manager
at Couchbase
• Microsoft MVP
• Pluralsight Author
• Father of 2, Husband
Who am I? Where am I?
• THAT Conference
• https://that.us
01/ Transactions in Relational
AGENDA
02/ Why NoSQL?
03/ What is ACID? And why is it hard?
04/ Demo
05/ Summary / Questions / Resources
Transactions
in Relational
1
9
Third Normal Form
ID DateCreated Item1 Item2 Item3
100 2020-05-27 smartphone charger
cable
case
101 2020-04-24 case
102 2020-05-25 charger
cable
case
Table: ShoppingCart
10
Third Normal Form
ID DateCreated
100 2020-05-27
101 2020-04-24
102 2020-05-25
CartID Name
100 smartphone
100 charger cable
100 case
101 case
102 charger cable
102 case
Table: ShoppingCart Table: ShoppingCartItems
11
Why Transactions?
ID DateCreated
100 2020-06-09
CartID Name
100 case
100 charger cable
Save a new shopping cart:
1. Insert one row into ShoppingCart
2. Insert one row into ShoppingCartItems
3. Insert another row into ShoppingCartItems
4. Done.
Table: ShoppingCart
Table: ShoppingCartItems
12
Why Transactions?
ID DateCreated
100 2020-06-09
CartID Name
100 case
100 charger cable
Save a new shopping cart:
1. Insert one row into ShoppingCart
2. Insert one row into ShoppingCartItems
Table: ShoppingCart
Table: ShoppingCartItems
13
Photo caption here
This layout has a
WHITE logo for use
on a darker photo.
14
Why Transactions?
ID DateCreated
100 2020-06-09
CartID Name
100 case
100 charger cable
Save a new shopping cart:
1. Insert one row into ShoppingCart
2. Insert one row into ShoppingCartItems
3. Crash!
4. Rollback! (phew)
Table: ShoppingCart
Table: ShoppingCartItems
15
First Normal Form
ID DateCreated Item1 Item2 Item3
100 2020-05-27 smartphone charger
cable
case
101 2020-04-24 case
102 2020-05-25 charger
cable
case
Table: ShoppingCart
16
JSON Document Database
{
"key" : "100",
"dateCreated" : "2020-05-27",
}
{
"item" : "smartphone",
"cartKey" : "100"
}
{
"item" : "charger cable",
"cartKey" : "100"
}
{
"item" : "case",
"cartKey" : "100"
}
Document: ShoppingCart1
Document: ShoppingCartItem1
Document: ShoppingCartItem2
Document: ShoppingCartItem3
17
JSON Document Database
key: 100
{
"dateCreated" : "2020-05-27",
"items" : [
"smartphone",
"charger cable",
"case"
]
}
18
Photo caption here
This layout has a
WHITE logo for use
on a darker photo.
• Domain-Driven Design
• By Eric Evans
• https://domainlanguage.com/ddd
19
Photo caption here
This layout has a
WHITE logo for use
on a darker photo.
https://bit.ly/fowlerAgg
Why NoSQL?
2
21
Scaling
Performance
High Availability
Flexibility
• Fewer operations for complex data
• Memory-first or memory-only
• Distributed systems can handle concurrency
• Designed to be distributed
• Easy clustering
• Sharding is built-in, automatic
• Fault tolerance
• Distributed systems can withstand damage
• Maintenance / upgrades / planned outages don't
have to be "outages"
• Data is isolated, accepting of many data models
• JSON
• "implied" schema
• Polyglot Persistence
NoSQL: The Big Four
Why is ACID hard?
3
23
• A – Atomicity
• C – Consistency
• I – Isolation
• D - Durabilty
ACID
24
• A group of operations either all succeed or all fail
A is for Atomicity
25
• Data will never be in an invalid state
• "Dirty reads", "dirty writes", "phantom reads", etc
• What is "eventual consistency"?
C is for Consistency
26
C is for Consistency
http://jepsen.io/consistency
27
• Ensure that an operation is independent of other concurrent
operations
• Optimistic/pessimistic locking
• Timeouts
I is for Isolation
28
• Data is safely stored in case of a system failure
• What is "durable enough"?
• Disk?
• Memory?
• Data center?
• Planet?
D is for Durability
Challenges of ACID
transactions in a
distributed database
30
Challenge:
• What happens if one or more of the
machines in the cluster crashes?
• Uncommitted transactions leave behind
artifacts?
• Identifying edge cases
Solutions:
• Consensus requirements
• Cooperative model / Paxos
• Mitigation
"Split Brain" (aka network problems)
31
Challenge:
• Performance: we don't want to just
reinvent a relational database
• How does an ACID transaction affect
performance, high availability?
Solutions:
• Only apply ACID transactions when
necessary.
• Use Data modeling to solve when
possible
Latency
32
Challenge:
• Testing
• How do we verify all those edge cases?
Solutions:
• "Solve" with Jepsen guidelines
• Jepsen testing
• Jepsen Disputes MongoDB's Data
Consistency Claims (InfoQ) -
https://bit.ly/jepsenMongo
Correctness
😀 😐 😞
👍 👌 👎
33
Server-side:
• Pros
• Light SDK work
• Cons
• Global co-ordinator
• Global lock manager
• Global scheduler
Client-side:
• Pros
• None of those global things
• Quick iteration
• Nothing new to configure on the server
• Cons
• Major SDK work
• All SDKs must use the same algorithm
Client-side vs Server-side
Demo
4
35
New to Couchbase 7 (beta)
BEGIN WORK;
UPDATE x1 SET a = a + 1 WHERE b < 10;
UPDATE x1 SET a = a + 15 WHERE b < 10;
SELECT a, b, c FROM x1 WHERE b < 20;
COMMIT WORK;
36
ACID
Transactions
Use only when
necessary
Remember the
Overhead
Solve with data
modelling when
possible
Don't be afraid to
use a transaction
when you need to
Give you the ability to treat multiple
operations as a single all-or-nothing
operation
What are the tradeoffs?
37
Photo caption here
This layout has a
WHITE logo for use
on a darker photo.
Cutting Edge
38
• Introduced (limited) transactions in 2018
• Announced true distributed transactions in 2019
• Read Committed
• Server-side implementation
• https://dl.acm.org/doi/pdf/10.1145/3299869.3314049
Other NoSQL ACID Transactions efforts
39
• Previously was "stored procedure only"
• "TransactionBatch" introduced in 2020
• Limited to one partition key
• Limited to 2mb, 5 seconds, 100 ops
• https://is.gd/r71mas
Other NoSQL ACID Transactions efforts
40
• Inspired by Google Spanner
• Distributed relational database
• Server-side implementation
• Clock syncing
• https://www.youtube.com/watch?v=OJySfiMKXLs (13:42)
Other NoSQL NewSQL ACID Transactions efforts
Conclusion
5
Data Modeling
vs ACID
Transactions
NoSQL and ACID
are not mutually
exclusive!
Why NoSQL?
Why NoSQL?
Resources/Slides:
https://resources.couchbase.com/dont-drop-acid
47
• Matthew Groves
• @mgroves on Twitter
• me@mgroves.com
• https://github.com/mgroves/dont-drop-acid
Contact me!
THANK YOU

More Related Content

Don't Drop ACID (July 2021)

  • 1. Don't Drop ACID Transactions in Distributed NoSQL Matthew D. Groves | Developer Advocate
  • 2. 2 Photo caption here This layout has a WHITE logo for use on a darker photo.
  • 3. 3 Photo caption here This layout has a WHITE logo for use on a darker photo.
  • 4. 4 Photo caption here This layout has a WHITE logo for use on a darker photo.
  • 5. 5 Photo caption here This layout has a WHITE logo for use on a darker photo.
  • 6. 6 • Matthew Groves • (Technical) Product Marketing Manager at Couchbase • Microsoft MVP • Pluralsight Author • Father of 2, Husband Who am I? Where am I? • THAT Conference • https://that.us
  • 7. 01/ Transactions in Relational AGENDA 02/ Why NoSQL? 03/ What is ACID? And why is it hard? 04/ Demo 05/ Summary / Questions / Resources
  • 9. 9 Third Normal Form ID DateCreated Item1 Item2 Item3 100 2020-05-27 smartphone charger cable case 101 2020-04-24 case 102 2020-05-25 charger cable case Table: ShoppingCart
  • 10. 10 Third Normal Form ID DateCreated 100 2020-05-27 101 2020-04-24 102 2020-05-25 CartID Name 100 smartphone 100 charger cable 100 case 101 case 102 charger cable 102 case Table: ShoppingCart Table: ShoppingCartItems
  • 11. 11 Why Transactions? ID DateCreated 100 2020-06-09 CartID Name 100 case 100 charger cable Save a new shopping cart: 1. Insert one row into ShoppingCart 2. Insert one row into ShoppingCartItems 3. Insert another row into ShoppingCartItems 4. Done. Table: ShoppingCart Table: ShoppingCartItems
  • 12. 12 Why Transactions? ID DateCreated 100 2020-06-09 CartID Name 100 case 100 charger cable Save a new shopping cart: 1. Insert one row into ShoppingCart 2. Insert one row into ShoppingCartItems Table: ShoppingCart Table: ShoppingCartItems
  • 13. 13 Photo caption here This layout has a WHITE logo for use on a darker photo.
  • 14. 14 Why Transactions? ID DateCreated 100 2020-06-09 CartID Name 100 case 100 charger cable Save a new shopping cart: 1. Insert one row into ShoppingCart 2. Insert one row into ShoppingCartItems 3. Crash! 4. Rollback! (phew) Table: ShoppingCart Table: ShoppingCartItems
  • 15. 15 First Normal Form ID DateCreated Item1 Item2 Item3 100 2020-05-27 smartphone charger cable case 101 2020-04-24 case 102 2020-05-25 charger cable case Table: ShoppingCart
  • 16. 16 JSON Document Database { "key" : "100", "dateCreated" : "2020-05-27", } { "item" : "smartphone", "cartKey" : "100" } { "item" : "charger cable", "cartKey" : "100" } { "item" : "case", "cartKey" : "100" } Document: ShoppingCart1 Document: ShoppingCartItem1 Document: ShoppingCartItem2 Document: ShoppingCartItem3
  • 17. 17 JSON Document Database key: 100 { "dateCreated" : "2020-05-27", "items" : [ "smartphone", "charger cable", "case" ] }
  • 18. 18 Photo caption here This layout has a WHITE logo for use on a darker photo. • Domain-Driven Design • By Eric Evans • https://domainlanguage.com/ddd
  • 19. 19 Photo caption here This layout has a WHITE logo for use on a darker photo. https://bit.ly/fowlerAgg
  • 21. 21 Scaling Performance High Availability Flexibility • Fewer operations for complex data • Memory-first or memory-only • Distributed systems can handle concurrency • Designed to be distributed • Easy clustering • Sharding is built-in, automatic • Fault tolerance • Distributed systems can withstand damage • Maintenance / upgrades / planned outages don't have to be "outages" • Data is isolated, accepting of many data models • JSON • "implied" schema • Polyglot Persistence NoSQL: The Big Four
  • 22. Why is ACID hard? 3
  • 23. 23 • A – Atomicity • C – Consistency • I – Isolation • D - Durabilty ACID
  • 24. 24 • A group of operations either all succeed or all fail A is for Atomicity
  • 25. 25 • Data will never be in an invalid state • "Dirty reads", "dirty writes", "phantom reads", etc • What is "eventual consistency"? C is for Consistency
  • 26. 26 C is for Consistency http://jepsen.io/consistency
  • 27. 27 • Ensure that an operation is independent of other concurrent operations • Optimistic/pessimistic locking • Timeouts I is for Isolation
  • 28. 28 • Data is safely stored in case of a system failure • What is "durable enough"? • Disk? • Memory? • Data center? • Planet? D is for Durability
  • 29. Challenges of ACID transactions in a distributed database
  • 30. 30 Challenge: • What happens if one or more of the machines in the cluster crashes? • Uncommitted transactions leave behind artifacts? • Identifying edge cases Solutions: • Consensus requirements • Cooperative model / Paxos • Mitigation "Split Brain" (aka network problems)
  • 31. 31 Challenge: • Performance: we don't want to just reinvent a relational database • How does an ACID transaction affect performance, high availability? Solutions: • Only apply ACID transactions when necessary. • Use Data modeling to solve when possible Latency
  • 32. 32 Challenge: • Testing • How do we verify all those edge cases? Solutions: • "Solve" with Jepsen guidelines • Jepsen testing • Jepsen Disputes MongoDB's Data Consistency Claims (InfoQ) - https://bit.ly/jepsenMongo Correctness 😀 😐 😞 👍 👌 👎
  • 33. 33 Server-side: • Pros • Light SDK work • Cons • Global co-ordinator • Global lock manager • Global scheduler Client-side: • Pros • None of those global things • Quick iteration • Nothing new to configure on the server • Cons • Major SDK work • All SDKs must use the same algorithm Client-side vs Server-side
  • 35. 35 New to Couchbase 7 (beta) BEGIN WORK; UPDATE x1 SET a = a + 1 WHERE b < 10; UPDATE x1 SET a = a + 15 WHERE b < 10; SELECT a, b, c FROM x1 WHERE b < 20; COMMIT WORK;
  • 36. 36 ACID Transactions Use only when necessary Remember the Overhead Solve with data modelling when possible Don't be afraid to use a transaction when you need to Give you the ability to treat multiple operations as a single all-or-nothing operation What are the tradeoffs?
  • 37. 37 Photo caption here This layout has a WHITE logo for use on a darker photo. Cutting Edge
  • 38. 38 • Introduced (limited) transactions in 2018 • Announced true distributed transactions in 2019 • Read Committed • Server-side implementation • https://dl.acm.org/doi/pdf/10.1145/3299869.3314049 Other NoSQL ACID Transactions efforts
  • 39. 39 • Previously was "stored procedure only" • "TransactionBatch" introduced in 2020 • Limited to one partition key • Limited to 2mb, 5 seconds, 100 ops • https://is.gd/r71mas Other NoSQL ACID Transactions efforts
  • 40. 40 • Inspired by Google Spanner • Distributed relational database • Server-side implementation • Clock syncing • https://www.youtube.com/watch?v=OJySfiMKXLs (13:42) Other NoSQL NewSQL ACID Transactions efforts
  • 43. NoSQL and ACID are not mutually exclusive!
  • 46. 47 • Matthew Groves • @mgroves on Twitter • me@mgroves.com • https://github.com/mgroves/dont-drop-acid Contact me!