SlideShare a Scribd company logo
NoSQL and NewSQL
Tradeoffs Between Scalable
Performance & Consistency
Avishai Ish-Shalom
2
A brief history of databases
2
1970s
Mainframes:
inception of the
relational model
1990s
LAN age:
replication, external
caching, ORMs
SQL
1980s
SQL, relational
databases become
de-facto standard
2000s
WEB 2.0:
NoSQL databases
for scale
2010s
Cloud age:
commoditization
of NoSQL, NewSQL
inception
1996
1995
1978 2008
2015
2014
3
Umbrella term for everything NOT SQL
NoSQL – Not Only SQL
Simplicity Power
K/V stores
“Wide rows”
Document stores
Also, lots of special purpose database: graph, metrics, search…..
4
Traditional RDBMS had severe scaling limitations
+ Cost
+ Scalability
+ Performance
+ Simplicity, predictability
Why NoSQL?
5
Distributed relational database with SQL support
+ OLTP workloads
+ Distributed transactions
+ ACID guarantees
NewSQL
6
+ Compatibility, familiar model
+ Better scalability than traditional RDBMS
+ Rich, powerful query language
Why NewSQL?
Data modeling
7
Relationships are hard
8
SQL is based on “relational algebra” (with some deviations)
+ Tables are relations between columns
+ Tables relate to each other using foreign keys
+ No duplicate tuples, no missing values
Many primitives tables, data joined during read
“Relational” modeling (normalization)
9
Normalization
Foreign key
SELECT * FROM Companies
JOIN Employees ON
Companies.CompanyID =
Employees.CompanyID WHERE
CompanyID = 23123
Company Employees
{Name: ScyllaDB, founded:2013} {Name: Dave, phone: 989723, role: engineer}
{Name: Cloudius systems, founded: 2013} {Name: John, phone: 32132, role: engineer}
CompanyID Name Founded
23123 ScyllaDB 2013
78934233 Cloudius Systems 2013
EmployeeID CompanyID Name Phone Role
753 23123 Dave 989723 engineer
4765 78934233 John 32132 engineer
10
+ Keep the data together
+ Minimal constraints on data structure
+ Possible duplication of values
+ Minimal reshaping of data
+ No joins
Denormalized data models
Name: ScyllaDB, Founded: 2013,
Employees: [
{Name: Dave,
Phone: 989723,
Role: engineer}
]
{ }
GET Companies/23123
11
Data models comparison
Normalized Denormalized
Data shaping On the fly, read-time Pre-shape, write-time
Joins On the fly, read-time Pre-join, write-time
Constraints On the fly, write time None
Consistency No duplication, consistent Duplication, no consistency
Locking Read/write (multi item) locks Minimal/no (item) locks
DB Complexity High Low
solation - transactions do not impact each other
12
A
C
I
D
ACID
tomicity - no intermediate states. Transactions complete or rollback
urability - written data will be persisted
onsistency - data constraints are preserved
13
ACID is about concurrency, mutability
+ Pessimistic model
+ Predates distributed databases
+ Designed for monolithic databases
+ Transactional
+ Database centric programming
ACID No, Mr. Database, you can not
have my business logic. Your
procedural ambitions will bear
no fruit and you'll have to pry
that logic from my dead, cold
object-oriented hands.
- DHH (Rails author)
14
+ Poorly defined and understood (A Critique of ANSI SQL Isolation Levels)
+ Still allows for anomalies (unless using Serializable)
+ Susceptible to various exploits (ACIDRain)
+ Cache, replica unfriendly
+ App must be transactional
ACID: The bad stuff
15
There’s a tradeoff between availability and consistency
+ Normally, Latency/Consistency tradeoff
+ Trade offs, not dichotomy
+ Extreme latency == outage
+ CAP Consistency != ACID consistency
+ Single item, single statement
+ Harvest/Yield model suggested
Basic reality of all distributed systems
CAP/PACELC theorems
=
Gotta
have this!
Choose 2
NoSQL – By availability vs consistency
16
Pick Two
Availability
Partition Tolerance
Consistency
Poll time!
17
18
+ ACID cannot be available (for isolation > READ COMMITTED)
+ CAP doesn’t actually apply to transactions (HAT not CAP)
+ ACID implies distributed locking (can be minimized)
+ Higher latency, lower scalability
Most NoSQL databases can provide ACID equivalent isolation levels*!
* But transactional semantics is rarely built in. It’s not ACID on purpose 🤷
Distributed ACID
(New)SQL is about semantics
NoSQL is about dynamics
19
20
Semantics – the “what”
+ What data
+ What shape
+ Schema
Focus on query power, integrity
Semantics vs dynamics
Dynamics – the “how”
+ How is a query executed
+ How is data located
+ How is data stored
Focus on performance, scale
21
NewSQL
+ SQL abstracts over scale
+ No semantics for
yield/harvest
+ Joins conflicts with sharding
+ Consistency elevates
replication latency
Scaling semantics
NoSQL
+ Explicit control over yield/harvest
+ Partition/shard friendly queries
+ Tunable/eventual consistency
cqlsh> CONSISTENCY ONE;
22
NewSQL
+ Ad-hoc
+ Mix reads and writes
+ Let the database worry
Performance hints, but limited
Query semantics
NoSQL
+ Schema designed per queries
+ Duplicate data
+ Separate reads from writes
It’s all about performance
23
Performance
YCSB workload A: 50% reads, 50% updates; high contention
24
Performance
YCSB workload D: 95% reads, 5% inserts; no contention
25
Performance
YCSB workload F: 50% reads, 50% read-modify-write updates; pseudo transactional
26
+ NoSQL designed for simple, fast queries
+ Consistency is expensive
+ Contention is expensive
+ Complex query processing
+ Transactional semantics are expensive
+ Even when not actually using transactions!
Why so slow?
27
Poll time!
28
NewSQL crunches the CPU
+ Compute results from storage
+ On the fly - each query
+ Indexed writes
+ Constraints verification
+ Transaction accounting
Performance
NoSQL crunches the disk
+ Multiple precomputed shapes
stored
+ With duplication
+ With replicas
20 years of hardware
evolution in 2 slides
29
30
RAM price by year
31
35 years of microprocessor trend data
What happened?
32
+ Per thread performance plateaued
+ Cores: 1 => 256
+ RAM: 2GB => 2TB
+ Disk space: 10GB => 10TB
+ Disk seek time: 10-20ms => 20µs
+ Network throughput: 1Gbps => 100Gbps
AWS u-24tb1.metal: 224 cores, 448 threads, 24TB RAM
We have lots of cores,
disks, machines
33
But can we use them?
34
Three major strategies:
+ Larger/faster hardware
+ Replicate data (increase throughput => CAP/PACELC)
+ Shard data (increase throughput => limit data semantics)
We combine strategies, each has its own problems
How do we scale things?
35
Coherency,
consensus,
synchronization
Row/partition
contention,
locks,
transactions
Universal Scalability Law
Avoid cross shard/replica
interactions at all costs
36
37
+ In reality there are very few cases requiring atomic transactions
+ No, finances mostly don’t need transactions - e.g. chargebacks
+ Transactions are very expensive (and sometimes impractical)
+ Immutable data models with reconciliation are almost always better
+ And sometimes mandated by law, e.g. ledgers
+ Database transactions can fail and need recovery, so do business transactions
That said, they are convenient and familiar
Remember: the real world is not globally consistent
What about transactions?
38
NewSQL
+ Generic data model
+ Strong consistency
+ Transactions
+ Limited scalability
+ Slow
Choosing between apples and oranges
NoSQL
+ Query specific data model
+ Tunable/weak consistency
+ No transactions
+ Scalable
+ Fast
Query power, data integrity,
DB centric code
Performance, scale,
app centric code
United States
2445 Faber St, Suite #200
Palo Alto, CA USA 94303
Israel
Maskit 4
Herzliya, Israel 4673304
www.scylladb.com
@scylladb
Thank You!

More Related Content

NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency

  • 1. NoSQL and NewSQL Tradeoffs Between Scalable Performance & Consistency Avishai Ish-Shalom
  • 2. 2 A brief history of databases 2 1970s Mainframes: inception of the relational model 1990s LAN age: replication, external caching, ORMs SQL 1980s SQL, relational databases become de-facto standard 2000s WEB 2.0: NoSQL databases for scale 2010s Cloud age: commoditization of NoSQL, NewSQL inception 1996 1995 1978 2008 2015 2014
  • 3. 3 Umbrella term for everything NOT SQL NoSQL – Not Only SQL Simplicity Power K/V stores “Wide rows” Document stores Also, lots of special purpose database: graph, metrics, search…..
  • 4. 4 Traditional RDBMS had severe scaling limitations + Cost + Scalability + Performance + Simplicity, predictability Why NoSQL?
  • 5. 5 Distributed relational database with SQL support + OLTP workloads + Distributed transactions + ACID guarantees NewSQL
  • 6. 6 + Compatibility, familiar model + Better scalability than traditional RDBMS + Rich, powerful query language Why NewSQL?
  • 8. 8 SQL is based on “relational algebra” (with some deviations) + Tables are relations between columns + Tables relate to each other using foreign keys + No duplicate tuples, no missing values Many primitives tables, data joined during read “Relational” modeling (normalization)
  • 9. 9 Normalization Foreign key SELECT * FROM Companies JOIN Employees ON Companies.CompanyID = Employees.CompanyID WHERE CompanyID = 23123 Company Employees {Name: ScyllaDB, founded:2013} {Name: Dave, phone: 989723, role: engineer} {Name: Cloudius systems, founded: 2013} {Name: John, phone: 32132, role: engineer} CompanyID Name Founded 23123 ScyllaDB 2013 78934233 Cloudius Systems 2013 EmployeeID CompanyID Name Phone Role 753 23123 Dave 989723 engineer 4765 78934233 John 32132 engineer
  • 10. 10 + Keep the data together + Minimal constraints on data structure + Possible duplication of values + Minimal reshaping of data + No joins Denormalized data models Name: ScyllaDB, Founded: 2013, Employees: [ {Name: Dave, Phone: 989723, Role: engineer} ] { } GET Companies/23123
  • 11. 11 Data models comparison Normalized Denormalized Data shaping On the fly, read-time Pre-shape, write-time Joins On the fly, read-time Pre-join, write-time Constraints On the fly, write time None Consistency No duplication, consistent Duplication, no consistency Locking Read/write (multi item) locks Minimal/no (item) locks DB Complexity High Low
  • 12. solation - transactions do not impact each other 12 A C I D ACID tomicity - no intermediate states. Transactions complete or rollback urability - written data will be persisted onsistency - data constraints are preserved
  • 13. 13 ACID is about concurrency, mutability + Pessimistic model + Predates distributed databases + Designed for monolithic databases + Transactional + Database centric programming ACID No, Mr. Database, you can not have my business logic. Your procedural ambitions will bear no fruit and you'll have to pry that logic from my dead, cold object-oriented hands. - DHH (Rails author)
  • 14. 14 + Poorly defined and understood (A Critique of ANSI SQL Isolation Levels) + Still allows for anomalies (unless using Serializable) + Susceptible to various exploits (ACIDRain) + Cache, replica unfriendly + App must be transactional ACID: The bad stuff
  • 15. 15 There’s a tradeoff between availability and consistency + Normally, Latency/Consistency tradeoff + Trade offs, not dichotomy + Extreme latency == outage + CAP Consistency != ACID consistency + Single item, single statement + Harvest/Yield model suggested Basic reality of all distributed systems CAP/PACELC theorems = Gotta have this! Choose 2
  • 16. NoSQL – By availability vs consistency 16 Pick Two Availability Partition Tolerance Consistency
  • 18. 18 + ACID cannot be available (for isolation > READ COMMITTED) + CAP doesn’t actually apply to transactions (HAT not CAP) + ACID implies distributed locking (can be minimized) + Higher latency, lower scalability Most NoSQL databases can provide ACID equivalent isolation levels*! * But transactional semantics is rarely built in. It’s not ACID on purpose 🤷 Distributed ACID
  • 19. (New)SQL is about semantics NoSQL is about dynamics 19
  • 20. 20 Semantics – the “what” + What data + What shape + Schema Focus on query power, integrity Semantics vs dynamics Dynamics – the “how” + How is a query executed + How is data located + How is data stored Focus on performance, scale
  • 21. 21 NewSQL + SQL abstracts over scale + No semantics for yield/harvest + Joins conflicts with sharding + Consistency elevates replication latency Scaling semantics NoSQL + Explicit control over yield/harvest + Partition/shard friendly queries + Tunable/eventual consistency cqlsh> CONSISTENCY ONE;
  • 22. 22 NewSQL + Ad-hoc + Mix reads and writes + Let the database worry Performance hints, but limited Query semantics NoSQL + Schema designed per queries + Duplicate data + Separate reads from writes It’s all about performance
  • 23. 23 Performance YCSB workload A: 50% reads, 50% updates; high contention
  • 24. 24 Performance YCSB workload D: 95% reads, 5% inserts; no contention
  • 25. 25 Performance YCSB workload F: 50% reads, 50% read-modify-write updates; pseudo transactional
  • 26. 26 + NoSQL designed for simple, fast queries + Consistency is expensive + Contention is expensive + Complex query processing + Transactional semantics are expensive + Even when not actually using transactions! Why so slow?
  • 28. 28 NewSQL crunches the CPU + Compute results from storage + On the fly - each query + Indexed writes + Constraints verification + Transaction accounting Performance NoSQL crunches the disk + Multiple precomputed shapes stored + With duplication + With replicas
  • 29. 20 years of hardware evolution in 2 slides 29
  • 31. 31 35 years of microprocessor trend data
  • 32. What happened? 32 + Per thread performance plateaued + Cores: 1 => 256 + RAM: 2GB => 2TB + Disk space: 10GB => 10TB + Disk seek time: 10-20ms => 20µs + Network throughput: 1Gbps => 100Gbps AWS u-24tb1.metal: 224 cores, 448 threads, 24TB RAM
  • 33. We have lots of cores, disks, machines 33 But can we use them?
  • 34. 34 Three major strategies: + Larger/faster hardware + Replicate data (increase throughput => CAP/PACELC) + Shard data (increase throughput => limit data semantics) We combine strategies, each has its own problems How do we scale things?
  • 37. 37 + In reality there are very few cases requiring atomic transactions + No, finances mostly don’t need transactions - e.g. chargebacks + Transactions are very expensive (and sometimes impractical) + Immutable data models with reconciliation are almost always better + And sometimes mandated by law, e.g. ledgers + Database transactions can fail and need recovery, so do business transactions That said, they are convenient and familiar Remember: the real world is not globally consistent What about transactions?
  • 38. 38 NewSQL + Generic data model + Strong consistency + Transactions + Limited scalability + Slow Choosing between apples and oranges NoSQL + Query specific data model + Tunable/weak consistency + No transactions + Scalable + Fast Query power, data integrity, DB centric code Performance, scale, app centric code
  • 39. United States 2445 Faber St, Suite #200 Palo Alto, CA USA 94303 Israel Maskit 4 Herzliya, Israel 4673304 www.scylladb.com @scylladb Thank You!