NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency

NoSQL and NewSQL
Tradeoffs Between Scalable
Performance & Consistency
Avishai Ish-Shalom

2
A brief history of databases
2
1970s
Mainframes:
inception of the
relational model
1990s
LAN age:
replication, external
caching, ORMs
SQL
1980s
SQL, relational
databases become
de-facto standard
2000s
WEB 2.0:
NoSQL databases
for scale
2010s
Cloud age:
commoditization
of NoSQL, NewSQL
inception
1996
1995
1978 2008
2015
2014

3
Umbrella term for everything NOT SQL
NoSQL – Not Only SQL
Simplicity Power
K/V stores
“Wide rows”
Document stores
Also, lots of special purpose database: graph, metrics, search…..

4
Traditional RDBMS had severe scaling limitations
+ Cost
+ Scalability
+ Performance
+ Simplicity, predictability
Why NoSQL?

5
Distributed relational database with SQL support
+ OLTP workloads
+ Distributed transactions
+ ACID guarantees
NewSQL

6
+ Compatibility, familiar model
+ Better scalability than traditional RDBMS
+ Rich, powerful query language
Why NewSQL?

Data modeling
7
Relationships are hard

8
SQL is based on “relational algebra” (with some deviations)
+ Tables are relations between columns
+ Tables relate to each other using foreign keys
+ No duplicate tuples, no missing values
Many primitives tables, data joined during read
“Relational” modeling (normalization)

9
Normalization
Foreign key
SELECT * FROM Companies
JOIN Employees ON
Companies.CompanyID =
Employees.CompanyID WHERE
CompanyID = 23123
Company Employees
{Name: ScyllaDB, founded:2013} {Name: Dave, phone: 989723, role: engineer}
{Name: Cloudius systems, founded: 2013} {Name: John, phone: 32132, role: engineer}
CompanyID Name Founded
23123 ScyllaDB 2013
78934233 Cloudius Systems 2013
EmployeeID CompanyID Name Phone Role
753 23123 Dave 989723 engineer
4765 78934233 John 32132 engineer

10
+ Keep the data together
+ Minimal constraints on data structure
+ Possible duplication of values
+ Minimal reshaping of data
+ No joins
Denormalized data models
Name: ScyllaDB, Founded: 2013,
Employees: [
{Name: Dave,
Phone: 989723,
Role: engineer}
]
{ }
GET Companies/23123

11
Data models comparison
Normalized Denormalized
Data shaping On the fly, read-time Pre-shape, write-time
Joins On the fly, read-time Pre-join, write-time
Constraints On the fly, write time None
Consistency No duplication, consistent Duplication, no consistency
Locking Read/write (multi item) locks Minimal/no (item) locks
DB Complexity High Low

solation - transactions do not impact each other
12
A
C
I
D
ACID
tomicity - no intermediate states. Transactions complete or rollback
urability - written data will be persisted
onsistency - data constraints are preserved

13
ACID is about concurrency, mutability
+ Pessimistic model
+ Predates distributed databases
+ Designed for monolithic databases
+ Transactional
+ Database centric programming
ACID No, Mr. Database, you can not
have my business logic. Your
procedural ambitions will bear
no fruit and you'll have to pry
that logic from my dead, cold
object-oriented hands.
- DHH (Rails author)

14
+ Poorly deﬁned and understood (A Critique of ANSI SQL Isolation Levels)
+ Still allows for anomalies (unless using Serializable)
+ Susceptible to various exploits (ACIDRain)
+ Cache, replica unfriendly
+ App must be transactional
ACID: The bad stuff

15
There’s a tradeoff between availability and consistency
+ Normally, Latency/Consistency tradeoff
+ Trade offs, not dichotomy
+ Extreme latency == outage
+ CAP Consistency != ACID consistency
+ Single item, single statement
+ Harvest/Yield model suggested
Basic reality of all distributed systems
CAP/PACELC theorems
=
Gotta
have this!
Choose 2

NoSQL – By availability vs consistency
16
Pick Two
Availability
Partition Tolerance
Consistency

18
+ ACID cannot be available (for isolation > READ COMMITTED)
+ CAP doesn’t actually apply to transactions (HAT not CAP)
+ ACID implies distributed locking (can be minimized)
+ Higher latency, lower scalability
Most NoSQL databases can provide ACID equivalent isolation levels*!
* But transactional semantics is rarely built in. It’s not ACID on purpose 🤷
Distributed ACID

(New)SQL is about semantics
NoSQL is about dynamics
19

20
Semantics – the “what”
+ What data
+ What shape
+ Schema
Focus on query power, integrity
Semantics vs dynamics
Dynamics – the “how”
+ How is a query executed
+ How is data located
+ How is data stored
Focus on performance, scale

21
NewSQL
+ SQL abstracts over scale
+ No semantics for
yield/harvest
+ Joins conﬂicts with sharding
+ Consistency elevates
replication latency
Scaling semantics
NoSQL
+ Explicit control over yield/harvest
+ Partition/shard friendly queries
+ Tunable/eventual consistency
cqlsh> CONSISTENCY ONE;

22
NewSQL
+ Ad-hoc
+ Mix reads and writes
+ Let the database worry
Performance hints, but limited
Query semantics
NoSQL
+ Schema designed per queries
+ Duplicate data
+ Separate reads from writes
It’s all about performance

23
Performance
YCSB workload A: 50% reads, 50% updates; high contention

24
Performance
YCSB workload D: 95% reads, 5% inserts; no contention

25
Performance
YCSB workload F: 50% reads, 50% read-modify-write updates; pseudo transactional

26
+ NoSQL designed for simple, fast queries
+ Consistency is expensive
+ Contention is expensive
+ Complex query processing
+ Transactional semantics are expensive
+ Even when not actually using transactions!
Why so slow?

28
NewSQL crunches the CPU
+ Compute results from storage
+ On the ﬂy - each query
+ Indexed writes
+ Constraints veriﬁcation
+ Transaction accounting
Performance
NoSQL crunches the disk
+ Multiple precomputed shapes
stored
+ With duplication
+ With replicas

20 years of hardware
evolution in 2 slides
29

31
35 years of microprocessor trend data

What happened?
32
+ Per thread performance plateaued
+ Cores: 1 => 256
+ RAM: 2GB => 2TB
+ Disk space: 10GB => 10TB
+ Disk seek time: 10-20ms => 20µs
+ Network throughput: 1Gbps => 100Gbps
AWS u-24tb1.metal: 224 cores, 448 threads, 24TB RAM

We have lots of cores,
disks, machines
33
But can we use them?

34
Three major strategies:
+ Larger/faster hardware
+ Replicate data (increase throughput => CAP/PACELC)
+ Shard data (increase throughput => limit data semantics)
We combine strategies, each has its own problems
How do we scale things?

35
Coherency,
consensus,
synchronization
Row/partition
contention,
locks,
transactions
Universal Scalability Law

Avoid cross shard/replica
interactions at all costs
36

37
+ In reality there are very few cases requiring atomic transactions
+ No, ﬁnances mostly don’t need transactions - e.g. chargebacks
+ Transactions are very expensive (and sometimes impractical)
+ Immutable data models with reconciliation are almost always better
+ And sometimes mandated by law, e.g. ledgers
+ Database transactions can fail and need recovery, so do business transactions
That said, they are convenient and familiar
Remember: the real world is not globally consistent
What about transactions?

38
NewSQL
+ Generic data model
+ Strong consistency
+ Transactions
+ Limited scalability
+ Slow
Choosing between apples and oranges
NoSQL
+ Query speciﬁc data model
+ Tunable/weak consistency
+ No transactions
+ Scalable
+ Fast
Query power, data integrity,
DB centric code
Performance, scale,
app centric code

United States
2445 Faber St, Suite #200
Palo Alto, CA USA 94303
Israel
Maskit 4
Herzliya, Israel 4673304
www.scylladb.com
@scylladb
Thank You!

NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency

Related slideshows

More Related Content

NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency