Scylla Summit 2017: Keynote, Looking back, looking ahead

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Looking back
Looking ahead
ScyllaDB
Avi Kivity

AND ON TWO LINES
First and last name
Position, company
Past accomplishments
and future plans for Scylla
CTO, ScyllaDB
Avi Kivity

AND ON TWO LINES
First and last name
Position, company
Avi Kivity
3
KVM hypervisor author and ex-maintainer
ScyllaDB co-founder and CTO

AND ON TWO LINES
First and last name
Position, company
Large Partitions

AND ON TWO LINES
First and last name
Position, company
Partitions and rows
5
▪ A table is composed of
partitions, indexed by a
partition key
▪ A partition is composed of rows,
indexed by row key
▪ Can have one row in a partition,
or a million
▪ Partitions are units of
distribution
▪ Rows are units of access
Table
Parts.
Rows
Partition key
Clustering key

AND ON TWO LINES
First and last name
Position, company
Partition vs. Row Orientation
6
Partition Orientation
▪ Partitions are the basic
managed unit
▪ Large (tens of megabytes)
partitions cause hiccups
Row Orientation
▪ Rows are the basic managed
unit
▪ Partitions can be larger than
memory with no ill effect

AND ON TWO LINES
First and last name
Position, company
Large partitions: file format
7
Summary
Index Large partition

AND ON TWO LINES
First and last name
Position, company
Large partitions: file format
8
Summary
Index Large partitionPX

AND ON TWO LINES
First and last name
Position, company
Large partitions: btree
9
Data file
Level 1
Level 2
Level 3
mapping partitions
mapping rows in partitions

AND ON TWO LINES
First and last name
Position, company
Row-oriented Repair
▪ Current repair
o 100 partitions granularity
• 1 row of mismatch causes 100 partitions to be synced
• Even a single partition can be large
o Repair master fetch / merge / push
• Can not send the delta between nodes
▪ Row oriented repair
o Single row granularity
o Row level mismatch can be detected
o Only the mismatched rows are synced between nodes
10

AND ON TWO LINES
First and last name
Position, company
Row orientation - long term effort
11
▪ Basic support, streaming: 1.3
▪ Cache: 2.0, 2.1
▪ Repair: 2.3
▪ SSTable Index: 2.4

AND ON TWO LINES
First and last name
Position, company
Increasing
Disk/Memory ratio

AND ON TWO LINES
First and last name
Position, company
Big data must be affordable data
▪ Common to see 1 TB/node in other databases
o But with 4:1 Disk:Memory ratios
▪ Scylla supports 30 TB/node today
▪ Currently 30:1 Disk:Memory ratio is achievable
o Goal is to support 100:1
13

AND ON TWO LINES
First and last name
Position, company
Why large nodes?
▪ Large nodes = small clusters
o Easier to administer
o Cheaper
▪ SSDs deliver 100s of thousands of IOPS
o Can rely less on cache and more on disk
▪ 10/20/40 Gbps networking
▪ 32+ cores/node
o More than enough compute
14

AND ON TWO LINES
First and last name
Position, company
Large disk challenges
▪ Memory-resident files
o CompressionInfo.db - used when
decompressing SSTable data blocks
o Summary.db - used to locate Index
blocks
o Filter.db - used to quickly eliminate
SSTables from query
15

AND ON TWO LINES
First and last name
Position, company
Large disk solutions
▪ CompressionInfo.db
o Compress 3X (2.1)
o Switch to cell-level compression (2.4)
▪ Filter.db
o Mostly important for very small partitions
o Automatic sampling (2.4)
▪ Summary.db
o Automatic sampling (2.1)
o Replace with btree (2.4)
16

AND ON TWO LINES
First and last name
Position, company
Reducing free disk space reserves
▪ Currently, required to have ~50% disk space free for compaction
o Compaction = copy all input to new file
▪ New compaction strategy for reduced free space reservations
o Able to incrementally delete input sstables before compaction completes
▪ Free space taken into account when deciding to compact
o Low free space -> compact earlier and more aggressively
17
See Nadav’s Compaction Strategy session

AND ON TWO LINES
First and last name
Position, company
Handling Node
Restarts

AND ON TWO LINES
First and last name
Position, company
Heat weighted load balancing
Attend Gleb’s talk for more
19

AND ON TWO LINES
First and last name
Position, company
Monitoring

AND ON TWO LINES
First and last name
Position, company
From Collectd to Prometheus
Collectd/graphite
▪ Hard to set up
▪ No preset dashboards
▪ Slow, clunky
21

AND ON TWO LINES
First and last name
Position, company
From Collectd to Prometheus
Prometheus/Grafana
▪ Simple docker setup
▪ Preset dashboards
▪ Drill down to
node/shard level
▪ Smooth and beautiful
▪ Very configurable
▪ Alerts
22
Attend Tzach’s Monitoring talk for more

AND ON TWO LINES
First and last name
Position, company
Indexing

AND ON TWO LINES
First and last name
Position, company
Materialized views
▪ New (experimental) in 2.0
▪ More ways to access your data efficiently
24
uid (pk) email last_login
7742 avi@scylladb.com yesterday
8012 foo@example.com never
email (pk) uid last_login
avi@scylladb.com 7742 yesterday
foo@example.com 8012 never

AND ON TWO LINES
First and last name
Position, company
Secondary indexing
▪ Transparently based on Materialized Views
▪ Global index
▪ Coming in 2.2/2.3
25

AND ON TWO LINES
First and last name
Position, company
Compaction
Strategies

AND ON TWO LINES
First and last name
Position, company
Compaction strategies = query patterns
▪ Size Tiered = general purpose
▪ Leveled = read intensive
▪ Date Tiered (1.3) = real-time data ordered by time
▪ Time Window (2.1) = real-time data ordered by time (but better)
27
Attend Nadav’s talk on compaction strategies

AND ON TWO LINES
First and last name
Position, company
Hybrid compaction strategy
▪ Mixes some characteristics of Leveled and Size-tiered
▪ Solves Size-tiered space amplification problem
28
Attend Nadav’s talk on compaction strategies

AND ON TWO LINES
First and last name
Position, company
Growing Ecosystem

AND ON TWO LINES
First and last name
Position, company
Ecosystem - drivers
▪ gocql talk by Chris Bannister
▪ gocqlx talk by Michał Matczuk
30

AND ON TWO LINES
First and last name
Position, company
Ecosystem - layered offerings
31
Talks:
▪ JanusGraph - Chin Huang and Ted Chang
▪ KairosDB - Brian Hawkins
▪ Spark - Burak Yavuz

AND ON TWO LINES
First and last name
Position, company
Ecosystem - Database as a Service
32
Talks:
▪ Compose: David Pitera
▪ Samsung SDS: Kuyul Noh /
Junghyun Park

AND ON TWO LINES
First and last name
Position, company
Ecosystem - Scylla Management Console
33
▪ Ignite talk by Yuval Zholkover

AND ON TWO LINES
First and last name
Position, company
Ecosystem - orchestration
▪ Planning to support orchestration environments
o Mesos, DC/OS
o Kubernetes
34

AND ON TWO LINES
First and last name
Position, company
Ecosystem - debugging tools
35

AND ON TWO LINES
First and last name
Position, company
Ecosystem - Seastar
36
▪ SMF - Seastar based log broker
▪ Pedis - Parallel Redis
Application
TCP/IP
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/IP
Task Scheduler
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/IP
Task Scheduler
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/IP
Task Scheduler
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Attend Alex’ talk for more crazy low-latency

AND ON TWO LINES
First and last name
Position, company
Ecosystem targeting ScyllaDB
● Targeting Cassandra (with Scylla as a side effect of compatibility)
● Dual target Cassandra/Scylla
○ Testing on both Cassandra and Scylla
● Dual target Scylla/Cassandra
○ Main target is Scylla, Cassandra by compatibility
● Around Seastar
○ Exploiting the parallel engine behind Scylla
37

AND ON TWO LINES
First and last name
Position, company
Design as Investment

AND ON TWO LINES
First and last name
Position, company
More bang for your design buck
● Using NoSQL is a significant design and ops effort
○ Select keys for good partitioning
○ Design a data model that works with your database
○ Application that talks to multiple database nodes in parallel
● Choosing Scylla rewards you for your effort
○ Good partitioning -> spread partitions over cores, not just nodes
○ Application parallelism -> more performance from a database that exploits it
39

AND ON TWO LINES
First and last name
Position, company
Migrations

AND ON TWO LINES
First and last name
Position, company
Where are people migrating from?
41
Attend Alexander Sicular’s talk about
no-downtime migrations

AND ON TWO LINES
First and last name
Position, company
Leading edge of
technology

AND ON TWO LINES
First and last name
Position, company
Technology innovation
▪ Multiple CPU architectures
o x86
o ARM
o POWER
o System Z
▪ Taking advantage of multi-socket, many-core, many-thread
43

AND ON TWO LINES
First and last name
Position, company
Technology innovation
▪ Integrating with non-volatile storage
o Intel Optane
o Samsung Z-SSD
44
Frank Ober’s Optane talk
Arash Rezaei’s Samsung Z-SSD

AND ON TWO LINES
First and last name
Position, company
Summary

AND ON TWO LINES
First and last name
Position, company
Scylla: a database with momentum
46
▪ Many improvements over the last year
▪ A lot of work still remains to be done
▪ Established NoSQL performance leader

AND ON TWO LINES
First and last name
Position, company
THANK YOU
avi@scylladb.com
@AviKivity
Please stay in touch
Any questions?

Scylla Summit 2017: Keynote, Looking back, looking ahead

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Scylla Summit 2017: Keynote, Looking back, looking ahead

Similar to Scylla Summit 2017: Keynote, Looking back, looking ahead (19)

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

Scylla Summit 2017: Keynote, Looking back, looking ahead