SlideShare a Scribd company logo
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
SCYLLA’S COMPACTION STRATEGIES
OR
HOW TO RUIN YOUR WORKLOAD'S PERFORMANCE
BY CHOOSING THE WRONG COMPACTION STRATEGY
Nadav Har’El, Raphael Carvalho
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Nadav Har’El
2
Nadav Har’El has had a diverse 20-year career in
computer programming and computer science.
In the past he worked on scientific computing,
networking software, and information retrieval.
In recent years his focus has been on virtualization
and operating systems. He also worked on nested
virtualization and exit-less I/O in KVM. Today, he
maintains the OSv kernel and also works on Seastar
and Scylla.
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Raphael Carvalho
3
Raphael S. Carvalho is a computer programmer who
loves file systems and has developed a huge
interest in distributed systems since he started
working on Scylla. Previously, he worked on ZFS
support for OSv and also drivers for the Syslinux
project. At ScyllaDB, Raphael has been mostly
working on compaction and compaction strategies.
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Agenda
▪ What is compaction?
▪ Scylla’s compaction strategies:
o Size Tier
o Leveled
o Hybrid
o Date Tier
o Time Window
▪ Which should I use for my workload and why?
▪ Examples!
4
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
What is compaction?
Scylla’s write path:
5
Writes
commit log
compaction
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
(What is compaction?)
▪ Scylla’s write path:
o Updates are inserted into a memory table (“memtable”)
o Memtables are periodically flushed to a new sorted file (“sstable”)
▪ After a while, we have many separate sstables
o Different sstables may contain old and new values of the same cell
o Or different rows in the same partition
o Wastes disk space
o Slows down reads
▪ Compaction: read several sstables and output one (or more)
containing the merged and most recent information
6
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
What is compaction? (cont.)
▪ This technique of keeping sorted files and merging them is
well-known and often called Log-Structured Merge (LSM) Tree
▪ Published in 1996, earliest popular application that I know of is the
Lucene search engine, 1999
o High performance write.
o Immediately readable.
o Reasonable performance for read.
7
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
(Compaction efficiency requirements)
▪ Sstable merge is efficient
o Merging sorted sstables efficient, and contiguous I/O for read and write
▪ Background compaction does not increase request tail-latency
o Scylla breaks compaction work into small pieces
▪ Background compaction does not fluctuate request throughput
o “Workload conditioning”: compaction done not faster than needed
8
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Compaction Strategy
▪ Which sstables to compact, and when?
▪ This is called the compaction strategy
▪ The goal of the strategy is low amplification:
o Avoid read requests needing many sstables.
• read amplification
o Avoid overwritten/deleted/expired data staying on disk.
o Avoid excessive temporary disk space needs (scary!)
• space amplification
o Avoid compacting the same data again and again.
• write amplification
9
Which compaction
strategy shall I
choose?
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Strategy #1: Size-Tiered Compaction
▪ Cassandra’s oldest, and still default, compaction strategy
▪ Dates back to Google’s BigTable paper (2006)
o Idea used even earlier (e.g., Lucene, 1999)
10
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Size-Tiered compaction strategy
11
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
(Size-Tiered compaction strategy)
▪ Each time when enough data is in the memory table, flush it to a
small sstable
▪ When several small sstables exist, compact them into one bigger
sstable
▪ When several bigger sstables exist, compact them into one very big
sstable
▪ …
▪ Each time one “size tier” has enough sstables, compact them into
one sstable in the (usually) next size tier
12
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Size-Tiered compaction - amplification
▪ write amplification: O(logN)
o Where “N” is (data size) / (flushed sstable size).
o Most data is in highest tier - needed to pass through O(logN) tiers
o This is asymptotically optimal
13
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Size-Tiered compaction - amplification
What is read amplification? O(logN) sstables, but:
▪ If workload writes a partition once and never modifies it:
o Eventually each partition’s data will be compacted into one sstable
o In-memory bloom filter will usually allow reading only one sstable
o Optimal
▪ But if workload continues to update a partition:
o All sstables will contain updates to the same partition
o O(logN) reads per read request
o Reasonable, but not great
14
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Size-Tiered compaction - amplification
▪ Space amplification
15
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Size-Tiered compaction - amplification
▪ Space amplification:
o Obsolete data in a huge sstable will remain for a very long time
o Compaction needs a lot of temporary space:
• Worst-case, needs to merge all existing sstables into one and may need
half the disk to be empty for the merged result. (2x)
• Less of a problem in Scylla than Cassandra because of sharding
o When workload is overwrite-intensive, it is even worse:
• We wait until 4 large sstables
• All 4 overwrote the same data, so merged amount is same as in 1 sstable
• 5-fold space amplification!
• Or worse - if compaction is behind, there will be the same data in several
tiers and have unequal sizes
16
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Strategy #2: Leveled Compaction
▪ Introduced in Cassandra 1.0, in 2011.
▪ Based on Google’s LevelDB (itself based on Google’s BigTable)
▪ No longer has size-tiered’s huge sstables
▪ Instead have runs:
o A run is a collection of small (160 MB by default) SSTables
o Have non-overlapping key ranges
o A huge SSTable must be rewritten as a whole, but in a run we can modify only
parts of it (individual sstables) while keeping the disjoint key requirement
▪ In leveled compaction strategy:
17
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Leveled compaction strategy
18
Level 0
Level 1
(run of 10
sstables) Level 2
(run of 100
sstables)
...
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
(Leveled compaction strategy)
▪ SSTables are divided into “levels”:
o New SSTables (dumped from memtables) are created in Level 0
o Each other level is a run of SSTables of exponentially increasing size:
• Level 1 is a run of 10 SSTables (of 160 MB each)
• Level 2 is a run of 100 SSTables (of 160 MB each)
• etc.
▪ When we have enough (e.g., 4) sstables in Level 0, we compact
them with all 10 sstables in Level 1
o We don't create one large sstable - rather, a run: we write one sstable and
when we reach the size limit (160 MB), we start a new sstable
19
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
(Leveled compaction strategy)
▪ After the compaction of level 0 into level 1, level 1 may have more
than 10 of sstables. We pick one and compact it into level 2:
o Take one sstable from level 1
o Look at its key range and find all sstables in level 2 which overlap with it
o Typically, there are about 12 of these
• The level 1 sstable spans roughly 1/10th of the keys, while each level 2
sstable spans 1/100th of the keys; so a level-1 sstable’s range roughly
overlaps 10 level-2 sstables plus two more on the edges
o As before, we compact the one sstable from level 1 and the 12 sstables from
level 2 and replace all of those with new sstables in level 2
20
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
(Leveled compaction strategy)
▪ After this compaction of level 1 into level 2, now we can have
excess sstables in level 2 so we merge them into level 3. Again, one
sstable from level 2 will need to be compacted with around 10
sstables from level 3.
21
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Leveled compaction - amplification
▪ Space amplification:
o Because of sstable counts, 90% of the data is in the deepest level (if full!)
o These sstables do not overlap, so it can’t have duplicate data!
o So at most, 10% of the space is wasted
o Also, each compaction needs a constant (~12*160MB) temporary space
o Nearly optimal
22
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Leveled compaction - amplification
▪ Read amplification:
o We have O(N) tables!
o But in each level sstables have disjoint ranges (cached in memory)
o Worst-case, O(logN) sstables relevant to a partition - plus L0 size.
o Under some assumptions (update complete rows, of similar sizes)
space amplification implies: 90% of the reads will need just one sstable!
o Nearly optimal
23
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Leveled compaction - amplification
▪ Write amplification:
24
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Leveled compaction - amplification
▪ Write amplification:
o Again, most of the data is in the deepest level k
• E.g., k=3 is enough for 160 GB of data (per shard!)
• All data was written once in L0, then compacted into L1, … then to Lk
• So each row written k+1 times
o For each input (level i>1) sstable we compact, we compact it with ~12
overlapping sstables in level i+1. Writing ~13 output sstables. (lower for L0)
o Worst-case, write amplification is around 13k
o Also O(logN) but higher constant factor than size-tiered...
o If enough writing and LCS can’t keep up, its read and space advantages are
lost
o If also have cache-miss reads, they will get less disk bandwidth
25
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 1 - write-only workload
▪ Write-only workload
o Cassandra-stress writing 30 million partitions (about 9 GB of data)
o Constant write rate 10,000 writes/second
o One shard
26
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 1 - write-only workload
▪ Size-tiered compaction:
at some points needs twice the disk space
o In Scylla with many shards, “usually” maximum space use is not concurrent
▪ Level-tiered compaction:
more than double the amount of disk I/O
o Test used smaller-than default sstables (10 MB) to illustrate the problem
o Same problem with default sstable size (160 MB) - with larger workloads
27
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 1 (space amplification)
constant multiple of
flushed memtable &
sstable size
28
x2 space
amplification
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 1 (write amplification)
▪ Amount of actual data collected: 8.8 GB
▪ Size-tiered compaction: 50 GB writes (4 tiers + commit log)
▪ Leveled compaction: 111 GB writes
29
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 1 - note
▪ Leveled compactions write amplification is not only a problem with
100% write...
▪ Can have just 10% writes and an amplified write workload so high
that
o Uncached reads slowed down because we need the disk to write
o Compaction can’t keep up, uncompacted sstables pile up, even slower reads
▪ Leveled compaction is unsuitable for many workloads with a
non-negligible amount of writes even if they seem “read mostly”
30
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Can we create a new compaction strategy with
▪ Low write amplification of size-tiered compaction
▪ Without its high temporary disk space usage during compaction?
31
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Strategy #3: Hybrid Compaction
▪ New in upcoming version of Scylla Enterprise
▪ Hybrid of Size-Tiered and Leveled strategies:
32
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Strategy #3: Hybrid Compaction
▪ Size-tiered compaction needs temporary space because we only
remove a huge sstable after we fully compact it.
▪ Let’s split each huge sstable into a run (a la LCS) of “fragments”:
o Treat the entire run (not individual sstables) as a file for STCS
o Remove individual sstables as compacted. Low temporary space.
33
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Strategy #3: Hybrid Compaction
▪ Solve 4x worst-case in overwrite workloads with other techniques:
o Compact fewer sstables if disk is getting full
• Not a risk because small temporary disk needs
o Compact fewer sstables if they have large overlaps
34
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Hybrid compaction - amplification
▪ Space amplification:
o Small constant temporary space needs, even smaller than LCS
(M*S per parallel compaction, e.g., M=4, S=160 MB)
o Overwrite-mostly still a worst-case, but 2-fold instead of 5-fold
o Optimal.
▪ Write amplification:
o O(logN), small constant — same as Size-Tiered compaction
▪ Read amplification:
o Like Size-Tiered, at worst O(logN) if updating the same partitions
35
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 1, with Hybrid compaction strategy
36
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 2 - overwrite workload
▪ Write 15 times the same 4 million partitions
o cassandra-stress write n=4000000 -pop seq=1..4000000 -schema
"replication(strategy=org.apache.cassandra.locator.SimpleStrategy,factor=1)"
o In this test cassandra-stress not rate limited
o Again, small (10MB) LCS tables
▪ Necessary amount of sstable data: 1.2 GB
▪ STCS space amplification: x7.7 !
▪ LCS space amplification lower, constant multiple of sstable size
▪ Hybrid will be around x2
37
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 2
38
x7.7
space
amplification
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 3 - read+updates workload
▪ When workloads are read-mostly, read amplification is important
▪ When workloads also have updates to existing partitions
o With STCS, each partition ends up in multiple sstables
o Read amplification
▪ An example to simulate this:
o Do a write-only update workload
• cassandra_stress write n=4,000,000 -pop seq=1..1,000,000
o Now run a read-only workload
• cassandra_stress read n=1,000,000 -pop seq=1..1,000,000
• measure avg. number of read bytes per request
39
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 3 - read+updates workload
▪ Size-tiered: 46,915 bytes read per request
o Optimal after major compaction - 11,979
▪ Leveled: 11,982
o Equal to optimal because in this case all sstables fit in L1...
▪ Increasing the number of partitions 8-fold:
o Size-tiered: 29,794 luckier this time
o Leveled: 16,713 unlucky (0.5 of data, not 0.9, in L2)
▪ BUT: Remember that if we have non-negligable amount of writes,
LCS write amplification may slow down reads
40
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 3, and major compaction
▪ We saw that size-tiered major compaction reduces read
amplification
▪ It also reduces space amplification (expired/overwritten data)
▪ Major compaction only makes sense if very few writes
o But in that case, LCS’s write amplification is not a problem!
o So LCS is recommended instead of major compaction
• Easier to use
• No huge operations like major compaction (need to find when to run)
• No 50%-free-disk worst-case requirement
• Good read amplification and space amplification
41
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Why major compaction? Is it suboptimal?
(from STCS perspective)
▪ STCS is quite inefficient / slow at getting rid of obsolete data
(droppable tombstone, shadowed data).
o For droppable tombstone, there’s tombstone compaction. Suboptimal though.
o For shadowed (overwritten) data, there’s nothing to do. Just wait for data and
obsolete data to be compacted together after reaching same tier.
42
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Tombstone compaction
▪ Triggered when standard compaction has nothing to do
▪ Tombstone compaction selects sstable with a percentage of
droppable tombstone higher than N% and hopes space will be
released.
▪ That’s suboptimal though…
▪ Tombstone cannot be purged unless it’s compacted with data it
deletes/shadows.
▪ CASSANDRA-7019 suggests improving the feature by compacting a
sstable with older overlapping sstables. That will be inefficient
with STCS though. What can we do instead?
43
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Making improved tombstone compaction
efficient with hybrid
▪ Hybrid can choose a fragment from high tiers and compact it with
all overlapping fragments from sstable runs of same tier or above.
▪ All sstable run(s) involved will have their (often only one) fragment
replaced by another with: (LIVE DATA) – (SHADOWED DATA) –
(DROPPABLE TOMBSTONES)
▪ Temporary space requirement of N * fragment size, N = number of
fragments involved
▪ Make it optional for regular scenarios but use it if running out of
disk space.
44
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Hybrid tombstone compaction - Example
45
FRAGMENTS
SSTABLE RUNS
CHOOSE A SSTABLE RUN
FRAGMENT WITH N% OF DROPPABLE
TOMBSTONES
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Hybrid tombstone compaction - Example
FRAGMENTS
SSTABLE RUNS
INCLUDE *OLDER* FRAGMENT(S)
THAT OVERLAP WITH THE ONE
PREVIOUSLY CHOSEN
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Hybrid tombstone compaction - Example
FRAGMENTS
SSTABLE RUNS
REPLACE FRAGMENTS BY ONES
WITHOUT SHADOWED DATA AND
DROPPABLE TOMBSTONES
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Making hybrid take action when lots of
duplicate data waste disk space
▪ Compact fewer tables of same tier if they contain lots of duplicate
data. Affects only overwrite intensive workloads.
▪ Cardinality information may help us estimating duplication
between tables. Work only at partition level though…
▪ Nadav came up with idea of doing a compaction sample to help
with estimation at clustering level. Works due to murmur
tokenizer.
▪ At worst case (running out of space), Hybrid can afford to compact
biggest tiers together to get rid of all obsolete data with low
temporary space requirement.
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Conclusion on this hybrid strategy topic
▪ Goal is to have hybrid do the cleanup job itself rather than relying
on sysadmin to run manual (major compaction) at an interval.
▪ Hybrid can take smart decisions due to its nature; non-aggressive,
incremental steps towards improving space amplification without
hurting system performance like major does.
▪ Trying to bring best of both worlds.
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Strategy #4: Time-Window Compaction
▪ Introduced in Cassandra 3.0.8, designed for time-series data
▪ Replaces Date-Tiered compaction strategy of Cassandra 2.1
(which is also supported by Scylla, but not recommended)
50
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Time-Window compaction strategy (cont.)
In a time-series use case:
▪ Clustering key and write time are correlated
▪ Data is added in time order. Only few out-of-order writes, typically
rearranged by just a few seconds
▪ Data is only deleted through expiration (TTL) or by deleting an
entire partition, usually the same TTL on all the data
▪ The rate at which data is written is nearly constant
▪ A query is a clustering-key range query on a given partition
Most common query: "values from the last hour/day/week"
51
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Time-Window compaction strategy (cont.)
▪ Scylla remembers in memory the minimum and maximum
clustering key in each newly-flushed sstable
o Efficiently find only the sstables with data relevant to a query
▪ Other compaction strategies
o Destroy this feature by merging “old” and “new” sstables
o Move all rows of a partition to the same sstable…
• But time series queries don’t need all rows of a partition, just rows in a
given time range
• Makes it impossible to expire old sstable’s when everything in them has
expired
• Read and write amplification (needless compactions)
52
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Time-Window compaction strategy (cont.)
So TWCS:
▪ Divides time into “time windows”
o E.g., if typical query asks for 1 day of data, choose a time window of 1 day
▪ Divide sstables into time buckets, according to time window
▪ Compact using Size-Tiered strategy inside each time bucket
o If the 2-day old window has just one big sstable and a repair creates an
additional tiny “old” sstable, the two will not get compacted
o A tradeoff: slows read but avoids the write amplification problem of DTCS
▪ When time bucket exits the current window, do a major
compaction
o Except for small repair-produced sstables, we get 1 sstable per time window
53
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Summary
54
Workload Size-Tiered Leveled Hybrid Time-Window
Write-only 2x peak space 2x writes Best -
Overwrite Huge peak
space
write
amplification
high peak
space, but not
like size-tiered
-
Read-mostly,
few updates
read
amplification
Best read
amplification
-
Read-mostly,
but a lot of
updates
read and space
amplification
write
amplification
may overwhelm
read
amplification
-
Time series write, read, and
space ampl.
write and space
amplification
write and read
amplification
Best
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
THANK YOU
nyh@scylladb.com
Please stay in touch
Any questions?

More Related Content

What's hot

Architecting for AWS
Architecting for AWSArchitecting for AWS
Architecting for AWS
Amazon Web Services
 
Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...
Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...
Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...
Amazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
Amazon Web Services
 
Your First 10 million Users on the AWS Cloud
Your First 10 million Users on the AWS CloudYour First 10 million Users on the AWS Cloud
Your First 10 million Users on the AWS Cloud
Amazon Web Services
 
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Amazon Web Services
 
AWS Direct Connect & VPN's - Pop-up Loft Tel Aviv
AWS Direct Connect & VPN's - Pop-up Loft Tel AvivAWS Direct Connect & VPN's - Pop-up Loft Tel Aviv
AWS Direct Connect & VPN's - Pop-up Loft Tel Aviv
Amazon Web Services
 
Cloud assessment approach
Cloud assessment approachCloud assessment approach
Cloud assessment approach
Balkrishna Heroor
 
AWS Technical Due Diligence Workshop Session One
AWS Technical Due Diligence Workshop Session OneAWS Technical Due Diligence Workshop Session One
AWS Technical Due Diligence Workshop Session One
Tom Laszewski
 
Failure is not an Option - Designing Highly Resilient AWS Systems
Failure is not an Option - Designing Highly Resilient AWS SystemsFailure is not an Option - Designing Highly Resilient AWS Systems
Failure is not an Option - Designing Highly Resilient AWS Systems
Amazon Web Services
 
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech TalksCloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
Amazon Web Services
 
Amazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage OverviewAmazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage Overview
Amazon Web Services
 
深入淺出 AWS 大數據工具
深入淺出 AWS 大數據工具深入淺出 AWS 大數據工具
深入淺出 AWS 大數據工具
Amazon Web Services
 
Apache solr教學介紹 20150501
Apache solr教學介紹 20150501Apache solr教學介紹 20150501
Apache solr教學介紹 20150501
Yung-Ting Chen
 
Auto Scaling on AWS
Auto Scaling on AWSAuto Scaling on AWS
Auto Scaling on AWS
AustinWebArch
 
Best Practices for Database Migration to the Cloud: Improve Application Perfo...
Best Practices for Database Migration to the Cloud: Improve Application Perfo...Best Practices for Database Migration to the Cloud: Improve Application Perfo...
Best Practices for Database Migration to the Cloud: Improve Application Perfo...
Amazon Web Services
 
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
Amazon Web Services
 
Graph Databases - RedisGraph and RedisInsight
Graph Databases - RedisGraph and RedisInsightGraph Databases - RedisGraph and RedisInsight
Graph Databases - RedisGraph and RedisInsight
Md. Farhan Memon
 
Databases on AWS: Scaling Applications & Modern Data Architectures
Databases on AWS: Scaling Applications & Modern Data ArchitecturesDatabases on AWS: Scaling Applications & Modern Data Architectures
Databases on AWS: Scaling Applications & Modern Data Architectures
Amazon Web Services
 
Journey Through The Cloud - Security Best Practices
Journey Through The Cloud - Security Best Practices Journey Through The Cloud - Security Best Practices
Journey Through The Cloud - Security Best Practices
Amazon Web Services
 
Automating document analysis and text extraction with Amazon Textract - AIM20...
Automating document analysis and text extraction with Amazon Textract - AIM20...Automating document analysis and text extraction with Amazon Textract - AIM20...
Automating document analysis and text extraction with Amazon Textract - AIM20...
Amazon Web Services
 

What's hot (20)

Architecting for AWS
Architecting for AWSArchitecting for AWS
Architecting for AWS
 
Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...
Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...
Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Your First 10 million Users on the AWS Cloud
Your First 10 million Users on the AWS CloudYour First 10 million Users on the AWS Cloud
Your First 10 million Users on the AWS Cloud
 
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
 
AWS Direct Connect & VPN's - Pop-up Loft Tel Aviv
AWS Direct Connect & VPN's - Pop-up Loft Tel AvivAWS Direct Connect & VPN's - Pop-up Loft Tel Aviv
AWS Direct Connect & VPN's - Pop-up Loft Tel Aviv
 
Cloud assessment approach
Cloud assessment approachCloud assessment approach
Cloud assessment approach
 
AWS Technical Due Diligence Workshop Session One
AWS Technical Due Diligence Workshop Session OneAWS Technical Due Diligence Workshop Session One
AWS Technical Due Diligence Workshop Session One
 
Failure is not an Option - Designing Highly Resilient AWS Systems
Failure is not an Option - Designing Highly Resilient AWS SystemsFailure is not an Option - Designing Highly Resilient AWS Systems
Failure is not an Option - Designing Highly Resilient AWS Systems
 
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech TalksCloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
 
Amazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage OverviewAmazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage Overview
 
深入淺出 AWS 大數據工具
深入淺出 AWS 大數據工具深入淺出 AWS 大數據工具
深入淺出 AWS 大數據工具
 
Apache solr教學介紹 20150501
Apache solr教學介紹 20150501Apache solr教學介紹 20150501
Apache solr教學介紹 20150501
 
Auto Scaling on AWS
Auto Scaling on AWSAuto Scaling on AWS
Auto Scaling on AWS
 
Best Practices for Database Migration to the Cloud: Improve Application Perfo...
Best Practices for Database Migration to the Cloud: Improve Application Perfo...Best Practices for Database Migration to the Cloud: Improve Application Perfo...
Best Practices for Database Migration to the Cloud: Improve Application Perfo...
 
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
 
Graph Databases - RedisGraph and RedisInsight
Graph Databases - RedisGraph and RedisInsightGraph Databases - RedisGraph and RedisInsight
Graph Databases - RedisGraph and RedisInsight
 
Databases on AWS: Scaling Applications & Modern Data Architectures
Databases on AWS: Scaling Applications & Modern Data ArchitecturesDatabases on AWS: Scaling Applications & Modern Data Architectures
Databases on AWS: Scaling Applications & Modern Data Architectures
 
Journey Through The Cloud - Security Best Practices
Journey Through The Cloud - Security Best Practices Journey Through The Cloud - Security Best Practices
Journey Through The Cloud - Security Best Practices
 
Automating document analysis and text extraction with Amazon Textract - AIM20...
Automating document analysis and text extraction with Amazon Textract - AIM20...Automating document analysis and text extraction with Amazon Textract - AIM20...
Automating document analysis and text extraction with Amazon Textract - AIM20...
 

Viewers also liked

Scylla Summit 2017: From Elasticsearch to Scylla at Zenly
Scylla Summit 2017: From Elasticsearch to Scylla at ZenlyScylla Summit 2017: From Elasticsearch to Scylla at Zenly
Scylla Summit 2017: From Elasticsearch to Scylla at Zenly
ScyllaDB
 
Scylla Summit 2017: Planning Your Queries for Maximum Performance
Scylla Summit 2017: Planning Your Queries for Maximum PerformanceScylla Summit 2017: Planning Your Queries for Maximum Performance
Scylla Summit 2017: Planning Your Queries for Maximum Performance
ScyllaDB
 
Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field
Scylla Summit 2017: A Toolbox for Understanding Scylla in the FieldScylla Summit 2017: A Toolbox for Understanding Scylla in the Field
Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field
ScyllaDB
 
Scylla Summit 2017: Migrating to Scylla From Cassandra and Others With No Dow...
Scylla Summit 2017: Migrating to Scylla From Cassandra and Others With No Dow...Scylla Summit 2017: Migrating to Scylla From Cassandra and Others With No Dow...
Scylla Summit 2017: Migrating to Scylla From Cassandra and Others With No Dow...
ScyllaDB
 
If You Care About Performance, Use User Defined Types
If You Care About Performance, Use User Defined TypesIf You Care About Performance, Use User Defined Types
If You Care About Performance, Use User Defined Types
ScyllaDB
 
Scylla Summit 2017: Stateful Streaming Applications with Apache Spark
Scylla Summit 2017: Stateful Streaming Applications with Apache Spark Scylla Summit 2017: Stateful Streaming Applications with Apache Spark
Scylla Summit 2017: Stateful Streaming Applications with Apache Spark
ScyllaDB
 
Scylla Summit 2017: Cry in the Dojo, Laugh in the Battlefield: How We Constan...
Scylla Summit 2017: Cry in the Dojo, Laugh in the Battlefield: How We Constan...Scylla Summit 2017: Cry in the Dojo, Laugh in the Battlefield: How We Constan...
Scylla Summit 2017: Cry in the Dojo, Laugh in the Battlefield: How We Constan...
ScyllaDB
 
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at TwitterScylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
ScyllaDB
 
Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL
Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQLScylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL
Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL
ScyllaDB
 
Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and S...
Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and S...Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and S...
Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and S...
ScyllaDB
 
Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...
Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...
Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...
ScyllaDB
 
Scylla Summit 2017: The Upcoming HPC Evolution
Scylla Summit 2017: The Upcoming HPC EvolutionScylla Summit 2017: The Upcoming HPC Evolution
Scylla Summit 2017: The Upcoming HPC Evolution
ScyllaDB
 
Scylla Summit 2017: Scylla on Samsung NVMe Z-SSDs
Scylla Summit 2017: Scylla on Samsung NVMe Z-SSDsScylla Summit 2017: Scylla on Samsung NVMe Z-SSDs
Scylla Summit 2017: Scylla on Samsung NVMe Z-SSDs
ScyllaDB
 
Scylla Summit 2017: Scylla on Kubernetes
Scylla Summit 2017: Scylla on KubernetesScylla Summit 2017: Scylla on Kubernetes
Scylla Summit 2017: Scylla on Kubernetes
ScyllaDB
 
Scylla Summit 2017: How to Use Gocql to Execute Queries and What the Driver D...
Scylla Summit 2017: How to Use Gocql to Execute Queries and What the Driver D...Scylla Summit 2017: How to Use Gocql to Execute Queries and What the Driver D...
Scylla Summit 2017: How to Use Gocql to Execute Queries and What the Driver D...
ScyllaDB
 
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
DataStax
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
ScyllaDB
 
Scylla Summit 2017: How Baidu Runs Scylla on a Petabyte-Level Big Data Platform
Scylla Summit 2017: How Baidu Runs Scylla on a Petabyte-Level Big Data PlatformScylla Summit 2017: How Baidu Runs Scylla on a Petabyte-Level Big Data Platform
Scylla Summit 2017: How Baidu Runs Scylla on a Petabyte-Level Big Data Platform
ScyllaDB
 
Scylla Summit 2017: Stretching Scylla Silly: The Datastore of a Graph Databas...
Scylla Summit 2017: Stretching Scylla Silly: The Datastore of a Graph Databas...Scylla Summit 2017: Stretching Scylla Silly: The Datastore of a Graph Databas...
Scylla Summit 2017: Stretching Scylla Silly: The Datastore of a Graph Databas...
ScyllaDB
 
Scylla Summit 2017: Keynote, Looking back, looking ahead
Scylla Summit 2017: Keynote, Looking back, looking aheadScylla Summit 2017: Keynote, Looking back, looking ahead
Scylla Summit 2017: Keynote, Looking back, looking ahead
ScyllaDB
 

Viewers also liked (20)

Scylla Summit 2017: From Elasticsearch to Scylla at Zenly
Scylla Summit 2017: From Elasticsearch to Scylla at ZenlyScylla Summit 2017: From Elasticsearch to Scylla at Zenly
Scylla Summit 2017: From Elasticsearch to Scylla at Zenly
 
Scylla Summit 2017: Planning Your Queries for Maximum Performance
Scylla Summit 2017: Planning Your Queries for Maximum PerformanceScylla Summit 2017: Planning Your Queries for Maximum Performance
Scylla Summit 2017: Planning Your Queries for Maximum Performance
 
Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field
Scylla Summit 2017: A Toolbox for Understanding Scylla in the FieldScylla Summit 2017: A Toolbox for Understanding Scylla in the Field
Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field
 
Scylla Summit 2017: Migrating to Scylla From Cassandra and Others With No Dow...
Scylla Summit 2017: Migrating to Scylla From Cassandra and Others With No Dow...Scylla Summit 2017: Migrating to Scylla From Cassandra and Others With No Dow...
Scylla Summit 2017: Migrating to Scylla From Cassandra and Others With No Dow...
 
If You Care About Performance, Use User Defined Types
If You Care About Performance, Use User Defined TypesIf You Care About Performance, Use User Defined Types
If You Care About Performance, Use User Defined Types
 
Scylla Summit 2017: Stateful Streaming Applications with Apache Spark
Scylla Summit 2017: Stateful Streaming Applications with Apache Spark Scylla Summit 2017: Stateful Streaming Applications with Apache Spark
Scylla Summit 2017: Stateful Streaming Applications with Apache Spark
 
Scylla Summit 2017: Cry in the Dojo, Laugh in the Battlefield: How We Constan...
Scylla Summit 2017: Cry in the Dojo, Laugh in the Battlefield: How We Constan...Scylla Summit 2017: Cry in the Dojo, Laugh in the Battlefield: How We Constan...
Scylla Summit 2017: Cry in the Dojo, Laugh in the Battlefield: How We Constan...
 
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at TwitterScylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
 
Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL
Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQLScylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL
Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL
 
Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and S...
Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and S...Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and S...
Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and S...
 
Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...
Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...
Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...
 
Scylla Summit 2017: The Upcoming HPC Evolution
Scylla Summit 2017: The Upcoming HPC EvolutionScylla Summit 2017: The Upcoming HPC Evolution
Scylla Summit 2017: The Upcoming HPC Evolution
 
Scylla Summit 2017: Scylla on Samsung NVMe Z-SSDs
Scylla Summit 2017: Scylla on Samsung NVMe Z-SSDsScylla Summit 2017: Scylla on Samsung NVMe Z-SSDs
Scylla Summit 2017: Scylla on Samsung NVMe Z-SSDs
 
Scylla Summit 2017: Scylla on Kubernetes
Scylla Summit 2017: Scylla on KubernetesScylla Summit 2017: Scylla on Kubernetes
Scylla Summit 2017: Scylla on Kubernetes
 
Scylla Summit 2017: How to Use Gocql to Execute Queries and What the Driver D...
Scylla Summit 2017: How to Use Gocql to Execute Queries and What the Driver D...Scylla Summit 2017: How to Use Gocql to Execute Queries and What the Driver D...
Scylla Summit 2017: How to Use Gocql to Execute Queries and What the Driver D...
 
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
 
Scylla Summit 2017: How Baidu Runs Scylla on a Petabyte-Level Big Data Platform
Scylla Summit 2017: How Baidu Runs Scylla on a Petabyte-Level Big Data PlatformScylla Summit 2017: How Baidu Runs Scylla on a Petabyte-Level Big Data Platform
Scylla Summit 2017: How Baidu Runs Scylla on a Petabyte-Level Big Data Platform
 
Scylla Summit 2017: Stretching Scylla Silly: The Datastore of a Graph Databas...
Scylla Summit 2017: Stretching Scylla Silly: The Datastore of a Graph Databas...Scylla Summit 2017: Stretching Scylla Silly: The Datastore of a Graph Databas...
Scylla Summit 2017: Stretching Scylla Silly: The Datastore of a Graph Databas...
 
Scylla Summit 2017: Keynote, Looking back, looking ahead
Scylla Summit 2017: Keynote, Looking back, looking aheadScylla Summit 2017: Keynote, Looking back, looking ahead
Scylla Summit 2017: Keynote, Looking back, looking ahead
 

Similar to Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the Wrong Compaction Strategy

Scylla Compaction Strategies
Scylla Compaction StrategiesScylla Compaction Strategies
Scylla Compaction Strategies
Nadav Har'El
 
Balancing Compaction Principles and Practices
Balancing Compaction Principles and PracticesBalancing Compaction Principles and Practices
Balancing Compaction Principles and Practices
ScyllaDB
 
Scylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data Center
Scylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data CenterScylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data Center
Scylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data Center
ScyllaDB
 
TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...
TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...
TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...
ScyllaDB
 
Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...
Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...
Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...
ScyllaDB
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
University of California, Santa Cruz
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
Yoshinori Matsunobu
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
Evan Chan
 
Scylla Summit 2017: Welcome and Keynote - Nextgen NoSQL
Scylla Summit 2017: Welcome and Keynote - Nextgen NoSQLScylla Summit 2017: Welcome and Keynote - Nextgen NoSQL
Scylla Summit 2017: Welcome and Keynote - Nextgen NoSQL
ScyllaDB
 
Scylla Summit 2017: Saving Thousands by Running Scylla on EC2 Spot Instances
Scylla Summit 2017: Saving Thousands by Running Scylla on EC2 Spot InstancesScylla Summit 2017: Saving Thousands by Running Scylla on EC2 Spot Instances
Scylla Summit 2017: Saving Thousands by Running Scylla on EC2 Spot Instances
ScyllaDB
 
Hadoop for sysadmins
Hadoop for sysadminsHadoop for sysadmins
Hadoop for sysadmins
ericwilliammarshall
 
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Dave Anselmi
 
Scylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of ScyllaScylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of Scylla
ScyllaDB
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar stores
Istvan Szukacs
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar stores
Istvan Szukacs
 
Webinar: Using Control Theory to Keep Compactions Under Control
Webinar: Using Control Theory to Keep Compactions Under ControlWebinar: Using Control Theory to Keep Compactions Under Control
Webinar: Using Control Theory to Keep Compactions Under Control
ScyllaDB
 
DB2 V8 - For Developers Only
DB2 V8 -  For Developers OnlyDB2 V8 -  For Developers Only
DB2 V8 - For Developers Only
Craig Mullins
 
Scylla Summit 2017: A Deep Dive on Heat Weighted Load Balancing
Scylla Summit 2017: A Deep Dive on Heat Weighted Load BalancingScylla Summit 2017: A Deep Dive on Heat Weighted Load Balancing
Scylla Summit 2017: A Deep Dive on Heat Weighted Load Balancing
ScyllaDB
 
How Incremental Compaction Reduces Your Storage Footprint
How Incremental Compaction Reduces Your Storage FootprintHow Incremental Compaction Reduces Your Storage Footprint
How Incremental Compaction Reduces Your Storage Footprint
ScyllaDB
 
Scala Days San Francisco
Scala Days San FranciscoScala Days San Francisco
Scala Days San Francisco
Martin Odersky
 

Similar to Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the Wrong Compaction Strategy (20)

Scylla Compaction Strategies
Scylla Compaction StrategiesScylla Compaction Strategies
Scylla Compaction Strategies
 
Balancing Compaction Principles and Practices
Balancing Compaction Principles and PracticesBalancing Compaction Principles and Practices
Balancing Compaction Principles and Practices
 
Scylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data Center
Scylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data CenterScylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data Center
Scylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data Center
 
TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...
TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...
TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...
 
Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...
Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...
Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
Scylla Summit 2017: Welcome and Keynote - Nextgen NoSQL
Scylla Summit 2017: Welcome and Keynote - Nextgen NoSQLScylla Summit 2017: Welcome and Keynote - Nextgen NoSQL
Scylla Summit 2017: Welcome and Keynote - Nextgen NoSQL
 
Scylla Summit 2017: Saving Thousands by Running Scylla on EC2 Spot Instances
Scylla Summit 2017: Saving Thousands by Running Scylla on EC2 Spot InstancesScylla Summit 2017: Saving Thousands by Running Scylla on EC2 Spot Instances
Scylla Summit 2017: Saving Thousands by Running Scylla on EC2 Spot Instances
 
Hadoop for sysadmins
Hadoop for sysadminsHadoop for sysadmins
Hadoop for sysadmins
 
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
 
Scylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of ScyllaScylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of Scylla
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar stores
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar stores
 
Webinar: Using Control Theory to Keep Compactions Under Control
Webinar: Using Control Theory to Keep Compactions Under ControlWebinar: Using Control Theory to Keep Compactions Under Control
Webinar: Using Control Theory to Keep Compactions Under Control
 
DB2 V8 - For Developers Only
DB2 V8 -  For Developers OnlyDB2 V8 -  For Developers Only
DB2 V8 - For Developers Only
 
Scylla Summit 2017: A Deep Dive on Heat Weighted Load Balancing
Scylla Summit 2017: A Deep Dive on Heat Weighted Load BalancingScylla Summit 2017: A Deep Dive on Heat Weighted Load Balancing
Scylla Summit 2017: A Deep Dive on Heat Weighted Load Balancing
 
How Incremental Compaction Reduces Your Storage Footprint
How Incremental Compaction Reduces Your Storage FootprintHow Incremental Compaction Reduces Your Storage Footprint
How Incremental Compaction Reduces Your Storage Footprint
 
Scala Days San Francisco
Scala Days San FranciscoScala Days San Francisco
Scala Days San Francisco
 

More from ScyllaDB

Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...
Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...
Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...
ScyllaDB
 
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
ScyllaDB
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
ScyllaDB
 
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...
ScyllaDB
 
Noise Canceling RUM by Tim Vereecke, Akamai
Noise Canceling RUM by Tim Vereecke, AkamaiNoise Canceling RUM by Tim Vereecke, Akamai
Noise Canceling RUM by Tim Vereecke, Akamai
ScyllaDB
 
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
ScyllaDB
 
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
ScyllaDB
 
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
ScyllaDB
 
Using Libtracecmd to Analyze Your Latency and Performance Troubles
Using Libtracecmd to Analyze Your Latency and Performance TroublesUsing Libtracecmd to Analyze Your Latency and Performance Troubles
Using Libtracecmd to Analyze Your Latency and Performance Troubles
ScyllaDB
 
Reducing P99 Latencies with Generational ZGC
Reducing P99 Latencies with Generational ZGCReducing P99 Latencies with Generational ZGC
Reducing P99 Latencies with Generational ZGC
ScyllaDB
 
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X
ScyllaDB
 
How Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global ScaleHow Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global Scale
ScyllaDB
 
Conquering Load Balancing: Experiences from ScyllaDB Drivers
Conquering Load Balancing: Experiences from ScyllaDB DriversConquering Load Balancing: Experiences from ScyllaDB Drivers
Conquering Load Balancing: Experiences from ScyllaDB Drivers
ScyllaDB
 
Interaction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance MetricInteraction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance Metric
ScyllaDB
 
How to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory ModelHow to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory Model
ScyllaDB
 
99.99% of Your Traces are Trash by Paige Cruz
99.99% of Your Traces are Trash by Paige Cruz99.99% of Your Traces are Trash by Paige Cruz
99.99% of Your Traces are Trash by Paige Cruz
ScyllaDB
 
Square's Lessons Learned from Implementing a Key-Value Store with Raft
Square's Lessons Learned from Implementing a Key-Value Store with RaftSquare's Lessons Learned from Implementing a Key-Value Store with Raft
Square's Lessons Learned from Implementing a Key-Value Store with Raft
ScyllaDB
 
Making Python 100x Faster with Less Than 100 Lines of Rust
Making Python 100x Faster with Less Than 100 Lines of RustMaking Python 100x Faster with Less Than 100 Lines of Rust
Making Python 100x Faster with Less Than 100 Lines of Rust
ScyllaDB
 
A Deep Dive Into Concurrent React by Matheus Albuquerque
A Deep Dive Into Concurrent React by Matheus AlbuquerqueA Deep Dive Into Concurrent React by Matheus Albuquerque
A Deep Dive Into Concurrent React by Matheus Albuquerque
ScyllaDB
 
The Latency Stack: Discovering Surprising Sources of Latency
The Latency Stack: Discovering Surprising Sources of LatencyThe Latency Stack: Discovering Surprising Sources of Latency
The Latency Stack: Discovering Surprising Sources of Latency
ScyllaDB
 

More from ScyllaDB (20)

Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...
Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...
Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...
 
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
 
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...
 
Noise Canceling RUM by Tim Vereecke, Akamai
Noise Canceling RUM by Tim Vereecke, AkamaiNoise Canceling RUM by Tim Vereecke, Akamai
Noise Canceling RUM by Tim Vereecke, Akamai
 
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
 
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
 
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
 
Using Libtracecmd to Analyze Your Latency and Performance Troubles
Using Libtracecmd to Analyze Your Latency and Performance TroublesUsing Libtracecmd to Analyze Your Latency and Performance Troubles
Using Libtracecmd to Analyze Your Latency and Performance Troubles
 
Reducing P99 Latencies with Generational ZGC
Reducing P99 Latencies with Generational ZGCReducing P99 Latencies with Generational ZGC
Reducing P99 Latencies with Generational ZGC
 
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X
 
How Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global ScaleHow Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global Scale
 
Conquering Load Balancing: Experiences from ScyllaDB Drivers
Conquering Load Balancing: Experiences from ScyllaDB DriversConquering Load Balancing: Experiences from ScyllaDB Drivers
Conquering Load Balancing: Experiences from ScyllaDB Drivers
 
Interaction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance MetricInteraction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance Metric
 
How to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory ModelHow to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory Model
 
99.99% of Your Traces are Trash by Paige Cruz
99.99% of Your Traces are Trash by Paige Cruz99.99% of Your Traces are Trash by Paige Cruz
99.99% of Your Traces are Trash by Paige Cruz
 
Square's Lessons Learned from Implementing a Key-Value Store with Raft
Square's Lessons Learned from Implementing a Key-Value Store with RaftSquare's Lessons Learned from Implementing a Key-Value Store with Raft
Square's Lessons Learned from Implementing a Key-Value Store with Raft
 
Making Python 100x Faster with Less Than 100 Lines of Rust
Making Python 100x Faster with Less Than 100 Lines of RustMaking Python 100x Faster with Less Than 100 Lines of Rust
Making Python 100x Faster with Less Than 100 Lines of Rust
 
A Deep Dive Into Concurrent React by Matheus Albuquerque
A Deep Dive Into Concurrent React by Matheus AlbuquerqueA Deep Dive Into Concurrent React by Matheus Albuquerque
A Deep Dive Into Concurrent React by Matheus Albuquerque
 
The Latency Stack: Discovering Surprising Sources of Latency
The Latency Stack: Discovering Surprising Sources of LatencyThe Latency Stack: Discovering Surprising Sources of Latency
The Latency Stack: Discovering Surprising Sources of Latency
 

Recently uploaded

Dev Dives: Mining your data with AI-powered Continuous Discovery
Dev Dives: Mining your data with AI-powered Continuous DiscoveryDev Dives: Mining your data with AI-powered Continuous Discovery
Dev Dives: Mining your data with AI-powered Continuous Discovery
UiPathCommunity
 
Chapter 3 - Static Testing (Review) V4.0
Chapter 3 - Static Testing (Review) V4.0Chapter 3 - Static Testing (Review) V4.0
Chapter 3 - Static Testing (Review) V4.0
Neeraj Kumar Singh
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Stephanie Beckett
 
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
shanthidl1
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
BookNet Canada
 
MYIR Product Brochure - A Global Provider of Embedded SOMs & Solutions
MYIR Product Brochure - A Global Provider of Embedded SOMs & SolutionsMYIR Product Brochure - A Global Provider of Embedded SOMs & Solutions
MYIR Product Brochure - A Global Provider of Embedded SOMs & Solutions
Linda Zhang
 
Summer24-ReleaseOverviewDeck - Stephen Stanley 27 June 2024.pdf
Summer24-ReleaseOverviewDeck - Stephen Stanley 27 June 2024.pdfSummer24-ReleaseOverviewDeck - Stephen Stanley 27 June 2024.pdf
Summer24-ReleaseOverviewDeck - Stephen Stanley 27 June 2024.pdf
Anna Loughnan Colquhoun
 
Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024
Prasta Maha
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
ThousandEyes
 
New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024
ThousandEyes
 
Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0
Neeraj Kumar Singh
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
 
Database Management Myths for Developers
Database Management Myths for DevelopersDatabase Management Myths for Developers
Database Management Myths for Developers
John Sterrett
 
Blockchain and Cyber Defense Strategies in new genre times
Blockchain and Cyber Defense Strategies in new genre timesBlockchain and Cyber Defense Strategies in new genre times
Blockchain and Cyber Defense Strategies in new genre times
anupriti
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
Mark Billinghurst
 
Metadata Lakes for Next-Gen AI/ML - Datastrato
Metadata Lakes for Next-Gen AI/ML - DatastratoMetadata Lakes for Next-Gen AI/ML - Datastrato
Metadata Lakes for Next-Gen AI/ML - Datastrato
Zilliz
 
Supercomputing from the Desktop Workstation
Supercomputingfrom the Desktop WorkstationSupercomputingfrom the Desktop Workstation
Supercomputing from the Desktop Workstation
Larry Smarr
 
ASIMOV: Enterprise RAG at Dialog Axiata PLC
ASIMOV: Enterprise RAG at Dialog Axiata PLCASIMOV: Enterprise RAG at Dialog Axiata PLC
ASIMOV: Enterprise RAG at Dialog Axiata PLC
Zilliz
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
Vijayananda Mohire
 
Data Protection in a Connected World: Sovereignty and Cyber Security
Data Protection in a Connected World: Sovereignty and Cyber SecurityData Protection in a Connected World: Sovereignty and Cyber Security
Data Protection in a Connected World: Sovereignty and Cyber Security
anupriti
 

Recently uploaded (20)

Dev Dives: Mining your data with AI-powered Continuous Discovery
Dev Dives: Mining your data with AI-powered Continuous DiscoveryDev Dives: Mining your data with AI-powered Continuous Discovery
Dev Dives: Mining your data with AI-powered Continuous Discovery
 
Chapter 3 - Static Testing (Review) V4.0
Chapter 3 - Static Testing (Review) V4.0Chapter 3 - Static Testing (Review) V4.0
Chapter 3 - Static Testing (Review) V4.0
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
 
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
 
MYIR Product Brochure - A Global Provider of Embedded SOMs & Solutions
MYIR Product Brochure - A Global Provider of Embedded SOMs & SolutionsMYIR Product Brochure - A Global Provider of Embedded SOMs & Solutions
MYIR Product Brochure - A Global Provider of Embedded SOMs & Solutions
 
Summer24-ReleaseOverviewDeck - Stephen Stanley 27 June 2024.pdf
Summer24-ReleaseOverviewDeck - Stephen Stanley 27 June 2024.pdfSummer24-ReleaseOverviewDeck - Stephen Stanley 27 June 2024.pdf
Summer24-ReleaseOverviewDeck - Stephen Stanley 27 June 2024.pdf
 
Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
 
New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024
 
Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
 
Database Management Myths for Developers
Database Management Myths for DevelopersDatabase Management Myths for Developers
Database Management Myths for Developers
 
Blockchain and Cyber Defense Strategies in new genre times
Blockchain and Cyber Defense Strategies in new genre timesBlockchain and Cyber Defense Strategies in new genre times
Blockchain and Cyber Defense Strategies in new genre times
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
 
Metadata Lakes for Next-Gen AI/ML - Datastrato
Metadata Lakes for Next-Gen AI/ML - DatastratoMetadata Lakes for Next-Gen AI/ML - Datastrato
Metadata Lakes for Next-Gen AI/ML - Datastrato
 
Supercomputing from the Desktop Workstation
Supercomputingfrom the Desktop WorkstationSupercomputingfrom the Desktop Workstation
Supercomputing from the Desktop Workstation
 
ASIMOV: Enterprise RAG at Dialog Axiata PLC
ASIMOV: Enterprise RAG at Dialog Axiata PLCASIMOV: Enterprise RAG at Dialog Axiata PLC
ASIMOV: Enterprise RAG at Dialog Axiata PLC
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
 
Data Protection in a Connected World: Sovereignty and Cyber Security
Data Protection in a Connected World: Sovereignty and Cyber SecurityData Protection in a Connected World: Sovereignty and Cyber Security
Data Protection in a Connected World: Sovereignty and Cyber Security
 

Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the Wrong Compaction Strategy

  • 1. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company SCYLLA’S COMPACTION STRATEGIES OR HOW TO RUIN YOUR WORKLOAD'S PERFORMANCE BY CHOOSING THE WRONG COMPACTION STRATEGY Nadav Har’El, Raphael Carvalho
  • 2. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Nadav Har’El 2 Nadav Har’El has had a diverse 20-year career in computer programming and computer science. In the past he worked on scientific computing, networking software, and information retrieval. In recent years his focus has been on virtualization and operating systems. He also worked on nested virtualization and exit-less I/O in KVM. Today, he maintains the OSv kernel and also works on Seastar and Scylla.
  • 3. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Raphael Carvalho 3 Raphael S. Carvalho is a computer programmer who loves file systems and has developed a huge interest in distributed systems since he started working on Scylla. Previously, he worked on ZFS support for OSv and also drivers for the Syslinux project. At ScyllaDB, Raphael has been mostly working on compaction and compaction strategies.
  • 4. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Agenda ▪ What is compaction? ▪ Scylla’s compaction strategies: o Size Tier o Leveled o Hybrid o Date Tier o Time Window ▪ Which should I use for my workload and why? ▪ Examples! 4
  • 5. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company What is compaction? Scylla’s write path: 5 Writes commit log compaction
  • 6. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company (What is compaction?) ▪ Scylla’s write path: o Updates are inserted into a memory table (“memtable”) o Memtables are periodically flushed to a new sorted file (“sstable”) ▪ After a while, we have many separate sstables o Different sstables may contain old and new values of the same cell o Or different rows in the same partition o Wastes disk space o Slows down reads ▪ Compaction: read several sstables and output one (or more) containing the merged and most recent information 6
  • 7. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company What is compaction? (cont.) ▪ This technique of keeping sorted files and merging them is well-known and often called Log-Structured Merge (LSM) Tree ▪ Published in 1996, earliest popular application that I know of is the Lucene search engine, 1999 o High performance write. o Immediately readable. o Reasonable performance for read. 7
  • 8. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company (Compaction efficiency requirements) ▪ Sstable merge is efficient o Merging sorted sstables efficient, and contiguous I/O for read and write ▪ Background compaction does not increase request tail-latency o Scylla breaks compaction work into small pieces ▪ Background compaction does not fluctuate request throughput o “Workload conditioning”: compaction done not faster than needed 8
  • 9. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Compaction Strategy ▪ Which sstables to compact, and when? ▪ This is called the compaction strategy ▪ The goal of the strategy is low amplification: o Avoid read requests needing many sstables. • read amplification o Avoid overwritten/deleted/expired data staying on disk. o Avoid excessive temporary disk space needs (scary!) • space amplification o Avoid compacting the same data again and again. • write amplification 9 Which compaction strategy shall I choose?
  • 10. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Strategy #1: Size-Tiered Compaction ▪ Cassandra’s oldest, and still default, compaction strategy ▪ Dates back to Google’s BigTable paper (2006) o Idea used even earlier (e.g., Lucene, 1999) 10
  • 11. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Size-Tiered compaction strategy 11
  • 12. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company (Size-Tiered compaction strategy) ▪ Each time when enough data is in the memory table, flush it to a small sstable ▪ When several small sstables exist, compact them into one bigger sstable ▪ When several bigger sstables exist, compact them into one very big sstable ▪ … ▪ Each time one “size tier” has enough sstables, compact them into one sstable in the (usually) next size tier 12
  • 13. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Size-Tiered compaction - amplification ▪ write amplification: O(logN) o Where “N” is (data size) / (flushed sstable size). o Most data is in highest tier - needed to pass through O(logN) tiers o This is asymptotically optimal 13
  • 14. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Size-Tiered compaction - amplification What is read amplification? O(logN) sstables, but: ▪ If workload writes a partition once and never modifies it: o Eventually each partition’s data will be compacted into one sstable o In-memory bloom filter will usually allow reading only one sstable o Optimal ▪ But if workload continues to update a partition: o All sstables will contain updates to the same partition o O(logN) reads per read request o Reasonable, but not great 14
  • 15. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Size-Tiered compaction - amplification ▪ Space amplification 15
  • 16. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Size-Tiered compaction - amplification ▪ Space amplification: o Obsolete data in a huge sstable will remain for a very long time o Compaction needs a lot of temporary space: • Worst-case, needs to merge all existing sstables into one and may need half the disk to be empty for the merged result. (2x) • Less of a problem in Scylla than Cassandra because of sharding o When workload is overwrite-intensive, it is even worse: • We wait until 4 large sstables • All 4 overwrote the same data, so merged amount is same as in 1 sstable • 5-fold space amplification! • Or worse - if compaction is behind, there will be the same data in several tiers and have unequal sizes 16
  • 17. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Strategy #2: Leveled Compaction ▪ Introduced in Cassandra 1.0, in 2011. ▪ Based on Google’s LevelDB (itself based on Google’s BigTable) ▪ No longer has size-tiered’s huge sstables ▪ Instead have runs: o A run is a collection of small (160 MB by default) SSTables o Have non-overlapping key ranges o A huge SSTable must be rewritten as a whole, but in a run we can modify only parts of it (individual sstables) while keeping the disjoint key requirement ▪ In leveled compaction strategy: 17
  • 18. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Leveled compaction strategy 18 Level 0 Level 1 (run of 10 sstables) Level 2 (run of 100 sstables) ...
  • 19. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company (Leveled compaction strategy) ▪ SSTables are divided into “levels”: o New SSTables (dumped from memtables) are created in Level 0 o Each other level is a run of SSTables of exponentially increasing size: • Level 1 is a run of 10 SSTables (of 160 MB each) • Level 2 is a run of 100 SSTables (of 160 MB each) • etc. ▪ When we have enough (e.g., 4) sstables in Level 0, we compact them with all 10 sstables in Level 1 o We don't create one large sstable - rather, a run: we write one sstable and when we reach the size limit (160 MB), we start a new sstable 19
  • 20. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company (Leveled compaction strategy) ▪ After the compaction of level 0 into level 1, level 1 may have more than 10 of sstables. We pick one and compact it into level 2: o Take one sstable from level 1 o Look at its key range and find all sstables in level 2 which overlap with it o Typically, there are about 12 of these • The level 1 sstable spans roughly 1/10th of the keys, while each level 2 sstable spans 1/100th of the keys; so a level-1 sstable’s range roughly overlaps 10 level-2 sstables plus two more on the edges o As before, we compact the one sstable from level 1 and the 12 sstables from level 2 and replace all of those with new sstables in level 2 20
  • 21. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company (Leveled compaction strategy) ▪ After this compaction of level 1 into level 2, now we can have excess sstables in level 2 so we merge them into level 3. Again, one sstable from level 2 will need to be compacted with around 10 sstables from level 3. 21
  • 22. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Leveled compaction - amplification ▪ Space amplification: o Because of sstable counts, 90% of the data is in the deepest level (if full!) o These sstables do not overlap, so it can’t have duplicate data! o So at most, 10% of the space is wasted o Also, each compaction needs a constant (~12*160MB) temporary space o Nearly optimal 22
  • 23. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Leveled compaction - amplification ▪ Read amplification: o We have O(N) tables! o But in each level sstables have disjoint ranges (cached in memory) o Worst-case, O(logN) sstables relevant to a partition - plus L0 size. o Under some assumptions (update complete rows, of similar sizes) space amplification implies: 90% of the reads will need just one sstable! o Nearly optimal 23
  • 24. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Leveled compaction - amplification ▪ Write amplification: 24
  • 25. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Leveled compaction - amplification ▪ Write amplification: o Again, most of the data is in the deepest level k • E.g., k=3 is enough for 160 GB of data (per shard!) • All data was written once in L0, then compacted into L1, … then to Lk • So each row written k+1 times o For each input (level i>1) sstable we compact, we compact it with ~12 overlapping sstables in level i+1. Writing ~13 output sstables. (lower for L0) o Worst-case, write amplification is around 13k o Also O(logN) but higher constant factor than size-tiered... o If enough writing and LCS can’t keep up, its read and space advantages are lost o If also have cache-miss reads, they will get less disk bandwidth 25
  • 26. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 1 - write-only workload ▪ Write-only workload o Cassandra-stress writing 30 million partitions (about 9 GB of data) o Constant write rate 10,000 writes/second o One shard 26
  • 27. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 1 - write-only workload ▪ Size-tiered compaction: at some points needs twice the disk space o In Scylla with many shards, “usually” maximum space use is not concurrent ▪ Level-tiered compaction: more than double the amount of disk I/O o Test used smaller-than default sstables (10 MB) to illustrate the problem o Same problem with default sstable size (160 MB) - with larger workloads 27
  • 28. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 1 (space amplification) constant multiple of flushed memtable & sstable size 28 x2 space amplification
  • 29. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 1 (write amplification) ▪ Amount of actual data collected: 8.8 GB ▪ Size-tiered compaction: 50 GB writes (4 tiers + commit log) ▪ Leveled compaction: 111 GB writes 29
  • 30. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 1 - note ▪ Leveled compactions write amplification is not only a problem with 100% write... ▪ Can have just 10% writes and an amplified write workload so high that o Uncached reads slowed down because we need the disk to write o Compaction can’t keep up, uncompacted sstables pile up, even slower reads ▪ Leveled compaction is unsuitable for many workloads with a non-negligible amount of writes even if they seem “read mostly” 30
  • 31. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Can we create a new compaction strategy with ▪ Low write amplification of size-tiered compaction ▪ Without its high temporary disk space usage during compaction? 31
  • 32. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Strategy #3: Hybrid Compaction ▪ New in upcoming version of Scylla Enterprise ▪ Hybrid of Size-Tiered and Leveled strategies: 32
  • 33. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Strategy #3: Hybrid Compaction ▪ Size-tiered compaction needs temporary space because we only remove a huge sstable after we fully compact it. ▪ Let’s split each huge sstable into a run (a la LCS) of “fragments”: o Treat the entire run (not individual sstables) as a file for STCS o Remove individual sstables as compacted. Low temporary space. 33
  • 34. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Strategy #3: Hybrid Compaction ▪ Solve 4x worst-case in overwrite workloads with other techniques: o Compact fewer sstables if disk is getting full • Not a risk because small temporary disk needs o Compact fewer sstables if they have large overlaps 34
  • 35. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Hybrid compaction - amplification ▪ Space amplification: o Small constant temporary space needs, even smaller than LCS (M*S per parallel compaction, e.g., M=4, S=160 MB) o Overwrite-mostly still a worst-case, but 2-fold instead of 5-fold o Optimal. ▪ Write amplification: o O(logN), small constant — same as Size-Tiered compaction ▪ Read amplification: o Like Size-Tiered, at worst O(logN) if updating the same partitions 35
  • 36. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 1, with Hybrid compaction strategy 36
  • 37. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 2 - overwrite workload ▪ Write 15 times the same 4 million partitions o cassandra-stress write n=4000000 -pop seq=1..4000000 -schema "replication(strategy=org.apache.cassandra.locator.SimpleStrategy,factor=1)" o In this test cassandra-stress not rate limited o Again, small (10MB) LCS tables ▪ Necessary amount of sstable data: 1.2 GB ▪ STCS space amplification: x7.7 ! ▪ LCS space amplification lower, constant multiple of sstable size ▪ Hybrid will be around x2 37
  • 38. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 2 38 x7.7 space amplification
  • 39. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 3 - read+updates workload ▪ When workloads are read-mostly, read amplification is important ▪ When workloads also have updates to existing partitions o With STCS, each partition ends up in multiple sstables o Read amplification ▪ An example to simulate this: o Do a write-only update workload • cassandra_stress write n=4,000,000 -pop seq=1..1,000,000 o Now run a read-only workload • cassandra_stress read n=1,000,000 -pop seq=1..1,000,000 • measure avg. number of read bytes per request 39
  • 40. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 3 - read+updates workload ▪ Size-tiered: 46,915 bytes read per request o Optimal after major compaction - 11,979 ▪ Leveled: 11,982 o Equal to optimal because in this case all sstables fit in L1... ▪ Increasing the number of partitions 8-fold: o Size-tiered: 29,794 luckier this time o Leveled: 16,713 unlucky (0.5 of data, not 0.9, in L2) ▪ BUT: Remember that if we have non-negligable amount of writes, LCS write amplification may slow down reads 40
  • 41. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 3, and major compaction ▪ We saw that size-tiered major compaction reduces read amplification ▪ It also reduces space amplification (expired/overwritten data) ▪ Major compaction only makes sense if very few writes o But in that case, LCS’s write amplification is not a problem! o So LCS is recommended instead of major compaction • Easier to use • No huge operations like major compaction (need to find when to run) • No 50%-free-disk worst-case requirement • Good read amplification and space amplification 41
  • 42. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Why major compaction? Is it suboptimal? (from STCS perspective) ▪ STCS is quite inefficient / slow at getting rid of obsolete data (droppable tombstone, shadowed data). o For droppable tombstone, there’s tombstone compaction. Suboptimal though. o For shadowed (overwritten) data, there’s nothing to do. Just wait for data and obsolete data to be compacted together after reaching same tier. 42
  • 43. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Tombstone compaction ▪ Triggered when standard compaction has nothing to do ▪ Tombstone compaction selects sstable with a percentage of droppable tombstone higher than N% and hopes space will be released. ▪ That’s suboptimal though… ▪ Tombstone cannot be purged unless it’s compacted with data it deletes/shadows. ▪ CASSANDRA-7019 suggests improving the feature by compacting a sstable with older overlapping sstables. That will be inefficient with STCS though. What can we do instead? 43
  • 44. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Making improved tombstone compaction efficient with hybrid ▪ Hybrid can choose a fragment from high tiers and compact it with all overlapping fragments from sstable runs of same tier or above. ▪ All sstable run(s) involved will have their (often only one) fragment replaced by another with: (LIVE DATA) – (SHADOWED DATA) – (DROPPABLE TOMBSTONES) ▪ Temporary space requirement of N * fragment size, N = number of fragments involved ▪ Make it optional for regular scenarios but use it if running out of disk space. 44
  • 45. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Hybrid tombstone compaction - Example 45 FRAGMENTS SSTABLE RUNS CHOOSE A SSTABLE RUN FRAGMENT WITH N% OF DROPPABLE TOMBSTONES
  • 46. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Hybrid tombstone compaction - Example FRAGMENTS SSTABLE RUNS INCLUDE *OLDER* FRAGMENT(S) THAT OVERLAP WITH THE ONE PREVIOUSLY CHOSEN
  • 47. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Hybrid tombstone compaction - Example FRAGMENTS SSTABLE RUNS REPLACE FRAGMENTS BY ONES WITHOUT SHADOWED DATA AND DROPPABLE TOMBSTONES
  • 48. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Making hybrid take action when lots of duplicate data waste disk space ▪ Compact fewer tables of same tier if they contain lots of duplicate data. Affects only overwrite intensive workloads. ▪ Cardinality information may help us estimating duplication between tables. Work only at partition level though… ▪ Nadav came up with idea of doing a compaction sample to help with estimation at clustering level. Works due to murmur tokenizer. ▪ At worst case (running out of space), Hybrid can afford to compact biggest tiers together to get rid of all obsolete data with low temporary space requirement.
  • 49. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Conclusion on this hybrid strategy topic ▪ Goal is to have hybrid do the cleanup job itself rather than relying on sysadmin to run manual (major compaction) at an interval. ▪ Hybrid can take smart decisions due to its nature; non-aggressive, incremental steps towards improving space amplification without hurting system performance like major does. ▪ Trying to bring best of both worlds.
  • 50. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Strategy #4: Time-Window Compaction ▪ Introduced in Cassandra 3.0.8, designed for time-series data ▪ Replaces Date-Tiered compaction strategy of Cassandra 2.1 (which is also supported by Scylla, but not recommended) 50
  • 51. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Time-Window compaction strategy (cont.) In a time-series use case: ▪ Clustering key and write time are correlated ▪ Data is added in time order. Only few out-of-order writes, typically rearranged by just a few seconds ▪ Data is only deleted through expiration (TTL) or by deleting an entire partition, usually the same TTL on all the data ▪ The rate at which data is written is nearly constant ▪ A query is a clustering-key range query on a given partition Most common query: "values from the last hour/day/week" 51
  • 52. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Time-Window compaction strategy (cont.) ▪ Scylla remembers in memory the minimum and maximum clustering key in each newly-flushed sstable o Efficiently find only the sstables with data relevant to a query ▪ Other compaction strategies o Destroy this feature by merging “old” and “new” sstables o Move all rows of a partition to the same sstable… • But time series queries don’t need all rows of a partition, just rows in a given time range • Makes it impossible to expire old sstable’s when everything in them has expired • Read and write amplification (needless compactions) 52
  • 53. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Time-Window compaction strategy (cont.) So TWCS: ▪ Divides time into “time windows” o E.g., if typical query asks for 1 day of data, choose a time window of 1 day ▪ Divide sstables into time buckets, according to time window ▪ Compact using Size-Tiered strategy inside each time bucket o If the 2-day old window has just one big sstable and a repair creates an additional tiny “old” sstable, the two will not get compacted o A tradeoff: slows read but avoids the write amplification problem of DTCS ▪ When time bucket exits the current window, do a major compaction o Except for small repair-produced sstables, we get 1 sstable per time window 53
  • 54. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Summary 54 Workload Size-Tiered Leveled Hybrid Time-Window Write-only 2x peak space 2x writes Best - Overwrite Huge peak space write amplification high peak space, but not like size-tiered - Read-mostly, few updates read amplification Best read amplification - Read-mostly, but a lot of updates read and space amplification write amplification may overwhelm read amplification - Time series write, read, and space ampl. write and space amplification write and read amplification Best
  • 55. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company THANK YOU nyh@scylladb.com Please stay in touch Any questions?