SlideShare a Scribd company logo
Our journey into
Cassandra performance
optimisation
Code Motion 2019
April 2nd
Agenda
2
1. About us
2. Our Cassandra experience
3. The symptoms
4. The issue
5. What we learned
6. Q&A
About us
3
whoami
4
Andreea Marin
Senior Software Engineer
Who are we and what do we do?
5
● Amsterdam based tech company
● Personalising communication between
brands and consumers
● 25+ nationalities, 36% women
How do we do it?
6
● 35+ engineers
● 100s of ec2 instances
● 15 microservices
● 30k events per second
● 21 Cassandra instances
● ...and growing
●
Our Cassandra
experience
7
Our Cassandra history
8
v 0.7
Started by using this
version
2011
v 0.6
2010
v 1.2
Updated to this version
2013
v 3.0
2015
Started new service
using v 3.0
2017
Why Cassandra?
9
• Huge amount of data (34 TiB) that we
can easily identify
• Data we use is ephemeral
• Easy to remove by setting the TTL
• Challenge: following the data that
is removed automatically and
keeping the personalization up to
date
The neighbourhood supermarket
Change the footer text in Insert/Header & Footer10
•
Product validity
11
Shop assistant
category product ttl
dairy cheese 22 days
dairy milk 2 days
pastry croissant 1 day
Stock manager
expiration product
3.04.2019 croissant
4.04.2019 milk
24.04.2019 cheese
Product validity
12
Shop assistant
category product ttl
dairy cheese 22 days
dairy milk 2 days
pastry croissant 1 day
Stock manager
expiration product
3.04.2019 croissant
4.04.2019 milk
24.04.2019 cheese
Product validity
13
Shop assistant
category product ttl
dairy cheese 22 days
dairy milk 2 days
pastry croissant 1 day
Stock manager
expiration product
3.04.2019 croissant
4.04.2019 milk
24.04.2019 cheese
Concepts
14
• Expiration time of a column
• An expired column is also a form of delete:
• So it results in a tombstone
• Cassandra treats everything as a write operation
• Appear after each delete, record TTL expired or insertion of a
null value.
TTL
Tombstones
The symptoms
15
Just the usual
checks
16
How we noticed the issues
17
● Increased latency
● Heap pressure
● Cassandra errors
Increased write latency
18
Increased read latency
19
Increased Garbage Collection
20
The issue
21
Primary Key
Organizing the data
22
Partition Key Product description
CategoryShelf
Clustering Key
Expiration
(UUID)
cqlsh:category_keyspace> select shelf, category, expiration, totimestamp(expiration), description from
products_table;
shelf | category | expiration | system.totimestamp(expiration) | description
---------+---------+--------------------------------------+---------------------------------+-------------
23 | dairy | 7375bfc0-eb39-11e8-f40f-2d43549360df | 2018-11-18 13:54:21.756000+0000 | milk
24 | dairy | 34fce5a0-eb45-11e8-32f7-13e820421fb1 | 2018-11-18 15:18:30.906000+0000 | cheese
15 | fruit | 7774bac0-fbf8-11e8-c934-37a2220ce03b | 2018-12-09 21:22:00.940000+0000 | apple
15 | fruit | 77761a50-fbf8-11e8-9c1b-a4ecf015f7e1 | 2018-12-09 21:22:00.949000+0000 | pear
18 | veg | 77761a50-fbf8-11e8-d293-0bf392d5cbf1 | 2018-12-09 21:22:00.949000+0000 | tomato
Organizing the data
23
cqlsh:category_keyspace> delete * from products_table where
expiration < 77761a50-fbf8-11e8-d293-0bf392d5cbf1;
Know your queries
24
The problem
cqlsh:category_keyspace> delete * from products_table where
expiration < 7774bac0-fbf8-11e8-c934-37a2220ce03b;
cqlsh:category_keyspace> delete * from products_table where
expiration < 77761a50-fbf8-11e8-d293-0bf392d5cbf1;
Know your queries
25
What if?
cqlsh:category_keyspace> delete * from products_table where
expiration >= 7774bac0-fbf8-11e8-c934-37a2220ce03b and
expiration < 77761a50-fbf8-11e8-d293-0bf392d5cbf1;
Know your queries
26
Initial fix
Primary Key
Re-organizing the data
27
Partition Key Product description
CategoryShelf
Clustering Key
Expiration
(UUID)
Timestamp
Long term fix
What we
learned
28
Trial and error
29
● Manual compaction of data
● Modifying the gc_grace_seconds to hurry
the removal of the tombstones
● Scheduled repairs
TAKEAWAYS
30
• It’s never only the code or only the
configuration
• Always take into consideration the data
growth
• Be one step ahead
QUESTIONS?
31
Thank you
32

More Related Content

Andreea Marin - Our journey into Cassandra performance optimisation -

  • 1. Our journey into Cassandra performance optimisation Code Motion 2019 April 2nd
  • 2. Agenda 2 1. About us 2. Our Cassandra experience 3. The symptoms 4. The issue 5. What we learned 6. Q&A
  • 5. Who are we and what do we do? 5 ● Amsterdam based tech company ● Personalising communication between brands and consumers ● 25+ nationalities, 36% women
  • 6. How do we do it? 6 ● 35+ engineers ● 100s of ec2 instances ● 15 microservices ● 30k events per second ● 21 Cassandra instances ● ...and growing ●
  • 8. Our Cassandra history 8 v 0.7 Started by using this version 2011 v 0.6 2010 v 1.2 Updated to this version 2013 v 3.0 2015 Started new service using v 3.0 2017
  • 9. Why Cassandra? 9 • Huge amount of data (34 TiB) that we can easily identify • Data we use is ephemeral • Easy to remove by setting the TTL • Challenge: following the data that is removed automatically and keeping the personalization up to date
  • 10. The neighbourhood supermarket Change the footer text in Insert/Header & Footer10 •
  • 11. Product validity 11 Shop assistant category product ttl dairy cheese 22 days dairy milk 2 days pastry croissant 1 day Stock manager expiration product 3.04.2019 croissant 4.04.2019 milk 24.04.2019 cheese
  • 12. Product validity 12 Shop assistant category product ttl dairy cheese 22 days dairy milk 2 days pastry croissant 1 day Stock manager expiration product 3.04.2019 croissant 4.04.2019 milk 24.04.2019 cheese
  • 13. Product validity 13 Shop assistant category product ttl dairy cheese 22 days dairy milk 2 days pastry croissant 1 day Stock manager expiration product 3.04.2019 croissant 4.04.2019 milk 24.04.2019 cheese
  • 14. Concepts 14 • Expiration time of a column • An expired column is also a form of delete: • So it results in a tombstone • Cassandra treats everything as a write operation • Appear after each delete, record TTL expired or insertion of a null value. TTL Tombstones
  • 17. How we noticed the issues 17 ● Increased latency ● Heap pressure ● Cassandra errors
  • 22. Primary Key Organizing the data 22 Partition Key Product description CategoryShelf Clustering Key Expiration (UUID)
  • 23. cqlsh:category_keyspace> select shelf, category, expiration, totimestamp(expiration), description from products_table; shelf | category | expiration | system.totimestamp(expiration) | description ---------+---------+--------------------------------------+---------------------------------+------------- 23 | dairy | 7375bfc0-eb39-11e8-f40f-2d43549360df | 2018-11-18 13:54:21.756000+0000 | milk 24 | dairy | 34fce5a0-eb45-11e8-32f7-13e820421fb1 | 2018-11-18 15:18:30.906000+0000 | cheese 15 | fruit | 7774bac0-fbf8-11e8-c934-37a2220ce03b | 2018-12-09 21:22:00.940000+0000 | apple 15 | fruit | 77761a50-fbf8-11e8-9c1b-a4ecf015f7e1 | 2018-12-09 21:22:00.949000+0000 | pear 18 | veg | 77761a50-fbf8-11e8-d293-0bf392d5cbf1 | 2018-12-09 21:22:00.949000+0000 | tomato Organizing the data 23
  • 24. cqlsh:category_keyspace> delete * from products_table where expiration < 77761a50-fbf8-11e8-d293-0bf392d5cbf1; Know your queries 24 The problem
  • 25. cqlsh:category_keyspace> delete * from products_table where expiration < 7774bac0-fbf8-11e8-c934-37a2220ce03b; cqlsh:category_keyspace> delete * from products_table where expiration < 77761a50-fbf8-11e8-d293-0bf392d5cbf1; Know your queries 25 What if?
  • 26. cqlsh:category_keyspace> delete * from products_table where expiration >= 7774bac0-fbf8-11e8-c934-37a2220ce03b and expiration < 77761a50-fbf8-11e8-d293-0bf392d5cbf1; Know your queries 26 Initial fix
  • 27. Primary Key Re-organizing the data 27 Partition Key Product description CategoryShelf Clustering Key Expiration (UUID) Timestamp Long term fix
  • 29. Trial and error 29 ● Manual compaction of data ● Modifying the gc_grace_seconds to hurry the removal of the tombstones ● Scheduled repairs
  • 30. TAKEAWAYS 30 • It’s never only the code or only the configuration • Always take into consideration the data growth • Be one step ahead