ScyllaDB Topology on Raft: An Inside Look
- 3. ■ Raft recap
■ ScyllaDB path to consistency:
■ Schema
■ Topology
■ Manageability
Presentation Agenda
- 6. Strong vs Eventual Consistency
Strong consistency
Node 1 Node 2
1. Write from
client
4. Acknowledged
to client
2. Write propagated
through cluster
3.Internal
acknowledgement
Eventual consistency
Node 1 Node 2
1. Write from
client
2. Acknowledged
to client
3. Eventual write
propagation
● requires a live majority
● always returns latest write
● highly available
● writes must commute
- 7. Data vs metadata
- metadata - data
Schema information: table,
view, type definitions
Topology information:
nodes, tokens
Static and regular rows,
counters
Replicated everywhere Partitioned
Not commutative Commutative
Changes rarely Changes frequently
Consistency of Metadata
1
2 3
3
1 2
replication_factor=2
ScyllaDB cluster
- 8. Elements of the Raft State
Topology
9
Schema
keyspaces
Backward
compatibility
topology peers
cdc_generations
columns
tables
tablets scylla_local
local
topology_requests
auth
5.2
5.2
5.2
6.0
6.0
6.0
6.0
6.0 3.0
3.0
3.0
service_levels
6.0
- 9. ■ Runs alongside Raft leader
■ Highly available
■ Drives the progress
■ Performs linearizable reads and writes of the topology
■ Request coordinators still use the local view on topology
■ No extra coordination when executing user requests
The Centralized Topology Coordinator
- 13. Dedicated commit log on shard 0
No need to FLUSH entire schema after changing it
10x less IO with large schemas!
shard 6 shard 7 shard 8
shard 3 shard 5 shard 5
shard 0 shard 1 shard 2
Node 1
shard 6 shard 7 shard 8
shard 3 shard 5 shard 5
shard 0 shard 1 shard 2
Node 2
Schema
commit log
Schema
commit log
- 14. Linearizable schema version
No re-hash of the entire schema on change
10x less CPU with large schemas.
TimeUUID-based Schema version
Hash-based schema version
5.x: 6.x:
- 15. Authentication and service levels on Raft
ScyllaDB 5.x Manual:
Set the system_auth keyspace replication factor to the number of nodes in the datacenter.
For production environments use only NetworkTopologyStrategy.
ScyllaDB 6.x:
■ Automatically replicated on every node
■ Linearizable with CREATE/DROP
■ No denial of service if a node is down
- 18. CDC generations on Raft
■ Quick & reliable propagation of CDC data at boot
■ The topology coordinator is responsible for changing the ring
■ Prerequisite for quick and concurrent boot
- 19. Automated cleanup
■ No need to run nodetool cleanup - automatic after topology op
■ Automatic repair is planned with tablets
You should run nodetool cleanup whenever you scale-out
(expand) your cluster, and new nodes are added to the same DC.
- 20. UUID based host identification
■ Token metadata
■ Hints
Increased safety:
■ Removed nodes are banned from the cluster
■ Live nodes can’t be removed, only decommissioned
- 21. Fast and concurrent bootstrap
■ bootstrap as many nodes as you want, simultaneously
■ New cluster assembly takes seconds, not minutes/hours
# DEPRECATED/IGNORED
skip_wait_for_gossip_to_settle: 30
- 23. New system table for Raft state
cqlsh> select * from system.raft_state;
group_id | disposition | server_id | can_vote
--------------------------------------+-------------+--------------------------------------+----------
7b818380-e9f8-11ed-9316-7c72c96b4bfa | CURRENT | c3b8f01d-e87f-487f-8e6c-e2c86f8b898b | True
- 24. New rest APIs
■ localhost:9000/storage_service/cleanup_all
■ localhost:9000/raft/trigger_snapshot/{group_id}
- 25. Maintenance mode
./scylla --maintenance-mode=true --maintenance-socket=workdir
kostja@hulk:~/work/scylla/db$ cqlsh ./cql.m
Connected to at ./cql.m:9042
[cqlsh 6.2.0 | Scylla 5.5.0~dev-0.20240130.0cbf8f75f016 | CQL spec 3.3.1 |
Native protocol v4]
Use HELP for help.
cqlsh>
- 26. Enabling Raft
■ In 6.0 and up Raft is ALWAYS ON
# DEPRECATED/IGNORED
consistent_cluster_management: true
- 29. ScyllaDB Summit 2024 Styles
2024 Summit color palette
#1B58EF #05CEE8 #00EFB6
#F244CD #8158FF #EEEEEE
#FFA522
#4D4D4D
The default body font is Roboto Condensed.
You can adjust the size as needed.
You can also use Roboto (the uncondensed version).
For code you should use Roboto Mono and you can set it on
this dark background
- 31. ScyllaDB Products Mascots
Scylla Open Source Scylla Enterprise Scylla Cloud
Scylla Manager
Scylla
Drivers
Scylla Operator
Scylla Monitoring
Scylla Alternator
- 36. Your Slide Title in Title Case
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum
dictum ex leo, ac blandit arcu convallis et.
■ Donec faucibus porttitor lorem vitae luctus
■ Vestibulum ante ipsum primis in faucibus
■ Orci luctus et ultrices posuere cubilia curae
■ Donec pharetra turpis eu interdum fermentum
■ Nulla facilisi
■ Lacus est finibus ligula