SlideShare a Scribd company logo
Doug Stuns: Cassandra/Scylla NoSQL Migrations Expert
Tomer Sandler: Solution Architect, ScyllaDB
Doug Stuns
Cassandra/Scylla NoSQL
Migrations Expert
Tomer Sandler
Solution Architect,
ScyllaDB
+ 30+ years of experience at architecting &
implementing enterprise-grade databases
+ Published author on Oracle database
+ Has led many RDBMS → NoSQL migrations
+ Has worked in many different industries with
+ unique design requirements
+ Understands the cost savings of minimizing RDBMS
footprints where/how this can be best utilized without
sacrificing performance & scalability
RDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful Migrations
# NoSQL Relational Databases
1 Query-based: Application -> Data -> Model Entity-based: Data -> Model -> Application
2 Denormalization Support for foreign-keys, Joins
3 CAP Theorem, Eventual Consistency ACID Guarantee
4 Distributed Architecture Mostly single point of failure
+ CAP Theorem
+ ACID vs BASE
+ Structured Query Language (SQL) vs Cassandra Query Language (CQL)
Item SQL CQL
Consistency Strong Eventual
Data Reference (Foreign key) Yes Denormalized data
Join Yes Use 3rd party tools
WHERE clauses Yes
Yes, performance hits may
apply for non-partition key
columns filtering
ORDER BY clauses Yes
Can only be applied to a
clustering column
+ Strong vs Eventual Consistency
+ Normalized vs Denormalized Data
+ Data model flexibility and various data type
+ Traffic volume (consistency vs speed)
+ Economy of scale
Query Patterns from RDBMS Application
+ Primary key in RDBMS should be matched in Scylla
+ Additional indexes can be handled with secondary indexes or additional tables
+ Customer table primary key in RDBMS: customer_id customer_desc
+ Unique performance index in RDBMS: customer_id create_dt
+ Partition key in Scylla: customer_id
+ Cluster ordering key in Scylla: customer_desc create_dt
This will provide the same data uniqueness in RDBMS.
select * from customer where customer_id = 505 and customer_name =
‘ACME’ and create_dt = ‘01-JAN-20’;
Query Patterns from RDBMS Application
+ customer table primary key in RDBMS: customer_id customer_desc
+ unique performance index in RDBMS: customer_id create_dt
In Scylla… This will provide the same data uniqueness in RDBMS.
+ Partition key in Scylla: customer_id or (customer_id customer_desc) as the compound partition key.
+ Cluster ordering key in Scylla: customer_desc
In Scylla...You could create a secondary index or create an additional customer table/materialized view
on.
+ customer_id create_dt as the partition key and cluster ordering key or a secondary index
Name (Key) City Height (m)
Shun Hing Square Shenzhen 384
Eton Place Dalian Tower 1 Dalian 383
Logan Century Center 1 Nanning 381
Burj Mohammed bin Rashid Abu Dhabi 381
Base Table: buildings
City (Key) Name Height (m)
Shenzhen Shun Hing Square 384
Dalian
Eton Place Dalian
Tower 1 383
Nanning
Logan Century
Center 1 381
Abu Dhabi
Burj Mohammed
bin Rashid 381
View: building_by_city
select * from buildings WHERE name = 'Tianjin CTF Finance Centre'; ✓
select * from building_by_city WHERE city = 'Shenzhen'; ✓
Name (Key) City Height (m)
Shun Hing Square Shenzhen 384
Eton Place Dalian Tower 1 Dalian 383
Logan Century Center 1 Nanning 381
Burj Mohammed bin Rashid Abu Dhabi 381
Base Table: buildings
City (Key) idx_token(Key) Name(Key)
Shenzhen 0x52dd3c1c6757d40b
Shun Hing
Square
Dalian 0x831daa7f26301684
Eton Place Dalian
Tower 1
Nanning 0xe278a1fea85cff66
Logan Century
Center 1
Abu Dhabi 0xd17f9056c9caba94
Burj Mohammed
bin Rashid
Index: buildings_by_city_index
SELECT * from buildings WHERE name = 'Tianjin CTF Finance Centre'; ✓
SELECT * FROM buildings WHERE city = 'New York City'; ✓
CREATE INDEX buildings_by_city ON buildings (city);
RI Referential Integrity in RDBMS
Product table contains customer_id reference back to customer table through a foreign key relationship.
In Scylla… The relationship can be made but without constraints as in the RDBMS.
There is no RI assuring integrity at database level so, the data in Product table would need to match
customer table and the API would need to appropriately load data that only existed in the customer table
to maintain integrity.
Modeling tools in RDBMS
erwin, TOAD for your RDBMS and many others
that allow logical models to provide DDL
generation for physical implementation.
Modeling tools in NoSQL Scylla
Hackolade allows very similar modeling, CQL
generation, diff of models and ability for model
based design in Scylla.
This also allows modelers to collaborate with a
JSON driven raw model structure.
https://hackolade.com
RDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful Migrations
Scylla with
RDBMS or
all by itself.
Hybrid
A hybrid migration would target you large growing
tables such as IOT or others to be moved to Scylla
while preserving your reference data in your existing
RDBMS significantly shrinking your RDBMS footprint.
Full
A full migration would target all RDBMS tables to
Scylla which would also shrink you RDBMS footprint
but require more careful thought for core table
uniqueness and consistency across the cluster.
LWT lightweight transactions:
At the core of every RDBMS application is set of tables or table that is the primary updatable and near
immediate consistent core of the RDBMS.
In order to fulfill this requirement LWT is utilized to support this NoSQL Scylla.
LWT utilize the paxos algorithm to assure the data in these core tables is read before write operation across
the cluster. So, your NoSQL Scylla has forced real time consistency.
LWT is implemented with a NOT EXISTS clause in update/insert/delete. This insures that the value does not
exist before a change. Most current data can be accessed with selects using serial consistency.
LWT should be used sparingly on your core primary tables only. Such as an new IP address or username for
ISP type application. Due to the higher overhead of this operation in the cluster. A great example:
https://docs.scylladb.com/using-scylla/lwt/
LWT lightweight transactions
+ An example of the paxos read ahead transaction
+ https://docs.scylladb.com/using-scylla/lwt/
R2
R3
R1
Client
+ Using “IF statement” allow users to maintain records consistency
Any INSERT, UPDATE or DELETE can have an IF clause:
> UPDATE employees SET join_date = … IF EXISTS;
> INSERT INTO bookings (id, item, client, quantity) VALUES
(…) IF NOT EXISTS;
> UPDATE inventory SET state = 'Used' WHERE itemid = ?
IF state = 'Unused' AND check = 'Passed';
> DELETE FROM tasks WHERE project_id = ? AND task_id = ?
IF task['state'] IN ('Complete', 'Abandoned');
DR/HA Disaster Recovery High Availability requirements
+ DR/HA is built into the architecture of the cluster at instantiation.
+ Replication Factor
+ On prem and Cloud Data Centers (AZ / Rack separation)
+ Easy migration from/to Cloud and between Cloud vendors
+ Workload separation (Read, Write, Failover ops)
Node Density
+ Less HW (store TBs of data per node)
+ Operational symplicity
+ High throughput with great performance
+ Lower costs
Spark
+ Spark is an open source programmatic tool that allows for ad-hoc queries by non
partition ranges.
+ Data migration activities outside of basic Scylla tool set.
https://spark.apache.org/
Presto
+ Presto is a open source ANSI 92 like query tool that can be used for ad-hoc queries,
data migration, and basic RDBMS functions such as joins and all the activities you
are familiar with in your RDBMS.
https://prestosql.io/
+ Data models “translation”
+ Data forklifting
+ Data validation
Doug Stuns
Cassandra/Scylla NoSQL
Migrations Expert
Tomer Sandler
Solution Architect,
ScyllaDB
United States Israel www.scylladb.com
@scylladb

More Related Content

RDBMS to NoSQL: Practical Advice from Successful Migrations

  • 1. Doug Stuns: Cassandra/Scylla NoSQL Migrations Expert Tomer Sandler: Solution Architect, ScyllaDB
  • 2. Doug Stuns Cassandra/Scylla NoSQL Migrations Expert Tomer Sandler Solution Architect, ScyllaDB
  • 3. + 30+ years of experience at architecting & implementing enterprise-grade databases + Published author on Oracle database + Has led many RDBMS → NoSQL migrations + Has worked in many different industries with + unique design requirements + Understands the cost savings of minimizing RDBMS footprints where/how this can be best utilized without sacrificing performance & scalability
  • 6. # NoSQL Relational Databases 1 Query-based: Application -> Data -> Model Entity-based: Data -> Model -> Application 2 Denormalization Support for foreign-keys, Joins 3 CAP Theorem, Eventual Consistency ACID Guarantee 4 Distributed Architecture Mostly single point of failure
  • 7. + CAP Theorem + ACID vs BASE
  • 8. + Structured Query Language (SQL) vs Cassandra Query Language (CQL) Item SQL CQL Consistency Strong Eventual Data Reference (Foreign key) Yes Denormalized data Join Yes Use 3rd party tools WHERE clauses Yes Yes, performance hits may apply for non-partition key columns filtering ORDER BY clauses Yes Can only be applied to a clustering column
  • 9. + Strong vs Eventual Consistency + Normalized vs Denormalized Data
  • 10. + Data model flexibility and various data type + Traffic volume (consistency vs speed) + Economy of scale
  • 11. Query Patterns from RDBMS Application + Primary key in RDBMS should be matched in Scylla + Additional indexes can be handled with secondary indexes or additional tables + Customer table primary key in RDBMS: customer_id customer_desc + Unique performance index in RDBMS: customer_id create_dt + Partition key in Scylla: customer_id + Cluster ordering key in Scylla: customer_desc create_dt This will provide the same data uniqueness in RDBMS. select * from customer where customer_id = 505 and customer_name = ‘ACME’ and create_dt = ‘01-JAN-20’;
  • 12. Query Patterns from RDBMS Application + customer table primary key in RDBMS: customer_id customer_desc + unique performance index in RDBMS: customer_id create_dt In Scylla… This will provide the same data uniqueness in RDBMS. + Partition key in Scylla: customer_id or (customer_id customer_desc) as the compound partition key. + Cluster ordering key in Scylla: customer_desc In Scylla...You could create a secondary index or create an additional customer table/materialized view on. + customer_id create_dt as the partition key and cluster ordering key or a secondary index
  • 13. Name (Key) City Height (m) Shun Hing Square Shenzhen 384 Eton Place Dalian Tower 1 Dalian 383 Logan Century Center 1 Nanning 381 Burj Mohammed bin Rashid Abu Dhabi 381 Base Table: buildings City (Key) Name Height (m) Shenzhen Shun Hing Square 384 Dalian Eton Place Dalian Tower 1 383 Nanning Logan Century Center 1 381 Abu Dhabi Burj Mohammed bin Rashid 381 View: building_by_city select * from buildings WHERE name = 'Tianjin CTF Finance Centre'; ✓ select * from building_by_city WHERE city = 'Shenzhen'; ✓
  • 14. Name (Key) City Height (m) Shun Hing Square Shenzhen 384 Eton Place Dalian Tower 1 Dalian 383 Logan Century Center 1 Nanning 381 Burj Mohammed bin Rashid Abu Dhabi 381 Base Table: buildings City (Key) idx_token(Key) Name(Key) Shenzhen 0x52dd3c1c6757d40b Shun Hing Square Dalian 0x831daa7f26301684 Eton Place Dalian Tower 1 Nanning 0xe278a1fea85cff66 Logan Century Center 1 Abu Dhabi 0xd17f9056c9caba94 Burj Mohammed bin Rashid Index: buildings_by_city_index SELECT * from buildings WHERE name = 'Tianjin CTF Finance Centre'; ✓ SELECT * FROM buildings WHERE city = 'New York City'; ✓ CREATE INDEX buildings_by_city ON buildings (city);
  • 15. RI Referential Integrity in RDBMS Product table contains customer_id reference back to customer table through a foreign key relationship. In Scylla… The relationship can be made but without constraints as in the RDBMS. There is no RI assuring integrity at database level so, the data in Product table would need to match customer table and the API would need to appropriately load data that only existed in the customer table to maintain integrity.
  • 16. Modeling tools in RDBMS erwin, TOAD for your RDBMS and many others that allow logical models to provide DDL generation for physical implementation. Modeling tools in NoSQL Scylla Hackolade allows very similar modeling, CQL generation, diff of models and ability for model based design in Scylla. This also allows modelers to collaborate with a JSON driven raw model structure. https://hackolade.com
  • 21. Hybrid A hybrid migration would target you large growing tables such as IOT or others to be moved to Scylla while preserving your reference data in your existing RDBMS significantly shrinking your RDBMS footprint. Full A full migration would target all RDBMS tables to Scylla which would also shrink you RDBMS footprint but require more careful thought for core table uniqueness and consistency across the cluster.
  • 22. LWT lightweight transactions: At the core of every RDBMS application is set of tables or table that is the primary updatable and near immediate consistent core of the RDBMS. In order to fulfill this requirement LWT is utilized to support this NoSQL Scylla. LWT utilize the paxos algorithm to assure the data in these core tables is read before write operation across the cluster. So, your NoSQL Scylla has forced real time consistency. LWT is implemented with a NOT EXISTS clause in update/insert/delete. This insures that the value does not exist before a change. Most current data can be accessed with selects using serial consistency. LWT should be used sparingly on your core primary tables only. Such as an new IP address or username for ISP type application. Due to the higher overhead of this operation in the cluster. A great example: https://docs.scylladb.com/using-scylla/lwt/
  • 23. LWT lightweight transactions + An example of the paxos read ahead transaction + https://docs.scylladb.com/using-scylla/lwt/ R2 R3 R1 Client
  • 24. + Using “IF statement” allow users to maintain records consistency Any INSERT, UPDATE or DELETE can have an IF clause: > UPDATE employees SET join_date = … IF EXISTS; > INSERT INTO bookings (id, item, client, quantity) VALUES (…) IF NOT EXISTS; > UPDATE inventory SET state = 'Used' WHERE itemid = ? IF state = 'Unused' AND check = 'Passed'; > DELETE FROM tasks WHERE project_id = ? AND task_id = ? IF task['state'] IN ('Complete', 'Abandoned');
  • 25. DR/HA Disaster Recovery High Availability requirements + DR/HA is built into the architecture of the cluster at instantiation. + Replication Factor + On prem and Cloud Data Centers (AZ / Rack separation) + Easy migration from/to Cloud and between Cloud vendors + Workload separation (Read, Write, Failover ops) Node Density + Less HW (store TBs of data per node) + Operational symplicity + High throughput with great performance + Lower costs
  • 26. Spark + Spark is an open source programmatic tool that allows for ad-hoc queries by non partition ranges. + Data migration activities outside of basic Scylla tool set. https://spark.apache.org/ Presto + Presto is a open source ANSI 92 like query tool that can be used for ad-hoc queries, data migration, and basic RDBMS functions such as joins and all the activities you are familiar with in your RDBMS. https://prestosql.io/
  • 27. + Data models “translation” + Data forklifting + Data validation
  • 28. Doug Stuns Cassandra/Scylla NoSQL Migrations Expert Tomer Sandler Solution Architect, ScyllaDB
  • 29. United States Israel www.scylladb.com @scylladb