Scylla Summit 2017: Stretching Scylla Silly: The Datastore of a Graph Database in a Containerized HA Hosted Cloud Service

•

1 like•1,409 views

In this talk, we will cover the lay of the land of graph databases. We will talk about what it takes to run a highly available hosted solution in the cloud while giving users a seamless vertical and horizontal scaling solution, and share our experiences migrating from an Apache Cassandra backed graphDB as-a-service solution.

Recommended for you

If You Care About Performance, Use User Defined Types

Shlomi Livne, VP of R&D at ScyllaDB, presented on the performance benefits of using user-defined types (UDTs) in ScyllaDB. He explained that with traditional columns, each column has overhead and flexibility comes at a price. However, with frozen UDTs, the columns are treated as a single unit, sharing metadata and improving performance. Livne showed results of a test where UDTs with many fields outperformed traditional columns with the same number of fields. However, he noted that Scylla's row cache and Java driver performance need improvement for UDTs.

•by ScyllaDB

nosqlscyllasummitscylla

Scylla Summit 2017: Migrating to Scylla From Cassandra and Others With No Dow...

The session will cover the best practices to migrate existing data from Apache Cassandra to Scylla and how to do it while being online all of the time.

•by ScyllaDB

nosqlscyllasummitscylla

Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and S...

The document appears to be a presentation on optimizing inter-data center communication. It discusses key topics like what inter-data center communication involves, the costs associated with it, best practices for setting snitches, keyspaces, client drivers and consistency levels for queries to optimize performance between data centers. It recommends using network topology replication strategies over simple strategies for multi-region deployments, setting load balancing and consistency levels appropriately in clients, and enabling internode compression to reduce costs of communication between data centers. The presentation encourages reviewing client locations, data access patterns, who is reading/writing data, and having conversations between operations and development teams to determine the best use cases.

•by ScyllaDB

nosqlscyllasummitscylla

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Graph Databases

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Graph Databases
▪ What is a Graph Database?
▪ Components of a property graph
▪ What is TinkerPop?
6

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
TinkerPop
7

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Vertices
8
Person

Recommended for you

Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

Apache Kafka is a high-throughput distributed streaming platform that is being adopted by hundreds of companies to manage their real-time data. KSQL is an open source streaming SQL engine that implements continuous, interactive queries against Apache Kafka™. KSQL makes it easy to read, write and process streaming data in real-time, at scale, using SQL-like semantics. In my talk, I will discuss streaming ETL from Kafka into stores like Apache Cassandra using KSQL.

•by ScyllaDB

nosqlscyllasummitscylla

Scylla Summit 2017: Stateful Streaming Applications with Apache Spark

When working with streaming data, stateful operations are a common use case. If you would like to perform data de-duplication, calculate aggregations over event-time windows, track user activity over sessions, you are performing a stateful operation. Apache Spark provides users with a high level, simple to use DataFrame/Dataset API to work with both batch and streaming data. The funny thing about batch workloads is that people tend to run these batch workloads over and over again. Structured Streaming allows users to run these same workloads, with the exact same business logic in a streaming fashion, helping users answer questions at lower latencies. In this talk, we will focus on stateful operations with Structured Streaming and we will demonstrate through live demos, how NoSQL stores can be plugged in as a fault tolerant state store to store intermediate state, as well as used as a streaming sink, where the output data can be stored indefinitely for downstream applications.

•by ScyllaDB

nosqlscyllasummitscylla

Scylla Summit 2017: How to Use Gocql to Execute Queries and What the Driver D...

This document outlines a presentation on using the GoCQL driver to execute queries against Cassandra and Scylla databases. It discusses connecting to a Cassandra cluster, executing queries, iterating over results, and using asynchronous queries. It also mentions some additional Cassandra libraries built on top of GoCQL, including gocqlx for data binding and queries, and gocassa for queries and migrations. The presentation aims to explain how GoCQL works behind the scenes and how to get started with basic querying functionality.

•by ScyllaDB

scyllanosqlscyllasummit

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Edges
9
Person PersonKnows

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Properties
10
Person Knows Personthrough: IBM

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
VertexProperties
11
Person Person
name: Keith
Lived_In:
[Brighton, Boston]
name: David
Lived_In: [NYC,
Boston]
Knows
through: IBM

$PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company VertexProperties Person Person name: Keith Lived_In: [{value:Brighton, properties: {from: 2010, to: 2014}, {value: Boston, properties: {from: 2014}] name: David Lived_In: [NYC, Boston] 12 Knows through: IBM$

Recommended for you

Scylla Summit 2017: Scylla on Samsung NVMe Z-SSDs

I will be giving a talk about performance characterization and tuning of Scylla on Samsung NVMe SSDs. We will characterize the performance of Scylla on Samsung high-performance NVMe SSDs and show how Z-SSD ─ the Samsung ultra-low-latency NVMe drive ─ can significantly shrink the performance gap between in-memory and in-storage with Scylla. We will further evaluate the throughput-vs-latency profile of Scylla with NVMe devices and present end-to-end latencies (from the client's viewpoint) as well as the latencies of the software/hardware stack. We will show that a Z-SSD-backed Scylla cluster can provide competitive performance to an in-memory deployment while sharply reducing costs.

•by ScyllaDB

scyllasummitnosqlscylla

Scylla Summit 2017: SMF: The Fastest RPC in the West

On a quest to build the fastest durable log broker in the west, we had to rethink all of the components needed to deliver on this promise. First, we began by building the fastest RPC system in the west, SMF. SMF is a new RPC mechanism, IDL-compiler, and libraries that make using Seastar easy. In this talk, I will cover SMF in detail and show a live demo on how you can get started using it to build your next application so you can live in the future.

•by ScyllaDB

nosqlscyllasummitscylladb

Scylla Compaction Strategies

Presentation on Scylla's and Cassandra's compaction, why it is needed and how it works, and the different compaction strategies: their strengths and weaknesses, and the different types of "amplification" and how to use them to reason about the different compaction strategies. And finally, what Scylla does better than Cassandra in this area. These slides were presented at a meetup in Tel-Aviv, a joint meetup of the following two groups: https://www.meetup.com/Israel-Cassandra-Users/events/259322355/ https://www.meetup.com/Big-things-are-happening-here/events/259495379/

•by Nadav Har'El

cassandrascyllanosql

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Gremlin
13

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example Gremlin Traversal
14

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
JanusGraph

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
16 16

Recommended for you

20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...

This document summarizes David B. Horvath's presentation on dealing with XML ordinals over multiple files. It discusses how the SAS XML engine converts XML objects into SAS datasets with generated keys (ordinals) to represent parent-child relationships. However, these ordinals are not unique when concatenating datasets from multiple files. The presentation describes how to handle non-unique ordinals by finding the maximum ordinal from the previous file and adding it to the current file's values. It also discusses how the presenter addressed processing over 100 datasets by writing SAS code to generate the SAS code needed to handle the XML processing, rather than copying and pasting code manually.

•by David Horvath

DAT304_Amazon Aurora Performance Optimization with MySQL

Amazon Aurora services are MySQL and PostgreSQL -compatible relational database engines with the speed, reliability, and availability of high-end commercial databases at one-tenth the cost. This session introduces you to Amazon Aurora, explores the capabilities and features of Aurora, explains common use cases, and helps you get started with Aurora.

•by Kamal Gupta

SQL Server2012 Enhancements

This document discusses new features in SQL Server 2012 including Always On, contained databases, columnstore indexes, Visual Studio integration, and TSQL enhancements. It provides details on columnstore indexes, query pagination using new features, windowing functions using the OVER clause, sequences, metadata discovery using new DMVs and stored procedures, enhanced functions, and general TSQL improvements including THROW and extended events.

•by Abhishek Sur

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
17

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
18
February 2012
Titan Graph Database started by Aurelius
February 2015
Aurelius acquired by DataStax
September 2015
Titan 1.0 released
January 2017
JanusGraph established at the Linux Foundation
with partners from Expero, Google, GRAKN.AI, Hortonworks, and IBM
April 2017
JanusGraph has first release
Soon
JanusGraph 0.2.0 release

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
19

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
20 20
Composite and Vertex Centric Indexes Mixed Indexes

Recommended for you

AWS Segment XO Group Joint webinar

This document summarizes a presentation about loading and analyzing behavioral data in Amazon Redshift. It includes the following: - An overview of the Amazon Redshift architecture including leader nodes, compute nodes, columnar storage, data compression, and parallel query execution. - A demonstration of Segment SQL which allows querying user, page, and event data from Segment in an Amazon Redshift database. - Three case studies from XO Group on how they used Segment SQL and Amazon Redshift to perform membership analysis, analyze share functionality, and gain insights for version support.

•by Arti Bhatia

Scylla Summit 2017: The Upcoming HPC Evolution

In this talk, I will explain how HPC is beginning to evolve and how we use supercomputers to monitor supercomputers. First we will look at how HPC is different from cloud computing in terms of infrastructure and application architecture. Then I will discuss how those things are changing and why. Finally, I will dive into a use case of monitoring supercomputers as an application area for Scylla.

•by ScyllaDB

scyllasummitscyllanosql

Scylla Summit 2017: How to Run Cassandra/Scylla from a MySQL DBA's Point of View

Are you a MySQL DBA or DevOps individual being asked to run Cassandra or Scylla? Feeling overwhelmed? In this talk, I will present Cassandra/Scylla operations in terms that directly relate to MySQL. I will show you comparisons between the Information Schema and the Cassandra/Scylla System keyspace(s). I will also talk about metrics available in MySQL versus Cassandra/Scylla and how to retrieve them. Finally, I will talk about how MySQL replication compares with Cassandra replication. Hopefully, when I am done you will be able to relate to Cassandra operations in a practical and useful way.

•by ScyllaDB

nosqlscyllasummitscylladb

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Traversals Using Composite and
Vertex-Centric Indexes
21
Person
Bought
Product
name: Keith timestamp: 1508176904949 Type: Paper Towels
g.V().has(“name”, “Keith”);
g.V().has(“name”, “Keith”).outE(“bought”).order().by(“timestamp”, incr).limit(20).inV();

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Scylla and JanusGraph

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
JanusGraph Backend Data Model
23

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
JanusGraph Backend Data Model
24

Recommended for you

Scylla Summit 2017: A Deep Dive on Heat Weighted Load Balancing

This presentation discusses the "cold node problem" that occurs when a node restarts in a Cassandra cluster. When a node restarts, it loses its cached data and becomes a bottleneck. The presentation proposes a "heat weighted load balancing" solution where the cluster tracks each node's cache hit ratio and redistributes requests based on this ratio after a restart. Testing shows this solution significantly improves throughput after a node restart by distributing requests more evenly across nodes based on their "heat" or cache contents.

•by ScyllaDB

scyllanosqlscyllasummit

Scylla Summit 2017: Saving Thousands by Running Scylla on EC2 Spot Instances

Scylla and Spotinst together provide a strong combination of extreme performance and cost reduction. In this talk, we will present how a Scylla cluster can be used on AWS’s EC2 Spot without losing consistency with the help of Spotinst prediction technology and advanced stateful features. We will show a live demo on how to run Scylla on the Spotinst platform.

•by ScyllaDB

nosqlscyllasummitscylla

Scylla Summit 2017: How We Got to 1 Millisecond Latency in 99% Under Repair, ...

Glauber Costa, a Principal Architect at ScyllaDB, discusses techniques for achieving low latency database operations. He identifies three main sources of latency: speed mismatch between disk and CPU, lack of respect for task quotas, and imperfect isolation. Glauber describes how ScyllaDB addresses these issues through techniques like the I/O scheduler, CPU scheduler, task quotas, block detector, and controllers that regulate operations like memtable flushes. The goal is to make high percentile latencies low and bounded by treating them as bugs rather than nice-to-haves. ScyllaDB users can already benefit from these latency improvements in many situations, with more fixes coming in future releases.

•by ScyllaDB

scylladbnosqlscyllasummit

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
25

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
26

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
27

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
THANK YOU
dpitera@us.ibm.com
krlohnes@us.ibm.com
@dpitera_
Please stay in touch
Any questions?

Recommended for you

Scylla Summit 2017: Scylla's Open Source Monitoring Solution

Scylla's monitoring capability has come a long way in the last year. We now have native support for Prometheus. Through scylla-grafana-monitoring, we have started providing default dashboards summarizing the most important aspects of Scylla for users. In this talk, I will cover what is currently available in our metrics, other non-standard metrics that are interesting but not available in our main dashboard, as well as our future plans for enhancement.

•by ScyllaDB

nosqlscylladbscyllasummit

How to achieve no compromise performance and availability

ScyllaDB co-founders Dor Laor and Avi Kivity discuss why they started ScyllaDB, the decision decisions they made to achieve no-compromise performance and availability, and give a demo on how to get up and running on Docker.

•by ScyllaDB

nosqlscylladbscylla

How to Monitor and Size Workloads on AWS i3 instances

There is a new class of machines in town! Amazon recently unveiled i3, a new class of machines targeted at I/O-intensive workloads. Scylla will officially support i3, and previews are already available. Join our webinar to learn how to build a state-of-the-art database solution. Presenters Glauber Costa and Eyal Gutkind will cover how to: - Determine which workloads can benefit from i3 instances - Ensure Scylla fully leverages the great resources in the i3 family - Effectively navigate the Scylla monitoring system and identify bottlenecks You'll also see a live demonstration with a dashboard featuring an i3 cluster with different data models and workloads.

•by ScyllaDB

awscloudawsmonitoring

What's hot

Scylla Summit 2017: Running a Soft Real-time Service at One Million QPS

Scylla Summit 2017: Stretching Scylla Silly: The Datastore of a Graph Database in a Containerized HA Hosted Cloud Service

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (16)

Viewers also liked

Viewers also liked (14)

Similar to Scylla Summit 2017: Stretching Scylla Silly: The Datastore of a Graph Database in a Containerized HA Hosted Cloud Service

Similar to Scylla Summit 2017: Stretching Scylla Silly: The Datastore of a Graph Database in a Containerized HA Hosted Cloud Service (6)

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

Scylla Summit 2017: Stretching Scylla Silly: The Datastore of a Graph Database in a Containerized HA Hosted Cloud Service