Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and Stop Paying too Much!

•

2 likes•3,292 views

The document appears to be a presentation on optimizing inter-data center communication. It discusses key topics like what inter-data center communication involves, the costs associated with it, best practices for setting snitches, keyspaces, client drivers and consistency levels for queries to optimize performance between data centers. It recommends using network topology replication strategies over simple strategies for multi-region deployments, setting load balancing and consistency levels appropriately in clients, and enabling internode compression to reduce costs of communication between data centers. The presentation encourages reviewing client locations, data access patterns, who is reading/writing data, and having conversations between operations and development teams to determine the best use cases.

Recommended for you

Scylla Summit 2017: How Baidu Runs Scylla on a Petabyte-Level Big Data Platform

In this presentation, I'll speak of the benefits of running Scylla on our Big Data environment which stores over 500TB of data as well as using Scylla as the indexing engine to replace MongoDB and Cassandra for our log data analysis platform.

•by ScyllaDB

nosqlscyllasummitscylla

Scylla Summit 2017: Scylla on Samsung NVMe Z-SSDs

I will be giving a talk about performance characterization and tuning of Scylla on Samsung NVMe SSDs. We will characterize the performance of Scylla on Samsung high-performance NVMe SSDs and show how Z-SSD ─ the Samsung ultra-low-latency NVMe drive ─ can significantly shrink the performance gap between in-memory and in-storage with Scylla. We will further evaluate the throughput-vs-latency profile of Scylla with NVMe devices and present end-to-end latencies (from the client's viewpoint) as well as the latencies of the software/hardware stack. We will show that a Z-SSD-backed Scylla cluster can provide competitive performance to an in-memory deployment while sharply reducing costs.

•by ScyllaDB

scyllasummitnosqlscylla

Scylla Summit 2017: Scylla's Open Source Monitoring Solution

Scylla's monitoring capability has come a long way in the last year. We now have native support for Prometheus. Through scylla-grafana-monitoring, we have started providing default dashboards summarizing the most important aspects of Scylla for users. In this talk, I will cover what is currently available in our metrics, other non-standard metrics that are interesting but not available in our main dashboard, as well as our future plans for enhancement.

•by ScyllaDB

nosqlscylladbscyllasummit

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
What Do You Pay for?
5

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Snitch, Keyspace and Client Drivers Settings
6
- Simple vs NetworkTopology Strategies
EC2MultiRegionSnitch - for AWS based deployments,
GossipingPropertyFileSnitch - for all other deployments

$PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Snitch, Keyspace and Client Drivers Settings - Simple vs NetworkTopology Strategies CREATE KEYSPACE myks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '7'} 7 USA Data Center Asia Data Center$

Recommended for you

Scylla Summit 2017: A Deep Dive on Heat Weighted Load Balancing

This presentation discusses the "cold node problem" that occurs when a node restarts in a Cassandra cluster. When a node restarts, it loses its cached data and becomes a bottleneck. The presentation proposes a "heat weighted load balancing" solution where the cluster tracks each node's cache hit ratio and redistributes requests based on this ratio after a restart. Testing shows this solution significantly improves throughput after a node restart by distributing requests more evenly across nodes based on their "heat" or cache contents.

•by ScyllaDB

scyllanosqlscyllasummit

Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the W...

In my talk, I will present the different compaction strategies that Scylla provides, and demonstrate when it is appropriate and when it is inappropriate to use each one. I will then present a new compaction strategy that we designed as a lesson from the existing compaction strategies by picking the best features of the existing strategies while avoiding their problems.

•by ScyllaDB

nosqlscyllasummitscylla

Scylla Summit 2017: From Elasticsearch to Scylla at Zenly

Zenly (recently acquired by Snap) makes a social map app. Their team has been running Scylla in production for the past eight months. Get an overview of the reasons they chose Scylla, its deployment on Google Cloud, the performances they achieved, plus learn as they share some of the few hiccups they hit along the way.

•by ScyllaDB

nosqlscyllasummitscylla

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Inter-Node Compression!
Default : None
Use: DC
In your Scylla.yaml file uncomment and set
internode_compression: dc
12

Recommended for you

Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

In this talk, we will share useful tools and techniques that we are using in the field to understand Scylla clusters. Users will learn how to use those same tools to better understand their deployment. Some of the questions that will be answered are: - how to find out which queries are the slowest and why - how we go about understanding the impact of the data model in a node's performance - how to check which resources are the bottlenecks in the cluster

•by ScyllaDB

nosqlscyllasummitscylla

If You Care About Performance, Use User Defined Types

Shlomi Livne, VP of R&D at ScyllaDB, presented on the performance benefits of using user-defined types (UDTs) in ScyllaDB. He explained that with traditional columns, each column has overhead and flexibility comes at a price. However, with frozen UDTs, the columns are treated as a single unit, sharing metadata and improving performance. Livne showed results of a test where UDTs with many fields outperformed traditional columns with the same number of fields. However, he noted that Scylla's row cache and Java driver performance need improvement for UDTs.

•by ScyllaDB

nosqlscyllasummitscylla

Scylla Summit 2017: How to Run Cassandra/Scylla from a MySQL DBA's Point of View

Are you a MySQL DBA or DevOps individual being asked to run Cassandra or Scylla? Feeling overwhelmed? In this talk, I will present Cassandra/Scylla operations in terms that directly relate to MySQL. I will show you comparisons between the Information Schema and the Cassandra/Scylla System keyspace(s). I will also talk about metrics available in MySQL versus Cassandra/Scylla and how to retrieve them. Finally, I will talk about how MySQL replication compares with Cassandra replication. Hopefully, when I am done you will be able to relate to Cassandra operations in a practical and useful way.

•by ScyllaDB

nosqlscyllasummitscylladb

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Summary
- As expected, you always pay :)
- Create keyspaces and tables in the regions you need them
- Enable conversations between ops and dev for best use cases
- Review client geo locations and data access patterns
- Who is writing? (Cheap)
- Who is reading? (Expensive)
- Enable compression!
13

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
THANK YOU
eyal@scylladb.com
@gutkinde
Please stay in touch
Any questions?

What's hot

Scylla Summit 2017: Snapfish's Journey Towards Scylla

ScyllaDB

Scylla Summit 2017: SMF: The Fastest RPC in the West

ScyllaDB

On a quest to build the fastest durable log broker in the west, we had to rethink all of the components needed to deliver on this promise. First, we began by building the fastest RPC system in the west, SMF. SMF is a new RPC mechanism, IDL-compiler, and libraries that make using Seastar easy. In this talk, I will cover SMF in detail and show a live demo on how you can get started using it to build your next application so you can live in the future.

Scylla Summit 2017: Distributed Materialized Views

ScyllaDB

Duarte Nunes presented on distributed materialized views in ScyllaDB. He discussed the challenges of implementing materialized views in a distributed system without a single master, including propagating updates from base tables to views, handling consistency when tables can diverge, and managing concurrent updates safely. His proposed solution uses asynchronous replica-based propagation paired with repair mechanisms and locking or optimistic concurrency to address these issues. Materialized views provide powerful indexing capabilities but also introduce performance overhead that is difficult to avoid given Scylla's data model.

Scylla Summit 2017: How Baidu Runs Scylla on a Petabyte-Level Big Data Platform

ScyllaDB

Scylla Summit 2017: Scylla on Samsung NVMe Z-SSDs

ScyllaDB

Scylla Summit 2017: Scylla's Open Source Monitoring Solution

ScyllaDB

Scylla Summit 2017: A Deep Dive on Heat Weighted Load Balancing

ScyllaDB

Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the W...

ScyllaDB

Scylla Summit 2017: From Elasticsearch to Scylla at Zenly

ScyllaDB

Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

ScyllaDB

If You Care About Performance, Use User Defined Types

ScyllaDB

Scylla Summit 2017: How to Run Cassandra/Scylla from a MySQL DBA's Point of View

ScyllaDB

Scylla Summit 2017: How to Use Gocql to Execute Queries and What the Driver D...

ScyllaDB

This document outlines a presentation on using the GoCQL driver to execute queries against Cassandra and Scylla databases. It discusses connecting to a Cassandra cluster, executing queries, iterating over results, and using asynchronous queries. It also mentions some additional Cassandra libraries built on top of GoCQL, including gocqlx for data binding and queries, and gocassa for queries and migrations. The presentation aims to explain how GoCQL works behind the scenes and how to get started with basic querying functionality.

Scylla Summit 2017: Scylla on Kubernetes

ScyllaDB

Scylla Summit 2017: Keynote, Looking back, looking ahead

ScyllaDB

Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...

ScyllaDB

Scylla Summit 2017: Saving Thousands by Running Scylla on EC2 Spot Instances

ScyllaDB

Scylla Summit 2017: The Upcoming HPC Evolution

ScyllaDB

Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...

ScyllaDB

JanusGraph, a highly scalable graph database solution, supports historically Cassandra and HBase as database backends. We decided to put Scylla in the mix, certainly searching for the best performing backend. We ran test scenarios that cover high volume reads and writes. In this talk, we will show you the performance results of Scylla vs others and also share our lessons learned during the performance evaluation.

Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

ScyllaDB

Apache Kafka is a high-throughput distributed streaming platform that is being adopted by hundreds of companies to manage their real-time data. KSQL is an open source streaming SQL engine that implements continuous, interactive queries against Apache Kafka™. KSQL makes it easy to read, write and process streaming data in real-time, at scale, using SQL-like semantics. In my talk, I will discuss streaming ETL from Kafka into stores like Apache Cassandra using KSQL.

What's hot (20)

Scylla Summit 2017: Snapfish's Journey Towards Scylla

Scylla Summit 2017: SMF: The Fastest RPC in the West

Scylla Summit 2017: Distributed Materialized Views

Scylla Summit 2017: How Baidu Runs Scylla on a Petabyte-Level Big Data Platform

Scylla Summit 2017: Scylla on Samsung NVMe Z-SSDs

Scylla Summit 2017: Scylla's Open Source Monitoring Solution

Scylla Summit 2017: A Deep Dive on Heat Weighted Load Balancing

Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the W...

Scylla Summit 2017: From Elasticsearch to Scylla at Zenly

Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

If You Care About Performance, Use User Defined Types

Scylla Summit 2017: How to Run Cassandra/Scylla from a MySQL DBA's Point of View

Scylla Summit 2017: How to Use Gocql to Execute Queries and What the Driver D...

Scylla Summit 2017: Scylla on Kubernetes

Scylla Summit 2017: Keynote, Looking back, looking ahead

Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...

Scylla Summit 2017: Saving Thousands by Running Scylla on EC2 Spot Instances

Scylla Summit 2017: The Upcoming HPC Evolution

Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...

Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

Viewers also liked

Scylla Summit 2017: Cry in the Dojo, Laugh in the Battlefield: How We Constan...

ScyllaDB

Scylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter

ScyllaDB

Sf bay area Kubernetes meetup dec8 2016 - deployment models

Peter Ss

Neutron scaling

Vinay Bannai

Mirantis - Continuous Deployment of Infrastructure, Platform, and Application...

Jakub Pavlik

Plan to run applications in containers on Kubernetes, but those applications or components are not yet ready to be deployed as micro-services? Then you need a unified platform that can orchestrate and continuously deliver infrastructure, container platform and applications services across bare-metal, containers and VMs so you can easily leverage the benefits of containers incrementally as legacy applications are adapted to micro-services architectures. Mirantis Cloud Platform (MCP) enables continuous delivery of the infrastructure, and in this demo we will show you how Mirantis Cloud Platform (MCP) enables continuous delivery of application workloads on top of its VM and Bare-Metal IaaS (OpenStack) and Container Services (Kubernetes) resources, backed by a single SDN (OpenContrail) implementation. We will utilize Spinnaker as an open-source multi-cloud continuous delivery platform for releasing software on MCP.

Kubernetes SDN performance and architecture

Jakub Pavlik

The document discusses Kubernetes SDN performance and architecture. It provides an overview of Calico and OpenContrail, two common SDN solutions for Kubernetes. Calico uses standard protocols and has no overhead but lacks L2 capabilities. OpenContrail provides advanced networking features through an overlay but has more overhead and complexity. Both solutions were tested on a 100 node cluster and their performance and production considerations are examined. The presentation concludes with a comparison of Calico and OpenContrail and examples of multi-cloud architectures using them.

Kubernetes Architecture - beyond a black box - Part 1

Hao H. Zhang

This is part 1 of my Kubernetes architecture deep-dive slide series. I have been working with Kubernetes for more than a year, from v1.3.6 to v1.6.7, and I am a CNCF certified Kubernetes administrator. Before I move on to something else, I would like to summarize and share my knowledges and take-aways about Kubernetes, from a software engineer perspective. This set of slides is a humble dig into one level below your running application in production, revealing how different components of Kubernetes work together to orchestrate containers and present your applications to the rest of the world. The slides contains 80+ external links to Kubernetes documentations, blog posts, Github issues, discussions, design proposals, pull requests, papers, source code files I went through when I was working with Kubernetes - which I think are valuable for people to understand how Kubernetes works, Kubernetes design philosophies and why these design came into places.

CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...

DataStax

Building queues on distributed data stores is hard, and long been considered an antipattern. However, with careful consideration and tactics, it is possible to do. CassieQ is an implementation of a distributed queue on Cassandra which supports easy installation, massive data ingest, authentication, a simple to use HTTP based API, and no dependencies other than your already existing Cassandra environment. About the Speakers Anton Kropp Senior Software Engineer, Curalate Anton Kropp is a senior engineer with over 8 years experience building distributed and fault tolerant systems. He has worked at companies big and small (Godaddy, PracticeFusion), and enjoys building frameworks and tooling to make life easier with a penchant for dockerized containers and simple API's. When he's not messing around on his computer he's drinking local Seattle beers, zipping around the city on his electric bike, and hanging out with his wife and dog.

Viewers also liked (8)

Scylla Summit 2017: Cry in the Dojo, Laugh in the Battlefield: How We Constan...

Scylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter

Sf bay area Kubernetes meetup dec8 2016 - deployment models

Neutron scaling

Mirantis - Continuous Deployment of Infrastructure, Platform, and Application...

Kubernetes SDN performance and architecture

Kubernetes Architecture - beyond a black box - Part 1

CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...

Similar to Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and Stop Paying too Much!

Information Technologies in Manufacturing

Patrick Bolkun

This document discusses the role of information technologies in supply chain management. It provides examples of entry-level jobs that utilize IT skills, including machine setter, quality control technician, database developer, warehouse data analyst, warehouse manager, mechanical engineer, and security system programmer. These jobs require both technical skills like programming and software proficiency as well as soft skills like communication, problem solving, and teamwork. The document emphasizes that information technologies are increasingly being implemented across manufacturing to improve communication, productivity, and business outcomes.

Millennium Search Jobs 11/29-12/5

Millennium Search

HARSH RESUME

harsh tyagi

Harsh Tyagi is a senior software engineer with over 5 years of experience working with NTT DATA. He has expertise in mainframe technologies like JCL, VSAM, COBOL, DB2, and IMS. Currently, he is working on a project for State Farm Insurance to support their test environments and help testers test applications before moving to production. His responsibilities include creating databases, executing batch jobs, handling issues, and providing knowledge transfer. He is proficient in various mainframe tools and has certifications in mainframe, soft skills, and project management.

Pranchan_B.E_mainframe_7.2 yrs

Pranchan Biswas

The document provides a summary of an individual's work experience. It details their current role as a Senior Systems Engineer at IBM India, where they have worked on several projects for Caceis Bank France involving technologies like COBOL, IMS, DB2, and FileAid. Previous experience includes roles as a Senior Software Engineer at iGate and Senior Analyst Programmer at Syntel, where they worked on projects in banking, insurance, and financial domains.

Siddhartha Chakraborty

siddchak234

Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...

Edureka!

** Data Science Master Program: https://www.edureka.co/masters-program/data-scientist-certification ** This Edureka "Data Scientist Roles and Responsibilities" PPT talks about the various Job Descriptions and specific skill sets for the different kinds of Data Scientists that are there. It explains why Data Science is the best career move, right now. Learn about various job roles and what they actually mean and the learning path to make a career in Data Science. Below are the topics covered in this module: What is Data Science? Who is a Data Scientist? Types of Data Scientists Skills Required to Become a Data Scientist Data Science Masters Program @Edureka Check out our Data Science Tutorial blog series: http://bit.ly/data-science-blogs Check out our complete Youtube playlist here: http://bit.ly/data-science-playlist Instagram: https://www.instagram.com/edureka_lea... Facebook: https://www.facebook.com/edurekaIN/ Twitter: https://twitter.com/edurekain LinkedIn: https://www.linkedin.com/company/edureka

Sundarapandiyan Updated Profile (1)

Sundarapandiyan Mohan

Sundarapandiyan KM has over 7 years of experience as a Senior Business Analyst in the banking and financial services domain at Wipro Technologies. He has extensive experience testing, implementing, and providing technical support for MasterCard products and enhancements for member banks in the United States. This includes preparing test cases and plans, testing online transactions, clearing files, releases and performing regression testing. He is a subject matter expert in various MasterCard products and has skills in databases, tools, operating systems, and programming languages.

seamless data.pptx jhuuhuhuhuhhuhjnjokojijoijij

salmanfrel434

10 Tech Jobs that Don't Require Coding.

Rock Interview

This document discusses 10 technology jobs that do not require coding skills. These include jobs in user experience design, user interface design, software quality testing, data analysis, search engine optimization, web analytics, growth hacking, enterprise software sales, technical support, and technical recruiting. For each role, the document provides the average salary and lists some of the main responsibilities and skills required to be successful in that position.

mani_datastage

manishankar ray

This document contains a summary of Manishankar Ray's professional experience and qualifications. It outlines his 3 years of experience as an ETL Data Stage Developer, technical skills including Data Stage, DB2, Qlikview, and programming languages. It also lists 4 projects between 2013-2015 involving ETL development using Data Stage and reporting with Qlikview for clients in various industries like manufacturing, insurance, and retail.

Chris Hartsell Resume

Chris Hartsell

Chris Hartsell is seeking a career in the IT field and provided his resume. He has worked with computers and networking throughout his life while gaining critical thinking skills through diagnosing technical problems in the RV industry. He is currently working towards a Bachelor's degree in Information Technology with a software engineering emphasis from University of Phoenix. His experience includes several years working as a service technician for RV companies, where he installed aftermarket products, diagnosed electronics issues, and provided solutions to customers. His skills include customer service, troubleshooting, installing and repairing computer hardware and software, and working in a team environment.

"That's Why We Get Paid The Big Bucks!": Top 11 Paying Tech Jobs 2016

Advanced Resources

Glassdoor, a top jobs and recruiting site, recently released their annual list of top paying jobs in America. Unsurprisingly, jobs in the Information and Technology sector dominated much of the list, making up 11 out of the 25 top paying jobs. Information and Technology positions are critical to data security, strategy development, and the productivity of most companies and require a great deal of technical know-how. It is no wonder that these roles command top dollar! Take a look at the presentation below to see position titles, median base salary, and a description of the role.

Pranchan b.e mainframe_7.5 yrs

Pranchan Biswas

The document contains a summary of the professional experience of Pranchan K. Biswas. He has over 7 years of experience working on mainframe technologies like COBOL, JCL, VSAM, DB2, IMS, and tools like File Aid, RTC, Changeman for clients in banking, financial services, and insurance. Currently he works as a Senior Systems Engineer at IBM India, where he has experience working as a Scrum Master and team member on projects for Caceis Bank France using Agile methodologies.

MicroStrategy Consultant Profile

Fernando Rivero Esqueda

Fernando Carlos Rivero E is a consultant with 9 years of experience in business intelligence and data warehousing, specializing in MicroStrategy products. He has various MicroStrategy certifications and has worked on numerous business intelligence projects in industries such as retail, media, finance, consumer goods, and telecommunications. His experience includes roles such as technical lead, project manager, data modeler, report developer, and instructor.

Arizona green (10.12.14)

careerconnectors

This document provides a summary of Burning Glass's skills-based analysis of green jobs in Arizona from May to October 2010. It finds that energy efficiency is the largest green subcluster, accounting for over a third of green job openings. The top green skills demanded include energy auditing, weatherization, and LEED certification. Salaries for green jobs average $42,216 annually compared to $37,440 for non-green jobs.

Station A Emerging Tech Speed Pitch

Jill Kirkpatrick

Resume

purnima gusain

This document is a curriculum vitae for Purnima Gusain that outlines her contact information, objective, educational qualifications, professional qualifications, job experience, hobbies, strengths, languages known, personal details, permanent address, and personal profile. She has a B.Tech in Information Technology from MDU with 65% marks and has worked as a System Administrator at Wipro Technologies since 2012. Her job experience includes desktop support, Windows administration, and handling issues related to software installation and customer relationship management.

Nitesh Sehrawat_2_Yrs _Testing

Nitesh Sehrawat

Nitesh has over 2 years of experience in IT with a focus on analysis, design, development and testing. He has expertise in various technologies including Java, JavaScript, SQL and frameworks like MVC. He is currently working as an Engineer - Quality Assurance at OnMobile Global testing their API and mobile content management systems. Previously he worked at Brillio Technologies as a Software Engineer testing various financial and utility applications.

David's resume

David Simpson

Hot Tech Jobs Oct.31-Nov.6

Millennium Search

This document summarizes 14 jobs posted from October 31st to November 6th. The jobs include positions like Business Development Manager, VP of Engineering, Inside Sales/Account Executive, Java Architect, and Web Developer. The salaries range from $40k base + 20% fees to $150k. The jobs are located in cities like New York, San Francisco, and Los Angeles and are in industries including eCommerce, technology, and software. Candidates are instructed to email their resumes to recruiters at the provided email addresses.

Similar to Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and Stop Paying too Much! (20)

Information Technologies in Manufacturing

Millennium Search Jobs 11/29-12/5

HARSH RESUME

Pranchan_B.E_mainframe_7.2 yrs

Siddhartha Chakraborty

Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...

Sundarapandiyan Updated Profile (1)

seamless data.pptx jhuuhuhuhuhhuhjnjokojijoijij

10 Tech Jobs that Don't Require Coding.

mani_datastage

Chris Hartsell Resume

"That's Why We Get Paid The Big Bucks!": Top 11 Paying Tech Jobs 2016

Pranchan b.e mainframe_7.5 yrs

MicroStrategy Consultant Profile

Arizona green (10.12.14)

Station A Emerging Tech Speed Pitch

Resume

Nitesh Sehrawat_2_Yrs _Testing

David's resume

Hot Tech Jobs Oct.31-Nov.6

More from ScyllaDB

Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...

ScyllaDB

In this presentation, we explore how standard profiling and monitoring methods may fall short in identifying bottlenecks in low-latency data ingestion workflows. Instead, we showcase the power of simple yet clever methods that can uncover hidden performance limitations. Attendees will discover unconventional techniques, including clever logging, targeted instrumentation, and specialized metrics, to pinpoint bottlenecks accurately. Real-world use cases will be presented to demonstrate the effectiveness of these methods. By the end of the session, attendees will be equipped with alternative approaches to identify bottlenecks and optimize their low-latency data ingestion workflows for high throughput.

Mitigating the Impact of State Management in Cloud Stream Processing Systems

ScyllaDB

Stream processing is a crucial component of modern data infrastructure, but constructing an efficient and scalable stream processing system can be challenging. Decoupling compute and storage architecture has emerged as an effective solution to these challenges, but it can introduce high latency issues, especially when dealing with complex continuous queries that necessitate managing extra-large internal states. In this talk, we focus on addressing the high latency issues associated with S3 storage in stream processing systems that employ a decoupled compute and storage architecture. We delve into the root causes of latency in this context and explore various techniques to minimize the impact of S3 latency on stream processing performance. Our proposed approach is to implement a tiered storage mechanism that leverages a blend of high-performance and low-cost storage tiers to reduce data movement between the compute and storage layers while maintaining efficient processing. Throughout the talk, we will present experimental results that demonstrate the effectiveness of our approach in mitigating the impact of S3 latency on stream processing. By the end of the talk, attendees will have gained insights into how to optimize their stream processing systems for reduced latency and improved cost-efficiency.

Measuring the Impact of Network Latency at Twitter

ScyllaDB

Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...

ScyllaDB

BlazingMQ is a new open source* distributed message queuing system developed at and published by Bloomberg. It provides highly-performant queues to applications for asynchronous, efficient, and reliable communication. This system has been used at scale at Bloomberg for eight years, where it moves terabytes of data and billions of messages across tens of thousands of queues in production every day. BlazingMQ provides highly-available, fault-tolerant queues courtesy of replication based on the Raft consensus algorithm. In addition, it provides a rich set of enterprise message routing strategies, enabling users to implement a variety of scenarios for message processing. Written in C++ from the ground up, BlazingMQ has been architected with low latency as one of its core requirements. This has resulted in some unique design and implementation choices at all levels of the system, such as its lock-free threading model, custom memory allocators, compact wire protocol, multi-hop network topology, and more. This talk will provide an overview of BlazingMQ. We will then delve into the system’s core design principles, architecture, and implementation details in order to explore the crucial role they play in its performance and reliability. *BlazingMQ will be released as open source between now and P99 (exact timing is still TBD)

Noise Canceling RUM by Tim Vereecke, Akamai

ScyllaDB

Noisy Real User Monitoring (RUM) data can ruin your P99! We introduce a fresh concept called ""Human Visible Navigations"" (HVN) to tackle this risk; we focus on the experiences you actually care about when talking about the speed of our sites: - Human: We exclude noise coming from bots and synthetic measurements. - Visible: We remove any partial or fully hidden experiences. These tend to be very slow but users don’t see this slowness. - Navigations: We ignore lightning fast back-forward navigations which usually have few optimisation opportunities. Adopting Human Visible Navigations provides you with these key benefits: - Fewer changes staying below the radar - Fewer data fluctuations - Fewer blindspots when finding bottlenecks - Better correlation with business metrics This is supported by plenty of real world examples coming from the world's largest scale modeling site (6M Monthly visits) in combination with aggregated data from the brand new rumarchive.com (open source) After attending this session; your P99 and other percentiles will become less noisy and easier to tune!

Running a Go App in Kubernetes: CPU Impacts

ScyllaDB

Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...

ScyllaDB

In this session, Tanel introduces a new open source eBPF tool for efficiently sampling both on-CPU events and off-CPU events for every thread (task) in the OS. Linux standard performance tools (like perf) allow you to easily profile on-CPU threads doing work, but if we want to include the off-CPU timing and reasons for the full picture, things get complicated. Combining eBPF task state arrays with periodic sampling for profiling allows us to get both a system-level overview of where threads spend their time, even when blocked and sleeping, and allow us to drill down into individual thread level, to understand why.

Performance Budgets for the Real World by Tammy Everts

ScyllaDB

Performance budgets have been around for more than ten years. Over those years, we’ve learned a lot about what works, what doesn’t, and what we need to improve. In this session, Tammy revisits old assumptions about performance budgets and offers some new best practices. Topics include: • Understanding performance budgets vs. performance goals • Aligning budgets with user experience • Pros and cons of Core Web Vitals • How to stay on top of your budgets to fight regressions

Using Libtracecmd to Analyze Your Latency and Performance Troubles

ScyllaDB

Trying to figure out why your application is responding late can be difficult, especially if it is because of interference from the operating system. This talk will briefly go over how to write a C program that can analyze what in the Linux system is interfering with your application. It will use trace-cmd to enable kernel trace events as well as tracing lock functions, and it will then go over a quick tutorial on how to use libtracecmd to read the created trace.dat file to uncover what is the cause of interference to you application.

Reducing P99 Latencies with Generational ZGC

ScyllaDB

With the low-latency garbage collector ZGC, GC pause times are no longer a big problem in Java. With sub-millisecond pause times there are instead other things in the GC and JVM that can cause application threads to experience unexpected latencies. This talk will dig into a specific use where the GC pauses are no longer the cause of unexpected latencies and look at how adding generations to ZGC help lower the p99 application latencies.

5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X

ScyllaDB

Linters are a type of database! They are a collection of lint rules — queries that look for rule violations to report — plus a way to execute those queries over a source code dataset. This is a case study about using database ideas to build a linter that looks for breaking changes in Rust library APIs. Maintainability and performance are key: new Rust releases tend to have mutually-incompatible ways of representing API information, and we cannot afford to reimplement and optimize dozens of rules for each Rust version separately. Fortunately, databases don't require rewriting queries when the underlying storage format or query plan changes! This allows us to ship massive optimizations and support multiple Rust versions without making any changes to the queries that describe lint rules. Ship now, optimize later"" can be a sustainable development practice after all — join us to see how!

How Netflix Builds High Performance Applications at Global Scale

ScyllaDB

Conquering Load Balancing: Experiences from ScyllaDB Drivers

ScyllaDB

Load balancing seems simple on the surface, with algorithms like round-robin, but the real world loves throwing curveballs. Join me in this session as we delve into the intricacies of load balancing within ScyllaDB Drivers. Discover firsthand experiences from our journey in driver development, where we employed the Power of Two Choices algorithm, optimized the implementation of load balancing in Rust Driver, mitigated cloud costs through zone-aware load balancing and combated the issue of overloading a particular core of ScyllaDB. Be prepared to delve into the practical and theoretical aspects of load balancing, gaining valuable insights along the way.

Interaction Latency: Square's User-Centric Mobile Performance Metric

ScyllaDB

Mobile performance metrics often take inspiration from the backend world and measure resource usage (CPU usage, memory usage, etc) and workload durations (how long a piece of code takes to run). However, mobile apps are used by humans and the app performance directly impacts their experience, so we should primarily track user-centric mobile performance metrics. Following the lead of tech giants, the mobile industry at large is now adopting the tracking of app launch time and smoothness (jank during motion). At Square, our customers spend most of their time in the app long after it's launched, and they don't scroll much, so app launch time and smoothness aren't critical metrics. What should we track instead? This talk will introduce you to Interaction Latency, a user-centric mobile performance metric inspired from the Web Vital metric Interaction to Next Paint"" (web.dev/inp). We'll go over why apps need to track this, how to properly implement its tracking (it's tricky!), how to aggregate this metric and what thresholds you should target.

How to Avoid Learning the Linux-Kernel Memory Model

ScyllaDB

The Linux-kernel memory model (LKMM) is a powerful tool for developing highly concurrent Linux-kernel code, but it also has a steep learning curve. Wouldn't it be great to get most of LKMM's benefits without the learning curve? This talk will describe how to do exactly that by using the standard Linux-kernel APIs (locking, reference counting, RCU) along with a simple rules of thumb, thus gaining most of LKMM's power with less learning. And the full LKMM is always there when you need it!

99.99% of Your Traces are Trash by Paige Cruz

ScyllaDB

Distributed tracing is still finding its footing in many organizations today, one challenge to overcome is the data volume - keeping 100% of your traces is expensive and unnecessary. Enter sampling - head vs tail how do you decide? Let’s look at the design of Sifter and get familiar with why tail-based sampling is the way to enact a cost-effective tracing solution while actually increasing the system’s observability.

Square's Lessons Learned from Implementing a Key-Value Store with Raft

ScyllaDB

To put it simply, Raft is used to make a use case (e.g., key-value store, indexing system) more fault tolerant to increase availability using replication (despite server and network failures). Raft has been gaining ground due to its simplicity without sacrificing consistency and performance. Although we'll cover Raft's building blocks, this is not about the Raft algorithm; it is more about the micro-lessons one can learn from building fault-tolerant, strongly consistent distributed systems using Raft. Things like majority agreement rule (quorum), write-ahead log, split votes & randomness to reduce contention, heartbeats, split-brain syndrome, snapshots & logs replay, client requests dedupe & idempotency, consistency guarantees (linearizability), leases & stale reads, batching & streaming, parallelizing persisting & broadcasting, version control, and more! And believe it or not, you might be using some of these techniques without even realizing it! This is inspired by Raft paper (raft.github.io), publications & courses on Raft, and an attempt to implement a key-value store using Raft as a side project.

Making Python 100x Faster with Less Than 100 Lines of Rust

ScyllaDB

A Deep Dive Into Concurrent React by Matheus Albuquerque

ScyllaDB

The Latency Stack: Discovering Surprising Sources of Latency

ScyllaDB

Usually, when an API call is slow, developers blame ourselves and our code. We held a lock too long, or used a blocking operation, or built an inefficient query. But often, the simple picture of latency as “the time a server takes to process a message” hides a great deal of end-to-end complexity. Debugging tail latencies requires unpacking the abstractions that we normally ignore: virtualization, hidden queues, and network behavior. In this talk, I’ll describe how developers can diagnose more sources of delay and failure by building a more realistic and broad understanding of networked services. I’ll give some real-world cases when high end-to-end latency or elevated failure rates occurred due to factors we ordinarily might not even measure. Some examples include TCP SYN retransmission; virtualization on the client; and surprising behavior from AWS load balancers. Unfortunately, many measurement techniques don’t cover anything but the portion most directly under developer control. But developers can do better by comparing multiple measurements, applying Little’s law, investing in eBPF probes, and paying attention to the network layer. Understanding API performance to find and fix issues faster ultimately means understanding the entire stack: the client, your code, and the underlying infrastructure.

More from ScyllaDB (20)

Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...

Mitigating the Impact of State Management in Cloud Stream Processing Systems

Measuring the Impact of Network Latency at Twitter

Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...

Noise Canceling RUM by Tim Vereecke, Akamai

Running a Go App in Kubernetes: CPU Impacts

Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...

Performance Budgets for the Real World by Tammy Everts

Using Libtracecmd to Analyze Your Latency and Performance Troubles

Reducing P99 Latencies with Generational ZGC

5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X

How Netflix Builds High Performance Applications at Global Scale

Conquering Load Balancing: Experiences from ScyllaDB Drivers

Interaction Latency: Square's User-Centric Mobile Performance Metric

How to Avoid Learning the Linux-Kernel Memory Model

99.99% of Your Traces are Trash by Paige Cruz

Square's Lessons Learned from Implementing a Key-Value Store with Raft

Making Python 100x Faster with Less Than 100 Lines of Rust

A Deep Dive Into Concurrent React by Matheus Albuquerque

The Latency Stack: Discovering Surprising Sources of Latency

Recently uploaded

Calgary MuleSoft Meetup APM and IDP .pptx

ishalveerrandhawa1

INDIAN AIR FORCE FIGHTER PLANES LIST.pdf

jackson110191

TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In

TrustArc

Six months into 2024, and it is clear the privacy ecosystem takes no days off!! Regulators continue to implement and enforce new regulations, businesses strive to meet requirements, and technology advances like AI have privacy professionals scratching their heads about managing risk. What can we learn about the first six months of data privacy trends and events in 2024? How should this inform your privacy program management for the rest of the year? Join TrustArc, Goodwin, and Snyk privacy experts as they discuss the changes we’ve seen in the first half of 2024 and gain insight into the concrete, actionable steps you can take to up-level your privacy program in the second half of the year. This webinar will review: - Key changes to privacy regulations in 2024 - Key themes in privacy and data governance in 2024 - How to maximize your privacy program in the second half of 2024

RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx

SynapseIndia

What's New in Copilot for Microsoft365 May 2024.pptx

Stephanie Beckett

Research Directions for Cross Reality Interfaces

Mark Billinghurst

How Social Media Hackers Help You to See Your Wife's Message.pdf

HackersList

Quantum Communications Q&A with Gemini LLM

Vijayananda Mohire

20240705 QFM024 Irresponsible AI Reading List June 2024

Matthew Sinclair

20240702 QFM021 Machine Intelligence Reading List June 2024

Matthew Sinclair

How RPA Help in the Transportation and Logistics Industry.pptx

SynapseIndia

Implementations of Fused Deposition Modeling in real world

Emerging Tech

The presentation showcases the diverse real-world applications of Fused Deposition Modeling (FDM) across multiple industries: 1. **Manufacturing**: FDM is utilized in manufacturing for rapid prototyping, creating custom tools and fixtures, and producing functional end-use parts. Companies leverage its cost-effectiveness and flexibility to streamline production processes. 2. **Medical**: In the medical field, FDM is used to create patient-specific anatomical models, surgical guides, and prosthetics. Its ability to produce precise and biocompatible parts supports advancements in personalized healthcare solutions. 3. **Education**: FDM plays a crucial role in education by enabling students to learn about design and engineering through hands-on 3D printing projects. It promotes innovation and practical skill development in STEM disciplines. 4. **Science**: Researchers use FDM to prototype equipment for scientific experiments, build custom laboratory tools, and create models for visualization and testing purposes. It facilitates rapid iteration and customization in scientific endeavors. 5. **Automotive**: Automotive manufacturers employ FDM for prototyping vehicle components, tooling for assembly lines, and customized parts. It speeds up the design validation process and enhances efficiency in automotive engineering. 6. **Consumer Electronics**: FDM is utilized in consumer electronics for designing and prototyping product enclosures, casings, and internal components. It enables rapid iteration and customization to meet evolving consumer demands. 7. **Robotics**: Robotics engineers leverage FDM to prototype robot parts, create lightweight and durable components, and customize robot designs for specific applications. It supports innovation and optimization in robotic systems. 8. **Aerospace**: In aerospace, FDM is used to manufacture lightweight parts, complex geometries, and prototypes of aircraft components. It contributes to cost reduction, faster production cycles, and weight savings in aerospace engineering. 9. **Architecture**: Architects utilize FDM for creating detailed architectural models, prototypes of building components, and intricate designs. It aids in visualizing concepts, testing structural integrity, and communicating design ideas effectively. Each industry example demonstrates how FDM enhances innovation, accelerates product development, and addresses specific challenges through advanced manufacturing capabilities.

7 Most Powerful Solar Storms in the History of Earth.pdf

Enterprise Wired

Comparison Table of DiskWarrior Alternatives.pdf

Andrey Yasko

The Increasing Use of the National Research Platform by the CSU Campuses

Larry Smarr

BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL

Liveplex

How to Build a Profitable IoT Product.pptx

Adam Dunkels

Details of description part II: Describing images in practice - Tech Forum 2024

BookNet Canada

This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator. Link to presentation recording and transcript: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/ Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.

Pigging Solutions Sustainability brochure.pdf

Pigging Solutions

Sustainability requires ingenuity and stewardship. Did you know Pigging Solutions pigging systems help you achieve your sustainable manufacturing goals AND provide rapid return on investment. How? Our systems recover over 99% of product in transfer piping. Recovering trapped product from transfer lines that would otherwise become flush-waste, means you can increase batch yields and eliminate flush waste. From raw materials to finished product, if you can pump it, we can pig it.

Recent Advancements in the NIST-JARVIS Infrastructure

KAMAL CHOUDHARY

Recently uploaded (20)

Calgary MuleSoft Meetup APM and IDP .pptx

INDIAN AIR FORCE FIGHTER PLANES LIST.pdf

TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In

RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx

What's New in Copilot for Microsoft365 May 2024.pptx

Research Directions for Cross Reality Interfaces

How Social Media Hackers Help You to See Your Wife's Message.pdf

Quantum Communications Q&A with Gemini LLM

20240705 QFM024 Irresponsible AI Reading List June 2024

20240702 QFM021 Machine Intelligence Reading List June 2024

How RPA Help in the Transportation and Logistics Industry.pptx

Implementations of Fused Deposition Modeling in real world

7 Most Powerful Solar Storms in the History of Earth.pdf

Comparison Table of DiskWarrior Alternatives.pdf

The Increasing Use of the National Research Platform by the CSU Campuses

BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL

How to Build a Profitable IoT Product.pptx

Details of description part II: Describing images in practice - Tech Forum 2024

Pigging Solutions Sustainability brochure.pdf

Recent Advancements in the NIST-JARVIS Infrastructure

Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and Stop Paying too Much!

1. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company How to Optimize Inter-DC communication Solution Architect, ScyllaDB Eyal Gutkind

2. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Eyal Gutkind 2 Head of Solution Architects team at ScyllaDB. Previously, held Product Management roles at Mirantis and DataStax. Prior to DataStax I spent 12 years with Mellanox Technologies in various engineering management and Product Marketing roles. I have a BSc. degree in Electrical and Computer Engineering from Ben Gurion University and an MBA from Fuqua School of Business at Duke University

3. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company What’s Inter-Datacenter Communication? 3 USA Data Center Asia Data Center

4. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company What Do You Pay for? 4

5. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company What Do You Pay for? 5

6. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Snitch, Keyspace and Client Drivers Settings 6 - Simple vs NetworkTopology Strategies EC2MultiRegionSnitch - for AWS based deployments, GossipingPropertyFileSnitch - for all other deployments

7. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Snitch, Keyspace and Client Drivers Settings - Simple vs NetworkTopology Strategies CREATE KEYSPACE myks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '7'} 7 USA Data Center Asia Data Center

8. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Snitch, Keyspace and Client Drivers Settings - Simple vs NetworkTopology Strategies CREATE KEYSPACE myks WITH replication = {'class': 'NetworkToplogyStrategy', 'usa’: '3', ‘asia’: ‘3’} 8 USA Data Center Asia Data Center

9. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Snitch, Keyspace and Client Drivers Settings - Simple vs NetworkTopology Strategies - Be Local! - Set Load balancing correctly in your clients 9 load_balancing_policy= TokenAwarePolicy(DCAwareRoundRobinPolicy(local_dc='asia'))

10. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Snitch, Keyspace and Client Drivers Settings - Simple vs NetworkTopology Strategies - Be Local! - Set Load balancing correctly in your clients - Set Consistency levels in your queries! 10 insert_query = SimpleStatement( "INSERT INTO myks.mytable (user_id, name, address, zip_code) VALUES (%s, %s, %s, %s)", consistency_level=ConsistencyLevel.LOCAL_QUORUM) …)

11. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Snitch, Keyspace and Client Drivers Settings - Simple vs NetworkTopology Strategies - Be Local! - Set Load balancing correctly in your clients - Set Consistency levels in your queries! 11

12. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Inter-Node Compression! Default : None Use: DC In your Scylla.yaml file uncomment and set internode_compression: dc 12

13. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Summary - As expected, you always pay :) - Create keyspaces and tables in the regions you need them - Enable conversations between ops and dev for best use cases - Review client geo locations and data access patterns - Who is writing? (Cheap) - Who is reading? (Expensive) - Enable compression! 13

14. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company THANK YOU eyal@scylladb.com @gutkinde Please stay in touch Any questions?

Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and Stop Paying too Much!

Related slideshows

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and Stop Paying too Much!

Similar to Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and Stop Paying too Much! (20)

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and Stop Paying too Much!