The session will cover the best practices to migrate existing data from Apache Cassandra to Scylla and how to do it while being online all of the time.
If You Care About Performance, Use User Defined TypesScyllaDB
Shlomi Livne, VP of R&D at ScyllaDB, presented on the performance benefits of using user-defined types (UDTs) in ScyllaDB. He explained that with traditional columns, each column has overhead and flexibility comes at a price. However, with frozen UDTs, the columns are treated as a single unit, sharing metadata and improving performance. Livne showed results of a test where UDTs with many fields outperformed traditional columns with the same number of fields. However, he noted that Scylla's row cache and Java driver performance need improvement for UDTs.
Scylla Summit 2017: Stateful Streaming Applications with Apache Spark ScyllaDB
When working with streaming data, stateful operations are a common use case. If you would like to perform data de-duplication, calculate aggregations over event-time windows, track user activity over sessions, you are performing a stateful operation.
Apache Spark provides users with a high level, simple to use DataFrame/Dataset API to work with both batch and streaming data. The funny thing about batch workloads is that people tend to run these batch workloads over and over again. Structured Streaming allows users to run these same workloads, with the exact same business logic in a streaming fashion, helping users answer questions at lower latencies.
In this talk, we will focus on stateful operations with Structured Streaming and we will demonstrate through live demos, how NoSQL stores can be plugged in as a fault tolerant state store to store intermediate state, as well as used as a streaming sink, where the output data can be stored indefinitely for downstream applications.
Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the W...ScyllaDB
In my talk, I will present the different compaction strategies that Scylla provides, and demonstrate when it is appropriate and when it is inappropriate to use each one. I will then present a new compaction strategy that we designed as a lesson from the existing compaction strategies by picking the best features of the existing strategies while avoiding their problems.
Scylla Summit 2017: Saving Thousands by Running Scylla on EC2 Spot InstancesScyllaDB
Scylla and Spotinst together provide a strong combination of extreme performance and cost reduction. In this talk, we will present how a Scylla cluster can be used on AWS’s EC2 Spot without losing consistency with the help of Spotinst prediction technology and advanced stateful features. We will show a live demo on how to run Scylla on the Spotinst platform.
Scylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data CenterScyllaDB
Frank will share the motivation behind the 3D XPoint memory, the current shipping Optane SSD product and key values of why it is better than NAND-based SSDs, and a few use cases that exist in the Open Source space for Database usages of Optane SSDs.
Scylla Summit 2017: How to Use Gocql to Execute Queries and What the Driver D...ScyllaDB
This document outlines a presentation on using the GoCQL driver to execute queries against Cassandra and Scylla databases. It discusses connecting to a Cassandra cluster, executing queries, iterating over results, and using asynchronous queries. It also mentions some additional Cassandra libraries built on top of GoCQL, including gocqlx for data binding and queries, and gocassa for queries and migrations. The presentation aims to explain how GoCQL works behind the scenes and how to get started with basic querying functionality.
Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...ScyllaDB
Benchmarks are fun to do but when going to production, all sorts of things can happen: anything from hardware outages to human error bringing your database down. Even in a healthy database, a lot of maintenance operations have to periodically run. Do you have the tools necessary to make sure you are good to go?
Scylla Summit 2017: A Toolbox for Understanding Scylla in the FieldScyllaDB
In this talk, we will share useful tools and techniques that we are using in the field to understand Scylla clusters. Users will learn how to use those same tools to better understand their deployment.
Some of the questions that will be answered are:
- how to find out which queries are the slowest and why
- how we go about understanding the impact of the data model in a node's performance
- how to check which resources are the bottlenecks in the cluster
Kubernetes is a declarative system for automatically deploying, managing, and scaling applications and their dependencies. In this short talk, I'll demonstrate a small Scylla cluster running in Google Compute Engine via Kubernetes and our publicly-published Docker images.
Scylla Summit 2017: SMF: The Fastest RPC in the WestScyllaDB
On a quest to build the fastest durable log broker in the west, we had to rethink all of the components needed to deliver on this promise. First, we began by building the fastest RPC system in the west, SMF. SMF is a new RPC mechanism, IDL-compiler, and libraries that make using Seastar easy. In this talk, I will cover SMF in detail and show a live demo on how you can get started using it to build your next application so you can live in the future.
Scylla Summit 2017: How to Run Cassandra/Scylla from a MySQL DBA's Point of ViewScyllaDB
Are you a MySQL DBA or DevOps individual being asked to run Cassandra or Scylla? Feeling overwhelmed? In this talk, I will present Cassandra/Scylla operations in terms that directly relate to MySQL. I will show you comparisons between the Information Schema and the Cassandra/Scylla System keyspace(s). I will also talk about metrics available in MySQL versus Cassandra/Scylla and how to retrieve them. Finally, I will talk about how MySQL replication compares with Cassandra replication. Hopefully, when I am done you will be able to relate to Cassandra operations in a practical and useful way.
Scylla Summit 2017: Scylla for Mass Simultaneous Sensor Data Processing of ME...ScyllaDB
We will share Scylla adoption practices in equipment sensor data management of MES, Data Modeling Tips, Data Architecture using Scylla, configurations, and tunings.
Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and S...ScyllaDB
The document appears to be a presentation on optimizing inter-data center communication. It discusses key topics like what inter-data center communication involves, the costs associated with it, best practices for setting snitches, keyspaces, client drivers and consistency levels for queries to optimize performance between data centers. It recommends using network topology replication strategies over simple strategies for multi-region deployments, setting load balancing and consistency levels appropriately in clients, and enabling internode compression to reduce costs of communication between data centers. The presentation encourages reviewing client locations, data access patterns, who is reading/writing data, and having conversations between operations and development teams to determine the best use cases.
Scylla Summit 2017: Snapfish's Journey Towards ScyllaScyllaDB
Snapfish, a web-based photo and printing service, will walk through their evaluation process for a new database, discuss use cases, and how they plan to use Scylla in their production systems.
Scylla Summit 2017: Running a Soft Real-time Service at One Million QPSScyllaDB
AdGear runs an ad tech gateway at more than one million queries per second to Scylla and recently transitioned from Apache Cassandra. In this talk, we will highlight the tools and languages that we use (Erlang), how we do bulk imports, and how performance compares between the two database engines.
ScyllaDB CTO Avi Kivity gave a keynote on how Scylla has evolved. He discussed new features in Scylla 2.0—including Materialized Views and Heat-Weighted Load Balancing, changes in monitoring—and shared our product roadmap. He also talked about our recent acquisition of Seastar.io and how it will enable us to deliver a database-as-a-service offering.
Scylla Summit 2017: Welcome and Keynote - Nextgen NoSQLScyllaDB
Our CEO and co-founder Dor Laor and our chairman Benny Schnaider sharing their vision for Scylla. This was also our opportunity to announce Scylla 2.0. Our latest release is a big step toward the first autonomous NoSQL database—one that dynamically tunes itself to varying conditions while always maintaining a high level of performance.
Scylla Summit 2017: How Baidu Runs Scylla on a Petabyte-Level Big Data PlatformScyllaDB
In this presentation, I'll speak of the benefits of running Scylla on our Big Data environment which stores over 500TB of data as well as using Scylla as the indexing engine to replace MongoDB and Cassandra for our log data analysis platform.
Scylla Summit 2017 Keynote: NextGen NoSQL with Chairman Benny SchnaiderScyllaDB
The document summarizes Benny Schnaider's presentation as the Chairman of NEXTGEN NOSQL. It discusses the evolution of NoSQL databases, with early generations having inefficiencies and issues that required workarounds. The presentation introduces Scylla, a next-generation NoSQL database that was built from the ground up by storage and operating systems experts to massively scale modern applications. Scylla leverages 20 years of database evolution and is implemented in C++ to provide better performance, stability and the ability to scale out across infrastructure.
Scylla Summit 2017: The Upcoming HPC EvolutionScyllaDB
In this talk, I will explain how HPC is beginning to evolve and how we use supercomputers to monitor supercomputers. First we will look at how HPC is different from cloud computing in terms of infrastructure and application architecture. Then I will discuss how those things are changing and why. Finally, I will dive into a use case of monitoring supercomputers as an application area for Scylla.
RDBMS to NoSQL: Practical Advice from Successful MigrationsScyllaDB
When and how to migrate data from SQL to NoSQL are matters of much debate. It can certainly be a daunting task, but when your SQL systems hit architectural limits or your Aurora expenses skyrocket, it’s probably time to consider the move.
See a discussion of how best to migrate data from SQL to NoSQL, and how to get heterogenous data systems to communicate with each other effectively in real time. Get important architectural considerations, tips and tricks and several real-world use cases.
From this webinar you will learn:
Key differences between RDBMS and NoSQL, and how to know when it’s time to migrate
How to harness the greatest strengths out of both classes of databases, SQL and NoSQL
Migration techniques proven in the field
Modeling differences between RDBMS and NoSQL
Managing releases in NoSQL vs RDBMS
Scylla features and services that help with migrating from a relational database
Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQLScyllaDB
Apache Kafka is a high-throughput distributed streaming platform that is being adopted by hundreds of companies to manage their real-time data. KSQL is an open source streaming SQL engine that implements continuous, interactive queries against Apache Kafka™. KSQL makes it easy to read, write and process streaming data in real-time, at scale, using SQL-like semantics. In my talk, I will discuss streaming ETL from Kafka into stores like Apache Cassandra using KSQL.
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...HostedbyConfluent
Core banking systems are batch oriented: typically with heavy overnight batch cycles before business opens each morning. In this talk I will explain some of the common interface points between core-banking infrastructure and event streaming systems. Then I will focus on how to do stream processing using ksqlDB for core-banking shaped data: showing how to do common operation using various ksqlDB functions. The key features are avro-record keys and multi-key joins (ksqlDB 0.15), schema management and state store planning.
Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...ScyllaDB
JanusGraph, a highly scalable graph database solution, supports historically Cassandra and HBase as database backends. We decided to put Scylla in the mix, certainly searching for the best performing backend. We ran test scenarios that cover high volume reads and writes. In this talk, we will show you the performance results of Scylla vs others and also share our lessons learned during the performance evaluation.
PartnerSkillUp_Enable a Streaming CDC SolutionTimothy Spann
PartnerSkillUp_Enable a Streaming CDC Solution
Tim Spann
Principal Developer Advocate in Data In Motion for Cloudera, Global
https://attend.cloudera.com/skillupseriesseptember14
Streaming Change Data Capture (CDC) Two Unique Ways
In this next session,
learn how to use Debezium with Flink, Kafka, and NiFi for Change Data Capture using two different mechanisms: Kafka Connect and Flink SQL.
With the virtual nature of today's world, streaming data is more critical than ever. Join Cloudera Chief Data-In-Motion Principal, Tim Spann, and Partner Solution Engineer, Salvador Alamazan as they look closely at key CDC use cases, discuss why Debezium is the best option for handling CDC and use examples to show you how to demonstrate value.
This is a must-attend experience!
Zoom Webinar
September 14, 2023
10:00am–11:00am EDT
FLaNK Stack
Apache NiFi
Apache Flink
Apache Kafka
Kafka Connect
Flink SQL
Cloudera DataFlow
Cloudera SQL Stream Builder
Cloudera Streams Messages Manager
Debezium
Postgresql
IBM DB2
Oracle DB
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
Apache Kafka is a popular distributed streaming data platform. A Kafka cluster stores streams of records (messages) in categories called topics. It is the architectural backbone of modern data analytics. Data flowing into Kafka often originates from native data streams such as social media streams, telemetry data, financial transactions and many others. But these data streams only contain part of the information. A lot of data necessary in stream processing is stored in traditional systems backed by relational databases. To implement new and modern, real-time solutions, an up-to-date view of that information is needed. So how do we make sure that information can flow between the RDMBS and Kafka, so that changes are available in Kafka as soon as possible in near-real-time? It this session, we present different approaches for integrating relational databases with Kafka, such as Kafka Connect, Oracle GoldenGate and bridging Kafka with Oracle Advanced Queuing (AQ).
Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply such as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge amount of data and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above.
GumGum relies heavily on Cassandra for storing different kinds of metadata. Currently GumGum reaches 1 billion unique visitors per month using 3 Cassandra datacenters in Amazon Web Services spread across the globe.
This presentation will detail how we scaled out from one local Cassandra datacenter to a multi-datacenter Cassandra cluster and all the problems we encountered and choices we made while implementing it.
How did we architect multi-region Cassandra in AWS? What were our experiences in implementing multi-datacenter Cassandra? How did we achieve low latency with multi-region Cassandra and the Datastax Driver? What are the different Cassandra use cases at GumGum? How did we integrate our Cassandra with Spark?
KSQL is a stream processing SQL engine, which allows stream processing on top of Apache Kafka. KSQL is based on Kafka Stream and provides capabilities for consuming messages from Kafka, analysing these messages in near-realtime with a SQL like language and produce results again to a Kafka topic. By that, no single line of Java code has to be written and you can reuse your SQL knowhow. This lowers the bar for starting with stream processing significantly.
KSQL offers powerful capabilities of stream processing, such as joins, aggregations, time windows and support for event time. In this talk I will present how KSQL integrates with the Kafka ecosystem and demonstrate how easy it is to implement a solution using KSQL for most part. This will be done in a live demo on a fictitious IoT sample.
Witsml data processing with kafka and spark streamingMark Kerzner
This document summarizes a presentation about using Kafka and Spark Streaming to process real-time well data in WITSML format. It discusses WITSML data standards, using Kafka as a messaging system to ingest WITSML data from rigs and service companies, and Spark Streaming to consume Kafka topics and apply rules to detect anomalies and send alerts. Visualizing the data in real-time using Highcharts javascript is also covered. Lessons learned focus on improving data partitioning and managing producer/consumer services.
Polyglot ClickHouse -- ClickHouse SF Meetup Sept 10Altinity Ltd
This document summarizes Robert Hodges' presentation on integrating ClickHouse with remote data sources. It discusses how ClickHouse can be used as a polyglot database to access data from MySQL, Kafka, S3, Snowflake and other sources using database engines, table engines, table functions and dictionaries. Specific examples are provided on accessing MySQL data, consuming messages from Kafka topics, reading and writing to S3 files, and experimental connections to Snowflake via ODBC. The presentation emphasizes that ClickHouse's polyglot capabilities are improving continuously and encourages testing new integrations.
This document provides an overview of NoSQL databases, including:
- NoSQL databases are non-relational and do not require fixed schemas like SQL databases.
- They are useful for large, unstructured datasets and provide high scalability and availability.
- Cassandra is a popular open-source NoSQL database that uses a column-oriented data model and eventual consistency.
- Hector is a Java client that provides an API for Cassandra and handles connection pooling.
- NoSQL databases sacrifice features like joins and ACID transactions in exchange for scalability and high availability.
This document summarizes a comparison of indexing between Oracle and SQL Server databases. It describes how indexes are structured differently in each platform, with Oracle using PCTFREE to control free space in blocks and SQL Server using FILLFACTOR. Tests were conducted inserting and deleting data in each to observe how indexes are impacted. The results showed that Oracle indexes were less affected by fragmentation while SQL Server indexes experienced more page splits leading to fragmentation issues. Maintaining indexes also differed, with SQL Server potentially facing more challenges with its clustered index structure.
Apache Cassandra Lunch #74: ScyllaDB - Peter CorlessAnant Corporation
In Cassandra Lunch #74, Technical Marketing Manager at ScyllaDB, Peter Corless, presents on ScyllaDB and some of the advantages of using ScyllaDB over open-source Cassandra.
Accompanying Blog: Coming Soon!
Accompanying YouTube: https://youtu.be/9s83yDMGcbI
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Cassandra Lunch Weekly at 12 PM EST Every Wednesday: https://www.meetup.com/Cassandra-DataStax-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Cassandra.Lunch:
https://github.com/Anant/Cassandra.Lunch
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Join The Anant Team:
https://www.careers.anant.us
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLScyllaDB
Event streaming applications unlock new benefits by combining various data feeds. However, getting actionable insights in a timely fashion has remained a challenge, as the data has been siloed in disparate systems. ksqlDB solves this by providing an interactive SQL interface that can seamlessly combine and transform data from various sources.
In this webinar, we will show how streaming queries of high throughput NoSQL systems can derive insights from various push/pull queries via ksqlDB's User-Defined Functions, Aggregate Functions and Table Functions.
CQRS: A More Effective Way of Writing the Same ApplicationsCodeFest
The document discusses CQRS (Command Query Responsibility Segregation), an architectural pattern that separates read and write operations into different models. It aims to address complexity that arises from having a single model handle all aspects of a business domain. CQRS keeps write and read operations separated using different data models and stacks, which can simplify design and improve scalability. The document outlines different flavors of CQRS implementations from basic to more advanced using events and event sourcing.
Event streaming applications unlock new benefits by combining various data feeds. However, getting actionable insights in a timely fashion has remained a challenge, as the data has been siloed in disparate systems. ksqlDB solves this by providing an interactive SQL interface that can seamlessly combine and transform data from various sources.
In this webinar, we will show how streaming queries of high throughput NoSQL systems can derive insights from various push/pull queries via ksqlDB's User-Defined Functions, Aggregate Functions and Table Functions.
Watch this to learn:
Real-world examples of the benefits of using a streaming database like ksqlDB and seamlessly combining data from Kafka & Cassandra/Scylla (NoSQL).
The functionality of ksqlDB via push/pull queries and UDFs/UDAFs/UDTFs.
The ease with which data stored in a NoSQL database can be transformed using ksqlDB and then persisted back for long-term storage.
Netflix migrated to the cloud to avoid single points of failure and to focus on their core competencies. They chose Amazon Web Services and migrated non-sensitive data and applications to the cloud. Netflix picked SimpleDB and S3 as their data stores in the cloud. Migrating from an RDBMS required translating relational concepts like normalization to key-value stores and working around issues with SimpleDB like lack of data types and transactions.
Landoop presenting how to simplify your ETL process using Kafka Connect for (E) and (L). Introducing KCQL - the Kafka Connect Query Language & how it can simplify fast-data (ingress & egress) pipelines. How KCQL can be used to set up Kafka Connectors for popular in-memory and analytical systems and live demos with HazelCast, Redis and InfluxDB. How to get started with a fast-data docker kafka development environment. Enhance your existing Cloudera (Hadoop) clusters with fast-data capabilities.
This webinar introduces Project Alternator, ScyllaDB's open-source DynamoDB-compatible API. The presenters are Dor Laor, CEO of ScyllaDB, Avi Kivity, CTO of ScyllaDB, and Nadav Har'El. The agenda includes discussions of what Alternator is, live demos of running Alternator and migrating from DynamoDB to Alternator, and how the Alternator implementation works and its current compatibility. The goal of Alternator is to allow applications designed for DynamoDB to run on ScyllaDB for better performance and flexibility.
Similar to Scylla Summit 2017: Migrating to Scylla From Cassandra and Others With No Downtime (20)
Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...ScyllaDB
In this presentation, we explore how standard profiling and monitoring methods may fall short in identifying bottlenecks in low-latency data ingestion workflows. Instead, we showcase the power of simple yet clever methods that can uncover hidden performance limitations.
Attendees will discover unconventional techniques, including clever logging, targeted instrumentation, and specialized metrics, to pinpoint bottlenecks accurately. Real-world use cases will be presented to demonstrate the effectiveness of these methods. By the end of the session, attendees will be equipped with alternative approaches to identify bottlenecks and optimize their low-latency data ingestion workflows for high throughput.
Mitigating the Impact of State Management in Cloud Stream Processing SystemsScyllaDB
Stream processing is a crucial component of modern data infrastructure, but constructing an efficient and scalable stream processing system can be challenging. Decoupling compute and storage architecture has emerged as an effective solution to these challenges, but it can introduce high latency issues, especially when dealing with complex continuous queries that necessitate managing extra-large internal states.
In this talk, we focus on addressing the high latency issues associated with S3 storage in stream processing systems that employ a decoupled compute and storage architecture. We delve into the root causes of latency in this context and explore various techniques to minimize the impact of S3 latency on stream processing performance. Our proposed approach is to implement a tiered storage mechanism that leverages a blend of high-performance and low-cost storage tiers to reduce data movement between the compute and storage layers while maintaining efficient processing.
Throughout the talk, we will present experimental results that demonstrate the effectiveness of our approach in mitigating the impact of S3 latency on stream processing. By the end of the talk, attendees will have gained insights into how to optimize their stream processing systems for reduced latency and improved cost-efficiency.
Measuring the Impact of Network Latency at TwitterScyllaDB
Widya Salim and Victor Ma will outline the causal impact analysis, framework, and key learnings used to quantify the impact of reducing Twitter's network latency.
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...ScyllaDB
BlazingMQ is a new open source* distributed message queuing system developed at and published by Bloomberg. It provides highly-performant queues to applications for asynchronous, efficient, and reliable communication. This system has been used at scale at Bloomberg for eight years, where it moves terabytes of data and billions of messages across tens of thousands of queues in production every day.
BlazingMQ provides highly-available, fault-tolerant queues courtesy of replication based on the Raft consensus algorithm. In addition, it provides a rich set of enterprise message routing strategies, enabling users to implement a variety of scenarios for message processing.
Written in C++ from the ground up, BlazingMQ has been architected with low latency as one of its core requirements. This has resulted in some unique design and implementation choices at all levels of the system, such as its lock-free threading model, custom memory allocators, compact wire protocol, multi-hop network topology, and more.
This talk will provide an overview of BlazingMQ. We will then delve into the system’s core design principles, architecture, and implementation details in order to explore the crucial role they play in its performance and reliability.
*BlazingMQ will be released as open source between now and P99 (exact timing is still TBD)
Noise Canceling RUM by Tim Vereecke, AkamaiScyllaDB
Noisy Real User Monitoring (RUM) data can ruin your P99!
We introduce a fresh concept called ""Human Visible Navigations"" (HVN) to tackle this risk; we focus on the experiences you actually care about when talking about the speed of our sites:
- Human: We exclude noise coming from bots and synthetic measurements.
- Visible: We remove any partial or fully hidden experiences. These tend to be very slow but users don’t see this slowness.
- Navigations: We ignore lightning fast back-forward navigations which usually have few optimisation opportunities.
Adopting Human Visible Navigations provides you with these key benefits:
- Fewer changes staying below the radar
- Fewer data fluctuations
- Fewer blindspots when finding bottlenecks
- Better correlation with business metrics
This is supported by plenty of real world examples coming from the world's largest scale modeling site (6M Monthly visits) in combination with aggregated data from the brand new rumarchive.com (open source)
After attending this session; your P99 and other percentiles will become less noisy and easier to tune!
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...ScyllaDB
In this session, Tanel introduces a new open source eBPF tool for efficiently sampling both on-CPU events and off-CPU events for every thread (task) in the OS. Linux standard performance tools (like perf) allow you to easily profile on-CPU threads doing work, but if we want to include the off-CPU timing and reasons for the full picture, things get complicated. Combining eBPF task state arrays with periodic sampling for profiling allows us to get both a system-level overview of where threads spend their time, even when blocked and sleeping, and allow us to drill down into individual thread level, to understand why.
Performance Budgets for the Real World by Tammy EvertsScyllaDB
Performance budgets have been around for more than ten years. Over those years, we’ve learned a lot about what works, what doesn’t, and what we need to improve. In this session, Tammy revisits old assumptions about performance budgets and offers some new best practices. Topics include:
• Understanding performance budgets vs. performance goals
• Aligning budgets with user experience
• Pros and cons of Core Web Vitals
• How to stay on top of your budgets to fight regressions
Using Libtracecmd to Analyze Your Latency and Performance TroublesScyllaDB
Trying to figure out why your application is responding late can be difficult, especially if it is because of interference from the operating system. This talk will briefly go over how to write a C program that can analyze what in the Linux system is interfering with your application. It will use trace-cmd to enable kernel trace events as well as tracing lock functions, and it will then go over a quick tutorial on how to use libtracecmd to read the created trace.dat file to uncover what is the cause of interference to you application.
Reducing P99 Latencies with Generational ZGCScyllaDB
With the low-latency garbage collector ZGC, GC pause times are no longer a big problem in Java. With sub-millisecond pause times there are instead other things in the GC and JVM that can cause application threads to experience unexpected latencies. This talk will dig into a specific use where the GC pauses are no longer the cause of unexpected latencies and look at how adding generations to ZGC help lower the p99 application latencies.
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000XScyllaDB
Linters are a type of database! They are a collection of lint rules — queries that look for rule violations to report — plus a way to execute those queries over a source code dataset.
This is a case study about using database ideas to build a linter that looks for breaking changes in Rust library APIs. Maintainability and performance are key: new Rust releases tend to have mutually-incompatible ways of representing API information, and we cannot afford to reimplement and optimize dozens of rules for each Rust version separately. Fortunately, databases don't require rewriting queries when the underlying storage format or query plan changes! This allows us to ship massive optimizations and support multiple Rust versions without making any changes to the queries that describe lint rules.
Ship now, optimize later"" can be a sustainable development practice after all — join us to see how!
How Netflix Builds High Performance Applications at Global ScaleScyllaDB
We all want to build applications that are blazingly fast. We also want to scale them to users all over the world. Can the two happen together? Can users in the slowest of environments also get a fast experience? Learn how we do this at Netflix: how we understand every user's needs and preferences and build high performance applications that work for every user, every time.
Conquering Load Balancing: Experiences from ScyllaDB DriversScyllaDB
Load balancing seems simple on the surface, with algorithms like round-robin, but the real world loves throwing curveballs. Join me in this session as we delve into the intricacies of load balancing within ScyllaDB Drivers. Discover firsthand experiences from our journey in driver development, where we employed the Power of Two Choices algorithm, optimized the implementation of load balancing in Rust Driver, mitigated cloud costs through zone-aware load balancing and combated the issue of overloading a particular core of ScyllaDB. Be prepared to delve into the practical and theoretical aspects of load balancing, gaining valuable insights along the way.
Interaction Latency: Square's User-Centric Mobile Performance MetricScyllaDB
Mobile performance metrics often take inspiration from the backend world and measure resource usage (CPU usage, memory usage, etc) and workload durations (how long a piece of code takes to run).
However, mobile apps are used by humans and the app performance directly impacts their experience, so we should primarily track user-centric mobile performance metrics. Following the lead of tech giants, the mobile industry at large is now adopting the tracking of app launch time and smoothness (jank during motion).
At Square, our customers spend most of their time in the app long after it's launched, and they don't scroll much, so app launch time and smoothness aren't critical metrics. What should we track instead?
This talk will introduce you to Interaction Latency, a user-centric mobile performance metric inspired from the Web Vital metric Interaction to Next Paint"" (web.dev/inp). We'll go over why apps need to track this, how to properly implement its tracking (it's tricky!), how to aggregate this metric and what thresholds you should target.
How to Avoid Learning the Linux-Kernel Memory ModelScyllaDB
The Linux-kernel memory model (LKMM) is a powerful tool for developing highly concurrent Linux-kernel code, but it also has a steep learning curve. Wouldn't it be great to get most of LKMM's benefits without the learning curve?
This talk will describe how to do exactly that by using the standard Linux-kernel APIs (locking, reference counting, RCU) along with a simple rules of thumb, thus gaining most of LKMM's power with less learning. And the full LKMM is always there when you need it!
99.99% of Your Traces are Trash by Paige CruzScyllaDB
Distributed tracing is still finding its footing in many organizations today, one challenge to overcome is the data volume - keeping 100% of your traces is expensive and unnecessary. Enter sampling - head vs tail how do you decide? Let’s look at the design of Sifter and get familiar with why tail-based sampling is the way to enact a cost-effective tracing solution while actually increasing the system’s observability.
Square's Lessons Learned from Implementing a Key-Value Store with RaftScyllaDB
To put it simply, Raft is used to make a use case (e.g., key-value store, indexing system) more fault tolerant to increase availability using replication (despite server and network failures). Raft has been gaining ground due to its simplicity without sacrificing consistency and performance.
Although we'll cover Raft's building blocks, this is not about the Raft algorithm; it is more about the micro-lessons one can learn from building fault-tolerant, strongly consistent distributed systems using Raft. Things like majority agreement rule (quorum), write-ahead log, split votes & randomness to reduce contention, heartbeats, split-brain syndrome, snapshots & logs replay, client requests dedupe & idempotency, consistency guarantees (linearizability), leases & stale reads, batching & streaming, parallelizing persisting & broadcasting, version control, and more!
And believe it or not, you might be using some of these techniques without even realizing it!
This is inspired by Raft paper (raft.github.io), publications & courses on Raft, and an attempt to implement a key-value store using Raft as a side project.
A Deep Dive Into Concurrent React by Matheus AlbuquerqueScyllaDB
Writing fluid user interfaces becomes more and more challenging as the application complexity increases. In this talk, we’ll explore how proper scheduling improves your app’s experience by diving into some of the concurrent React features, understanding their rationales, and how they work under the hood.
The Latency Stack: Discovering Surprising Sources of LatencyScyllaDB
Usually, when an API call is slow, developers blame ourselves and our code. We held a lock too long, or used a blocking operation, or built an inefficient query. But often, the simple picture of latency as “the time a server takes to process a message” hides a great deal of end-to-end complexity. Debugging tail latencies requires unpacking the abstractions that we normally ignore: virtualization, hidden queues, and network behavior.
In this talk, I’ll describe how developers can diagnose more sources of delay and failure by building a more realistic and broad understanding of networked services. I’ll give some real-world cases when high end-to-end latency or elevated failure rates occurred due to factors we ordinarily might not even measure. Some examples include TCP SYN retransmission; virtualization on the client; and surprising behavior from AWS load balancers. Unfortunately, many measurement techniques don’t cover anything but the portion most directly under developer control. But developers can do better by comparing multiple measurements, applying Little’s law, investing in eBPF probes, and paying attention to the network layer.
Understanding API performance to find and fix issues faster ultimately means understanding the entire stack: the client, your code, and the underlying infrastructure.
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfNeo4j
Presented at Gartner Data & Analytics, London Maty 2024. BT Group has used the Neo4j Graph Database to enable impressive digital transformation programs over the last 6 years. By re-imagining their operational support systems to adopt self-serve and data lead principles they have substantially reduced the number of applications and complexity of their operations. The result has been a substantial reduction in risk and costs while improving time to value, innovation, and process automation. Join this session to hear their story, the lessons they learned along the way and how their future innovation plans include the exploration of uses of EKG + Generative AI.
Are you interested in dipping your toes in the cloud native observability waters, but as an engineer you are not sure where to get started with tracing problems through your microservices and application landscapes on Kubernetes? Then this is the session for you, where we take you on your first steps in an active open-source project that offers a buffet of languages, challenges, and opportunities for getting started with telemetry data.
The project is called openTelemetry, but before diving into the specifics, we’ll start with de-mystifying key concepts and terms such as observability, telemetry, instrumentation, cardinality, percentile to lay a foundation. After understanding the nuts and bolts of observability and distributed traces, we’ll explore the openTelemetry community; its Special Interest Groups (SIGs), repositories, and how to become not only an end-user, but possibly a contributor.We will wrap up with an overview of the components in this project, such as the Collector, the OpenTelemetry protocol (OTLP), its APIs, and its SDKs.
Attendees will leave with an understanding of key observability concepts, become grounded in distributed tracing terminology, be aware of the components of openTelemetry, and know how to take their first steps to an open-source contribution!
Key Takeaways: Open source, vendor neutral instrumentation is an exciting new reality as the industry standardizes on openTelemetry for observability. OpenTelemetry is on a mission to enable effective observability by making high-quality, portable telemetry ubiquitous. The world of observability and monitoring today has a steep learning curve and in order to achieve ubiquity, the project would benefit from growing our contributor community.
Support en anglais diffusé lors de l'événement 100% IA organisé dans les locaux parisiens d'Iguane Solutions, le mardi 2 juillet 2024 :
- Présentation de notre plateforme IA plug and play : ses fonctionnalités avancées, telles que son interface utilisateur intuitive, son copilot puissant et des outils de monitoring performants.
- REX client : Cyril Janssens, CTO d’ easybourse, partage son expérience d’utilisation de notre plateforme IA plug & play.
The DealBook is our annual overview of the Ukrainian tech investment industry. This edition comprehensively covers the full year 2023 and the first deals of 2024.
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionBert Blevins
Cybersecurity is a major concern in today's connected digital world. Threats to organizations are constantly evolving and have the potential to compromise sensitive information, disrupt operations, and lead to significant financial losses. Traditional cybersecurity techniques often fall short against modern attackers. Therefore, advanced techniques for cyber security analysis and anomaly detection are essential for protecting digital assets. This blog explores these cutting-edge methods, providing a comprehensive overview of their application and importance.
Quality Patents: Patents That Stand the Test of TimeAurora Consulting
Is your patent a vanity piece of paper for your office wall? Or is it a reliable, defendable, assertable, property right? The difference is often quality.
Is your patent simply a transactional cost and a large pile of legal bills for your startup? Or is it a leverageable asset worthy of attracting precious investment dollars, worth its cost in multiples of valuation? The difference is often quality.
Is your patent application only good enough to get through the examination process? Or has it been crafted to stand the tests of time and varied audiences if you later need to assert that document against an infringer, find yourself litigating with it in an Article 3 Court at the hands of a judge and jury, God forbid, end up having to defend its validity at the PTAB, or even needing to use it to block pirated imports at the International Trade Commission? The difference is often quality.
Quality will be our focus for a good chunk of the remainder of this season. What goes into a quality patent, and where possible, how do you get it without breaking the bank?
** Episode Overview **
In this first episode of our quality series, Kristen Hansen and the panel discuss:
⦿ What do we mean when we say patent quality?
⦿ Why is patent quality important?
⦿ How to balance quality and budget
⦿ The importance of searching, continuations, and draftsperson domain expertise
⦿ Very practical tips, tricks, examples, and Kristen’s Musts for drafting quality applications
https://www.aurorapatents.com/patently-strategic-podcast.html
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...Toru Tamaki
Jindong Gu, Zhen Han, Shuo Chen, Ahmad Beirami, Bailan He, Gengyuan Zhang, Ruotong Liao, Yao Qin, Volker Tresp, Philip Torr "A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models" arXiv2023
https://arxiv.org/abs/2307.12980
Best Practices for Effectively Running dbt in Airflow.pdfTatiana Al-Chueyr
As a popular open-source library for analytics engineering, dbt is often used in combination with Airflow. Orchestrating and executing dbt models as DAGs ensures an additional layer of control over tasks, observability, and provides a reliable, scalable environment to run dbt models.
This webinar will cover a step-by-step guide to Cosmos, an open source package from Astronomer that helps you easily run your dbt Core projects as Airflow DAGs and Task Groups, all with just a few lines of code. We’ll walk through:
- Standard ways of running dbt (and when to utilize other methods)
- How Cosmos can be used to run and visualize your dbt projects in Airflow
- Common challenges and how to address them, including performance, dependency conflicts, and more
- How running dbt projects in Airflow helps with cost optimization
Webinar given on 9 July 2024
Best Programming Language for Civil EngineersAwais Yaseen
The integration of programming into civil engineering is transforming the industry. We can design complex infrastructure projects and analyse large datasets. Imagine revolutionizing the way we build our cities and infrastructure, all by the power of coding. Programming skills are no longer just a bonus—they’re a game changer in this era.
Technology is revolutionizing civil engineering by integrating advanced tools and techniques. Programming allows for the automation of repetitive tasks, enhancing the accuracy of designs, simulations, and analyses. With the advent of artificial intelligence and machine learning, engineers can now predict structural behaviors under various conditions, optimize material usage, and improve project planning.
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsMydbops
This presentation, delivered at the Postgres Bangalore (PGBLR) Meetup-2 on June 29th, 2024, dives deep into connection pooling for PostgreSQL databases. Aakash M, a PostgreSQL Tech Lead at Mydbops, explores the challenges of managing numerous connections and explains how connection pooling optimizes performance and resource utilization.
Key Takeaways:
* Understand why connection pooling is essential for high-traffic applications
* Explore various connection poolers available for PostgreSQL, including pgbouncer
* Learn the configuration options and functionalities of pgbouncer
* Discover best practices for monitoring and troubleshooting connection pooling setups
* Gain insights into real-world use cases and considerations for production environments
This presentation is ideal for:
* Database administrators (DBAs)
* Developers working with PostgreSQL
* DevOps engineers
* Anyone interested in optimizing PostgreSQL performance
Contact info@mydbops.com for PostgreSQL Managed, Consulting and Remote DBA Services
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Chris Swan
Have you noticed the OpenSSF Scorecard badges on the official Dart and Flutter repos? It's Google's way of showing that they care about security. Practices such as pinning dependencies, branch protection, required reviews, continuous integration tests etc. are measured to provide a score and accompanying badge.
You can do the same for your projects, and this presentation will show you how, with an emphasis on the unique challenges that come up when working with Dart and Flutter.
The session will provide a walkthrough of the steps involved in securing a first repository, and then what it takes to repeat that process across an organization with multiple repos. It will also look at the ongoing maintenance involved once scorecards have been implemented, and how aspects of that maintenance can be better automated to minimize toil.
Coordinate Systems in FME 101 - Webinar SlidesSafe Software
If you’ve ever had to analyze a map or GPS data, chances are you’ve encountered and even worked with coordinate systems. As historical data continually updates through GPS, understanding coordinate systems is increasingly crucial. However, not everyone knows why they exist or how to effectively use them for data-driven insights.
During this webinar, you’ll learn exactly what coordinate systems are and how you can use FME to maintain and transform your data’s coordinate systems in an easy-to-digest way, accurately representing the geographical space that it exists within. During this webinar, you will have the chance to:
- Enhance Your Understanding: Gain a clear overview of what coordinate systems are and their value
- Learn Practical Applications: Why we need datams and projections, plus units between coordinate systems
- Maximize with FME: Understand how FME handles coordinate systems, including a brief summary of the 3 main reprojectors
- Custom Coordinate Systems: Learn how to work with FME and coordinate systems beyond what is natively supported
- Look Ahead: Gain insights into where FME is headed with coordinate systems in the future
Don’t miss the opportunity to improve the value you receive from your coordinate system data, ultimately allowing you to streamline your data analysis and maximize your time. See you there!
7 Most Powerful Solar Storms in the History of Earth.pdfEnterprise Wired
Solar Storms (Geo Magnetic Storms) are the motion of accelerated charged particles in the solar environment with high velocities due to the coronal mass ejection (CME).
20240705 QFM024 Irresponsible AI Reading List June 2024
Scylla Summit 2017: Migrating to Scylla From Cassandra and Others With No Downtime
1. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Migration To Scylla
From Cassandra
Senior Solutions Architect, ScyllaDB
Alexander Sicular
2. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Alexander "Sasha" Sicular
2
● Over 16 years at Columbia University, the last seven as
Director of Medical Informatics, working in the field of
clinical informatics building EMR's, billing, data
integration and research systems.
● Having extensive experience in relational,
non-relational and distributed databases, Alexander
helps customers get the most out of Scylla as a Senior
Solutions Architect at ScyllaDB.
3. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
3
Agenda
+ Compatibility
+ DB Migration 101
+ Offline migration
+ Live migration
+ Migration From Cassandra to Scylla
+ Migration Tools
+ Best Practice
4. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Compatibility
5. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Scylla Compatibility
5
+ SSTable file format (Compatible to Cassandra 2.1)
+ Configuration file format (Compatible to Cassandra 2.1)
+ CQL language (CQL version 3.3.1)
+ CQL native protocol (CQL version 3.3.1)
+ JMX management protocol (Compatible to Cassandra 2.1)
+ Management command line (nodetool from C* 3.0)
+ All Drivers (Java, C++, Python, Node, Ruby, Go…)
6. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
DB Migration 101
7. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
DB Migration Steps
7
+ Schema Migration
+ Migrating Historical Data (Forklifting)
+ Migrating Live Data (Dual Writes)
+ Validation (Offline and/or Dual Reads)*
+ Fade out old DB
* Optional step
8. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Offline Migration
From DB-OLD to DB-NEW
8
Read from DB-NEW
Read / Write to DB-OLD
Write to DB-NEW
Time
Forklifting Historical Data
Validation*
Fade out
DB-OLDDBs in Sync
Down Time
Migrate Schema
9. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Live Migration
From DB-OLD to DB-NEW
9
Read from DB-OLD
Read from DB-NEW
Dual Reads*
Write to DB-OLD
Write to DB-NEW
Dual Writes
Time
Forklifting Historical Data
Validation*
DBs in Sync
Fade out
DB-OLD
Migrate Schema
10. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Migration Tools
11. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
11
Migration Multi DC cluster
SSTable
Loader
SSTables
CQL
Internal
communication
DC A
DC B
DC C
DC A
DC B
If every Cassandra DC holds the same
information, uploading from one of the DC's
sstables is sufficient.
Dual Write needs to be implemented in all
regions.
Number and RF of DC's does not have to be
preserved.
12. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
12
+ Use DESCRIBE to export each Cassandra Keyspace, Table, UDT (not including
system tables)
+ Cassandra
+ cqlsh "-e DESC SCHEMA" > schema.cql
+ Scylla
+ cqlsh --file ‘schema.cql’
+ When migrating from Cassandra 3.x some schema updates required
Migrate Schema
13. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
13
+ Update the application logic to send each write to both clusters (Cassandra
and Scylla) in parallel
+ Recommendations:
+ Compare the results and log inconsistencies, if any
+ Use client side timestamp
+ Create knobs for each DB writer, allowing you to stop/start writing to each DB in
runtime
+ Rolling application logic upgrade for zero downtime
+ Dual Read can follow the same logic
Dual Write
Client
CQLCQL
14. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
14
Use two different cluster sessions.
#connect to cluster 1
db1 = cassandra.cluster.Cluster(IP_C1).connect()
#connect to cluster 2
db2 = cassandra.cluster.Cluster(IP_C2).connect()
Dual Writes
15. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
15
Two prepared statements, one for each DB session.
#insert statement with explicit TIMESTAMP
insert_statement = "INSERT INTO keyspace.table (c1,c2)
VALUES (?,?) USING TIMESTAMP ?"
#prepared statements
prepared_statement_1 = db1.prepare(insert_statement)
prepared_statement_2 = db2.prepare(insert_statement)
Dual Writes
16. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
16
Create sample values, execute async insert statements.
#rand values, explicitly set a write time in microseconds
values = [random.randrange(0,1000) , str(uuid.uuid4()) , int(time.time()*1000000)]
# build a list of queries
inserts = []
#insert 1st statement into the 1st session
inserts.append(db1.execute_async(prepared_statement_1, values))
#insert 2nd statement into the 2nd session
inserts.append(db2.execute_async(prepared_statement_2, values))
Dual Writes
17. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
17
Return for results, log results and values in array.
# loop over futures and output success/fail
results = []
for i in range(0,len(inserts)):
try:
row = inserts[i].result()
results.append(1)
except Exception:
results.append(0)
results.append(values)
Dual Writes
18. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
18
Check for failures in either write.
#did we have failures?
if (results[0]==0):
#do something
log('Write to cluster 1 failed')
if (results[1]==0):
#do something
log('Write to cluster 2 failed')
Dual Writes
19. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
19
Forklifting Historical Data
+ Install Scylla’s sstableloader on Cassandra nodes, or on intermediate servers
+ Create snapshot of each Cassandra node
+ Run sstableloader from each Cassandra node
sstableloader -x -d [Scylla IP] .../[ks]/[table]
Or, from intermediate servers, using mount to Cassandra filesystem
sstableloader -x -d [scylla IP] .../[mount point] in /[ks]/[table] format
+ Watch for an affect on Cassandra nodes, and use throttling (-t) to limit the
loader throughput
SSTable
Loader
SSTables
CQL
20. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Best Practices
21. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
21
Best Practices
+ Clean up the origin database in advance. Don't waste
time on old data!
+ More data = longer migration time
+ Iterative migration and validation. For example one table,
one region, one user prefix, etc. After validation keep or
delete/restart that dataset
+ At any point: verify and validate. You can always roll back
to the origin DB for any reason
22. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
22
Best Practices… Continued
+ Make sure to have a monitoring stack in place for both
DBs and the application during the entire migration
+ Validate the process by sampling data at different points
+ Before fading out the origin DB, make sure there are no
live connections to it
+ Make sure all relevant users are aware of the process and
limitations (don't update your schema!)
+ Get Scylla involved. We want to help!
23. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
THANK YOU!
siculars@scylladb.com
@siculars
Please stay in touch:
Any questions?