SlideShare a Scribd company logo
Performance Tuning RocksDB
for Kafka Streams’ State Store
Dhruba Borthakur (Rockset), Bruno Cadonna (Confluent)
About the Presenters
Dhruba Borthakur
CTO & Co-founder Rockset
rockset.com
2
Bruno Cadonna
Contributor to Apache Kafka &
Software Engineer at Confluent
confluent.io
Agenda
• Kafka Streams and State Stores
• Introduction to RocksDB
• Compaction Styles in RocksDB
• Possible Operational Issues
• Tuning RocksDB
• RocksDB Command Line Utilities
• Takeaways
3
Kafka Streams and State Stores

Recommended for you

Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices

Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop. It's also enabling many real-time system frameworks and use cases. Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API. Also talk about the best practices involved in running a producer/consumer. In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects. We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing Kafka ACLs and monitoring Consumer offsets.

hadoop summitapache kafkadataworks summit
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive



3 Things to Learn About:
 -How Kudu is able to fill the analytic gap between HDFS and Apache HBase -The trade-offs between real-time transactional access and fast analytic performance -How Kudu provides an option to achieve fast scans and random access from a single API

apache kuduossapache hbase
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller

Presentation at Strata Data Conference 2018, New York The controller is the brain of Apache Kafka. A big part of what the controller does is to maintain the consistency of the replicas and determine which replica can be used to serve the clients, especially during individual broker failure. Jun Rao outlines the main data flow in the controller—in particular, when a broker fails, how the controller automatically promotes another replica as the leader to serve the clients, and when a broker is started, how the controller resumes the replication pipeline in the restarted broker. Jun then describes recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster.

apache kafkajun raostrata
Kafka Streams
5
● Stateless and stateful processors
● Stateful processors use state stores
Kafka Streams
6
● Stateless and stateful processors
● Stateful processors use state stores
Kafka Streams
7
● Stateless and stateful processors
● Stateful processors use state stores
Kafka Streams
8
● Stateless and stateful processors
● Stateful processors use state stores

Recommended for you

Kafka 101
Kafka 101Kafka 101
Kafka 101

What is Kafka What problem does Kafka solve How does Kafka work What are the benefits of Kafka Conclusion

Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)

Kafka is becoming an ever more popular choice for users to help enable fast data and Streaming. Kafka provides a wide landscape of configuration to allow you to tweak its performance profile. Understanding the internals of Kafka is critical for picking your ideal configuration. Depending on your use case and data needs, different settings will perform very differently. Lets walk through performance essentials of Kafka. Let's talk about how your Consumer configuration, can speed up or slow down the flow of messages to Brokers. Lets talk about message keys, their implications and their impact on partition performance. Lets talk about how to figure out how many partitions and how many Brokers you should have. Let's discuss consumers and what effects their performance. How do you combine all of these choices and develop the best strategy moving forward? How do you test performance of Kafka? I will attempt a live demo with the help of Zeppelin to show in real time how to tune for performance.

hadoophadoop summitdataworks summit 2017
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices

Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale. In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.

rocksdbdatabase
Kafka Streams
9
● Stateless and stateful processors
● Stateful processors use state stores
Kafka Streams
10
● Stateless and stateful processors
● Stateful processors use state stores
Kafka Streams
11
● Stateless and stateful processors
● Stateful processors use state stores
● Create one topology per input partition, i.e., task
State Stores in Kafka Streams
12
• Stateful processor may use one or more state
stores
• Each partition has its own state store
Metrics &
De-/Serialization
Caching
Changelogging
Restoration

Recommended for you

Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka

Kafka is well known for high throughput ingestion. However, to get the best latency characteristics without compromising on throughput and durability, we need to tune Kafka. In this talk, we share our experiences to achieve the optimal combination of latency, throughput and durability for different scenarios.

producerperformancekafka
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System

Speaker: Matthias J. Sax, Software Engineer, Confluent ksqlDB is a distributed event streaming database system that allows users to express SQL queries over relational tables and event streams. The project was released by Confluent in 2017 and is hosted on Github and developed with an open-source spirit. ksqlDB is built on top of Apache Kafka®, a distributed event streaming platform. In this talk, we discuss ksqlDB’s architecture that is influenced by Apache Kafka and its stream processing library, Kafka Streams. We explain how ksqlDB executes continuous queries while achieving fault tolerance and high vailability. Furthermore, we explore ksqlDB’s streaming SQL dialect and the different types of supported queries. Matthias J. Sax is a software engineer at Confluent working on ksqlDB. He mainly contributes to Kafka Streams, Apache Kafka's stream processing library, which serves as ksqlDB's execution engine. Furthermore, he helps evolve ksqlDB's "streaming SQL" language. In the past, Matthias also contributed to Apache Flink and Apache Storm and he is an Apache committer and PMC member. Matthias holds a Ph.D. from Humboldt University of Berlin, where he studied distributed data stream processing systems. https://db.cs.cmu.edu/events/quarantine-db-talk-2020-confluent-ksqldb-a-stream-relational-database-system/

ksqldbapache kafkakafka
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...

Flink Forward San Francisco 2022. Probably everyone who has written stateful Apache Flink applications has used one of the fault-tolerant keyed state primitives ValueState, ListState, and MapState. With RocksDB, however, retrieving and updating items comes at an increased cost that you should be aware of. Sometimes, these may not be avoidable with the current API, e.g., for efficient event-time stream-sorting or streaming joins where you need to iterate one or two buffered streams in the right order. With FLIP-220, we are introducing a new state primitive: BinarySortedMultiMapState. This new form of state offers you to (a) efficiently store lists of values for a user-provided key, and (b) iterate keyed state in a well-defined sort order. Both features can be backed efficiently by RocksDB with a 2x performance improvement over the current workarounds. This talk will go into the details of the new API and its implementation, present how to use it in your application, and talk about the process of getting it into Flink. by Nico Kruber

apache flinkstream processingbig data
State Stores in Kafka Streams
13
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
State Stores in Kafka Streams
14
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
State Stores in Kafka Streams
15
01
10
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
State Stores in Kafka Streams
16
01
10
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
• caches records

Recommended for you

Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake

Change Data Capture CDC is a typical use case in Real-Time Data Warehousing. It tracks the data change log -binlog- of a relational database [OLTP], and replay these change log timely to an external storage to do Real-Time OLAP, such as delta/kudu. To implement a robust CDC streaming pipeline, lots of factors should be concerned, such as how to ensure data accuracy , how to process OLTP source schema changed, whether it is easy to build for variety databases with less code.

spark + ai summit

 *
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams

Independent of the source of data, the integration and analysis of event streams gets more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. In this session we compare two popular Streaming Analytics solutions: Spark Streaming and Kafka Streams. Spark is fast and general engine for large-scale data processing and has been designed to provide a more efficient alternative to Hadoop MapReduce. Spark Streaming brings Spark's language-integrated API to stream processing, letting you write streaming applications the same way you write batch jobs. It supports both Java and Scala. Kafka Streams is the stream processing solution which is part of Kafka. It is provided as a Java library and by that can be easily integrated with any Java application. This presentation shows how you can implement stream processing solutions with each of the two frameworks, discusses how they compare and highlights the differences and similarities.

kafka-streamsspark-streamingspark-structured-streaming
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud

This document provides an overview and summary of Amazon S3 best practices and tuning for Hadoop/Spark in the cloud. It discusses the relationship between Hadoop/Spark and S3, the differences between HDFS and S3 and their use cases, details on how S3 behaves from the perspective of Hadoop/Spark, well-known pitfalls and tunings related to S3 consistency and multipart uploads, and recent community activities related to S3. The presentation aims to help users optimize their use of S3 storage with Hadoop/Spark frameworks.

awsemrhadoop
01
10
State Stores in Kafka Streams
17
01
10
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
• caches records
• writes records to changelog
01
10
State Stores in Kafka Streams
18
01
10
01
10
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
• caches records
• writes records to changelog
01
10
State Stores in Kafka Streams
19
01
10
01
10
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
• caches records
• writes records to changelog
• writes records to local state store
01
10
State Stores in Kafka Streams
20
01
10
01
10
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
• caches records
• writes records to changelog
• writes records to local state store
• State stores are restored from changelog
topics
• Restoration is byte-based and by-passes
wrapping layers

Recommended for you

Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines

Kafka is a high-throughput, fault-tolerant, scalable platform for building high-volume near-real-time data pipelines. This presentation is about tuning Kafka pipelines for high-performance. Select configuration parameters and deployment topologies essential to achieve higher throughput and low latency across the pipeline are discussed. Lessons learned in troubleshooting and optimizing a truly global data pipeline that replicates 100GB data under 25 minutes is discussed.

kafkaperformancebig data
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)

Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.

kafka tutorialkafka trainingkafka architecture
Apache kafka
Apache kafkaApache kafka
Apache kafka

This document provides an introduction to Apache Kafka, an open-source distributed event streaming platform. It discusses Kafka's history as a project originally developed by LinkedIn, its use cases like messaging, activity tracking and stream processing. It describes key Kafka concepts like topics, partitions, offsets, replicas, brokers and producers/consumers. It also gives examples of how companies like Netflix, Uber and LinkedIn use Kafka in their applications and provides a comparison to Apache Spark.

apache kafkastreamingmessage broker
RocksDB is the Default State Store
• Kafka Streams needed a write optimized state store
• Kafka Streams 2.6 uses RocksDB 5.18.4
• Kafka Streams provides metrics to monitor RocksDB state stores
• RocksDB can be configured by passing a class that implements interface
RocksDBConfigSetter to configuration rocksdb.config.setter
21
Example: Configuring RocksDB in Kafka Streams
22
public static class MyRocksDBConfig implements RocksDBConfigSetter {
@Override
public void setConfig(final String storeName,
final Options options,
final Map<String, Object> configs) {
// e.g. set compaction style
options.setCompactionStyle(CompactionStyle.LEVEL);
}
@Override
public void close(final String storeName, final Options options) {}
}
Introduction to RocksDB
What is RocksDB?
• Key-value persistent store
• Embedded C++ & Java library
• Server workloads
24

Recommended for you

Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...

Structured Streaming has proven to be the best platform for building distributed stream processing applications. Its unified SQL/Dataset/DataFrame APIs and Spark’s built-in functions make it easy for developers to express complex computations. Delta Lake, on the other hand, is the best way to store structured data because it is a open-source storage layer that brings ACID transactions to Apache Spark and big data workloads Together, these can make it very easy to build pipelines in many common scenarios. However, expressing the business logic is only part of the larger problem of building end-to-end streaming pipelines that interact with a complex ecosystem of storage systems and workloads. It is important for the developer to truly understand the business problem that needs to be solved. Apache Spark, being a unified analytics engine doing both batch and stream processing, often provides multiples ways to solve the same problem. So understanding the requirements carefully helps you to architect your pipeline that solves your business needs in the most resource efficient manner. In this talk, I am going examine a number common streaming design patterns in the context of the following questions. WHAT are you trying to consume? What are you trying to produce? What is the final output that the business wants? What are your throughput and latency requirements? WHY do you really have those requirements? Would solving the requirements of the individual pipeline actually solve your end-to-end business requirements? HOW are going to architect the solution? And how much are you willing to pay for it? Clarity in understanding the ‘what and why’ of any problem can automatically much clarity on the ‘how’ to architect it using Structured Streaming and, in many cases, Delta Lake.

* 
apache spark

 *big data

 *ai

 *
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka

This document provides an overview of Apache Kafka. It begins with defining Kafka as a distributed streaming platform and messaging system. It then lists the agenda which includes what Kafka is, why it is used, common use cases, major companies that use it, how it achieves high performance, and core concepts. Core concepts explained include topics, partitions, brokers, replication, leaders, and producers and consumers. The document also provides examples to illustrate these concepts.

kafkaapache kafkadistributed
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData

A presentation in ApacheCon Asia 2022 from Dan Wang and Yingchun Lai. Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store. Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus

c++rocksdbapache
What is it not?
• Not distributed
• No failover
• Not highly-available. If the machine
dies, you lose your data
• Focus on performance
Kafka Streams makes it fault-tolerant
25
RocksDB API
• Keys and values are byte arrays
• Data are stored sorted by key
• Update Operations: Put/Delete/Merge
• Queries: Get/Iterator
26
Log Structured Merge Architecture
27
Periodic
compaction
Read only data
in SSD or disk
Read write data
in RAM
Transaction log
Scan request from
application
Write request from
application
RocksDB Write Path
28
Write request
Read only
MemTable
Log
Log
sst sst sst
sst sst sst
LS
Compaction
Flush
SwitchSwitch
Active
MemTable Log

Recommended for you

Training Slides: Basics 103: The Power of Tungsten Connector / Proxy
Training Slides: Basics 103: The Power of Tungsten Connector / ProxyTraining Slides: Basics 103: The Power of Tungsten Connector / Proxy
Training Slides: Basics 103: The Power of Tungsten Connector / Proxy

Tungsten Connector / Proxy for MySQL is truly the secret sauce for the Tungsten Clustering solution. Join us for a basic 30min introduction and tour of Tungsten Connector / Proxy, and gain an understanding of the various SQL routing methods available in Tungsten Connector / Proxy. AGENDA - Review the cluster architecture - Understand the role of the Connector - Explore Connector routing methods - Discuss user authentication - Review configuration files and their locations - Explore the command line interface

trainingcontinuentcontinuent tungsten
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams

Using Kafka streaming platform to build a scalable, highly available, decouple realtime data pipeline at scale

kafkakafka streamsbig data
SQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPSQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTP

Deep dive deck given at SQL Saturday Manchester 2015 covering Checkpoint File Pairs, Hash Indexign etc.

RocksDB Reads
• Data can be in memory or disk
• Consult multiple files to find the latest
instance of the key
• Use bloom filters to reduce IO
• Every sst file has a bloom filter
• bloom filters are cached in memory
• default config: eliminates 99% of reads
29
RocksDB Read Path
30
Read only
MemTable Log
Log
sst sst sst
LS
Compaction
Flush
Active
MemTable
Log
sst sst sst
Memory
Persistent
Storage
Read
request
Get(k)
Blooms
RocksDB Architecture
31
Read only
MemTable
Log
Log
sst sst sst
LS
Compaction
Flush
Active
MemTable
Log
sst sst
Memory
Persistent
Storage
sst
Switch Switch
Write
request
Read only
BlockCache
Read
request
RocksDB Open & Pluggable
32
Pluggable
compaction
Pluggable sst
data format on
storage
Pluggable
Memtable
format in RAM
Transaction log
Blooms
Customizable
WAL
Get or scan request
from application
Write request from
application

Recommended for you

Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stack

Rocketfuel processes over 120 billion ad auctions per day and needs to detect fraud in real time to prevent losses. They developed Helios, which ingests event data from Kafka and HDFS into Storm in real time, joins the streams in HBase, then runs MapReduce jobs hourly to populate an OLAP cube for analyzing feature vectors and detecting fraud patterns. This architecture on Hadoop allows them to easily scale real-time processing and experiment with different configurations to quickly react to fraud.

hadoop summit
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)

During this session Greg Brandt and Liyin Tang, Data Infrastructure engineers from Airbnb, will discuss the design and architecture of Airbnb's streaming ETL infrastructure, which exports data from RDS for MySQL and DynamoDB into Airbnb's data warehouse, using a system called SpinalTap. We will also discuss how we leverage Spark Streaming to compute derived data from tracking topics and/or database tables, and HBase to provide immediate data access and generate cleanly time-partitioned Hive tables.

aws re:invent 2016awsdatabases
Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling Out

This is the version of my Apache Performance Tuning deck I last presented at ApacheCon, in 2008 in Amsterdam.

open sourceapachenetwork performance
Compaction Styles in RocksDB
What is Compaction
• Multi-threaded
• Parallel compactions on different parts of the database
• Deletes overwritten keys
• Two types of compactions
• level compactions
• universal compaction
34
Level compaction
• RocksDB default compaction is Level Compaction (for read heavy workloads)
• Stores data in multiple levels
• More recent data stored in L0
• Older data stored in Lmax
• Files in L0
• overlapping keys, sorted by flush time
• Files in L1 to Lmax
• non overlapping keys, sorted by key
• Max space amplification = 10%
https://github.com/facebook/rocksdb/wiki/Leveled-Compaction
35
Universal Compaction
• For write heavy workloads
• needed if Level style compaction is bottlenecked by disk throughout
• Stores all files in L0
• All files are arranged in time order
• Decreases write amplification but increases space amplification
• Pick up files that are chronologically adjacent to one another
• merge them
• replace them with a new file in L0
36

Recommended for you

ActiveMQ 5.9.x new features
ActiveMQ 5.9.x new featuresActiveMQ 5.9.x new features
ActiveMQ 5.9.x new features

This document provides an overview and agenda for a presentation on Apache ActiveMQ 5.9.x and Apache Apollo. The presentation will cover new features in ActiveMQ 5.9.x including AMQP 1.0 support, REST management, a new default file-based store using LevelDB, and high availability replication of the store. It will also introduce Apache Apollo and allow for a question and discussion period.

integrationred hatmessaging
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]

Microservices can provide terabytes of data in microseconds by mapping data from SQL databases into in-memory key-value stores and column key stores within JVMs. This is done through periodic synchronization of changed data from databases into memory and mapping the in-memory data into fast access structures. The in-memory data is then exposed through Java Stream and REST APIs to microservices for high performance querying and analysis of large datasets. This architecture allows microservices to quickly share access to large datasets and restart rapidly by reloading from the synchronized persistent stores.

jvmjava8java
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]

By leveraging memory-mapped files, Speedment and the Chronicle Engine supports large Java maps that easily can exceed the size of your server’s RAM.Because the Java maps are mapped onto files, these maps can be shared instantly between several microservice JVMs and new microservice instances can be added, removed, or restarted very quickly. Data can be retrieved with predictable ultralow latency for a wide range of operations. The solution can be synchronized with an underlying database so that your in-memory maps will be consistently “alive.” The mapped files can be tens of terabytes, which has been done in real-world deployment cases, and a large number of micro services can share these maps simultaneously. Learn more in this session.

jvmjavaone2016
Possible Operational Issues
Operational Issues
• High memory usage
• Application gets slower or even crashes
• Operating system shows high memory usage
• Kafka Streams metrics for monitoring memory
usage of RocksDB (KIP-607, 2.7+), e.g.,
size-all-mem-tables + block-cache-usage +
block-cache-pinned-usage, show high values
38
Operational Issues
• High memory usage
• Application gets slower or even crashes
• Operating system shows high memory usage
• Kafka Streams metrics for monitoring memory
usage of RocksDB (KIP-607, 2.7+), e.g.,
size-all-mem-tables + block-cache-usage +
block-cache-pinned-usage, show high values
• High disk usage
• Application crashes with I/O errors
• Operating system shows high disk usage
• total-sst-files-size (KIP-607, 2.7+)
39
Operational Issues
• High disk I/O
• Operating system shows high disk I/O
• Kafka Streams metrics with high values
• memtable-bytes-flushed-[rate | total]
• bytes-[read | written]-compaction-rate
• Kafka Streams metrics with low values
• memtable-hit-ratio
• block-cache-[data | index | filter]-hit-ratio
40

Recommended for you

Fault tolerance
Fault toleranceFault tolerance
Fault tolerance

Spark Streaming provides fault-tolerance through checkpointing and write ahead logs (WAL). Checkpointing saves metadata and generated RDDs to reliable storage to recover from driver failures. WAL saves all received data to log files to enable zero data loss recovery from executor failures. Structured Streaming uses checkpointing for fault-tolerance. Kafka achieves fault-tolerance through replication of partitions across brokers. Flume uses durable file channels and redundant topologies. HDFS replicates blocks across multiple machines. The Lambda architecture handles batch and real-time data through separate batch and speed layers that are merged in the serving layer.

hadoopapache sparkapache kafka
Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the Cloud

Getting started with Riak in the Cloud involves provisioning a Riak cluster on Engine Yard and optimizing it for performance. Key steps include choosing instance types like m1.large or m1.xlarge that are EBS-optimized, having at least 5 nodes, setting the ring size to 256, disabling swap, using the Bitcask backend, enabling kernel optimizations, and monitoring and backing up the cluster. Benchmarks show best performance from high I/O instance types like hi1.4xlarge that use SSDs rather than EBS storage.

riconriak
How is Kafka so Fast?
How is Kafka so Fast?How is Kafka so Fast?
How is Kafka so Fast?

I explain the key points of Kafka that makes it a very fast and reliable distributed message bus even using mechanical disks.

kafkassdzerocopy
Operational Issues
• High disk I/O
• Operating system shows high disk I/O
• Kafka Streams metrics with high values
• memtable-bytes-flushed-[rate | total]
• bytes-[read | written]-compaction-rate
• Kafka Streams metrics with low values
• memtable-hit-ratio
• block-cache-[data | index | filter]-hit-ratio
• Write stalls
• Processing latency of the application increases
• Kafka Streams client gets kicked out of the group
• Kafka Streams metric write-stall-duration-[avg | total] shows high values
41
Operational Issues
• Too many open files
• Application crashes with I/O errors
• Kafka Streams metric number-open-files shows high values
42
Tuning RocksDB
Debug Kafka Streams OOM
• Memory consumption
• memtable (for writes)
• memtable size, number of memtables
• block cache (reads)
• configure to share among all the partitions in the kafka store
• Kafka Streams keeps index blocks in the block cache
• rocksdb-java bugs (https://github.com/facebook/rocksdb/issues/6247)
• High disk usage
• Use level compaction instead of universal compaction
• provision more disk space
https://docs.confluent.io/current/streams/developer-guide/memory-mgmt.html
44

Recommended for you

Oracle DB
Oracle DBOracle DB
Oracle DB

Oracle has evolved from its first release in 1979 to become a leading database with various editions that can be used by individuals, workgroups or enterprises, and it provides developer tools and supports different database structures, security mechanisms, SQL for data access and transactions. Key components of an Oracle database include control files, data files, redo log files, tablespaces that logically organize storage, and various memory and file structures.

Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem

The document discusses large-scale stream processing in the Hadoop ecosystem. It provides examples of real-time stream processing use cases for computing player statistics and analyzing telco network data. It then summarizes several open source stream processing frameworks, including Apache Storm, Samza, Kafka Streams, Spark, Flink, and Apex. Key aspects like programming models, fault tolerance methods, and performance are compared for each framework. The document concludes with recommendations for further innovation in areas like dynamic scaling and batch integration.

Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016

Distributed stream processing is one of the hot topics in big data analytics today. An increasing number of applications are shifting from traditional static data sources to processing the incoming data in real-time. Performing large scale stream analysis requires specialized tools and techniques which have become widely available in the last couple of years. This talk will give a deep, technical overview of the Apache stream processing landscape. We compare several frameworks including Flink , Spark, Storm, Samza and Apex. Our goal is to highlight the strengths and weaknesses of the individual systems in a project-neutral manner to help selecting the best tools for the specific applications. We will touch on the topics of API expressivity, runtime architecture, performance, fault-tolerance and strong use-cases for the individual frameworks. This talk is targeted towards anyone interested in streaming analytics either from user’s or contributor’s perspective. The attendees can expect to get a clear view of the available open-source stream processing architectures

big dataanalyticsspark
Debug writes stalls
• Debug write stalls in RocksDB
• Is disk IO utilization at 100%?
• add more storage spindles
• use universal compaction
• Check number of background compaction threads
• Kafka Streams uses Max(2, number of available processors) by default
• Check memtable configuration
• AdvancedColumnFamilyOptions.max_write_buffer_number
• ColumnFamilyOptions.write_buffer_size
45
Debugging file descriptor issues
• Too many open files
• DBOptions.max_open_files = -1 (default)
• opens all sst files at db open time
• good for performance but can run out of file descriptors
• Increase operating system number of open file descriptors
• Set DBOptions.max_open_files = 10000
• will open a max of 10000 files concurrently
• Decrease number of files by making each file larger
• AdvancedColumnFamilyOptions.target_file_size_base = 128 MB (default is 64 MB)
46
RocksDB Command Line Utilities
Build rocksdb command line utilities
git clone git@github.com :facebook/rocksdb.git
cd rocksdb
make ldb sst_dump
cp ldb /usr/local/bin
cp sst_dump /usr/ local/bin
48
Useful RocksDB command line tools:
https://github.com/facebook/rocksdb/wiki/Administration-and-Data-Access-Tool
Build
# change these values accordingly
APP_ID=my-app
STATE_STORE=my-counts
STATE_STORE_DIR =/tmp/kafka-streams
TASKS=$(ls $STATE_STORE_DIR/$APP_ID)
Change These Values

Recommended for you

Monitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildMonitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the Wild

Tim Vaillancourt is a senior technical operations architect specializing in MongoDB. He has over 10 years of experience tuning Linux for database workloads and monitoring technologies like Nagios, MRTG, Munin, Zabbix, Cacti, and Graphite. He discussed the various MongoDB storage engines including MMAPv1, WiredTiger, RocksDB, and TokuMX. Key metrics for monitoring the different engines include lock ratio, page faults, background flushing times, checkpoints/compactions, replication lag, and scanned/moved documents. High-level operating system metrics like CPU, memory, disk, and network utilization are also important for ensuring MongoDB has sufficient resources.

Apache Spark Components
Apache Spark ComponentsApache Spark Components
Apache Spark Components

Spark Streaming allows processing live data streams using small batch sizes to provide low latency results. It provides a simple API to implement complex stream processing algorithms across hundreds of nodes. Spark SQL allows querying structured data using SQL or the Hive query language and integrates with Spark's batch and interactive processing. MLlib provides machine learning algorithms and pipelines to easily apply ML to large datasets. GraphX extends Spark with an API for graph-parallel computation on property graphs.

VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...
VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...
VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...

This document provides an overview and agenda for a presentation on virtualizing SQL Server workloads on VMware vSphere. The presentation will cover designing SQL Server virtual machines for performance in production environments, consolidating multiple SQL Server workloads, and ensuring SQL Server availability using vSphere features. It emphasizes understanding the workload, optimizing for storage and network performance, avoiding swapping, using large memory pages, and accounting for NUMA when configuring SQL Server virtual machines.

best practicesvmworldsql
Useful commands
# View all keys
for i in $TASKS; do
ldb --db=$STATE_STORE_DIR/$APP_ID/$i/rocksdb/$STATE_STORE
scan 2>/dev/null;
done
# Show table properties
for i in $TASKS; do
TABLE_PROPERTIES=$(sst_dump
--file=$STATE_STORE_DIR/$APP_ID/$i/rocksdb/$STATE_STORE
--show_properties)
echo -e "Table properties for task:
$in$TABLE_PROPERTIESnn"
done
49
Useful commands- Example output
50
# example output
Table properties for task: 1_9
from [] to []
Process /tmp/kafka-streams/ my-app/1_9/rocksdb/my-counts/000006.sst
Sst file format: block-based
Table Properties :
------------------------------
# data blocks: 1
# entries: 2
raw key size: 18
raw average key size: 9.000000
raw value size: 88
raw average value size: 44.000000
data block size: 125
index block size: 35
filter block size: 0
(estimated) table size: 160
Takeaways
Takeaways
• RocksDB is the default state store in Kafka Streams
• Kafka Streams provides functionality to configure and monitor RocksDB
• RocksDB uses a log structured merge (LSM) architecture with different compaction
styles
• You might run into operational issues, but you can solve them by debugging and tuning
RocksDB
• RocksDB offers command line utilities for analysing state stores
52

Recommended for you

Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of state

The different challenges of working with state in a distributed streaming data pipeline and how we solve it with the 3S architecture and Kafka streams stores based on rocksDB

kafka streamskafkabig data
Il Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazioneIl Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazione

Per costruire applicazioni di AI affidabili, sicure e governate occorre una base dati in tempo reale altrettanto solida. Ancor più quando ci troviamo a gestire ingenti flussi di dati in continuo movimento. Come arrivarci? Affidati a una vera piattaforma di data streaming che ti permetta di scalare e creare rapidamente applicazioni di AI in tempo reale partendo da dati affidabili. Scopri di più! Non perdere il nostro prossimo webinar durante il quale avremo l’occasione di: • Esplorare il paradigma della GenAI e di come questa nuova tecnnologia sta rimodellando il panorama aziendale, rispondendo alla necessità di offrire un contesto e soluzioni in tempo reale che soddisfino le esigenze della tua azienda. • Approfondire le incertezze del panorama dell'AI in evoluzione e l'importanza cruciale del data streaming e dell'elaborazione dati. • Vedere in dettaglio l'architettura in continua evoluzione e il ruolo chiave di Kafka e Confluent nelle applicazioni di AI. • Analizzare i vantaggi di una piattaforma di streaming dei dati come Confluent nel collegare l'eredità legacy e la GenAI, facilitando lo sviluppo e l’utilizzo di AI predittive e generative.

Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...

As businesses strive to remain at the cutting edge of innovation, the demand for scalable and up-to-date conversational AI solutions has become paramount. Generative AI (GenAI) chatbots that seamlessly integrate into our daily lives and adapt to the ever-evolving nuances of human interaction are crucial. Real-time data plays a pivotal role in ensuring the responsiveness and relevance of these chatbots, empowering them to stay abreast of the latest trends, user preferences, and contextual information.

53
Thank you!
dhruba@rockset.com
bruno@confluent.io
cnfl.io/meetups cnfl.io/slackcnfl.io/blog
Learn how Rockset uses RocksDB
https://rockset.com/blog/how-we-use-rocksdb-at-rockset/
Confluent is hiring in Germany!
KSQL Engineer [Remote]
https://jobs.lever.co/confluent/ee6f1b72-37ad-44d9-990c-eaeac77bee8b
• Kafka Streams and ksqlDB
• fully remote
Questions?
Contact me via
• bruno@confluent.io or
• confluentcommunity.slack.com (member ID: UHJ3GBV6G)
54
Rockset is hiring in London
Rockset’s vision is to make the world more data driven. Rockset is built by experts with
decades of experience in web-scale data management and distributed systems. Our team
comprises engineers who helped create the online data and search infrastructure at
Facebook, founded the Hadoop Filesystem project at Yahoo, implemented the Gmail
backend and Kubernetes at Google, and built databases at Oracle.We are looking for
people who are excited by distributed systems and database technologies.
Visit our page at https://rockset.com
Job Openings: https://jobs.lever.co/rockset
55

More Related Content

What's hot

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Ververica
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


Cloudera, Inc.
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
confluent
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Aparna Pillai
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
DataWorks Summit
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
Yoshinori Matsunobu
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
confluent
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Databricks
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
Guido Schmutz
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
Sumant Tambe
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Kumar Shivam
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Saroj Panyasrivanit
 

What's hot (20)

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 

Similar to Performance Tuning RocksDB for Kafka Streams’ State Stores

How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
acelyc1112009
 
Training Slides: Basics 103: The Power of Tungsten Connector / Proxy
Training Slides: Basics 103: The Power of Tungsten Connector / ProxyTraining Slides: Basics 103: The Power of Tungsten Connector / Proxy
Training Slides: Basics 103: The Power of Tungsten Connector / Proxy
Continuent
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
Yoni Farin
 
SQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPSQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTP
Tony Rogerson
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stack
DataWorks Summit/Hadoop Summit
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
Amazon Web Services
 
Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling Out
Sander Temme
 
ActiveMQ 5.9.x new features
ActiveMQ 5.9.x new featuresActiveMQ 5.9.x new features
ActiveMQ 5.9.x new features
Christian Posta
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Malin Weiss
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Speedment, Inc.
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
Thisara Pramuditha
 
Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the Cloud
Ines Sombra
 
How is Kafka so Fast?
How is Kafka so Fast?How is Kafka so Fast?
How is Kafka so Fast?
Ricardo Paiva
 
Oracle DB
Oracle DBOracle DB
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Gyula Fóra
 
Monitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildMonitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the Wild
Tim Vaillancourt
 
Apache Spark Components
Apache Spark ComponentsApache Spark Components
Apache Spark Components
Girish Khanzode
 
VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...
VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...
VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...
VMworld
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of state
Yoni Farin
 

Similar to Performance Tuning RocksDB for Kafka Streams’ State Stores (20)

How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
Training Slides: Basics 103: The Power of Tungsten Connector / Proxy
Training Slides: Basics 103: The Power of Tungsten Connector / ProxyTraining Slides: Basics 103: The Power of Tungsten Connector / Proxy
Training Slides: Basics 103: The Power of Tungsten Connector / Proxy
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
SQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPSQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTP
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stack
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
 
Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling Out
 
ActiveMQ 5.9.x new features
ActiveMQ 5.9.x new featuresActiveMQ 5.9.x new features
ActiveMQ 5.9.x new features
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
 
Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the Cloud
 
How is Kafka so Fast?
How is Kafka so Fast?How is Kafka so Fast?
How is Kafka so Fast?
 
Oracle DB
Oracle DBOracle DB
Oracle DB
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
 
Monitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildMonitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the Wild
 
Apache Spark Components
Apache Spark ComponentsApache Spark Components
Apache Spark Components
 
VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...
VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...
VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of state
 

More from confluent

Il Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazioneIl Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud ConnectorsBreak data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructureBuilding API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
confluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
confluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
confluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
confluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
confluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
confluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
confluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
confluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
confluent
 

More from confluent (20)

Il Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazioneIl Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazione
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud ConnectorsBreak data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud Connectors
 
Building API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructureBuilding API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructure
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 

Recently uploaded

Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
Andrey Yasko
 
Manual | Product | Research Presentation
Manual | Product | Research PresentationManual | Product | Research Presentation
Manual | Product | Research Presentation
welrejdoall
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
rajancomputerfbd
 
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Bert Blevins
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
Enterprise Wired
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
Stephanie Beckett
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
Matthew Sinclair
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
jackson110191
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
Lidia A.
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Stephanie Beckett
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
Liveplex
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 

Recently uploaded (20)

Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
 
Manual | Product | Research Presentation
Manual | Product | Research PresentationManual | Product | Research Presentation
Manual | Product | Research Presentation
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
 
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 

Performance Tuning RocksDB for Kafka Streams’ State Stores

  • 1. Performance Tuning RocksDB for Kafka Streams’ State Store Dhruba Borthakur (Rockset), Bruno Cadonna (Confluent)
  • 2. About the Presenters Dhruba Borthakur CTO & Co-founder Rockset rockset.com 2 Bruno Cadonna Contributor to Apache Kafka & Software Engineer at Confluent confluent.io
  • 3. Agenda • Kafka Streams and State Stores • Introduction to RocksDB • Compaction Styles in RocksDB • Possible Operational Issues • Tuning RocksDB • RocksDB Command Line Utilities • Takeaways 3
  • 4. Kafka Streams and State Stores
  • 5. Kafka Streams 5 ● Stateless and stateful processors ● Stateful processors use state stores
  • 6. Kafka Streams 6 ● Stateless and stateful processors ● Stateful processors use state stores
  • 7. Kafka Streams 7 ● Stateless and stateful processors ● Stateful processors use state stores
  • 8. Kafka Streams 8 ● Stateless and stateful processors ● Stateful processors use state stores
  • 9. Kafka Streams 9 ● Stateless and stateful processors ● Stateful processors use state stores
  • 10. Kafka Streams 10 ● Stateless and stateful processors ● Stateful processors use state stores
  • 11. Kafka Streams 11 ● Stateless and stateful processors ● Stateful processors use state stores ● Create one topology per input partition, i.e., task
  • 12. State Stores in Kafka Streams 12 • Stateful processor may use one or more state stores • Each partition has its own state store Metrics & De-/Serialization Caching Changelogging Restoration
  • 13. State Stores in Kafka Streams 13 • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: Metrics & De-/Serialization Caching Changelogging Restoration
  • 14. State Stores in Kafka Streams 14 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records
  • 15. State Stores in Kafka Streams 15 01 10 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records
  • 16. State Stores in Kafka Streams 16 01 10 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records • caches records
  • 17. 01 10 State Stores in Kafka Streams 17 01 10 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records • caches records • writes records to changelog
  • 18. 01 10 State Stores in Kafka Streams 18 01 10 01 10 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records • caches records • writes records to changelog
  • 19. 01 10 State Stores in Kafka Streams 19 01 10 01 10 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records • caches records • writes records to changelog • writes records to local state store
  • 20. 01 10 State Stores in Kafka Streams 20 01 10 01 10 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records • caches records • writes records to changelog • writes records to local state store • State stores are restored from changelog topics • Restoration is byte-based and by-passes wrapping layers
  • 21. RocksDB is the Default State Store • Kafka Streams needed a write optimized state store • Kafka Streams 2.6 uses RocksDB 5.18.4 • Kafka Streams provides metrics to monitor RocksDB state stores • RocksDB can be configured by passing a class that implements interface RocksDBConfigSetter to configuration rocksdb.config.setter 21
  • 22. Example: Configuring RocksDB in Kafka Streams 22 public static class MyRocksDBConfig implements RocksDBConfigSetter { @Override public void setConfig(final String storeName, final Options options, final Map<String, Object> configs) { // e.g. set compaction style options.setCompactionStyle(CompactionStyle.LEVEL); } @Override public void close(final String storeName, final Options options) {} }
  • 24. What is RocksDB? • Key-value persistent store • Embedded C++ & Java library • Server workloads 24
  • 25. What is it not? • Not distributed • No failover • Not highly-available. If the machine dies, you lose your data • Focus on performance Kafka Streams makes it fault-tolerant 25
  • 26. RocksDB API • Keys and values are byte arrays • Data are stored sorted by key • Update Operations: Put/Delete/Merge • Queries: Get/Iterator 26
  • 27. Log Structured Merge Architecture 27 Periodic compaction Read only data in SSD or disk Read write data in RAM Transaction log Scan request from application Write request from application
  • 28. RocksDB Write Path 28 Write request Read only MemTable Log Log sst sst sst sst sst sst LS Compaction Flush SwitchSwitch Active MemTable Log
  • 29. RocksDB Reads • Data can be in memory or disk • Consult multiple files to find the latest instance of the key • Use bloom filters to reduce IO • Every sst file has a bloom filter • bloom filters are cached in memory • default config: eliminates 99% of reads 29
  • 30. RocksDB Read Path 30 Read only MemTable Log Log sst sst sst LS Compaction Flush Active MemTable Log sst sst sst Memory Persistent Storage Read request Get(k) Blooms
  • 31. RocksDB Architecture 31 Read only MemTable Log Log sst sst sst LS Compaction Flush Active MemTable Log sst sst Memory Persistent Storage sst Switch Switch Write request Read only BlockCache Read request
  • 32. RocksDB Open & Pluggable 32 Pluggable compaction Pluggable sst data format on storage Pluggable Memtable format in RAM Transaction log Blooms Customizable WAL Get or scan request from application Write request from application
  • 34. What is Compaction • Multi-threaded • Parallel compactions on different parts of the database • Deletes overwritten keys • Two types of compactions • level compactions • universal compaction 34
  • 35. Level compaction • RocksDB default compaction is Level Compaction (for read heavy workloads) • Stores data in multiple levels • More recent data stored in L0 • Older data stored in Lmax • Files in L0 • overlapping keys, sorted by flush time • Files in L1 to Lmax • non overlapping keys, sorted by key • Max space amplification = 10% https://github.com/facebook/rocksdb/wiki/Leveled-Compaction 35
  • 36. Universal Compaction • For write heavy workloads • needed if Level style compaction is bottlenecked by disk throughout • Stores all files in L0 • All files are arranged in time order • Decreases write amplification but increases space amplification • Pick up files that are chronologically adjacent to one another • merge them • replace them with a new file in L0 36
  • 38. Operational Issues • High memory usage • Application gets slower or even crashes • Operating system shows high memory usage • Kafka Streams metrics for monitoring memory usage of RocksDB (KIP-607, 2.7+), e.g., size-all-mem-tables + block-cache-usage + block-cache-pinned-usage, show high values 38
  • 39. Operational Issues • High memory usage • Application gets slower or even crashes • Operating system shows high memory usage • Kafka Streams metrics for monitoring memory usage of RocksDB (KIP-607, 2.7+), e.g., size-all-mem-tables + block-cache-usage + block-cache-pinned-usage, show high values • High disk usage • Application crashes with I/O errors • Operating system shows high disk usage • total-sst-files-size (KIP-607, 2.7+) 39
  • 40. Operational Issues • High disk I/O • Operating system shows high disk I/O • Kafka Streams metrics with high values • memtable-bytes-flushed-[rate | total] • bytes-[read | written]-compaction-rate • Kafka Streams metrics with low values • memtable-hit-ratio • block-cache-[data | index | filter]-hit-ratio 40
  • 41. Operational Issues • High disk I/O • Operating system shows high disk I/O • Kafka Streams metrics with high values • memtable-bytes-flushed-[rate | total] • bytes-[read | written]-compaction-rate • Kafka Streams metrics with low values • memtable-hit-ratio • block-cache-[data | index | filter]-hit-ratio • Write stalls • Processing latency of the application increases • Kafka Streams client gets kicked out of the group • Kafka Streams metric write-stall-duration-[avg | total] shows high values 41
  • 42. Operational Issues • Too many open files • Application crashes with I/O errors • Kafka Streams metric number-open-files shows high values 42
  • 44. Debug Kafka Streams OOM • Memory consumption • memtable (for writes) • memtable size, number of memtables • block cache (reads) • configure to share among all the partitions in the kafka store • Kafka Streams keeps index blocks in the block cache • rocksdb-java bugs (https://github.com/facebook/rocksdb/issues/6247) • High disk usage • Use level compaction instead of universal compaction • provision more disk space https://docs.confluent.io/current/streams/developer-guide/memory-mgmt.html 44
  • 45. Debug writes stalls • Debug write stalls in RocksDB • Is disk IO utilization at 100%? • add more storage spindles • use universal compaction • Check number of background compaction threads • Kafka Streams uses Max(2, number of available processors) by default • Check memtable configuration • AdvancedColumnFamilyOptions.max_write_buffer_number • ColumnFamilyOptions.write_buffer_size 45
  • 46. Debugging file descriptor issues • Too many open files • DBOptions.max_open_files = -1 (default) • opens all sst files at db open time • good for performance but can run out of file descriptors • Increase operating system number of open file descriptors • Set DBOptions.max_open_files = 10000 • will open a max of 10000 files concurrently • Decrease number of files by making each file larger • AdvancedColumnFamilyOptions.target_file_size_base = 128 MB (default is 64 MB) 46
  • 47. RocksDB Command Line Utilities
  • 48. Build rocksdb command line utilities git clone git@github.com :facebook/rocksdb.git cd rocksdb make ldb sst_dump cp ldb /usr/local/bin cp sst_dump /usr/ local/bin 48 Useful RocksDB command line tools: https://github.com/facebook/rocksdb/wiki/Administration-and-Data-Access-Tool Build # change these values accordingly APP_ID=my-app STATE_STORE=my-counts STATE_STORE_DIR =/tmp/kafka-streams TASKS=$(ls $STATE_STORE_DIR/$APP_ID) Change These Values
  • 49. Useful commands # View all keys for i in $TASKS; do ldb --db=$STATE_STORE_DIR/$APP_ID/$i/rocksdb/$STATE_STORE scan 2>/dev/null; done # Show table properties for i in $TASKS; do TABLE_PROPERTIES=$(sst_dump --file=$STATE_STORE_DIR/$APP_ID/$i/rocksdb/$STATE_STORE --show_properties) echo -e "Table properties for task: $in$TABLE_PROPERTIESnn" done 49
  • 50. Useful commands- Example output 50 # example output Table properties for task: 1_9 from [] to [] Process /tmp/kafka-streams/ my-app/1_9/rocksdb/my-counts/000006.sst Sst file format: block-based Table Properties : ------------------------------ # data blocks: 1 # entries: 2 raw key size: 18 raw average key size: 9.000000 raw value size: 88 raw average value size: 44.000000 data block size: 125 index block size: 35 filter block size: 0 (estimated) table size: 160
  • 52. Takeaways • RocksDB is the default state store in Kafka Streams • Kafka Streams provides functionality to configure and monitor RocksDB • RocksDB uses a log structured merge (LSM) architecture with different compaction styles • You might run into operational issues, but you can solve them by debugging and tuning RocksDB • RocksDB offers command line utilities for analysing state stores 52
  • 53. 53 Thank you! dhruba@rockset.com bruno@confluent.io cnfl.io/meetups cnfl.io/slackcnfl.io/blog Learn how Rockset uses RocksDB https://rockset.com/blog/how-we-use-rocksdb-at-rockset/
  • 54. Confluent is hiring in Germany! KSQL Engineer [Remote] https://jobs.lever.co/confluent/ee6f1b72-37ad-44d9-990c-eaeac77bee8b • Kafka Streams and ksqlDB • fully remote Questions? Contact me via • bruno@confluent.io or • confluentcommunity.slack.com (member ID: UHJ3GBV6G) 54
  • 55. Rockset is hiring in London Rockset’s vision is to make the world more data driven. Rockset is built by experts with decades of experience in web-scale data management and distributed systems. Our team comprises engineers who helped create the online data and search infrastructure at Facebook, founded the Hadoop Filesystem project at Yahoo, implemented the Gmail backend and Kubernetes at Google, and built databases at Oracle.We are looking for people who are excited by distributed systems and database technologies. Visit our page at https://rockset.com Job Openings: https://jobs.lever.co/rockset 55