SlideShare a Scribd company logo
Decupling data pipelines at
scale
About Me
Husband, father of 2 +
Co founder and CTO of Coralogix
Love distributed systems
Kafka isn’t only a platform it is a
design philosophy
Love observability
Our data pipeline stats
1GB±
1000 Megabytes of log
data ingested every
second
500K+
500,000 events per
second (EPS)
50TB
1000 MB of log data
ingested every second
Realtime
Near Realtime latency
of less then a few
seconds
Kafka Background
• Brokers form a cluster
• Topics which are broken to partitions
• Partitions replicated and spread across
brokers
• Producers push messages to a topic
• Consumers pull messages from a topic
• Consumer group represent an offset of
a consumer in a partition
• Retention size or age
• Keys decide how data is spread across
partitions
• Compactions keep only the latest
message from the same key
Broker A Broker B Broker C
A note on keys
By default writing messages to Kafka will generate a hash key that will make
sure the messages are spread between all partitions equally
When specifying a key, all messages from that key will be sent to same
partition. This enable to keep order of the same id, stateful transformation,
compaction and building stores in streams.
It is critical to choose your keys wisely, choosing a key that is skewed will
cause a hot partition which will cause a hot broker that could cripple your
cluster.
Old solution - Streamonolith
• Consume messages from Kafka, execute workload and produce back to Kafka
• Running in containers
• Communicate with external sources for puling and updating state/metadata
Service Logic including pulling state
and manipulating the state
RestServer/Logstash
Retention
Memory Store
Healthy scalable real-time streams
“As we see it, a scalable healthy stream is bound to the machine’s
CPU. When traffic increases, the CPU usage should increase linearly”
Yoni Farin, Coralogix
In our case, we were forced to scale out our services despite the CPU
not being a bottleneck. At some point, one of our services had to run
on 240 containers (!). We then decided that we’re due for a redesign.
CPU Bound
CPU Bound serviceExternal side effects
First try - naive approach
On demand cache
a cache that will be populated on-demand with a configurable
expiration time. For each message, the mechanism checked that the
required data was cached and retrieved it when necessary. It did
improve read performance, but some issues persisted.
Streamonolith issues
• X minutes delay from the source of truth by design
• Hard hit after a restart until the cache warms up
• Bad p99 and p95 performance, which affected the average processing
time
• Whenever the external source malfunctioned, the service was
crippled
• Due to updates, external I/O was still mixed with the in-memory
processing.
New design requirement
• Processing rate need to be stable and predictable including P95 and
P99
• Immune to infrastructure glitches
• Decoupled from other technologies/ds
• High available, nodes can go down and up without impact
• CPU Bound scalable services - Efficient in both memory and CPU
• Observable
• Opinionated to make sure other are following the same pattern
Why Kafka streams
• Opinionated
• Easy to get started
• Overall easy to maintain
• JVM process
• Relies only on Kafka.
• Easy to monitor (Kafka metrics, JVM)
• GlobalKTable/KTable provides an embed push-updated cache
• RocksDB battery included
• Fit perfectly to k8s STS
The 3 S’s
Source
Stream
Sink
Source
A source is any process that receive data from
an external source and uses the producer API
to produce data to Kafka:
Rest Server, RDS, Elasticsearch, S3 etc.
Kafka connect implements many of these
sources and is an easy way to start syncing
data to Kafka
KafkaConnect
Stream
Consume data form Kafka
Transform the data
Produce the data to kafka
A stream has many transformations
Stateless transformation
• N -> N map, mapValues
• N -> M flatMap
• N -> 0 peek
Stateful transformation
• Reduce
• Aggregate
• Count
Stream App
Stream read/update of external state
Get data from external state
• Use GlobalKTable to read a
source topic to build a store in
the stream based on rocksDB
• Use Join to enrich the main data
with the state.
Insert/update an external source
• Use a dedicated topic with a
dedicated sync microservice for
insert/update
Sink
Consume data form Kafka using the
ConsumerAPI and write the data to
external source: RDS, Elasticsearch,
S3 etc.
Akka streams alpakka has a great
back pressure mechanism that is easy
to implement
Kafka connect implements different
sinks and is an easy way to start
flushing data from Kafka to an
external source
KafkaConnect
RestServer/Logstash
Full architecture
KafkaConnect
Sourceingeststream
StreamApp
Retention
Compact
GlobalKTable Store
SinkStreamSource
What did we accomplish
• Removed external I/O due to querying data from external sources
• Removed external I/O due to writing data to external sources
• Our services are now CPU bound, with Kafka as their only I/O action
• Increased resiliency against external source glitches. Whenever an
external source is experiencing degraded performance, the stream
will continue to work with the latest data it received
• Reduced the required resources for our services by 80%!
GlobalKTableKTable and interactive queries
GlobalKTable will read the entire topic in each of the workers and can
serve as an on-push cache for your stream app using join
KTable is similar just sharded using keys, so each worker will receive
only a part of the data.
To query the sharded data, you can use interactive query, which will do
the bookkeeping of knowing on which worker each key is present.
Combine that with RocksDB an you get a custom sharded scalable db
like experience. – Cool!
Why use RocksDB
• Fast load of large state
• Working great with k8s STS for shorts recovery time
• Be able to keep large caches without sucking all RAM
• Builtin bloomfilter
Tips and pitfalls
• Enable RockesDB compression
• Monitor Disk lookup
• Use stateful transformation carefully (mind the keys)
• Change the defaults, they are for dev
• Repartition and changelog topics default replica is 1
• RocksDB config isn’t optimize
• In memory compaction
• StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG
Kafka streams decoupling with stores
Appendix 1 – Selected stateless transformations from Kafka site
Branch
KStream → KStream[]
Branch (or split) a KStream based on the supplied predicates into one or
more KStream instances. (details)
Predicates are evaluated in order. A record is placed to one and only one output
stream on the first match: if the n-th predicate evaluates to true, the record is
placed to n-th stream. If no predicate matches, the the record is dropped.
Branching is useful, for example, to route records to different downstream topics.
Filter
KStream → KStream
KTable → KTable
Evaluates a boolean function for each element and retains those for which the
function returns true. (KStream details, KTable details)
Map
KStream → KStream
Takes one record and produces one record. You can modify the record key and
value, including their types. (details)
Marks the stream for data re-partitioning: Applying a grouping or a join
after map will result in re-partitioning of the records. If possible
use mapValues instead, which will not cause data re-partitioning.
Appendix 2 – Selected stateful transformations
Aggregate
KGroupedStream
→ KTable
KGroupedTable →
KTable
Rolling aggregation. Aggregates the values of (non-
windowed) records by the grouped key. Aggregating is a
generalization of reduce and allows, for example, the
aggregate value to have a different type than the input values.
(KGroupedStream details, KGroupedTable details)
When aggregating a grouped stream, you must provide an
initializer (e.g., aggValue = 0) and an “adder” aggregator
(e.g., aggValue + curValue). When aggregating a grouped
table, you must provide a “subtractor” aggregator
(think: aggValue - oldValue).
Several variants of aggregate exist, see Javadocs for details.
KGroupedStream<byte[], String> groupedStream = ...;
KGroupedTable<byte[], String> groupedTable = ...;
// Java 8+ examples, using lambda expressions
// Aggregating a KGroupedStream (note how the value type changes from String to Long)
KTable<byte[], Long> aggregatedStream = groupedStream.aggregate(
() -> 0L, /* initializer */
(aggKey, newValue, aggValue) -> aggValue + newValue.length(), /* adder */
Materialized.as("aggregated-stream-store") /* state store name */
.withValueSerde(Serdes.Long()); /* serde for aggregate value */
// Aggregating a KGroupedTable (note how the value type changes from String to Long)
KTable<byte[], Long> aggregatedTable = groupedTable.aggregate(
() -> 0L, /* initializer */
(aggKey, newValue, aggValue) -> aggValue + newValue.length(), /* adder */
(aggKey, oldValue, aggValue) -> aggValue - oldValue.length(), /* subtractor */
Materialized.as("aggregated-table-store") /* state store name */
.withValueSerde(Serdes.Long()) /* serde for aggregate value */
Aggregate
(windowed)
KGroupedStream
→ KTable
Windowed aggregation. Aggregates the values of
records, per window, by the grouped key. Aggregating is a
generalization of reduce and allows, for example, the
aggregate value to have a different type than the input values.
(TimeWindowedKStream details, SessionWindowedKStream
details)
You must provide an initializer (e.g., aggValue = 0), “adder”
aggregator (e.g., aggValue + curValue), and a window. When
windowing based on sessions, you must additionally provide
a “session merger” aggregator
(e.g., mergedAggValue = leftAggValue + rightAggValue).
The windowed aggregate turns
a TimeWindowedKStream<K, V> or SessionWindowdKStream
<K, V> into a windowed KTable<Windowed<K>, V>.
Several variants of aggregate exist, see Javadocs for details.
import java.util.concurrent.TimeUnit;
KGroupedStream<String, Long> groupedStream = ...;
// Java 8+ examples, using lambda expressions
// Aggregating with time-based windowing (here: with 5-minute tumbling windows)
KTable<Windowed<String>, Long> timeWindowedAggregatedStream = groupedStream.windowedBy(TimeUnit.MINUTES.toMillis(5))
.aggregate(
() -> 0L, /* initializer */
(aggKey, newValue, aggValue) -> aggValue + newValue, /* adder */
Materialized.<String, Long, WindowStore<Bytes, byte[]>>as("time-windowed-aggregated-stream-store") /* state store name */
.withValueSerde(Serdes.Long())); /* serde for aggregate value */
// Aggregating with session-based windowing (here: with an inactivity gap of 5 minutes)
KTable<Windowed<String>, Long> sessionizedAggregatedStream =
groupedStream.windowedBy(SessionWindows.with(TimeUnit.MINUTES.toMillis(5)).
aggregate(
() -> 0L, /* initializer */
(aggKey, newValue, aggValue) -> aggValue + newValue, /* adder */
(aggKey, leftAggValue, rightAggValue) -> leftAggValue + rightAggValue, /* session merger */
Materialized.<String, Long, SessionStore<Bytes, byte[]>>as("sessionized-aggregated-stream-store") /* state store name */
.withValueSerde(Serdes.Long())); /* serde for aggregate value */

More Related Content

Kafka streams decoupling with stores

  • 2. About Me Husband, father of 2 + Co founder and CTO of Coralogix Love distributed systems Kafka isn’t only a platform it is a design philosophy Love observability
  • 3. Our data pipeline stats 1GB± 1000 Megabytes of log data ingested every second 500K+ 500,000 events per second (EPS) 50TB 1000 MB of log data ingested every second Realtime Near Realtime latency of less then a few seconds
  • 4. Kafka Background • Brokers form a cluster • Topics which are broken to partitions • Partitions replicated and spread across brokers • Producers push messages to a topic • Consumers pull messages from a topic • Consumer group represent an offset of a consumer in a partition • Retention size or age • Keys decide how data is spread across partitions • Compactions keep only the latest message from the same key Broker A Broker B Broker C
  • 5. A note on keys By default writing messages to Kafka will generate a hash key that will make sure the messages are spread between all partitions equally When specifying a key, all messages from that key will be sent to same partition. This enable to keep order of the same id, stateful transformation, compaction and building stores in streams. It is critical to choose your keys wisely, choosing a key that is skewed will cause a hot partition which will cause a hot broker that could cripple your cluster.
  • 6. Old solution - Streamonolith • Consume messages from Kafka, execute workload and produce back to Kafka • Running in containers • Communicate with external sources for puling and updating state/metadata Service Logic including pulling state and manipulating the state RestServer/Logstash Retention Memory Store
  • 7. Healthy scalable real-time streams “As we see it, a scalable healthy stream is bound to the machine’s CPU. When traffic increases, the CPU usage should increase linearly” Yoni Farin, Coralogix In our case, we were forced to scale out our services despite the CPU not being a bottleneck. At some point, one of our services had to run on 240 containers (!). We then decided that we’re due for a redesign.
  • 8. CPU Bound CPU Bound serviceExternal side effects
  • 9. First try - naive approach On demand cache a cache that will be populated on-demand with a configurable expiration time. For each message, the mechanism checked that the required data was cached and retrieved it when necessary. It did improve read performance, but some issues persisted.
  • 10. Streamonolith issues • X minutes delay from the source of truth by design • Hard hit after a restart until the cache warms up • Bad p99 and p95 performance, which affected the average processing time • Whenever the external source malfunctioned, the service was crippled • Due to updates, external I/O was still mixed with the in-memory processing.
  • 11. New design requirement • Processing rate need to be stable and predictable including P95 and P99 • Immune to infrastructure glitches • Decoupled from other technologies/ds • High available, nodes can go down and up without impact • CPU Bound scalable services - Efficient in both memory and CPU • Observable • Opinionated to make sure other are following the same pattern
  • 12. Why Kafka streams • Opinionated • Easy to get started • Overall easy to maintain • JVM process • Relies only on Kafka. • Easy to monitor (Kafka metrics, JVM) • GlobalKTable/KTable provides an embed push-updated cache • RocksDB battery included • Fit perfectly to k8s STS
  • 14. Source A source is any process that receive data from an external source and uses the producer API to produce data to Kafka: Rest Server, RDS, Elasticsearch, S3 etc. Kafka connect implements many of these sources and is an easy way to start syncing data to Kafka KafkaConnect
  • 15. Stream Consume data form Kafka Transform the data Produce the data to kafka A stream has many transformations Stateless transformation • N -> N map, mapValues • N -> M flatMap • N -> 0 peek Stateful transformation • Reduce • Aggregate • Count Stream App
  • 16. Stream read/update of external state Get data from external state • Use GlobalKTable to read a source topic to build a store in the stream based on rocksDB • Use Join to enrich the main data with the state. Insert/update an external source • Use a dedicated topic with a dedicated sync microservice for insert/update
  • 17. Sink Consume data form Kafka using the ConsumerAPI and write the data to external source: RDS, Elasticsearch, S3 etc. Akka streams alpakka has a great back pressure mechanism that is easy to implement Kafka connect implements different sinks and is an easy way to start flushing data from Kafka to an external source KafkaConnect
  • 19. What did we accomplish • Removed external I/O due to querying data from external sources • Removed external I/O due to writing data to external sources • Our services are now CPU bound, with Kafka as their only I/O action • Increased resiliency against external source glitches. Whenever an external source is experiencing degraded performance, the stream will continue to work with the latest data it received • Reduced the required resources for our services by 80%!
  • 20. GlobalKTableKTable and interactive queries GlobalKTable will read the entire topic in each of the workers and can serve as an on-push cache for your stream app using join KTable is similar just sharded using keys, so each worker will receive only a part of the data. To query the sharded data, you can use interactive query, which will do the bookkeeping of knowing on which worker each key is present. Combine that with RocksDB an you get a custom sharded scalable db like experience. – Cool!
  • 21. Why use RocksDB • Fast load of large state • Working great with k8s STS for shorts recovery time • Be able to keep large caches without sucking all RAM • Builtin bloomfilter
  • 22. Tips and pitfalls • Enable RockesDB compression • Monitor Disk lookup • Use stateful transformation carefully (mind the keys) • Change the defaults, they are for dev • Repartition and changelog topics default replica is 1 • RocksDB config isn’t optimize • In memory compaction • StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG
  • 24. Appendix 1 – Selected stateless transformations from Kafka site Branch KStream → KStream[] Branch (or split) a KStream based on the supplied predicates into one or more KStream instances. (details) Predicates are evaluated in order. A record is placed to one and only one output stream on the first match: if the n-th predicate evaluates to true, the record is placed to n-th stream. If no predicate matches, the the record is dropped. Branching is useful, for example, to route records to different downstream topics. Filter KStream → KStream KTable → KTable Evaluates a boolean function for each element and retains those for which the function returns true. (KStream details, KTable details) Map KStream → KStream Takes one record and produces one record. You can modify the record key and value, including their types. (details) Marks the stream for data re-partitioning: Applying a grouping or a join after map will result in re-partitioning of the records. If possible use mapValues instead, which will not cause data re-partitioning.
  • 25. Appendix 2 – Selected stateful transformations Aggregate KGroupedStream → KTable KGroupedTable → KTable Rolling aggregation. Aggregates the values of (non- windowed) records by the grouped key. Aggregating is a generalization of reduce and allows, for example, the aggregate value to have a different type than the input values. (KGroupedStream details, KGroupedTable details) When aggregating a grouped stream, you must provide an initializer (e.g., aggValue = 0) and an “adder” aggregator (e.g., aggValue + curValue). When aggregating a grouped table, you must provide a “subtractor” aggregator (think: aggValue - oldValue). Several variants of aggregate exist, see Javadocs for details. KGroupedStream<byte[], String> groupedStream = ...; KGroupedTable<byte[], String> groupedTable = ...; // Java 8+ examples, using lambda expressions // Aggregating a KGroupedStream (note how the value type changes from String to Long) KTable<byte[], Long> aggregatedStream = groupedStream.aggregate( () -> 0L, /* initializer */ (aggKey, newValue, aggValue) -> aggValue + newValue.length(), /* adder */ Materialized.as("aggregated-stream-store") /* state store name */ .withValueSerde(Serdes.Long()); /* serde for aggregate value */ // Aggregating a KGroupedTable (note how the value type changes from String to Long) KTable<byte[], Long> aggregatedTable = groupedTable.aggregate( () -> 0L, /* initializer */ (aggKey, newValue, aggValue) -> aggValue + newValue.length(), /* adder */ (aggKey, oldValue, aggValue) -> aggValue - oldValue.length(), /* subtractor */ Materialized.as("aggregated-table-store") /* state store name */ .withValueSerde(Serdes.Long()) /* serde for aggregate value */ Aggregate (windowed) KGroupedStream → KTable Windowed aggregation. Aggregates the values of records, per window, by the grouped key. Aggregating is a generalization of reduce and allows, for example, the aggregate value to have a different type than the input values. (TimeWindowedKStream details, SessionWindowedKStream details) You must provide an initializer (e.g., aggValue = 0), “adder” aggregator (e.g., aggValue + curValue), and a window. When windowing based on sessions, you must additionally provide a “session merger” aggregator (e.g., mergedAggValue = leftAggValue + rightAggValue). The windowed aggregate turns a TimeWindowedKStream<K, V> or SessionWindowdKStream <K, V> into a windowed KTable<Windowed<K>, V>. Several variants of aggregate exist, see Javadocs for details. import java.util.concurrent.TimeUnit; KGroupedStream<String, Long> groupedStream = ...; // Java 8+ examples, using lambda expressions // Aggregating with time-based windowing (here: with 5-minute tumbling windows) KTable<Windowed<String>, Long> timeWindowedAggregatedStream = groupedStream.windowedBy(TimeUnit.MINUTES.toMillis(5)) .aggregate( () -> 0L, /* initializer */ (aggKey, newValue, aggValue) -> aggValue + newValue, /* adder */ Materialized.<String, Long, WindowStore<Bytes, byte[]>>as("time-windowed-aggregated-stream-store") /* state store name */ .withValueSerde(Serdes.Long())); /* serde for aggregate value */ // Aggregating with session-based windowing (here: with an inactivity gap of 5 minutes) KTable<Windowed<String>, Long> sessionizedAggregatedStream = groupedStream.windowedBy(SessionWindows.with(TimeUnit.MINUTES.toMillis(5)). aggregate( () -> 0L, /* initializer */ (aggKey, newValue, aggValue) -> aggValue + newValue, /* adder */ (aggKey, leftAggValue, rightAggValue) -> leftAggValue + rightAggValue, /* session merger */ Materialized.<String, Long, SessionStore<Bytes, byte[]>>as("sessionized-aggregated-stream-store") /* state store name */ .withValueSerde(Serdes.Long())); /* serde for aggregate value */