Kafka streams decoupling with stores
- 2. About Me
Husband, father of 2 +
Co founder and CTO of Coralogix
Love distributed systems
Kafka isn’t only a platform it is a
design philosophy
Love observability
- 3. Our data pipeline stats
1GB±
1000 Megabytes of log
data ingested every
second
500K+
500,000 events per
second (EPS)
50TB
1000 MB of log data
ingested every second
Realtime
Near Realtime latency
of less then a few
seconds
- 4. Kafka Background
• Brokers form a cluster
• Topics which are broken to partitions
• Partitions replicated and spread across
brokers
• Producers push messages to a topic
• Consumers pull messages from a topic
• Consumer group represent an offset of
a consumer in a partition
• Retention size or age
• Keys decide how data is spread across
partitions
• Compactions keep only the latest
message from the same key
Broker A Broker B Broker C
- 5. A note on keys
By default writing messages to Kafka will generate a hash key that will make
sure the messages are spread between all partitions equally
When specifying a key, all messages from that key will be sent to same
partition. This enable to keep order of the same id, stateful transformation,
compaction and building stores in streams.
It is critical to choose your keys wisely, choosing a key that is skewed will
cause a hot partition which will cause a hot broker that could cripple your
cluster.
- 6. Old solution - Streamonolith
• Consume messages from Kafka, execute workload and produce back to Kafka
• Running in containers
• Communicate with external sources for puling and updating state/metadata
Service Logic including pulling state
and manipulating the state
RestServer/Logstash
Retention
Memory Store
- 7. Healthy scalable real-time streams
“As we see it, a scalable healthy stream is bound to the machine’s
CPU. When traffic increases, the CPU usage should increase linearly”
Yoni Farin, Coralogix
In our case, we were forced to scale out our services despite the CPU
not being a bottleneck. At some point, one of our services had to run
on 240 containers (!). We then decided that we’re due for a redesign.
- 9. First try - naive approach
On demand cache
a cache that will be populated on-demand with a configurable
expiration time. For each message, the mechanism checked that the
required data was cached and retrieved it when necessary. It did
improve read performance, but some issues persisted.
- 10. Streamonolith issues
• X minutes delay from the source of truth by design
• Hard hit after a restart until the cache warms up
• Bad p99 and p95 performance, which affected the average processing
time
• Whenever the external source malfunctioned, the service was
crippled
• Due to updates, external I/O was still mixed with the in-memory
processing.
- 11. New design requirement
• Processing rate need to be stable and predictable including P95 and
P99
• Immune to infrastructure glitches
• Decoupled from other technologies/ds
• High available, nodes can go down and up without impact
• CPU Bound scalable services - Efficient in both memory and CPU
• Observable
• Opinionated to make sure other are following the same pattern
- 12. Why Kafka streams
• Opinionated
• Easy to get started
• Overall easy to maintain
• JVM process
• Relies only on Kafka.
• Easy to monitor (Kafka metrics, JVM)
• GlobalKTable/KTable provides an embed push-updated cache
• RocksDB battery included
• Fit perfectly to k8s STS
- 14. Source
A source is any process that receive data from
an external source and uses the producer API
to produce data to Kafka:
Rest Server, RDS, Elasticsearch, S3 etc.
Kafka connect implements many of these
sources and is an easy way to start syncing
data to Kafka
KafkaConnect
- 15. Stream
Consume data form Kafka
Transform the data
Produce the data to kafka
A stream has many transformations
Stateless transformation
• N -> N map, mapValues
• N -> M flatMap
• N -> 0 peek
Stateful transformation
• Reduce
• Aggregate
• Count
Stream App
- 16. Stream read/update of external state
Get data from external state
• Use GlobalKTable to read a
source topic to build a store in
the stream based on rocksDB
• Use Join to enrich the main data
with the state.
Insert/update an external source
• Use a dedicated topic with a
dedicated sync microservice for
insert/update
- 17. Sink
Consume data form Kafka using the
ConsumerAPI and write the data to
external source: RDS, Elasticsearch,
S3 etc.
Akka streams alpakka has a great
back pressure mechanism that is easy
to implement
Kafka connect implements different
sinks and is an easy way to start
flushing data from Kafka to an
external source
KafkaConnect
- 19. What did we accomplish
• Removed external I/O due to querying data from external sources
• Removed external I/O due to writing data to external sources
• Our services are now CPU bound, with Kafka as their only I/O action
• Increased resiliency against external source glitches. Whenever an
external source is experiencing degraded performance, the stream
will continue to work with the latest data it received
• Reduced the required resources for our services by 80%!
- 20. GlobalKTableKTable and interactive queries
GlobalKTable will read the entire topic in each of the workers and can
serve as an on-push cache for your stream app using join
KTable is similar just sharded using keys, so each worker will receive
only a part of the data.
To query the sharded data, you can use interactive query, which will do
the bookkeeping of knowing on which worker each key is present.
Combine that with RocksDB an you get a custom sharded scalable db
like experience. – Cool!
- 21. Why use RocksDB
• Fast load of large state
• Working great with k8s STS for shorts recovery time
• Be able to keep large caches without sucking all RAM
• Builtin bloomfilter
- 22. Tips and pitfalls
• Enable RockesDB compression
• Monitor Disk lookup
• Use stateful transformation carefully (mind the keys)
• Change the defaults, they are for dev
• Repartition and changelog topics default replica is 1
• RocksDB config isn’t optimize
• In memory compaction
• StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG
- 24. Appendix 1 – Selected stateless transformations from Kafka site
Branch
KStream → KStream[]
Branch (or split) a KStream based on the supplied predicates into one or
more KStream instances. (details)
Predicates are evaluated in order. A record is placed to one and only one output
stream on the first match: if the n-th predicate evaluates to true, the record is
placed to n-th stream. If no predicate matches, the the record is dropped.
Branching is useful, for example, to route records to different downstream topics.
Filter
KStream → KStream
KTable → KTable
Evaluates a boolean function for each element and retains those for which the
function returns true. (KStream details, KTable details)
Map
KStream → KStream
Takes one record and produces one record. You can modify the record key and
value, including their types. (details)
Marks the stream for data re-partitioning: Applying a grouping or a join
after map will result in re-partitioning of the records. If possible
use mapValues instead, which will not cause data re-partitioning.
- 25. Appendix 2 – Selected stateful transformations
Aggregate
KGroupedStream
→ KTable
KGroupedTable →
KTable
Rolling aggregation. Aggregates the values of (non-
windowed) records by the grouped key. Aggregating is a
generalization of reduce and allows, for example, the
aggregate value to have a different type than the input values.
(KGroupedStream details, KGroupedTable details)
When aggregating a grouped stream, you must provide an
initializer (e.g., aggValue = 0) and an “adder” aggregator
(e.g., aggValue + curValue). When aggregating a grouped
table, you must provide a “subtractor” aggregator
(think: aggValue - oldValue).
Several variants of aggregate exist, see Javadocs for details.
KGroupedStream<byte[], String> groupedStream = ...;
KGroupedTable<byte[], String> groupedTable = ...;
// Java 8+ examples, using lambda expressions
// Aggregating a KGroupedStream (note how the value type changes from String to Long)
KTable<byte[], Long> aggregatedStream = groupedStream.aggregate(
() -> 0L, /* initializer */
(aggKey, newValue, aggValue) -> aggValue + newValue.length(), /* adder */
Materialized.as("aggregated-stream-store") /* state store name */
.withValueSerde(Serdes.Long()); /* serde for aggregate value */
// Aggregating a KGroupedTable (note how the value type changes from String to Long)
KTable<byte[], Long> aggregatedTable = groupedTable.aggregate(
() -> 0L, /* initializer */
(aggKey, newValue, aggValue) -> aggValue + newValue.length(), /* adder */
(aggKey, oldValue, aggValue) -> aggValue - oldValue.length(), /* subtractor */
Materialized.as("aggregated-table-store") /* state store name */
.withValueSerde(Serdes.Long()) /* serde for aggregate value */
Aggregate
(windowed)
KGroupedStream
→ KTable
Windowed aggregation. Aggregates the values of
records, per window, by the grouped key. Aggregating is a
generalization of reduce and allows, for example, the
aggregate value to have a different type than the input values.
(TimeWindowedKStream details, SessionWindowedKStream
details)
You must provide an initializer (e.g., aggValue = 0), “adder”
aggregator (e.g., aggValue + curValue), and a window. When
windowing based on sessions, you must additionally provide
a “session merger” aggregator
(e.g., mergedAggValue = leftAggValue + rightAggValue).
The windowed aggregate turns
a TimeWindowedKStream<K, V> or SessionWindowdKStream
<K, V> into a windowed KTable<Windowed<K>, V>.
Several variants of aggregate exist, see Javadocs for details.
import java.util.concurrent.TimeUnit;
KGroupedStream<String, Long> groupedStream = ...;
// Java 8+ examples, using lambda expressions
// Aggregating with time-based windowing (here: with 5-minute tumbling windows)
KTable<Windowed<String>, Long> timeWindowedAggregatedStream = groupedStream.windowedBy(TimeUnit.MINUTES.toMillis(5))
.aggregate(
() -> 0L, /* initializer */
(aggKey, newValue, aggValue) -> aggValue + newValue, /* adder */
Materialized.<String, Long, WindowStore<Bytes, byte[]>>as("time-windowed-aggregated-stream-store") /* state store name */
.withValueSerde(Serdes.Long())); /* serde for aggregate value */
// Aggregating with session-based windowing (here: with an inactivity gap of 5 minutes)
KTable<Windowed<String>, Long> sessionizedAggregatedStream =
groupedStream.windowedBy(SessionWindows.with(TimeUnit.MINUTES.toMillis(5)).
aggregate(
() -> 0L, /* initializer */
(aggKey, newValue, aggValue) -> aggValue + newValue, /* adder */
(aggKey, leftAggValue, rightAggValue) -> leftAggValue + rightAggValue, /* session merger */
Materialized.<String, Long, SessionStore<Bytes, byte[]>>as("sessionized-aggregated-stream-store") /* state store name */
.withValueSerde(Serdes.Long())); /* serde for aggregate value */