How a BEAM runner executes a
Javier Ramirez (@supercoco9)
Head of Engineering @teamdatatonic
▪ Why do I care?
▪ Pipeline basics
▪ Runner Overview
▪ Exploring the graph
▪ Implementing PCollections
▪ Watermark Propagation
▪ Implementing Ptransforms: Read, Pardo, GroupByKey,
Window, and Flatten
▪ Optimising the execution Plan
▪ Persisting state (coders and snapshots)
▪ FnApi and RunnerApi for SDK independence
▪ I started using beam dataflow private alpha when it was “just” a serverless runner
▪ Then Beam was born as a common layer on top of multiple runners
▪ Wanted to understand what is part of Beam and what’s part of the runner
▪ Might help choosing the right runner for the job
Pipeline overview
■ Write pipeline code in Java, Python, or Go
■ The abstraction is a Directed Acyclic Graph (DAG) where nodes are transforms and edges are data flowing as
■ Both PTransforms and PCollections can be distributed and parallelised, and the model is fault-tolerant, so
they need to be serializable to be sent across workers
■ Read data from one or more inputs, bounded or unbounded
■ Apply transforms, stateless or stateful
■ Write data to one or more outputs
■ Optionally, keep track of metrics

Runner overview
BEAM-compatible is a very flexible claim
▪ Can choose to support only some languages (The portability API will change this)
▪ Can choose to support only batch or streaming processing
▪ Can choose to what extent to support early triggers and late data, refinements, state…
▪ Needs to translate from BEAM code to runner-native code
▪ Is responsible for submitting and monitoring the pipeline
▪ Must serialize/deserialize data and functions across workers and stages
▪ Is responsible for performance, scalability, optimisations, and enforcing the BEAM
model guarantees (some methods will be called exactly once, a transform will not be
executed by more than a thread at once within a worker, if a bundle of data is
processed by a transform more than once, it will not generate duplicates…)
Runner entrypoint: exploring the DAG
■ Beam provides a method to traverse (visit) the graph. Runners need to walk the graph to:
■ Validate the pipeline
■ Get insights to choose the best execution strategy
❏ Example: Spark Runner
❏ Chooses if using the batch or streaming engine by visiting the graph and checking if any PCollection
is unbounded
❏ Detects PCollections that are used in more than one transform, and creates internal caches to store
those collections
■ Translate the BEAM transforms into native transforms
■ Optimise the graph execution (to minimize serialization and shuffling)
Implementing PCollections
■ Unordered bags of elements
■ Might be bounded or unbounded
■ All the elements are of the same type and the PCollection has a coder to serialize/deserialize
■ Every element will always have
■ A Timestamp (might be negative infinity if not important)
■ A Window, which is initially the global window, but can be changed via transforms
■ Every PCollection has a watermark estimating how complete it is
Watermark Propagation
Watermark propagation taken from the Flink documentation

Implementing Window
■ Window is just a grouping key with a maximum timestamp
■ One element can be conceptually in one window only. If you need to assign an element to
multiple windows, it counts as multiple elements from Beam’s point of view.
■ The runner may choose to use a physical representation where one element appears to be
assigned to multiple windows for storage efficiency, but it maps conceptually to multiple
Implementing GroupByKey
■ GroupByKey groups a PCollection of key-value pairs by Key and Window
■ GroupByKey will emit results only when window triggers allow it, and should automatically drop
expired elements
■ Since GroupByKey is closely related to Windows, it needs to be able to merge element by
window when requested, for example to keep session windows
■ GroupByKey needs to choose the timestamp to emit with the results
Implementing Pardo
■ Conceptually simple:
■ Setup is called once per instance of the ParDo
■ The runner decides on the bundle size (some runners allow user control)
■ It calls startBundle once per bundle
■ It calls processElement once per element
■ If we are using timely processing, it calls onTimer for each timer activation
■ It finishes by calling finishBundle
■ If an element fails, the whole bundle is retried
■ Teardown is called to release ParDo resources
■ Under the hood, the runner needs to take into account ParDos can be stateful and can have side
inputs. In those cases the runner is responsible for keeping and propagating state, and for
materialising the side inputs
Optimising the DAG execution
■ Two levels of optimisation
■ Execution plan (Supported by BEAM)
■ Intermediate data materialisation (Depends on the Runner)

Recommended for you

MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions

MongoDB has adapted transaction feature (ACID Properties) in MongoDB 4.0. This talk focuses on the internals of how MongoDB adapted the ACID properties with Weird Tiger Engine. Weird tiger offers more future possibilities for MongoDB. This tech talk was presented at Mydbops Database Meetup on 27-04-2019 by Manosh Malai Senior Devops/NoSQL Consultant with Mydbops and Ranjith Database Administrator with Mydbops.

mongodbweird tigermongodb 4.0
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...

In this talk we present Zalando's microservices architecture, introduce Saiki – our next generation data integration and distribution platform on AWS and show how we employ stream processing for near-real time business intelligence. Zalando is one of the largest online fashion retailers in Europe. In order to secure our future growth and remain competitive in this dynamic market, we are transitioning from a monolithic to a microservices architecture and from a hierarchical to an agile organization. We first have a look at how business intelligence processes have been working inside Zalando for the last years and present our current approach - Saiki. It is a scalable, cloud-based data integration and distribution infrastructure that makes data from our many microservices readily available for analytical teams. We no longer live in a world of static data sets, but are instead confronted with an endless stream of events that constantly inform us about relevant happenings from all over the enterprise. The processing of these event streams enables us to do near-real time business intelligence. In this context we have evaluated Apache Flink vs. Apache Spark in order to choose the right stream processing framework. Given our requirements, we decided to use Flink as part of our technology stack, alongside with Kafka and Elasticsearch. With these technologies we are currently working on two use cases: a near real-time business process monitoring solution and streaming ETL. Monitoring our business processes enables us to check if technically the Zalando platform works. It also helps us analyze data streams on the fly, e.g. order velocities, delivery velocities and to control service level agreements. On the other hand, streaming ETL is used to relinquish resources from our relational data warehouse, as it struggles with increasingly high loads. In addition to that, it also reduces the latency and facilitates the platform scalability. Finally, we have an outlook on our future use cases, e.g. near-real time sales and price monitoring. Another aspect to be addressed is to lower the entry barrier of stream processing for our colleagues coming from a relational database background.

microservicesflinkzalando tech
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka

The first presentation for Kafka Meetup @ Linkedin (Bangalore) held on 2015/12/5 It provides a brief introduction to the motivation for building Kafka and how it works from a high level. Please download the presentation if you wish to see the animated slides.

SDK independent runners: RunnerAPI & FnAPI
■ The harness is a docker container able to run the language-specific parts of the pipeline. The
Runner is responsible for launching and managing the container. Communication between
Runner and Harness is via the FnApi, implemented via gRPC
SDK independent runners: FnAPI
▪ Why do I care?
▪ Pipeline basics
▪ Runner Overview
▪ Exploring the graph
▪ Implementing PCollections
▪ Watermark Propagation
▪ Implementing Ptransforms: Read, Pardo, GroupByKey,
Window, and Flatten
▪ Optimising the execution Plan
▪ Persisting state (coders and snapshots)
▪ FnApi and RunnerApi for SDK independence
Javier Ramirez (@supercoco9)
Head of Engineering

