SlideShare a Scribd company logo
PARALLEL & ASYNC
PROCESSING USING TPL
DATAFLOW
Petru Rebeja
Parallel & async processing using tpl dataflow
AGENDA
• What is Dataflow?
• When to use it?
• How to use it?
• Q&A
THE BIG PICTURE
CLR Thread Pool
Tasks
PLINQ Parallel Loops
Concurrent Collections
Dataflow
DATAFLOW BENEFITS
• Effortless use of multi-threading
• Performance boost via painless optimization
• Development focus is on the ‘what’ rather than ‘how’
DATAFLOW USAGES
High throughput, low-latency scenarios
Robotics
Manufacturing
Imaging Biology
Oil & Gas
Finance
PROGRAMMING MODEL
• Actor-based programming
• In-process message passing
• Components (blocks) for creating data processing pipelines
ARCHITECTURE
IDataflowBlock
ISourceBlock<TOutput> ITargetBlock<TInput>
IPropagatorBlock<Tinput,Toutput>
COMPOSITION
Source
Target
Propagator
Optional
Transform
BUFFERING BLOCKS
BufferBlock<T>
BroadcastBlock<T>
WriteOnceBlock<T>
EXECUTION BLOCKS
ActionBlock<T>
TransformBlock<T,V>
TransformManyBlock<T,V>
GROUPING BLOCKS
BatchBlock<T>
JoinBlock<T1,T2,…>
BatchedJoinBlock<T1,T2>
BEHAVIOR CONFIGURATION OPTIONS
• BufferBlock<T>
• BroadcastBlock<T>
• WriteOnceBlock<T>
DataflowBlockOptions
• ActionBlock<T>
• TransformBlock<TIn, TOut>
• TransformManyBlock<TIn, TOut>
ExecutionDataflowBlockOptions
• BatchBlock<T>
• JoinBlock<T1, T2[, T3]>
• BatchedJoinBlock<T1, T2>
GroupingDataflowBlockOptions
COMPLETION & CANCELLATION
• To know when a block completes await block.Completion
or add a continuation task to it
• To propagate completion from source to target, set
DataflowLinkOptions.PropagateCompletion when
linking
• Set DataflowBlockOptions.CancellationToken to
enable cancellation
ERROR HANDLING
• If the exception does not affect the integrity of the
pipeline – use a try/catch inside the block
• Otherwise, handle errors outside of the pipeline by
• Adding a continuation to block.Completion
• Propagating errors through the pipeline
DEALING WITH CONCURRENCY
• Rule of thumb: avoid shared state whenever possible.
• Use ConcurrentExclusiveSchedulerPair to perform
updates on shared state
• Be aware of the caveats with
ConcurrentExclusiveSchedulerPair
CREATING CUSTOM BLOCKS
The easy way:
DataflowBlock.Encapsulate<TInput, TOutput>(
target, source)
CREATING CUSTOM BLOCKS
The hard(core) way:
class CustomBlock:
IPropagatorBlock<TInput, TOutput>
{
}
CREATING CUSTOM BLOCKS
Either way you choose, don’t forget to:
• Propagate completion
• Pool for cancellation
REFERENCES & FURTHER READING
Dataflow (Task Parallel Library) http://msdn.microsoft.com/en-us/library/hh228603(v=vs.110).aspx
Stephen Toub
TPL Dataflow Tour
http://channel9.msdn.com/posts/TPL-Dataflow-Tour
Joseph Albahari
The Future of .NET Parallel
Programming
http://channel9.msdn.com/events/TechEd/Australia/Tech-Ed-Australia-
2011/DEV308
Stephen Toub
Inside TPL Dataflow
http://channel9.msdn.com/Shows/Going+Deep/Stephen-Toub-Inside-TPL-
Dataflow
Alexey Kursov
Pipeline TPL Dataflow Usage examples
https://www.youtube.com/watch?v=AI9KxgDF43k
https://www.youtube.com/watch?v=AI9KxgDF43k
Richard Blewett, Andrew Clymer
Pro Asynchronous Programming with
.NET
APRESS 2013
ISBN: 978-1430259206
AKKA.NET http://getakka.net/
QUESTIONS?
THANK YOU!
Petru.Rebeja@gmail.com
Parallel & Async Processing using TPL
Dataflow

More Related Content

Parallel & async processing using tpl dataflow

Editor's Notes

  1. When discussing about how to use Dataflow we’ll touch the following points of interest: - programming model (what are the entities exposed by Dataflow?) - configuring the behavior of the entities (parallelism, completion, error handling) - although Dataflow removes the need for dealing with concurrent scenarios there are cases when concurrency is inevitable and developers must properly deal with concurrency pitfalls - whenever the functionality of built-in blocks isn’t enough, Dataflow offers the possibility to create custom blocks
  2. .NET Framework 4.0 comes with three APIs for Parallel Programming: Tasks (lower level), PLINQ and Parallel (upper level). The Dataflow library is a natural extension of the TPL library that allows developers to create data-processing pipelines in their applications. The Dataflow library provides a framework for creating blocks that perform a specific function asynchronously. These blocks can be composed together to form a pipeline where data flows into one end of the pipeline and some result or results come out from the other end. This is great when data can be processed at different rates or when parallel processing can efficiently spread work out across multiple CPU cores.
  3. Dataflow is a paradigm shift but when the developers overcome the discomfort of the paradigm shift they will benefit from the high expressivity of the code.