Parallel & async processing using tpl dataflow
- 5. DATAFLOW BENEFITS
• Effortless use of multi-threading
• Performance boost via painless optimization
• Development focus is on the ‘what’ rather than ‘how’
- 13. BEHAVIOR CONFIGURATION OPTIONS
• BufferBlock<T>
• BroadcastBlock<T>
• WriteOnceBlock<T>
DataflowBlockOptions
• ActionBlock<T>
• TransformBlock<TIn, TOut>
• TransformManyBlock<TIn, TOut>
ExecutionDataflowBlockOptions
• BatchBlock<T>
• JoinBlock<T1, T2[, T3]>
• BatchedJoinBlock<T1, T2>
GroupingDataflowBlockOptions
- 14. COMPLETION & CANCELLATION
• To know when a block completes await block.Completion
or add a continuation task to it
• To propagate completion from source to target, set
DataflowLinkOptions.PropagateCompletion when
linking
• Set DataflowBlockOptions.CancellationToken to
enable cancellation
- 15. ERROR HANDLING
• If the exception does not affect the integrity of the
pipeline – use a try/catch inside the block
• Otherwise, handle errors outside of the pipeline by
• Adding a continuation to block.Completion
• Propagating errors through the pipeline
- 16. DEALING WITH CONCURRENCY
• Rule of thumb: avoid shared state whenever possible.
• Use ConcurrentExclusiveSchedulerPair to perform
updates on shared state
• Be aware of the caveats with
ConcurrentExclusiveSchedulerPair
- 20. REFERENCES & FURTHER READING
Dataflow (Task Parallel Library) http://msdn.microsoft.com/en-us/library/hh228603(v=vs.110).aspx
Stephen Toub
TPL Dataflow Tour
http://channel9.msdn.com/posts/TPL-Dataflow-Tour
Joseph Albahari
The Future of .NET Parallel
Programming
http://channel9.msdn.com/events/TechEd/Australia/Tech-Ed-Australia-
2011/DEV308
Stephen Toub
Inside TPL Dataflow
http://channel9.msdn.com/Shows/Going+Deep/Stephen-Toub-Inside-TPL-
Dataflow
Alexey Kursov
Pipeline TPL Dataflow Usage examples
https://www.youtube.com/watch?v=AI9KxgDF43k
https://www.youtube.com/watch?v=AI9KxgDF43k
Richard Blewett, Andrew Clymer
Pro Asynchronous Programming with
.NET
APRESS 2013
ISBN: 978-1430259206
AKKA.NET http://getakka.net/
Editor's Notes
- When discussing about how to use Dataflow we’ll touch the following points of interest:
- programming model (what are the entities exposed by Dataflow?)
- configuring the behavior of the entities (parallelism, completion, error handling)
- although Dataflow removes the need for dealing with concurrent scenarios there are cases when concurrency is inevitable and developers must properly deal with concurrency pitfalls
- whenever the functionality of built-in blocks isn’t enough, Dataflow offers the possibility to create custom blocks
- .NET Framework 4.0 comes with three APIs for Parallel Programming: Tasks (lower level), PLINQ and Parallel (upper level).
The Dataflow library is a natural extension of the TPL library that allows developers to create data-processing pipelines in their applications. The Dataflow library provides a framework for creating blocks that perform a specific function asynchronously. These blocks can be composed together to form a pipeline where data flows into one end of the pipeline and some result or results come out from the other end. This is great when data can be processed at different rates or when parallel processing can efficiently spread work out across multiple CPU cores.
- Dataflow is a paradigm shift but when the developers overcome the discomfort of the paradigm shift they will benefit from the high expressivity of the code.