Reader small image

You're reading from  Parallel Programming and Concurrency with C# 10 and .NET 6

Product typeBook
Published inAug 2022
PublisherPackt
ISBN-139781803243672
Edition1st Edition
Right arrow
Author (1)
Alvin Ashcraft
Alvin Ashcraft
author image
Alvin Ashcraft

Alvin Ashcraft is a software engineer and developer community champion with over 25 years of experience in software development. Working primarily with Microsoft Windows, web, and cloud technologies, his career has focused primarily on the healthcare industry. He has been awarded as a Microsoft MVP 11 times, most recently as a Windows Dev MVP. Alvin works in the Philadelphia area for Allscripts, a global healthcare software company, as a principal software engineer. He is also a board member of the TechBash Foundation, where he helps organize the annual TechBash developer conference. He has previously worked for companies such as Oracle, Genzeon, CSC, and ITG Pathfinders. Originally from the Allentown, PA area, Alvin currently resides in West Grove, PA with his wife and three daughters.
Read more about Alvin Ashcraft

Right arrow

Chapter 7: Task Parallel Library (TPL) and Dataflow

The Task Parallel Library (TPL) dataflow library contains building blocks to orchestrate asynchronous workflows in .NET. This chapter will introduce the TPL Dataflow library, describe the types of dataflow blocks in the library, and illustrate some common patterns for using dataflow blocks through hands-on examples.

The dataflow library can be useful when processing large amounts of data in multiple stages or when your application receives data in a continuous stream. The dataflow blocks provide a fantastic way of implementing the producer/consumer design pattern.

To understand this, we will create a sample project that implements this pattern and examine other real-world uses of the dataflow library.

Note

It’s important to know that the TPL Dataflow library isn’t distributed as part of the .NET runtime or SDK. It’s available as a NuGet package from Microsoft. We will add it to our sample projects...

Technical requirements

To follow along with the examples in this chapter, the following software is recommended for Windows developers:

  • Visual Studio 2022 version 17.0 or later
  • .NET 6
  • To complete the WPF sample, you will need to install the .NET desktop development workload for Visual Studio

While these are recommended, if you have .NET 6 installed, you can use your preferred editor. For example, Visual Studio 2022 for Mac on macOS 10.13 or later, JetBrains Rider, or Visual Studio Code will work just as well.

The code examples for this chapter can be found on GitHub at https://github.com/PacktPublishing/Parallel-Programming-and-Concurrency-with-C-sharp-10-and-.NET-6/tree/main/chapter07.

Let’s get started by discussing the TPL Dataflow library and why it can be a great way to implement parallel programming in .NET.

Introducing the TPL Dataflow library

The TPL Dataflow library has been available for as long as TPL itself. It was released in 2010 after .NET Framework 4.0 reached its RTM milestone. The members of the dataflow library are part of the System.Threading.Tasks.Dataflow namespace. The dataflow library is intended to build on the basics of parallel programming that are provided in TPL, expanding to address data flow scenarios (hence the name of the library). The dataflow library is made up of foundational classes called blocks. Each data flow block is responsible for a particular action or step in the overall flow.

The dataflow library consists of three basic types of blocks:

  • Source blocks: These blocks implement the ISourceBlock<TOutput> interface. Source blocks can have their data read from the workflow you define.
  • Target blocks: This type of block implements the ITargetBlock<TInput> interface and is a data receiver.
  • Propagator blocks: These blocks act...

Implementing the producer/consumer pattern

The blocks in the TPL Dataflow library provide a fantastic platform for implementing the producer/consumer pattern. If you are not familiar with this design pattern, it involves two operations and a queue of work. The producer is the first operation. It is responsible for filling the queue with data or units of work. The consumer is responsible for taking items from the queue and acting on them in some way. There can be one or more producers and one or more consumers in the system. You can change the number of producers or consumers, depending on which part of the process is the bottleneck.

Real-World Scenario Example

To relate the producer/consumer pattern to a real-world scenario, think about preparing gifts for a holiday gathering. You and a partner are working together to prepare the gifts. You are fetching and staging the gifts to be wrapped. You are the producer. Your partner is taking items from your queue and wrapping each gift...

Creating a data pipeline with multiple blocks

One of the biggest advantages of using dataflow blocks is the ability to link them and create a complete workflow or data pipeline. In the previous section, we saw how this linking worked between producer and consumer blocks. In this section, we will create a console application with a pipeline of five dataflow blocks all linked together to complete a series of tasks. We will leverage TransformBlock, TransformManyBlock, and ActionBlock to take an RSS feed and output a list of categories that are unique across all blog posts in the feed. Follow these steps:

  1. Start by creating a new .NET 6 console application in Visual Studio named OutputBlogCategories.
  2. Add the System.ComponentModel.Syndication NuGet package that we used in the previous example.
  3. Add the same RssFeedService class from the previous example. You can right-click on the project in Solution Explorer and select Add | Existing Item or you can create a new class named...

Manipulating data from multiple data sources

A JoinBlock can be configured to receive different data types from two or three data sources. As each set of data types is completed, the block is completed with a Tuple containing all three object types to be acted upon. In this example, we will create a JoinBlock that accepts a string and int pair and passes Tuple(string, int) along to an ActionBlock, which outputs their values to the console. Follow these steps:

  1. Start by creating a new console application in Visual Studio
  2. Add a new class named DataJoiner to the project and add a static method to the class named JoinData:
    public static void JoinData()
    {
    }
  3. Add the following code to create two BufferBlock objects, a JoinBlock<string, int>, and an ActionBlock<Tuple<string, int>>:
    var stringQueue = new BufferBlock<string>();
    var integerQueue = new BufferBlock<int>();
    var joinStringsAndIntegers = new JoinBlock<string, 
        int...

Summary

In this chapter, we learned all about the various blocks in the TPL Dataflow library. We started by learning a little about each block type and providing a brief code snippet for each. Next, we created a practical example that implemented the producer/consumer pattern to fetch blog data from three different Microsoft blogs. We also examined TransformBlock, TransformManyBlock, and JoinBlock more closely in .NET console applications. You should now feel confident in your ability to use some of the dataflow blocks in your applications to automate some complex data workflows.

If you would like some additional reading about the TPL Dataflow library, you can download Introduction to TPL Dataflow from the Microsoft Download Center: https://www.microsoft.com/en-us/download/details.aspx?id=14782.

In the next chapter, Chapter 8, we will take a closer look at the collections in the System.Collections.Concurrent namespace. We will also discover some practical uses of PLINQ in modern...

Questions

Answer the following questions to test your knowledge of this chapter:

  1. What type of data flow block aggregates data from two or three data sources?
  2. What type of block is a BufferBlock?
  3. What type of block is populated by a producer in the producer/consumer pattern?
  4. What method links the completion of two blocks?
  5. What method is called to signal that our code is done adding data to a source block?
  6. What is the async equivalent of calling Post()?
  7. What is the async equivalent of calling Receive()?
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Parallel Programming and Concurrency with C# 10 and .NET 6
Published in: Aug 2022Publisher: PacktISBN-13: 9781803243672
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alvin Ashcraft

Alvin Ashcraft is a software engineer and developer community champion with over 25 years of experience in software development. Working primarily with Microsoft Windows, web, and cloud technologies, his career has focused primarily on the healthcare industry. He has been awarded as a Microsoft MVP 11 times, most recently as a Windows Dev MVP. Alvin works in the Philadelphia area for Allscripts, a global healthcare software company, as a principal software engineer. He is also a board member of the TechBash Foundation, where he helps organize the annual TechBash developer conference. He has previously worked for companies such as Oracle, Genzeon, CSC, and ITG Pathfinders. Originally from the Allentown, PA area, Alvin currently resides in West Grove, PA with his wife and three daughters.
Read more about Alvin Ashcraft