Concurrency

In this chapter, we are going to explore how to write concurrent programs in C++ using threads with shared memory. We will look at ways to make concurrent programs correct by writing programs that are free from data races and deadlocks. This chapter will also contain some pieces of advice on how to make concurrent programs run with low latency and high throughput.

Before we go any further, it should be said that this chapter is not a complete introduction to concurrent programming, nor will it cover all the details of concurrency in C++. Instead, this chapter is an introduction to the core building blocks of writing concurrent programs in C++, mixed with some performance-related guidelines. If you haven't been writing concurrent programs before, it is probably wise to go through some introduction texts to cover the theoretical aspects of concurrent programming...

Understanding the basics of concurrency

A concurrent program can execute multiple tasks at the same time. Concurrent programming is, in general, a lot harder than sequential programming, but there are several reasons why a program may benefit from being concurrent:

Efficiency: Smartphones and desktop computers of today have multiple CPU cores that can execute multiple tasks in parallel if your program allows it to. If you manage to split a big task into subtasks that can be run in parallel, it is theoretically possible to divide the running time of the big task by the number of CPU cores. For programs that run on machines with one single core, there can still be a gain in performance if a task is I/O bound. While one subtask is waiting for I/O, other subtasks can still perform useful work on the CPU.
Responsiveness and low latency contexts: For applications with a graphical user...

What makes concurrent programming hard?

There are a couple of reasons why concurrent programming is hard, and, if you have been writing concurrent programs before, you have most likely already encountered the ones listed here:

Sharing state between multiple threads in a safe manner is hard. Whenever we have data that can be read and written to at the same time, we need some way of protecting that data from data races. We will see a lot of examples of this later on.

Concurrent programs are usually more complicated to reason about because of the multiple parallel execution flows.
Concurrency complicates debugging. Bugs that occur because of data races can be very hard to debug since they are dependent on how threads are being scheduled. These kinds of bugs can be hard to reproduce and in the worst case, they cease to exist when running the program using a debugger. Sometimes...

Concurrency and parallelism

Concurrency and parallelism are two terms that are sometimes used interchangeably. However, they are not the same and it is important to understand the difference. A program is said to run concurrently if it has multiple individual control flows running during overlapping time periods. In C++, each individual control flow is represented by a thread. The threads may or may not execute at the exact same time, though. If they do, they are said to execute in parallel. For a concurrent program to run in parallel, it needs to be executed on a machine that has support for parallel execution of instructions: that is, machines with multiple CPU cores.

At first glance, it might seem obvious that we always want concurrent programs to run in parallel if possible, for efficiency reasons. However, that is not necessarily always true. A lot of synchronization primitives...

Concurrent programming in C++

The concurrency support in C++ makes it possible for a program to execute multiple tasks concurrently. As mentioned earlier, writing a correct concurrent C++ program is, in general, a lot harder than writing a program that executes all tasks sequentially in one thread. This section will also demonstrate some common pitfalls to make you aware of all the difficulties involved in writing concurrent programs.

Concurrency support was first introduced in C++11 and has since then been extended into both C++14 and C++17. Before concurrency was part of the language, concurrency was implemented with native concurrency support from the operating system, POSIX Threads (pthreads), or some other library. With concurrency support directly in the C++ language, we can now write cross-platform concurrent programs, which is great! However, since the concurrency support...

Lock-free programming

Lock-free programming is hard. We will not spend a lot of time discussing lock-free programming in this book, but instead provide you with an example of how a very simple lock-free data structure could be implemented. There is a great wealth of resources—on the web and in books—dedicated to lock-free programming that will explain the concepts you need to understand before writing your own lock-free data structures. Some concepts you might have heard of, related to lock-free programming, such as Compare-And-Swap (CAS) and the ABA-problem will not be further discussed in this book.

Lock-free queue example

Here, we are going to show an example of a lock-free queue, which is a relatively simple...

Performance guidelines

This chapter will end with some guidelines related to performance. We cannot stress enough the importance of having a concurrent program running correctly before trying to improve the performance. Also, before applying any of these guidelines related to performance, you first need to set up a reliable way of measuring what you are trying to improve.

Avoid contention

Whenever multiple threads are using shared data, there will be contention. Contention hurts performance and sometimes the overhead caused by contention can make a parallel algorithm work slower than a single-threaded alternative.

Using a lock that causes a wait and a context switch is an obvious performance penalty, but what is not equally...

Summary

In this chapter, we have seen how to create programs that can execute multiple threads concurrently. We have seen how to avoid data races by protecting critical sections with locks or by using atomics. We have looked into execution order and the C++ memory model, which becomes important to understand when writing lock-free programs. We have seen that immutable data structures are thread-safe. The chapter ended with some guidelines for improving the performance in concurrent applications.