Building Low Latency Applications with C++

Introducing Low Latency Application Development in C++

Let us kick off our journey with low latency applications by introducing them in this first chapter. In this chapter, we will first understand the behavior and requirements of latency-sensitive and latency-critical applications. We will understand the huge business impact that application latencies have for businesses that rely on quick and strict response times.

We will also discuss why C++ is one of the most preferred programming languages when it comes to low latency application development. We will spend a large part of this book building an entire low latency electronic trading system from scratch in C++. So, this will serve as a good chapter for you to understand the motivation for using C++ as well as what makes it the most popular language for low latency applications.

We will also present some of the important low latency applications in different business areas. Part of the motivation is to make you understand that latencies are indeed very critical in different business areas for use cases that are sensitive to response times. The other part of the motivation is to identify the similarities in the behavior, expectations, design, and implementation of these applications. Even though they solve different business problems, the low latency requirements of these applications are often built on similar technical design and implementation principles.

In this chapter, we will cover the following topics:

Understanding the requirements for latency-sensitive applications
Understanding why C++ is the preferred programming language
Introducing some important low latency applications

In order to build ultra-low latency applications effectively, we should first understand the terms and concepts we will refer to throughout the rest of this book. We should also understand why C++ has emerged as the clear choice for most low latency application development. It is also important to always keep the business impact of low latencies in mind because the aim is to build low latency applications to benefit the business’s bottom line. This chapter discusses these ideas so that you can build a good foundation before we dive into the technical details in the rest of this book.

Understanding requirements for latency-sensitive applications

In this section, we will discuss some concepts that are required to build an understanding of what metrics matter for latency-sensitive applications. First, let’s define clearly what latency means and what latency-sensitive applications are.

Latency is defined as the time delay between when a task is started to the time when the task is finished. By definition, any processing or work will incur some overhead or latency – that is, no system has zero latency unless the system does absolutely no work. The important detail here is that some systems might have latency that is an infinitesimal fraction of a millisecond and the tolerance for an additional microsecond there might be low.

Low latency applications are applications that execute tasks and respond or return results as quickly as possible. The point here is that reaction latency is an important criterion for such applications where higher latencies can degrade performance or even render an application completely useless. On the other hand, when such applications perform with the low latencies that are expected of them, they can beat the competition, run at maximum speed, achieve maximum throughput, or increase productivity and improve the user experience – depending on the application and business.

Low latency can be thought of as both a quantitative as well as a qualitative term. The quantitative aspect is pretty obvious, but the qualitative aspect might not necessarily be obvious. Depending on the context, architects and developers might be willing to accept higher latencies in some cases but be unwilling to accept an extra microsecond in some contexts. For instance, if a user refreshes a web page or they wait for a video to load, a few seconds of latency is quite acceptable. However, once the video loads and starts playing, it can no longer incur a few seconds of latency to render or display without negatively impacting the user experience. An extreme example is high-speed financial trading systems where a few extra microseconds can make a huge difference between a profitable firm and a firm that cannot compete at all.

In the following subsections, we will present some nomenclature that applies to low latency applications. It is important to understand these terms well so that we can continue our discussion on low latency applications, as we will refer to these concepts frequently. The concepts and terms we will discuss next are used to differentiate between different latency-sensitive applications, the measurement of latencies, and the requirements of these applications.

Understanding latency-sensitive versus latency-critical applications

There is a subtle but important difference between the terms latency-sensitive applications and latency-critical applications. A latency-sensitive application is one in which, as performance latencies are reduced, it improves the business impact or profitability. So, the system might still be functional and possibly profitable at higher performance latencies but can be significantly more profitable if latencies are reduced. Examples of such applications would be operating systems (OSes), web browsers, databases, and so on.

A latency-critical application, on the other hand, is one that fails completely if performance latency is higher than a certain threshold. The point here is that while latency-sensitive applications might only lose part of their profitability at higher latencies, latency-critical applications fail entirely at high enough latencies. Examples of such applications are traffic control systems, financial trading systems, autonomous vehicles, and some medical appliances.

Measuring latency

In this section, we will discuss different methods of measuring latency. The real difference between these methods comes down to what is considered the beginning of the processing task and what is the end of the processing task. Another approach would be the units of what we are measuring – time is the most common one but in some cases, CPU clock cycles can also be used if it comes down to instruction-level measurements. Let’s look at the different measurements next, but first, we present a diagram of a generic server-client system without diving into the specifics of the use case or transport protocols. This is because measuring latency is generic and applies to many different applications with this kind of server-client setup.

Figure 1.1 – A general server-client system with timestamps between different hops

We present this diagram here because, in the next few subsections, we will define and understand latencies between the different hops on the round-trip path from the server client and back to the server.

Time to first byte

Time to first byte is measured as the time elapsed from when the sender sends the first byte of a request (or response) to the moment when the receiver receives the first byte. This typically (but not necessarily) applies to network links or systems where there are data transfer operations that are latency-sensitive. In Figure 1.1, time to first byte would be the difference between and

Round-trip time

Round-trip time (RTT) is the sum of the time it takes for a packet to travel from one process to another and then the time it takes for the response packet to reach the original process. Again, this is typically (but not necessarily) used for network traffic going back and forth between server and client processes, but can also be used for two processes communicating in general.

RTT, by default, includes the time taken by the server process to read, process, and respond to the request sent by the sender – that is, RTT generally includes server processing times. In the context of electronic trading, the true RTT latency is based on three components:

First, the time it takes for information from the exchange to reach the participant
Second, the time it takes for the execution of the algorithms to analyze the information and make a decision
Finally, the time it take for the decision to reach the exchange and get processed by the matching engine

We will discuss this more in the last section of this book, Analyzing and improving performance.

Tick-to-trade

Tick-to-trade (TTT) is similar to RTT and is a term most commonly used in electronic trading systems. TTT is defined as the time from when a packet (usually a market data packet) first hits a participant’s infrastructure (trading server) to the time when the participant is done processing the packet and sends a packet out (order request) to the trading exchange. So, TTT includes the time spent by the trading infrastructure to read the packet, process it, calculate trading signals, generate an order request in reaction to that, and put it on the wire. Putting it on the wire typically means writing something to a network socket. We will revisit this topic and explore it in greater detail in the last section of this book, Analyzing and improving performance. In Figure 1.1, TTT would be the difference between and .

CPU clock cycles

CPU clock cycles are basically the smallest increment of work that can be done by the CPU processor. In reality, they are the amount of time between two pulses of the oscillator that drives the CPU processor. Measuring CPU clock cycles is typically used to measure latency at the instruction level – that is, at an extremely low level at the processor level. C++ is both a low-level as well as a high-level language; it lets you get as close to the hardware as needed and also provides higher-level abstractions such as classes, templates, and so on. But generally, C++ developers do not spend a lot of time dealing with extremely low-level or possibly assembly code. This means that the compiled machine code might not be exactly what a C++ developer expects. Additionally, depending on the compiler versions, the processor architectures, and so on, there may be even more sources of differences. So, for extremely performance-sensitive low latency code, it is often not uncommon for engineers to measure how many instructions are executed and how many CPU clock cycles are required to do so. This level of optimization is typically the highest level of optimization possible, alongside kernel-level optimizations.

Now that we have seen some different methods of measuring latencies in different applications, in the next section, we will look at some latency summary metrics and how each one of them can be important under different scenarios.

Differentiating between latency metrics

The relative importance of a specific latency metric over the other depends on the application and the business itself. As an example, a latency-critical application such as an autonomous vehicle software system cares about peak latency much more than the mean latency. Low latency electronic trading systems typically care more about mean latency and smaller latency variance than they do about peak latency. Video streaming and playback applications might generally prioritize high throughput over lower latency variance due to the nature of the application and the consumers.

Throughput versus latency

Before we look at the metrics themselves, first, we need to clearly understand the difference between two terms – throughput and latency – which are very similar to each other and often used interchangeably but should not be. Throughput is defined as how much work gets done in a certain period of time, and latency is how quickly a single task is completed. To improve throughput, the usual approach is to introduce parallelism and add additional computing, memory, and networking resources. Note that each individual task might not be processed as quickly as possible, but overall, more tasks will be completed after a certain amount of time. This is because, while being processed individually, each task might take longer than in a low latency setup, but the parallelism boosts throughput over a set of tasks. Latency, on the other hand, is measured for each individual task from beginning to finish, even if fewer tasks are executed overall.

Mean latency

Mean latency is basically the expected average response time of a system. It is simply the average of all the latency measurement observations. This metric includes large outliers, so can be a noisy metric for systems that experience a large range of performance latencies.

Median latency

Median latency is typically a better metric for the expected response time of a system. Since it is the median of the latency measurement observations, it excludes the impact of large outliers. Due to this, it is sometimes preferred over the mean latency metric.

Peak latency

Peak latency is an important metric for systems where a single large outlier in performance can have a devastating impact on the system. Large values of peak latency can also significantly influence the mean latency metric of the system.

Latency variance

For systems that require a latency profile that is as deterministic as possible, the actual variance of the performance latency is an important metric. This is typically important where the expected latencies are quite predictable. For systems with low latency variance, the mean, median, and peak latencies are all expected to be quite close to each other.

Requirements of latency-sensitive applications

In this section, we will formally describe the behavior of latency-sensitive applications and the performance profile that these applications are expected to adhere to. Obviously, latency-sensitive applications need low latency performance, but here we will try to explore minor subtleties in the term low latency and discuss some different ways of looking at it.

Correctness and robustness

When we think of latency-sensitive applications, it is often the case that we think low latency is the single most important aspect of such applications. But in reality, a huge requirement of such applications is correctness and we mean very high levels of robustness and fault tolerance. Intuitively, this idea should make complete sense; these applications require very low latency to be successful, which then should tell you that these applications also have very high throughput and need to process huge amounts of inputs and produce a large number of outputs. Hence, the system needs to achieve very close to 100% correctness and be very robust as well for the application to be successful in their business area. Additionally, the correctness and robustness requirements need to be maintained as the application grows and changes during its lifetime.

Low latencies on average

This is the most obvious requirement when we think about latency-sensitive applications. The expected reaction or processing latency needs to be as low as possible for the application or business overall to succeed. Here, we care about the mean and median performance latency and need it to be as low as possible. By design, this means the system cannot have too many outliers or very high peaks in performance latency.

Capped peak latency

We use the term capped peak latency to refer to the requirement that there needs to be a well-defined upper threshold for the maximum possible latency the application can ever encounter. This behavior is important for all latency-sensitive applications, but most important for latency-critical applications. But even in the general case, applications that have extremely high-performance latency for a handful of cases will typically destroy the performance of the system. What this really means is that the application needs to handle any input, scenario, or sequence of events and do so within a low latency period. Of course, the performance to handle a very rare and specific scenario can possibly be much higher than the most likely case, but the point here is that it cannot be unbounded or unacceptable.

Predictable latency – low latency variance

Some applications prefer that the expected performance latency is predictable, even if that means sacrificing latency a little bit if the average latency metric is higher than it could be. What this really means is that such applications will make sure that the expected performance latency for all kinds of different inputs or events has as little variance as possible. It is impossible to achieve zero latency variance, but some choices can be made in data structures, algorithms, code implementation, and setup to try to minimize this as much as possible.

High throughput

As mentioned before, low latency and throughput are related but not identical. For that reason, sometimes some applications that need the highest throughput possible might have some differences in design and implementation to maximize throughput. The point is that maximizing throughput might come at the cost of sacrificing average performance latencies or increasing peak latencies to achieve that.

In this section, we introduced the concepts that apply to low latency application performance and the business impact of those metrics. We will need these concepts in the rest of the book when we refer to the performance of the applications we build. Next, we will move the conversation forward and explore the programming languages available for low latency application development. We will discuss the characteristics of the languages that support low latency applications and understand why C++ has risen to the top of the list when it comes to developing and improving latency-sensitive applications.

Understanding why C++ is the preferred programming language

There are several high-level language choices when it comes to low latency applications – Java, Scala, Go, and C++. In this section, we will discuss why C++ is one of the most popular languages when it comes to low latency applications. We will discuss several characteristics of the C++ language that support the high-level language constructs to support large code bases. The power of C++ is that it also provides very low-level access, similar to the C programming language, to support a very high level of control and optimization.

Compiled language

C++ is a compiled language and not an interpreted language. A compiled language is a programming language where the source code is translated into a machine code binary that is ready to run on a specific architecture. Examples of compiled languages are C, C++, Erlang, Haskell, Rust, and Go. The alternative to compiled languages is interpreted languages. Interpreted languages are different in the sense that the program is run by an interpreter, which runs through the source line by line and executes each command. Some examples of interpreted languages are Ruby, Python, and JavaScript.

Interpreted languages are inherently slower than compiled languages because, unlike compiled languages where the translation into machine instructions is done at compile time, here the interpretation to machine instructions is done at runtime. However, with the development of just-in-time compilation, interpreted languages are not tremendously slower. For compiled languages, the code is already pre-built for the target hardware so there is no extra interpretation step at runtime. Since C++ is a compiled language, it gives the developers a lot of control over the hardware. This means competent developers can optimize things such as memory management, CPU usage, cache performance, and so on. Additionally, since compiled languages are converted into machine code for specific hardware at compile time, it can be optimized to a large degree. Hence, compiled languages in general, and especially C++, are faster and more efficient to execute.

Closer to hardware – low-level language

Compared to other popular programming languages such as Python, Java, and so on, C++ is low level so it’s extremely close to the hardware. This is especially useful when the software is tightly coupled with the target hardware it runs on and possibly even in cases where low-level support is required. Being extremely close to the hardware also means that there is a significant speed advantage when building systems in C++. Especially in low latency applications such as high-frequency trading (HFT) where a few microseconds can make a huge difference, C++ is generally the established gold standard in the industry.

We will discuss an example of how being closer to the hardware helps boost C++ performance over another language such as Java. A C/C++ pointer is the actual address of an object in memory. So, the software can access memory and objects in memory directly without needing extra abstractions that would slow it down. This, however, does mean that the application developer will often have to explicitly manage the creation, ownership, destruction, and lifetime of objects instead of relying on the programming language to manage things for you as in Python or Java. An extreme case of C++ being close to the hardware is that it is possible to call assembly instructions straight from C++ statements – we will see an example of this in later chapters.

Deterministic usage of resources

It is critical for low latency applications to use resources very efficiently. Embedded applications (which are also often used in real-time applications) are especially limited in time and memory resources. In languages such as Java and Python that rely on automatic garbage collection, there is an element of non-determinism – that is, the garbage collector can introduce large latencies in performance unpredictably. Additionally, for systems that are very limited in memory, low-level languages such as C and C++ can do special things such as placing data at custom sections or addresses in memory through pointers. In languages such as C and C++, the programmer is in charge of explicit creation, management, and deallocation of memory resources, allowing for deterministic and efficient use of resources.

Speed and high performance

C++ is faster than most other programming languages for the reasons we have already discussed. It also provides excellent concurrency and multithreading support. Obviously, this is another good feature when it comes to developing low latency applications that are latency-sensitive or even latency-critical. Such requirements are also often found in applications around servers that are under heavy load such as web servers, application servers, database servers, trading servers, and so on.

Another advantage of C++ is due to its compile-time optimization ability. C and C++ support features such as macros or pre-processor directives, a constexpr specifier, and template metaprogramming. These allow us to move a large part of the processing from runtime to compile time. Basically, this means we minimize the work done during runtime on the critical code path by moving a lot of the processing to the compilation step when building the machine code binary. We will discuss these features heavily in later chapters when we build a complete electronic trading system, and their benefits will become very clear at that point.

Language constructs and features

The C++ language itself is a perfect combination of flexibility and feature richness. It allows a lot of freedom for the developers, who can leverage it to tune applications down to a very low level. However, it also provides a lot of higher-level abstractions, which can be used to build very large, feature-rich, versatile, and scalable applications, while still being extremely low latency when required. In this section, we will explore some of those C++-specific language features that put it in a unique position of low-level control and high-level abstraction features.

Portability

First off, C++ is highly portable and can build applications that can be compiled for a lot of different operating systems, platforms, CPU architecture, and so on. Since it does not require a runtime interpreter that differs for different platforms, all that is required to do is build the correct binaries at compile time, which is relatively straightforward, and the final deployed binary can just run on any platform. Additionally, some of the other features we have already discussed (such as the ability to run in low-memory and weaker CPU architectures combined with the lack of garbage collection requirements) make it even more portable than some of the other high-level languages.

Compiler optimizations

We have discussed that C++ is a compiled language, which makes it inherently faster than interpreted languages since it does not incur additional runtime costs. Since the developer’s complete source code is compiled into the final executable binary, compilers have an opportunity to holistically analyze all the objects and code paths. This leads to the possibility of very high levels of optimization at compile times. Modern compilers work closely with modern hardware to produce some surprisingly optimized machine code. The point here is that developers can focus on solving business problems and, assuming the C++ developers are competent, the compiled program is still extremely optimized without requiring a lot of the developer’s time and effort. Since C++ allows you to directly inline assembly code as well, it gives the developers an even greater chance to work with the compiler and produce highly optimized executables.

Statically typed

When it comes to type systems in programming languages, there are two options – statically typed language and dynamically typed language. A statically typed language performs checks around data types (integers, floats, doubles, structures, and classes) and interactions between these types during the compilation process. A dynamically typed language performs these type checks at runtime. Examples of statically typed languages are C++ and Java, and examples of dynamically typed languages are Python, Perl, and JavaScript.

One big benefit of statically typed languages is that since all the type-checking is done at compile time, it gives us the opportunity to find and eliminate many bugs before the program is even run. Obviously, type checking alone cannot find all possible bugs, but the point we’re trying to make here is that statically typed languages do a significantly better job at finding errors and bugs related to types at compile time. This is especially true for low latency applications that are highly numerical in nature.

Another huge benefit of statically typed languages, especially when it comes to low latency applications, is that since the type-checking is done at compile time, there is an additional opportunity for the compiler to optimize the types and type interactions at compile time. In fact, a large part of the reason that compiled languages are much faster is due to the static versus dynamic type-checking system itself. This is also a big reason why, for a dynamically typed language such as Python, high-performance libraries such as NumPy require types when creating arrays and matrices.

Multiple paradigms

Unlike some other languages, C++ does not force the developer to follow a specific programming paradigm. It supports a lot of different programming paradigms such as monolithic, procedural, object-oriented programming (OOP), generic programming, and so on. This makes it a good fit for a wide range of applications because it gives the developer the flexibility to design their program in a way that facilitates maximum optimization and lowest latencies instead of forcing a programming paradigm onto that application.

Libraries

Out of the box, C++ already comes with a large C and C++ library, which provides a lot of data structures, algorithms, and abstractions for tasks such as the following:

Network programming
Dynamic memory management
Numeric operations
Error and exception handling
String operations
Commonly needed algorithms
Input/output (I/O) operations including file operations
Multithreading support

Additionally, the huge community of C++ developers has built and open-sourced a lot of the libraries; we will discuss some of the most popular ones in the following subsections.

Standard Template Library

Standard Template Library (STL) is a very popular and widely used templatized and header-only library containing data structures and containers, iterators and allocators for these containers, and algorithms for tasks such as sorting, searching, algorithms for the containers, and so on.

Boost

Boost is a large C++ library that provides support for multithreading, network operations, image processing, regular expressions (regex), linear algebra, unit testing, and so on.

Asio

Asio (asynchronous input/output) is another well-known and widely used library that comes in two versions: non-Boost and one that is part of the Boost library. It provides support for multithreading concurrency and for implementing and using the asynchronous I/O model and is portable to all major platforms.

GNU Scientific Library

GNU Scientific Library (GSL) provides support for a wide range of mathematical concepts and operations such as complex numbers, matrices, and calculus, and manages other functions.

Active Template Library

Active Template Library (ATL) is a template-heavy C++ library to help program the Component Object Model (COM). It replaces the previous Microsoft Foundation Classes (MFC) library and improves upon it. It is developed by Microsoft and is open source and heavily uses an important low latency C++ feature, the Curiously Recurring Template Pattern (CRTP), which we will also explore and use heavily in this book. It supports COM features such as dual interfaces, ActiveX controls, connection points, tear-off interfaces, COM enumerator interfaces, and a lot more.

Eigen

Eigen is a powerful C++ library for mathematical and scientific applications. It has functions for linear algebra, numerical methods and solvers, numeric types such as complex numbers, features and operations for geometry, and much more.

LAPACK

Linear Algebra Package (LAPACK) is another large and extremely powerful C++ library specifically for linear algebra and linear equations and to support routines for large matrices. It implements a lot of functionality such as solving simultaneous linear equations, least squares methods, eigenvalues, singular value decomposition (SVD), and many more applications.

OpenCV

Open Source Computer Vision (OpenCV) is one of the most well-known C++ libraries when it comes to computer graphics and vision-related applications. It is also available for Java and Python and provides many algorithms for face and object recognition, 3D models, machine learning, deep learning, and more.

mlpack

mlpack is a super-fast, header-only C++ library for a wide variety of machine learning models and the mathematical operations related to them. It also has support for other languages such as Go, Julia, R, and Python.

QT

QT is by far the most popular library when it comes to building cross-platform graphical programs in C++. It works on Windows, Linux, macOS, and even platforms such as Android and embedded systems. It is open source and is used to build GUI widgets.

Crypto++

Crypto++ is a free open source C++ library to support algorithms, operations, and utilities for cryptography. It has many cryptographic algorithms, random number generators, block ciphers, functions, public-key operations, secret sharing, and more across many platforms such as Linux, Windows, macOS, iOS, and Android.

Suitable for big projects

In the previous section, we discussed the design and a lot of features of C++ that make it a great fit for low latency applications. Another aspect of C++ is that because of the flexibility it provides to the developer and all the high-level abstractions it allows you to build, it is actually very well suited to very large real-world projects. Huge projects such as compilers, cloud processing and storage systems, and OSes are built in C++ for these reasons. We will dive into these and many other applications that try to strike a balance between low latency performance, feature richness, and different business cases, and quite often, C++ is the perfect fit for developing such systems.

Mature and large community support

The C programming language was originally created in 1972, and then C++ (originally referred to as C with classes) was created in 1983. C++ is a very mature language and is embedded extensively into many applications in many different business areas. Some examples are the Unix operating system, Oracle MySQL, the Linux kernel, Microsoft Office, and Microsoft Visual Studio – these were all written in C++. The fact that C++ has been around for 40 years means that most software problems have been encountered and solutions have been designed and implemented. C++ is also very popular and taught as part of most computer science degrees and, additionally, has a huge library of developer tools, third-party components, open source projects, libraries, manuals, tutorials, books, and so on dedicated to it. The bottom line is that there is a large amount of documentation, examples, and community support backing up new C++ developers and new C++ projects.

Language under active development

Even though C++ is 40 years old, it is still very much under active development. Ever since the first C++ version was commercially released in 1985, there have been multiple improvements and enhancements to the C++ standard and the language. In chronological order, C++ 98, C++ 03, C++ 0X, C++ 11, C++ 14, C++ 17, and C++ 20 have been released, and C++ 23 is being developed. Each version comes with improvements and new features. So, C++ is a powerful language and is constantly evolving with time and adding modern features. Here is a diagram showing the evolution of C++ over the years:

Figure 1.2 – Evolution of C++

Given the already mature state of the C++ programming language, super-fast speed, perfect combination of high-level abstractions and low-level hardware access and control, huge knowledge base, and developer community along with best practices, libraries, and tools, C++ is a clear pick for low latency application development.

In this section, we looked at the choice of the C++ programming language for low latency application development. We discussed the various characteristics, features, libraries, and community support that make it a great fit for these applications. It is no surprise that C++ is deeply embedded into most applications that have strict performance requirements. In the next section, we will look at a lot of different low latency applications in different business areas with the goal of understanding the similarities that such applications share.

Introducing some important low latency applications

In this section, we will explore some common low latency applications in different business areas in order to familiarize ourselves with different kinds of latency-sensitive applications and how latency plays an important part in their performance. Additionally, discussing these applications will reveal some similarities in the nature and design of these applications.

Lower-level low latency applications

First, we will start with applications that would be considered extremely low-level, meaning very close to the hardware. Note that all low latency applications have at least some portion of the application that is low-level since, by definition, that is how low latency performance is achieved. These applications, however, have large portions of the entire application dealing with mostly low-level details; let us discuss those next.

Telecommunications

We already discussed that C++ is one of the fastest programming languages out there. It is used a lot in building telephone switches, routers, internet, space probes, and various other parts of telecommunications infrastructure. These applications are required to handle a large number of simultaneous connections and facilitate communication between them. These applications need to perform these tasks with speed and efficiency, making them a good example of low latency applications.

Embedded systems

Since C++ is closer to the hardware compared to other high-level programming languages, it is used in latency-sensitive embedded systems. Some examples of these would be machines used in the field of medicine, surgical tools, smart watches, and so on. C++ is usually the language of choice for medical applications such as MRI machines, lab testing systems, and systems to manage patient information. Additionally, there are use cases to model medical data, run simulations for research, and so on.

Compilers

Interestingly, compilers for various programming languages use C and C++ to build the compilers for those languages. The reason for this is, again, that C and C++ are low-level languages closer to the hardware and can build these compilers efficiently. The compiler applications themselves are able to optimize the code for the programming language to a very large degree and produce low latency machine code.

Operating systems

From Microsoft Windows to macOS to Linux itself, all the major OSes are built in C++ – yet again, another example of a low latency application where the fact that C++ is a low-level language makes it an ideal fit. OSes are extremely large and extremely complex. In addition to that, they have to have low latency and be highly performant to be a competitive modern OS.

For instance, Linux is typically the OS of choice for many high-load servers as well as servers designed for low latency applications, so the OS itself needs to have very high performance. In addition to traditional OSes, C and C++ are also heavily used to build mobile OSes such as iOS, Android, and Windows phone kernels. In summary, OSes need to be extremely fast and efficient at managing all the system and hardware resources. C++ developers building OSes can leverage the language’s abilities to build super-low-latency OSes.

Cloud/distributed systems

Organizations that develop and use cloud and distributed storage and processing systems have very low latency requirements. For this reason, they rely heavily on a programming language such as C++. Distributed storage systems have to support very fast and very efficient filesystem operations, so need to be close to the hardware. Additionally, distributed processing generally means high levels of concurrency, reliance on low latency multithreading libraries, as well as high load tolerance and scalability optimization requirements.

Databases

Databases are another good example of applications that need low latencies and high levels of concurrency and parallelism. Databases are also critical components in many different applications in many different business areas. Postgres, MySQL, and MongoDB (which are by far the most popular database systems right now) are written in C and C++ – yet another example of why C++ is the preferred language for low latency applications. C++ is also ideal for designing and structuring databases to optimize storage efficiency.

Flight software and traffic control

Flight software for commercial airplanes and military aircraft is a class of latency-critical applications. Here, not only is it important that the code follow very strict guidelines, be extremely robust, and be very well tested but the applications also need to respond and react to events predictably and within strict latency thresholds.

Traffic control software depends on many sensors, which need to monitor the speed, location, and volume of vehicles and transmit them to the central software. The software then uses the information to control traffic signs, maps, and traffic lights. Obviously, for such real-time applications, there is a requirement for it to be low latency and easily able to handle the large volume of data quickly and efficiently.

Higher-level low latency applications

In this subsection, we will discuss what many might consider slightly higher-level low latency applications. These are the applications people typically think of when trying to solve business problems; however, one thing to keep in mind is that these applications still have to implement and use lower-level optimization techniques to provide the performance that is required of them.

Graphics and video game applications

Graphics applications require super-fast rendering performance and serve as another example of a low latency application. Graphics software employs techniques from computer vision, image processing, and so on, which typically involves a lot of very fast and very efficient matrix operations on numerous large matrices. When it comes to graphics rendering in video games, there are even more stringent requirements for low latency performance since these are interactive applications, and speed and responsiveness are critical to the user experience. Nowadays, video games are typically made available on multiple platforms to reach a larger target audience. What this means is that these applications, or slightly stripped-down versions of these applications, need to run on low-end devices, which might not have a lot of computation and memory resources available. Video games overall have a lot of resource-intensive operations – rendering graphics, handling multiple players simultaneously, fast responsiveness to user inputs, and so on. C++ is a very good fit for all these applications and has been used to create a lot of well-known games such as Counter-Strike, Starcraft, and Warcraft, and game engines such as Unreal Engine. C++ is also a good fit for different gaming platforms – Windows PCs, Nintendo Switch, Xbox, and PlayStation.

Augmented reality and virtual reality applications

Augmented reality (AR) and virtual reality (VR) are both technologies that augment and enhance a real-life environment or create a whole new virtual environment. While AR just augments the environment by adding digital elements to our live view, VR creates a completely new simulated environment. So, these applications take graphics rendering and video game applications to a whole new level.

AR and VR technology has found a lot of different business use cases, such as design and construction, maintenance and repairs, training and education, healthcare, retail and marketing, and even in the field of technology itself. AR and VR applications have similar requirements as video game applications and need to handle large amounts of data from various sources in real time, as well as handle user interactions seamlessly and smoothly. The technical challenges for these applications are handling limited processing capability and available memory, possibly limited mobile bandwidth, and maintaining low latency and real-time performance to not hurt the user experience.

Browsers

Web browsers are often more complicated than they might appear. There are rendering engines in a web browser that require low latencies and efficient processing. Additionally, there are often interactions with databases and interactive rendering code so that users do not have to wait a long time for the content to update or for interactive content to respond. Due to the low latency requirements of web browsers, it is no surprise that C++ is often the preferred language for this application as well. In fact, some of the most popular web browsers (Google Chrome, Mozilla Firefox, Safari, Opera, etc.) heavily employ C++.

Search engines

Search engines are another use case that requires low latency and highly efficient data structures, algorithms, and code bases. Modern search engines such as Google use techniques such as internet crawling technology, indexing infrastructures, page rank algorithms, and other complex algorithms including machine learning. Google’s search engine relies on C++ to implement all these requirements in a highly low latency and efficient fashion.

Libraries

Many high-level libraries often have stringent performance requirements and can be regarded as low latency applications themselves but usually, they are key components in larger low latency applications and businesses. These libraries cover different areas – network programming, data structures, faster algorithms, databases, multithreading, mathematical libraries (for example, machine learning), and many more. Such libraries require very low latency and high-performance processing such as computations that involve many matrix operations on a large number of matrices, a lot of which can also be very large in size.

It should be clear here that performance is critical in such applications – another area where C++ is often used quite heavily. Even though a lot of these libraries such as TensorFlow are available in Python, under the hood, the core machine learning mathematical operations are actually implemented in C++ to power these machine learning methods on huge datasets.

Banking and financial applications

Banking applications are another class of low latency applications that need to process millions of transactions every day and require low latency, high concurrency, and robustness. Large banks have millions of clients and hundreds of millions of transactions that all need to be executed correctly and quickly and be able to scale up to handle the client load and thus database and server loads. C++ is automatically the choice here for a lot of these banking applications for the reasons we have discussed before.

When it comes to applications such as financial modeling, electronic trading systems, and trading strategies, low latency is more critical than in any other field. The speed and deterministic performance of C++ make it ideal for processing billions of market updates. sending millions of orders, and transacting at the exchange, especially when it comes to HFT. Since markets update very quickly, trading applications need the right data very quickly to execute trades extremely quickly. Large latencies in this system can cause losses that destroy a significant amount of trading profits, or worse. On the research and development side of things, simulations over many trading instruments across multiple exchanges also need large-scale low latency distributed processing to be done quickly and efficiently. The quantitative development and research and risk analysis libraries are also written in C++ because they need to process massive amounts of data as quickly as possible. One of the best examples of this would be the pricing and risk libraries that calculate fair trading prices for options products and run many simulations to assess options risk, as the search space is enormous.

Mobile phone applications

Modern mobile phone applications are quite feature-rich. Additionally, they have to run on platforms with very limited hardware resources. This makes it even more important that the implementation of these applications be very low latency and highly efficient in how they use the limited resources they have. However, these applications still need to be extremely quick to respond to user interactions, possibly handle backend connectivity, and render high-quality graphics on mobile devices. Mobile platforms such as Android and the Windows OS, browsers such as Google Chrome and Firefox, and apps such as YouTube have a lot of C++ involvement.

Internet of Things and machine-to-machine applications

Internet of Things (IoT) and machine-to-machine (M2M) applications are based on connecting devices to collect, store and exchange data with each other automatically. Overall, while IoT and M2M are similar in nature, there are some differences around aspects such as networks, scalability, interoperability, and human interactions.

IoT is a broad term that refers to connecting different physical devices together. IoT devices are generally actuators and sensors that are embedded inside other larger devices such as smart thermostats, refrigerators, doorbells, cars, smart watches, TVs, and medical devices. These devices operate on platforms with limited computing resources, power requirements, and minimal available memory resources.

M2M is a communication method where multiple machines interact with each other using wired or wireless connections without any human oversight or interaction. The point here is that internet connectivity is not necessary for M2M. So IoT is a subset of M2M, but M2M is a broader universe of M2M communication-based systems. M2M technology is used in different applications such as security, tracking and tracing, automation, manufacturing, and facility management.

We have already discussed these applications before, but to summarize again here, IoT and M2M technology are used in applications such as telecommunications, medical and healthcare, pharmaceuticals, automotive and aerospace industries, retail and logistics and supply chain management, manufacturing, and military satellite data analysis systems.

This section was all about different business areas and use cases where low latency applications thrive and, in some cases, are a necessity for the business. Our hope is that you understand that low latency applications are used in many different areas, even though it might not be immediately obvious. The other objective here was to establish similarities that these applications share, even though they are designed to solve different business problems.

Filter reviews by

All

Feefo verified reviews

Amazon verified reviews

POE Sep 12, 2023

This book is well written and covers two important areas: developing low latency applications with C++ (as the title suggests), and electronic trading systems. The author’s expertise in both areas is evident throughout the book.There are ample code examples and plethora of topics including Internet of Things (IoT), memory pool abstraction, performance, optimizations, instrumentation, and more. A set of utilities and classes are provided to help support network socket operations. The author also walks readers through the design and development of a trading system.If you are a serious C++ developer, want to learn how to write low latency applications, or are just interested in electronic trading systems, then this book is for you.Great resource!

Amazon Verified review

Wayne Oct 16, 2023

I primarily use C# at work, but in college, I spent time using C++ for algo/data structures courses, but we never went this in-depth. If you are looking to get into C++ and have either taken a 101 and 102 in C++ or have been using Java/C# for a bit, this is a great book if you want to check out C++ on a much deeper level. There are a ton of things you likely won't know if you aren't already in the trading space, and even if it doesn't fulfill every feature required, you'll have a lot of tools to do that yourself. I used this book to check out C++ again, and it was worth it.

Reader Sep 08, 2023

From the moment I cracked open the book, I was struck by the author's evident passion for their subject matter. The depth of research and meticulous attention to detail is immediately apparent, and it's clear that this is the work of an expert in their field. This level of expertise truly enhances the reading experience and provides a sense of trustworthiness that is invaluable when diving into a complex subject.

Guanqi Oct 26, 2023

Great book for low latency trading

Swaad G Sep 28, 2023

Great resource for those expert level folks looking to specialize in C++

Building Low Latency Applications with C++: Develop a complete low latency trading ecosystem from scratch using modern C++

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the author

FAQs

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access