You're reading from Building Low Latency Applications with C++

Product typeBook

Published inJul 2023

PublisherPackt

ISBN-139781837639359

Edition1st Edition

Concepts

Programming Language

Author (1)

Sourav Ghosh

Understanding requirements for latency-sensitive applications

In this section, we will discuss some concepts that are required to build an understanding of what metrics matter for latency-sensitive applications. First, let’s define clearly what latency means and what latency-sensitive applications are.

Latency is defined as the time delay between when a task is started to the time when the task is finished. By definition, any processing or work will incur some overhead or latency – that is, no system has zero latency unless the system does absolutely no work. The important detail here is that some systems might have latency that is an infinitesimal fraction of a millisecond and the tolerance for an additional microsecond there might be low.

Low latency applications are applications that execute tasks and respond or return results as quickly as possible. The point here is that reaction latency is an important criterion for such applications where higher latencies can degrade performance or even render an application completely useless. On the other hand, when such applications perform with the low latencies that are expected of them, they can beat the competition, run at maximum speed, achieve maximum throughput, or increase productivity and improve the user experience – depending on the application and business.

Low latency can be thought of as both a quantitative as well as a qualitative term. The quantitative aspect is pretty obvious, but the qualitative aspect might not necessarily be obvious. Depending on the context, architects and developers might be willing to accept higher latencies in some cases but be unwilling to accept an extra microsecond in some contexts. For instance, if a user refreshes a web page or they wait for a video to load, a few seconds of latency is quite acceptable. However, once the video loads and starts playing, it can no longer incur a few seconds of latency to render or display without negatively impacting the user experience. An extreme example is high-speed financial trading systems where a few extra microseconds can make a huge difference between a profitable firm and a firm that cannot compete at all.

In the following subsections, we will present some nomenclature that applies to low latency applications. It is important to understand these terms well so that we can continue our discussion on low latency applications, as we will refer to these concepts frequently. The concepts and terms we will discuss next are used to differentiate between different latency-sensitive applications, the measurement of latencies, and the requirements of these applications.

Understanding latency-sensitive versus latency-critical applications

There is a subtle but important difference between the terms latency-sensitive applications and latency-critical applications. A latency-sensitive application is one in which, as performance latencies are reduced, it improves the business impact or profitability. So, the system might still be functional and possibly profitable at higher performance latencies but can be significantly more profitable if latencies are reduced. Examples of such applications would be operating systems (OSes), web browsers, databases, and so on.

A latency-critical application, on the other hand, is one that fails completely if performance latency is higher than a certain threshold. The point here is that while latency-sensitive applications might only lose part of their profitability at higher latencies, latency-critical applications fail entirely at high enough latencies. Examples of such applications are traffic control systems, financial trading systems, autonomous vehicles, and some medical appliances.

Measuring latency

In this section, we will discuss different methods of measuring latency. The real difference between these methods comes down to what is considered the beginning of the processing task and what is the end of the processing task. Another approach would be the units of what we are measuring – time is the most common one but in some cases, CPU clock cycles can also be used if it comes down to instruction-level measurements. Let’s look at the different measurements next, but first, we present a diagram of a generic server-client system without diving into the specifics of the use case or transport protocols. This is because measuring latency is generic and applies to many different applications with this kind of server-client setup.

Figure 1.1 – A general server-client system with timestamps between different hops

We present this diagram here because, in the next few subsections, we will define and understand latencies between the different hops on the round-trip path from the server client and back to the server.

Time to first byte

Time to first byte is measured as the time elapsed from when the sender sends the first byte of a request (or response) to the moment when the receiver receives the first byte. This typically (but not necessarily) applies to network links or systems where there are data transfer operations that are latency-sensitive. In Figure 1.1, time to first byte would be the difference between and

Round-trip time

Round-trip time (RTT) is the sum of the time it takes for a packet to travel from one process to another and then the time it takes for the response packet to reach the original process. Again, this is typically (but not necessarily) used for network traffic going back and forth between server and client processes, but can also be used for two processes communicating in general.

RTT, by default, includes the time taken by the server process to read, process, and respond to the request sent by the sender – that is, RTT generally includes server processing times. In the context of electronic trading, the true RTT latency is based on three components:

First, the time it takes for information from the exchange to reach the participant
Second, the time it takes for the execution of the algorithms to analyze the information and make a decision
Finally, the time it take for the decision to reach the exchange and get processed by the matching engine

We will discuss this more in the last section of this book, Analyzing and improving performance.

Tick-to-trade

Tick-to-trade (TTT) is similar to RTT and is a term most commonly used in electronic trading systems. TTT is defined as the time from when a packet (usually a market data packet) first hits a participant’s infrastructure (trading server) to the time when the participant is done processing the packet and sends a packet out (order request) to the trading exchange. So, TTT includes the time spent by the trading infrastructure to read the packet, process it, calculate trading signals, generate an order request in reaction to that, and put it on the wire. Putting it on the wire typically means writing something to a network socket. We will revisit this topic and explore it in greater detail in the last section of this book, Analyzing and improving performance. In Figure 1.1, TTT would be the difference between and .

CPU clock cycles

CPU clock cycles are basically the smallest increment of work that can be done by the CPU processor. In reality, they are the amount of time between two pulses of the oscillator that drives the CPU processor. Measuring CPU clock cycles is typically used to measure latency at the instruction level – that is, at an extremely low level at the processor level. C++ is both a low-level as well as a high-level language; it lets you get as close to the hardware as needed and also provides higher-level abstractions such as classes, templates, and so on. But generally, C++ developers do not spend a lot of time dealing with extremely low-level or possibly assembly code. This means that the compiled machine code might not be exactly what a C++ developer expects. Additionally, depending on the compiler versions, the processor architectures, and so on, there may be even more sources of differences. So, for extremely performance-sensitive low latency code, it is often not uncommon for engineers to measure how many instructions are executed and how many CPU clock cycles are required to do so. This level of optimization is typically the highest level of optimization possible, alongside kernel-level optimizations.

Now that we have seen some different methods of measuring latencies in different applications, in the next section, we will look at some latency summary metrics and how each one of them can be important under different scenarios.

Differentiating between latency metrics

The relative importance of a specific latency metric over the other depends on the application and the business itself. As an example, a latency-critical application such as an autonomous vehicle software system cares about peak latency much more than the mean latency. Low latency electronic trading systems typically care more about mean latency and smaller latency variance than they do about peak latency. Video streaming and playback applications might generally prioritize high throughput over lower latency variance due to the nature of the application and the consumers.

Throughput versus latency

Before we look at the metrics themselves, first, we need to clearly understand the difference between two terms – throughput and latency – which are very similar to each other and often used interchangeably but should not be. Throughput is defined as how much work gets done in a certain period of time, and latency is how quickly a single task is completed. To improve throughput, the usual approach is to introduce parallelism and add additional computing, memory, and networking resources. Note that each individual task might not be processed as quickly as possible, but overall, more tasks will be completed after a certain amount of time. This is because, while being processed individually, each task might take longer than in a low latency setup, but the parallelism boosts throughput over a set of tasks. Latency, on the other hand, is measured for each individual task from beginning to finish, even if fewer tasks are executed overall.

Mean latency

Mean latency is basically the expected average response time of a system. It is simply the average of all the latency measurement observations. This metric includes large outliers, so can be a noisy metric for systems that experience a large range of performance latencies.

Median latency

Median latency is typically a better metric for the expected response time of a system. Since it is the median of the latency measurement observations, it excludes the impact of large outliers. Due to this, it is sometimes preferred over the mean latency metric.

Peak latency

Peak latency is an important metric for systems where a single large outlier in performance can have a devastating impact on the system. Large values of peak latency can also significantly influence the mean latency metric of the system.

Latency variance

For systems that require a latency profile that is as deterministic as possible, the actual variance of the performance latency is an important metric. This is typically important where the expected latencies are quite predictable. For systems with low latency variance, the mean, median, and peak latencies are all expected to be quite close to each other.

Requirements of latency-sensitive applications

In this section, we will formally describe the behavior of latency-sensitive applications and the performance profile that these applications are expected to adhere to. Obviously, latency-sensitive applications need low latency performance, but here we will try to explore minor subtleties in the term low latency and discuss some different ways of looking at it.

Correctness and robustness

When we think of latency-sensitive applications, it is often the case that we think low latency is the single most important aspect of such applications. But in reality, a huge requirement of such applications is correctness and we mean very high levels of robustness and fault tolerance. Intuitively, this idea should make complete sense; these applications require very low latency to be successful, which then should tell you that these applications also have very high throughput and need to process huge amounts of inputs and produce a large number of outputs. Hence, the system needs to achieve very close to 100% correctness and be very robust as well for the application to be successful in their business area. Additionally, the correctness and robustness requirements need to be maintained as the application grows and changes during its lifetime.

Low latencies on average

This is the most obvious requirement when we think about latency-sensitive applications. The expected reaction or processing latency needs to be as low as possible for the application or business overall to succeed. Here, we care about the mean and median performance latency and need it to be as low as possible. By design, this means the system cannot have too many outliers or very high peaks in performance latency.

Capped peak latency

We use the term capped peak latency to refer to the requirement that there needs to be a well-defined upper threshold for the maximum possible latency the application can ever encounter. This behavior is important for all latency-sensitive applications, but most important for latency-critical applications. But even in the general case, applications that have extremely high-performance latency for a handful of cases will typically destroy the performance of the system. What this really means is that the application needs to handle any input, scenario, or sequence of events and do so within a low latency period. Of course, the performance to handle a very rare and specific scenario can possibly be much higher than the most likely case, but the point here is that it cannot be unbounded or unacceptable.

Predictable latency – low latency variance

Some applications prefer that the expected performance latency is predictable, even if that means sacrificing latency a little bit if the average latency metric is higher than it could be. What this really means is that such applications will make sure that the expected performance latency for all kinds of different inputs or events has as little variance as possible. It is impossible to achieve zero latency variance, but some choices can be made in data structures, algorithms, code implementation, and setup to try to minimize this as much as possible.

High throughput

As mentioned before, low latency and throughput are related but not identical. For that reason, sometimes some applications that need the highest throughput possible might have some differences in design and implementation to maximize throughput. The point is that maximizing throughput might come at the cost of sacrificing average performance latencies or increasing peak latencies to achieve that.

In this section, we introduced the concepts that apply to low latency application performance and the business impact of those metrics. We will need these concepts in the rest of the book when we refer to the performance of the applications we build. Next, we will move the conversation forward and explore the programming languages available for low latency application development. We will discuss the characteristics of the languages that support low latency applications and understand why C++ has risen to the top of the list when it comes to developing and improving latency-sensitive applications.

You have been reading a chapter from

Building Low Latency Applications with C++

Published in: Jul 2023Publisher: PacktISBN-13: 9781837639359

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Author (1)

Sourav Ghosh

Sourav Ghosh has worked in several proprietary, high-frequency algorithmic trading firms over the last decade. He has built and deployed extremely low latency, high-throughput automated trading systems for trading exchanges around the world, across multiple asset classes. He specializes in statistical arbitrage market-making and pairs trading strategies with the most liquid global futures contracts. He is currently the vice president at an investment bank based in São Paulo, Brazil. He holds a master's in computer science from the University of Southern California. His areas of interest include computer architecture, FinTech, probability theory and stochastic processes, statistical learning and inference methods, and natural language processing.
Read more about Sourav Ghosh

Personalised recommendations for you

Based on your interests and search pattern

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

BookSep 2023288 pages

Expert C++

Discover advanced programming techniques, the latest features of C++17 and C++20, and best practices for memory management, debugging, testing, and large-scale application design with Expert C++. Ideal for experienced developers advancing to proficient programmers and building professional-grade C++ applications.

BookAug 2023604 pages

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

BookOct 2023604 pages4

Developer Career Masterplan

Written by industry experts that have spent the last 20+ years helping developers grow their career path towards senior developer positions and beyond. This book provides a comprehensive guide, sharing examples and stories from their global careers. By the end, you’ll have the knowledge to create a clear career progression plan as a technical professional.

BookSep 2023310 pages

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

BookNov 2023434 pages

Python Real-World Projects

Amplify your developer journey by curating a dynamic project portfolio that outshines traditional resumes. Delve into the Python realm through immersive projects, mastering core concepts while constructing comprehensive modules and applications. From data acquisition prowess to impactful data visualization, Python Real-World Projects arms you with essential skills to beat the competition.

BookSep 2023478 pages5

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

BookNov 2023386 pages

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Quantum Computing Algorithms

The book emphasizes intuitive ideas behind quantum algorithms in ways that other books don’t cover, striking a careful balance between no math and too much math. To get the most from this book, you should be comfortable with basic algebra and writing simple computer code. No prior understanding of quantum physics is needed to get started.

BookSep 2023342 pages

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5