Working with Numbers and Strings
Numbers and strings are the fundamental types of any programming language; all other types are based on or composed of these ones. Developers are confronted all the time with tasks such as converting between numbers and strings, parsing and formatting strings, and generating random numbers. This chapter is focused on providing useful recipes for these common tasks using modern C++ language and library features.
The recipes included in this chapter are as follows:
- Converting between numeric and string types
- Limits and other properties of numeric types
- Generating pseudo-random numbers
- Initializing all the bits of the internal state of a pseudo-random number generator
- Creating cooked user-defined literals
- Creating raw, user-defined literals
- Using raw string literals to avoid escaping characters
- Creating a library of string helpers
- Verifying the format of a string using regular...
Converting between numeric and string types
Converting between number and string types is a ubiquitous operation. Prior to C++11, there was little support for converting numbers to strings and back, so developers had to resort mostly to type-unsafe functions, and they usually wrote their own utility functions in order to avoid writing the same code over and over again. With C++11, the standard library provides utility functions for converting between numbers and strings. In this recipe, you will learn how to convert between numbers and strings and the other way around using modern C++ standard functions.
All the utility functions mentioned in this recipe are available in the
How to do it...
Use the following standard conversion functions when you need to convert between numbers and strings:
- To convert from an integer or floating-point type to a string type, use
std::to_wstring(), as shown in the...
Limits and other properties of numeric types
Sometimes, it is necessary to know and use the minimum and maximum values that can be represented with a numeric type, such as
double. Many developers use standard C macros for this, such as
DBL_MAX. C++ provides a class template called
numeric_limits with specializations for every numeric type that enables you to query the minimum and maximum value of a type. However,
numeric_limits is not limited to that functionality, and offers additional constants for type property querying, such as whether a type is signed or not, how many bits it needs for representing its values, whether it can represent infinity for floating-point types, and many others. Prior to C++11, the use of
numeric_limits<T> was limited because it could not be used in places where constants were needed (examples include the size of arrays and switch cases). Due to that, developers preferred to...
Generating pseudo-random numbers
Generating random numbers is necessary for a large variety of applications, from games to cryptography, from sampling to forecasting. However, the term random numbers is not actually correct, as the generation of numbers through mathematical formulas is deterministic and does not produce true random numbers, but numbers that look random and are called pseudo-random. True randomness can only be achieved through hardware devices, based on physical processes, and even that can be challenged as we may consider even the universe to be actually deterministic. Modern C++ provides support for generating pseudo-random numbers through a pseudo-random number library containing number generators and distributions. Theoretically, it can also produce true random numbers, but in practice, those could actually be only pseudo-random.
In this recipe, we'll discuss the standard support for generating pseudo-random numbers. Understanding...
Initializing all bits of internal state of a pseudo-random number generator
In the previous recipe, we looked at the pseudo-random number library, along with its components, and how it can be used to produce numbers in different statistical distributions. One important factor that was overlooked in that recipe is the proper initialization of the pseudo-random number generators.
With careful analysis (that is beyond the purpose of this recipe or this book), it can be shown that the Mersenne twister engine has a bias toward producing some values repeatedly and omitting others, thus generating numbers not in a uniform distribution, but rather in a binomial or Poisson distribution. In this recipe, you will learn how to initialize a generator in order to produce pseudo-random numbers with a true uniform distribution.
You should read the previous recipe, Generating pseudo-random numbers, to get an overview of what the pseudo-random number library offers...
Creating cooked user-defined literals
Literals are constants of built-in types (numerical, Boolean, character, character string, and pointer) that cannot be altered in a program. The language defines a series of prefixes and suffixes to specify literals (and the prefix/suffix is actually part of the literal). C++11 allows us to create user-defined literals by defining functions called literal operators, which introduce suffixes for specifying literals. These work only with numerical character and character string types.
This opens the possibility of defining both standard literals in future versions and allows developers to create their own literals. In this recipe, we will learn how to create our own cooked literals.
User-defined literals can have two forms: raw and cooked. Raw literals are not processed by the compiler, whereas cooked literals are values processed by the compiler (examples can include handling escape sequences in a character string or...
Creating raw user-defined literals
In the previous recipe, we looked at the way C++11 allows library implementers and developers to create user-defined literals and the user-defined literals available in the C++14 standard. However, user-defined literals have two forms: a cooked form, where the literal value is processed by the compiler before being supplied to the literal operator, and a raw form, in which the literal is not processed by the compiler before being supplied to the literal operator. The latter is only available for integral and floating-point types. Raw literals are useful for altering the compiler's normal behavior. For instance, a sequence such as 3.1415926 is interpreted by the compiler as a floating-point value, but with the use of a raw user-defined literal, it could be interpreted as a user-defined decimal value. In this recipe, we will look at creating raw user-defined literals.
Before continuing with this recipe, it is strongly recommended...
Using raw string literals to avoid escaping characters
Strings may contain special characters, such as non-printable characters (newline, horizontal and vertical tab, and so on), string and character delimiters (double and single quotes), or arbitrary octal, hexadecimal, or Unicode values. These special characters are introduced with an escape sequence that starts with a backslash, followed by either the character (examples include
"), its designated letter (examples include
n for a new line,
t for a horizontal tab), or its value (examples include octal 050, hexadecimal XF7, or Unicode U16F0). As a result, the backslash character itself has to be escaped with another backslash character. This leads to more complicated literal strings that can be hard to read.
To avoid escaping characters, C++11 introduced raw string literals that do not process escape sequences. In this recipe, you will learn how to use the various forms of raw string literals.
Creating a library of string helpers
The string types from the standard library are a general-purpose implementation that lacks many helpful methods, such as changing the case, trimming, splitting, and others that may address different developer needs. Third-party libraries that provide rich sets of string functionalities exist. However, in this recipe, we will look at implementing several simple, yet helpful, methods you may often need in practice. The purpose is rather to see how string methods and standard general algorithms can be used for manipulating strings, but also to have a reference to reusable code that can be used in your applications.
In this recipe, we will implement a small library of string utilities that will provide functions for the following:
- Changing a string into lowercase or uppercase
- Reversing a string
- Trimming white spaces from the beginning and/or the end of the string
- Trimming a specific set of characters from the beginning...
Verifying the format of a string using regular expressions
Regular expressions are a language intended for performing pattern matching and replacements in texts. C++11 provides support for regular expressions within the standard library through a set of classes, algorithms, and iterators available in the header
<regex>. In this recipe, we will learn how regular expressions can be used to verify that a string matches a pattern (examples can include verifying an email or IP address formats).
Throughout this recipe, we will explain, whenever necessary, the details of the regular expressions that we use. However, you should have at least some basic knowledge of regular expressions in order to use the C++ standard library for regular expressions. A description of regular expressions syntax and standards is beyond the purpose of this book; if you are not familiar with regular expressions, it is recommended that you read more about them before continuing...
Parsing the content of a string using regular expressions
In the previous recipe, we looked at how to use
std::regex_match() to verify that the content of a string matches a particular format. The library provides another algorithm called
std::regex_search() that matches a regular expression against any part of a string, and not only the entire string, as
regex_match() does. This function, however, does not allow us to search through all the occurrences of a regular expression in an input string. For this purpose, we need to use one of the iterator classes available in the library.
In this recipe, you will learn how to parse the content of a string using regular expressions. For this purpose, we will consider the problem of parsing a text file containing name-value pairs. Each such pair is defined on a different line and has the format
name = value, but lines starting with a
# represent comments and must be ignored. The following is an example:
#remove # to uncomment...
Replacing the content of a string using regular expressions
In the previous two recipes, we looked at how to match a regular expression on a string or a part of a string and iterate through matches and submatches. The regular expression library also supports text replacement based on regular expressions. In this recipe, we will learn how to use
std::regex_replace() to perform such text transformations.
For general information about regular expressions support in C++11, refer to the Verifying the format of a string using regular expressions recipe, earlier in this chapter.
How to do it...
In order to perform text transformations using regular expressions, you should perform the following:
<string>and the namespace
std::string_literalsfor C++14 standard user-defined literals for strings:
#include <regex> #include <string> using namespace std::string_literals;
- Use the
Using string_view instead of constant string references
When working with strings, temporary objects are created all the time, even if you might not be really aware of it. Many times, these temporary objects are irrelevant and only serve the purpose of copying data from one place to another (for example, from a function to its caller). This represents a performance issue because they require memory allocation and data copying, which should be avoided. For this purpose, the C++17 standard provides a new string class template called
std::basic_string_view that represents a non-owning constant reference to a string (that is, a sequence of characters). In this recipe, you will learn when and how you should use this class.
string_view class is available in the namespace
std in the
How to do it...
You should use
std::string_view to pass a parameter to a function (or return a value from a function), instead of
Formatting text with std::format
The C++ language has two ways of formatting text: the
printf family of functions and the I/O streams library. The
printf functions are inherited from C and provide a separation of the formatting text and the arguments. The streams library provides safety and extensibility and is usually recommended over
printf functions, but is, in general, slower. The C++20 standard proposes a new formatting library alternative for output formatting, which is similar in form to
printf but safe and extensible and is intended to complement the existing streams library. In this recipe, we will learn how to use the new functionalities instead of the
printf functions or the streams library.
The new formatting library is available in the header
<format>. You must include this header for the following samples to work.
How to do it...
std::format() function formats its arguments according to the provided formatting string. You can use...
Using std::format with user-defined types
The C++20 formatting library is a modern alternative to using
printf-like functions or the I/O streams library, which it actually complements. Although the standard provides default formatting for basic types, such as integral and floating-point types,
bool, character types, strings, and chrono types, the user can create custom specialization for user-defined types. In this recipe, we will learn how to do that.
You should read the previous recipe, Formatting text with std::format, to familiarize yourself with the formatting library.
In the examples that we'll be showing here, we will use the following class:
In the next section, we'll introduce the necessary steps to implement to enable text formatting using
std::format() for user-defined types.