Few things are as important for application security as cryptography. Done properly, it can make data unreadable to attackers even if you suffer a breach. But do it wrong, and it can actually amplify the impact of other vulnerabilities.
While cryptography is incredibly accessible to developers nowadays, many still have questions around how to use it, when to use it, and which of the many options or algorithms to pick.
This book will try to answer those questions without going into formal, academic explanations, and without complex mathematical formulas or diagrams, in a way that is friendly to developers. Unlike many other books on the topic, our goal won't be to train you to become a cryptographer. Instead, by the end of this book, you'll hopefully have enough knowledge of the most important algorithms to be able to use them in your code confidently.
In this first chapter, we're going to introduce the book and cover the following topics:
- An overview of what cryptography is and why it matters to developers
- A definition of "safe" in the context of cryptography
- Types and "layers" of encryption
What is cryptography and why should a developer care?
Cryptography is used to protect the information we share on the internet. You'll also find it when you're authenticating yourself to use your phone or laptop, or swiping the badge to get into your office. Cryptography is used to secure every digital payment transaction and to guarantee contracts that are signed digitally. It is also used to protect data we store on our devices, both from unauthorized access and from accidental corruption.
In fact, this use of cryptography, to shield messages from adversaries, is thousands of years old. A common example is the Caesar cipher (a cipher is an algorithm used for encryption or decryption), which is named after the Roman emperor Julius Caesar; this is a classic example of a "substitution cipher," in which letters of the alphabet are substituted with one another. In the case of the Caesar cipher, each letter was shifted by three, so the letter A became D, B became E, Z became C, and so on.
Other famous historical ciphers include the Vigenère cipher, which in the 16th century first introduced the concept of an "encryption key," and then mechanical ones such as the Enigma machine, used by Nazi Germany during World War II and eventually broken by Polish and English researchers (this was also the topic of the 2014 movie The Imitation Game, which depicted the work of Alan Turing and the other researchers at Bletchley Park).
While the history of cryptography can be an interesting topic per se, and it provides invaluable lessons on common types of attacks against cryptography, all those historical ciphers are considered broken and so are not useful to modern application developers. As such, we won't be spending more time talking about them in this book, but nevertheless, I encourage curious readers to learn more about them!
…and the other uses of modern cryptography
In fact, throughout this book, we'll be using cryptographic functions for a variety of purposes, some of which are meant to be shared with the public rather than kept as secrets:
- In Chapter 3, File and Password Hashing with Node.js, we'll learn about hashing, which is a one-way cryptographic operation that allows you to derive a fixed-length digest for any message. This is used for purposes including checking the integrity of files or messages, generating identifiers and encryption keys, and protecting passwords.
- In Chapter 4, Symmetric Encryption in Node.js, and Chapter 5, Using Asymmetric and Hybrid Encryption in Node.js, we'll look at data encryption and decryption using various kinds of ciphers.
- In Chapter 6, Digital Signatures with Node.js and Trust, we'll learn how cryptography can be used to create and validate digital signatures, which can be used (among other things) to authenticate the author of a message, certifying its provenance, or preventing repudiation.
Why this matters to developers
As a software developer, understanding the practices presented above can help you build applications that implement cryptographic operations safely, using the most appropriate solutions. There are many possible uses of cryptography, and the aforementioned list contains only some examples.
Not all the operations we're going to learn about in this book will be useful or even relevant for every application you might be working on. However, the topics presented in the next chapters are possibly the most common ones, which developers can leverage to solve frequent real-world problems.
In fact, given the current state of application development, nowadays you could argue that knowing how to safely implement the solutions described in this book should be a skill most developers need to be at least somehow familiar with, without the need to consult with expert cryptographers.
Globally, cyberattacks have been getting more and more frequent every year and are carried out by increasingly more sophisticated adversaries, sometimes including state-sponsored actors. Just as attacks have increased in both quantity and quality, thanks to the pervasiveness of digital systems in our businesses and lives, the impact of breaches has grown costlier, with damage that can be in the range of millions of dollars and, in certain cases, catastrophic or fatal.
Whereas security used to be an afterthought in the software development life cycle, at times relegated to an issue for operations specialists only, teams are now tasked with considering it during every stage of the development process. Many organizations are also adopting practices such as DevSecOps and "shift left security," in which the security needs of an entire solution are included in the entire software development life cycle, from planning and development to operations.
DevOps and DevSecOps
DevOps is a set of principles and practices, aided by specialized tools, to enable the continuous delivery of value to end users. DevOps is built around practices such as Continuous Integration (CI) and Continuous Delivery (CD), agile software development, and continuous monitoring that are meant to bring the development and operations teams closer together and allow faster application release cycles, and continuous learning and improvement.
To learn more about DevOps, visit https://bit.ly/crypto-devops
Building on DevOps, DevSecOps requires the integration of security practices and thinking into every stage of the DevOps process – planning, building, delivering, and operating. Organizations that adopt DevSecOps believe that security is a shared responsibility among every person and team that has any dealing with an application.
To learn more about DevSecOps, visit https://bit.ly/crypto-devsecops
For example, whereas corporate or Local Networks (LANs) were traditionally considered safe, it's now common practice to assume your systems have been breached and practice defense in depth or add multiple layers of protection. So, an application that encrypts data in transit prevents an eavesdropper (called a Man-in-the-Middle or MitM) from potentially stealing secrets, even inside a LAN.
It should go without saying that while adding cryptography to your applications is often necessary to make them safe, it's almost always not sufficient. For example, protecting your users' data with strong encryption will help limit the impact of breaches but will be useless against account takeovers through social engineering.
Nevertheless, cryptography does play an important role in the process of protecting data; for example, it secures the data your application uses (both at rest and in transit), ensuring the integrity of the information stored or exchanged, authenticating end users or other services, preventing tampering, and so on.
In addition to learning about the different classes of cryptographic operations that you can take advantage of, with this book I also hope you will be equipped to understand when to use each one in the safest way possible.
In fact, just as not using cryptography can lead to serious consequences, implementing cryptography in the wrong way can be equally as bad. This can happen when developers use broken algorithms (for this reason, we'll be sticking to a few tried-and-tested ones in this book) or buggy implementations of good algorithms, such as libraries containing security vulnerabilities (which is why we're going to recommend built-in standard libraries whenever possible). In other cases, issues could be caused by developers picking the wrong cryptographic operations; one too-common example is storing passwords in databases that have been encrypted with a symmetric cipher rather than hashed, as we'll see later in the book.
Lastly, I would like to point out that the five classes of cryptographic operations we're covering in this book do not represent all the different algorithms and protocols that can be used by modern applications, nor all the possible different uses of cryptography.
For example, end-to-end encrypted messaging apps or VPNs often need to implement special key exchange algorithms, or full-disk encryption apps may leverage different algorithms or modes of operation than the ones we'll be looking at. Such problems tend to be more specific to a certain domain or type of application, and if you are building a solution that requires them, you might need to perform some additional research or work with a cryptographer.
What this book is about – and what it's not
This book is about the practical uses of common cryptographic operations that software developers without a background in cryptography can leverage.
In each chapter, we'll present a specific class of cryptographic operations, and for each, we'll then look at the following:
- What it's used for and what it should not be used for.
- What the most common algorithms are that developers should use.
- Considerations on what parameters to set for using the algorithms safely, when appropriate.
Just as we have covered what this book is about, it's important to also mention what this book is not about, listing four things that you will not be learning in the following chapters:
- First, this book is not about how cryptographic algorithms work, or a description of the algorithms themselves.
As we've mentioned many times already, our book's goal is not to train cryptographers but rather to help developers build better applications leveraging common cryptographic algorithms. In this book, you will not encounter formal descriptions of the algorithms and their flows, the mathematical models behind them, and so on.
- Second, following on from the previous point, we will be looking at how to use cryptographic algorithms and not at how to implement them.
The latter not only requires a deep understanding of how algorithms work but is also a dangerous minefield because of the risks associated with poor implementations. At the risk of sounding hyperbolic, an entire class of attacks could be made possible by doing things as common as using
ifstatements in code implementing a cryptographic algorithm.
In short, we'll leave the implementation to the experts, and we'll stick to using pre-built libraries. Whenever possible, that will be the built-in standard library of Node.js.
- Third, as mentioned previously, this book is about strong, modern cryptography only. We'll focus on a limited set of widely adopted algorithms and functions that have been "battle-tested" and are generally deemed safe by cryptographers (more on what "safe" means in the next section). Simple or historical ciphers, codes, and so on might be interesting to those wanting to learn the science of cryptography or looking for brain-teasers, but are of little practical relevance for our goal.
- Fourth, we will not be talking about the other kind of "crypto", that is, cryptocurrencies. While Bitcoin, Ethereum, and the like are indeed based on the blockchain technology, which makes extensive use of cryptography (especially hashing and digital signatures), that's where the commonality ends. Nevertheless, should you be interested in learning about how blockchains work in the future, the concepts you'll learn from this book will likely be of use to you.
Rules of engagement
Lastly, before we begin, it's important to point out that cryptography is hard; making mistakes is surprisingly easy, and finding them can be really challenging. This is true even for expert cryptographers!
You'll often hear two very strong recommendations that are meant to keep you and the applications you're building safe. If you allow me, I propose that we turn them into a kind of "Hippocratic oath," which I invite you to repeat after me:
- I will not invent my own cryptographic algorithms.
- I will not roll my own implementation of cryptographic algorithms.
Thankfully for us, a lot of very brilliant cryptographers have worked for decades to design strong algorithms and implement them correctly and safely. As developers, the best thing we can do is to stick to proven, tested algorithms, and leverage existing, audited libraries to adopt them. Normally, that means using whatever is available on the standard library of the language you're using, the APIs exposed by the operating system or runtime environment, or leveraging well-known, audited libraries.
As we've now introduced the topic and the goals of this book, and hopefully convinced you of the importance of adopting cryptography in your applications, we're now ready to get started with our learning, starting with setting some shared understanding.
As a rule of thumb, possibly every cryptographic algorithm can be broken, given a sufficient amount of computing power and time.
For simplicity, we'll focus the next examples on data encryption, although the same would apply to all other classes of cryptographic operations.
The goal of using cryptography is to make sure that the effort (computing power multiplied by time) is too big for an attacker to attempt to break your encryption. Assuming that the algorithms you use do not have flaws in their design or backdoors that can be exploited, the only way attackers can crack your encryption is by doing a brute-force attack.
A brute-force attack works by trying every single combination possible until you find the right one. For example, if you were trying to break a padlock with 3 digits, each from 0 to 9, you'd have 1,000 different combinations (000, 001, 002… until 999), and the correct one would be one of those. On average, it would take you 500 attempts before you could find the right one (the number of possible permutations divided by 2). If every attempt takes you 3 seconds to try, then you can expect to be done in 1,500 seconds on average, or 25 minutes.
For example, AES-128 (a symmetric cipher) uses a 128-bit key, which means that there are 2128 possible combinations, or 340,282,366,920,938,463,463,374,607,431,768,211,456;that is, approximately 3.4 x 1038. That is a very large number that ought to be put into perspective.
One of the largest computing grids in the world today, if perhaps not the largest, is the network of all the Bitcoin mining grids. In 2021, the Bitcoin network reached an estimated peak hash rate of 180 million terahashes per second, which would translate to being able to compute about 292 hashes per year. This is a metric that is commonly used to indicate how much compute power is being used by the network.
Imagine a hypothetical scenario in which all Bitcoin miners agreed to get together and convert the entire grid to brute-force a 128-bit key. If they managed to attempt 292 keys per year, it would still take them an average of 235 years, or over 1010 years, to complete the attack (2128 possible keys, divided by 292 keys attempted per year, divided by 2 to get the average time). That's roughly the same as the age of the universe, which is estimated to be around 14 billion years old, or about 1010.
Of course, computing power increases constantly, so the time it will take to break cryptography in the future will be less.
Quantum computing will also make it easier to break certain kinds of cryptography, although those systems are still experimental today and not yet powerful enough for most practical applications (nevertheless, cryptographers are already preparing for that future today, by designing stronger "post-quantum" algorithms).
That said, our goal should always be to choose algorithms that are guaranteed to protect our data for at least as long as it's necessary. For example, let's say you encrypt a document containing the password of your online banking account today; if technological innovation allowed an attacker to crack it in "only" 100 years from now, it might not matter to you at that point, as you'll most likely be dead by then.
Given the aforementioned context, an algorithm should be considered "broken" when it's possible to crack it in a way that is significantly faster than using a brute-force attack.
How Cryptography Is Broken – the Example of Frequency Analysis
At the beginning of this chapter, we looked at some historical ciphers and explained that they were broken. Substitution ciphers are among the oldest and simplest ones, and they rely on replacing every letter of the alphabet with another one. For example, in the Caesar cipher, letters are shifted by three, so A is replaced with D, B with E, and so on. While this may have been effective to protect messages from the illiterate enemy soldiers of the era, it looks trivial today.
Substitution ciphers that rely on the shifting of letters can be broken in a fairly simple way with brute-force attacks, even manually (assuming, of course, that the attacker knows that the cipher works by shifting letters). With only 26 letters in the English alphabet, it takes at most 25 attempts to find the number by which characters are shifted.
A more complex substitution cipher could instead create a new alphabet in which letters are randomly shuffled. So, the letter A may be replaced with X, B with C, and D with P. With the English alphabet, there are 26! (26 factorial) possible combinations, or about 1026, offering a decent amount of protection against brute-force attacks, even against an adversary that can use modern computers.
However, through cryptanalysis (the science of studying cryptographic systems looking for weaknesses), even those more "advanced" substitution ciphers have been broken, through a technique called frequency analysis. This works by looking at the frequency of letters (and pairs of letters) in the encrypted text and comparing that with the average frequency of letters in English writing. For example, "E" is the most frequently used letter in English text, so whatever letter is the most used in the encrypted text will likely correspond to "E."
With an understanding of how encryption helps protect data and a contextualization of what it means for data to be "safe," we now need to look at the multiple ways that data can be encrypted.
Types and "layers" of encryption
Before we begin talking about data encryption throughout this book, we should clarify the various types of data encryption and the layers at which data can be encrypted.
- Encryption at rest refers to the practice of encrypting data when it's stored on a persistent medium. Some canonical examples include storing files in an encrypted hard drive or turning on data encryption for your database systems. When data is encrypted at rest, it's protected against certain kinds of attacks, such as physical ones on the computers/servers that store the data (for example, stolen hard drives). However, data is usually decrypted in-memory while it's being processed, so attackers that manage to infiltrate a live system may have the ability to steal your data as plaintext.
For example, it's common nowadays to encrypt the hard drives of computers, which is especially important to prevent people from reading data from the storage of a laptop that is lost or stolen. However, while the laptop is powered on, the encryption keys are present in memory (RAM), so malicious applications running on the system may get full access to everything stored on the hard drive.
- Encryption in transit refers to the practice of encrypting data while it's being transmitted over an untrusted channel. The most common example is Transport Layer Security (TLS), used by the HTTPS protocol for securing access to websites; this protects the information exchanged between a client and web server over the internet (for example, passwords or other sensitive data). In this case, a MitM that managed to tap the wire would not be able to see the actual data being exchanged. However, both the client that sends the data and the server that receives it are able to see the message in plaintext.
- End-to-end encryption (also called E2E Encryption or E2EE) is the practice of encrypting a client's data before sending it to a remote server so that only the client has the keys to decrypt it. This is commonly used with cloud storage; your documents are encrypted on your laptop before being sent to the cloud provider, and the keys never leave your laptop. The cloud provider sees only encrypted blobs of data and cannot read what you're storing on their service or do any processing on that data (although they might still gather insights based on metadata, such as the size of the encrypted blobs; for example, encrypted videos are much larger than encrypted photos!).
The techniques we're going to learn in this book will primarily apply to developers building solutions for encrypting data at rest or that leverage end-to-end encryption. For most applications that require encryption in transit, developers will find it much more efficient (and effective) to leverage algorithms such as TLS (HTTPS) as proven and universal standards.
- Data can be encrypted with multiple layers of the same encryption type. For example, you might encrypt a file you're working on (for example, using PGP (Pretty Good Privacy)/GPG (GNU Privacy Guard), or creating an encrypted ZIP file) and store that on an encrypted hard drive. Both operations provide encryption at rest for the data, yet they serve different purposes; encrypting the file makes it so an attacker with access to the running system could not see its contents, while the full-disk encryption ensures that someone stealing your laptop could not even see that the encrypted file exists in there (not even its filename in many cases).
- Data can also be encrypted with multiple types of encryptions. For example, you could be storing a file protected with end-to-end encryption on a remote server and use TLS (encryption in transit) while transmitting it. While this wouldn't give you more protection against eavesdroppers trying to read what data you're sending, it can offer additional privacy because they would not be able to see that information is being sent to the cloud storage provider in the first place.
Layered encryption is especially common in the context of in-transit data encryption. For example, when you connect to the internet through a Wi-Fi network that is secured with a password (for example, using the WPA2 protocol) and then visit a website over a secure HTTPS connection, your data in transit is encrypted twice: TLS (HTTPS) protects your communication between the laptop and the web server, and WPA2 offers additional protection between your laptop and the Wi-Fi access point.
Understanding the different types and layers of encryption is useful when you're designing your solution, as it allows you to identify where you should leverage cryptography to protect your and your users' data. This decision needs to be influenced by your solution's needs and your threat model (for more information, visit the OWASP page on threat modeling: https://bit.ly/crypto-threat-modeling).
In this first chapter, we have introduced the goal of this book – to provide practical foundations of cryptography for software developers. Then, we listed the concepts we'll be learning in the next chapters. We've also learned why developers should care about cryptography and discussed some best practices that people dealing with cryptography should follow as safeguards.
In the next chapter, we'll begin writing code, first by learning about how to deal with binary data in Node.js and how to encode that as text strings. We'll also look at how to generate random data with Node.js, something that we'll do frequently throughout this book.