Coding with Streams
We briefly touched on streams in Chapter 3, Callbacks and Events and Chapter 5, Asynchronous Control Flow Patterns with Promises and Async/Await as an option to make some of our code a bit more robust. Now, it’s finally time to dive in! We are here to talk about streams: one of the most important components and patterns of Node.js. There is a motto in the community that goes, “stream all the things!”, and this alone should be enough to describe the role of streams in Node.js. Dominic Tarr, an early contributor to the Node.js community, defined streams as “Node’s best and most misunderstood idea.” There are different reasons that make Node.js streams so attractive; it’s not just related to technical properties, such as performance or efficiency, but it’s more about their elegance and the way they fit perfectly into the Node.js philosophy. Yet, despite their potential, streams remain underutilized in the broader developer community. Many find them intimidating and choose to avoid them altogether. This chapter is here to change that. We’ll explore streams in depth, highlight their advantages, and present them in a clear and approachable way, making their power accessible to all developers.
But before we dive in, let’s take a short break for an author’s note (Luciano here). Streams are one of my favourite topics in Node.js, and I can’t help but share a story from my career where streams truly saved the day.
I was working for a network security company on a team developing a cloud application. The application’s purpose was to collect network metadata from physical devices monitoring traffic in corporate environments. Imagine recording all the connections between hosts in the network, which protocols they’re using, and how much data they’re transferring. This data could help spot the movement of an attacker in the network or uncover attempts at data exfiltration. The idea was simple yet powerful: in the event of a security incident, our customers could log into our platform, browse through the recorded metadata, and figure out exactly what happened, enabling them to take action quickly.
As you might imagine, this required continuously streaming a significant amount of data from devices at customer sites to our cloud-based web server. In the spirit of keeping things simple and shipping fast, our initial implementation of the data collector (the HTTP server receiving and storing metadata) used a buffered approach.
Devices would send network metadata in frames every minute, each containing all the observations from the previous 60 seconds.
Here’s how it worked: we’d load the entire frame into memory as it arrived, and only after receiving the complete frame would we write it to persistent storage. This worked well in the beginning because we were only serving small customers who generated relatively modest amounts of metadata, even during peak traffic.
But when we rolled out the solution to a larger customer, things started to break down. We noticed occasional failures in the collector and, worse, gaps in the stored data. After digging into the issue, we discovered that the collector was crashing due to excessive memory usage. If a customer generated a particularly large frame, the system couldn’t handle it, leading to data loss.
This was a serious problem. Our entire value proposition depended on being able to reliably store and retrieve network metadata for forensic analysis. If customers couldn’t trust us to preserve their data, the platform was effectively useless.
We needed a fix, and fast. The root of the problem was clear: buffering entire frames in memory was a rookie mistake. The solution? Keep the memory footprint low by processing data in smaller chunks and writing them to storage incrementally.
Enter Node.js streams. With streams, we could process data piece by piece as it arrived, rather than waiting for the entire frame. After refactoring our code to use streams, we were able to handle terabytes of data daily without breaking a sweat. The system’s latency improved dramatically: customers could see their data in the cloud in under two minutes. We also cut costs by using smaller machines with less memory, and the new implementation was far more elegant and maintainable, thanks to the composable nature of the Node.js streams API.
While this might sound like a specific use case, the lessons here apply broadly. Any time you’re moving data from A to B, especially when dealing with unpredictable volumes or when early results are valuable, Node.js streams are an invaluable tool.
I promise you that once you learn the fundamentals of streams, you’ll appreciate their power and see many opportunities to leverage them in your applications!
This chapter aims to provide a complete understanding of Node.js streams. The first half of this chapter serves as an introduction to the main ideas, the terminology, and the libraries behind Node.js streams. In the second half, we will cover more advanced topics and, most importantly, we will explore useful streaming patterns that can make your code more elegant and effective in many circumstances.
In this chapter, you will learn about the following topics:
- Why streams are so important in Node.js
- Understanding, using, and creating streams
- Streams as a programming paradigm: leveraging their power in many different contexts and not just for I/O
- Streaming patterns and connecting streams together in different configurations
Without further ado, let’s discover together why streams are one of the cornerstones of Node.js.