Deploying Node.js

By Sandro Pasquali
    What do you get with a Packt Subscription?

  • Instant access to this title and 7,500+ eBooks & Videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
About this book

Node.js is a popular choice for teams that need to design, build, test, deploy, maintain, and monitor large-scale distributed systems. Starting with a detailed overview of the Node.js architecture, this book covers topics that will help in application development, testing, deployment, and maintenance.

You will learn about concurrency, event loops, callbacks and streams. Furthermore, step-by-step instructions on deploying applications to providers such as DigitalOcean and Heroku will be provided, including information on setting up load balancers and proxies. Message queues and other techniques for managing state and session data at scale will also be covered.

A series of examples on deploying your Node.js applications in production environments are provided, including a discussion on setting up continuous deployment and integration for your team. Popular tools for testing, deploying, building, and monitoring Node.js applications are covered, helping you get up and running quickly.

Publication date:
July 2015
Publisher
Packt
Pages
274
ISBN
9781783981403

 

Chapter 1. Appreciating Node

At the time of writing this book, Node is approaching its fifth year of existence, and its usage has grown in each of those five years. The opportunity for Node to fail has come, and passed. Node is a serious technology built by a highly skilled core team and very active community focused on constantly improving its speed, security, and usefulness.

Every day, developers face some of the problems that NodeJS aims to solve. Some of them are as follows:

  • Scaling networked applications beyond a single server

  • Preventing I/O bottlenecks (database, file, and network access)

  • Monitoring system usage and performance

  • Testing the integrity of system components

  • Managing concurrency safely and reliably

  • Pushing code changes and bug fixes into live environments

In this book, we will look at techniques of deploying, scaling, monitoring, testing, and maintaining your Node applications. The focus will be on how Node's event-driven, nonblocking model can be applied in practice to these aspects of software design and deployment.

On February 28, 2014, Eran Hammer delivered the keynote address to attendees of NodeDay, a large developer conference organized and sponsored by PayPal. He began his address by reciting some numbers relevant to his employer, Walmart:

  • 11,000 stores

  • Half a trillion dollars of net sales per year

  • 2.2 million employees

  • The largest private employer in the world

He continued:

 

"55 percent of our Black Friday traffic, which is our Superbowl of the year…we do about 40 percent of annual revenues on Black Friday. 55 percent came on mobile…that 55 percent of traffic went 100 percent through Node. […] We were able to deliver…this massive traffic with the equivalent of two CPUs and 30 Gigs of RAM. That's it. That's what Node needed to handle 100 percent of mobile Node traffic on Black Friday. […] Walmart global e-commerce is a 10-billion-dollar business, and by the end of this year, all 10 billion will go through Node."

 
 --Eran Hammer, Senior Architect, Walmart Labs

Modern network software, for various reasons, is growing in complexity and, in many ways, changing how we think about application development. Most new platforms and languages are attempting to address these changes. Node is no exception—and JavaScript is no exception.

Learning about Node means learning about event-driven programming, composing software out of modules, creating and linking data streams, and producing and consuming events and their related data. Node-based architectures are often composed of many small processes and/or services communicating with events—internally, by extending the EventEmitter interface and using callbacks and externally, over one of several common transport layers (for example, HTTP, TCP) or through a thin messaging layer covering one of these transport layers (for example, 0MQ, Redis PUBSUB, and Kafka). It is likely that these processes are composed of several free, open source, and high-quality npm modules, each distributed with unit tests and/or examples and/or documentation.

In this chapter, we will take a quick tour of Node, highlighting the problems it aims to solve, the solutions implied by its design, and what this means to you. We will also briefly discuss some of the core topics we will explore more comprehensively in later chapters, such as how to structure efficient and stable Node servers, how to make the best use of JavaScript for your application and your team, and how to think about and use Node for best results.

Let's start with understanding the how and why of Node's design.

 

Understanding Node's unique design


I/O operations (disk and network) are clearly more expensive. The following table shows clock cycles consumed by typical system tasks (from Ryan Dahl's original presentation of Node—https://www.youtube.com/watch?v=ztspvPYybIY):

L1-cache

3 cycles

L2-cache

14 cycles

RAM

250 cycles

Disk

41,000,000 cycles

Network

240,000,000 cycles

The reasons are clear enough: a disk is a physical device, a spinning metal platter—storing and retrieving that data is much slower than moving data between solid-state devices (such as microprocessors and memory chips) or indeed optimized on-chip L1/L2 caches. Similarly, data does not move from point to point on a network instantaneously. Light itself needs 0.1344 seconds to circle the globe! In a network used by many billions of people regularly interacting across great distances at speeds much slower than the speed of light, with many detours and few straight lines, this sort of latency builds up.

When our software ran on personal computers on our desks, little or no communication was happening over the network. Delays or hiccups in our interactions with a word processor or spreadsheet had to do with disk access time. Much work was done to improve disk access speeds. Data storage and retrieval became faster, software became more responsive, and users now expect this responsiveness in their tools.

With the advent of cloud computing and browser-based software, your data has left the local disk and exists on a remote disk, and you access this data via a network—the Internet. Data access times have slowed down again, dramatically. Network I/O is slow. Nevertheless, more companies are migrating sections of their applications into the cloud, with some software being entirely network-based.

Node is designed to make I/O fast. It is designed for this new world of networked software, where data is in many places and must be assembled quickly. Many of the traditional frameworks to build web applications were designed at a time when a single user working on a desktop computer used a browser to periodically make HTTP requests to a single server running a relational database. Modern software must anticipate tens of thousands of simultaneously connected clients concurrently altering enormous, shared data pools via a variety of network protocols on any number of unique devices. Node is designed specifically to help those building that kind of network software.

What do concurrency, parallelism, asynchronous execution, callbacks, and events mean to the Node developer?

Concurrency

Running code procedurally, or in order, is a reasonable idea. We tend to do that when we execute tasks and, for a long time, programming languages were naturally procedural. Clearly, at some point, the instructions you send to a processor must be executed in a predictable order. If I want to multiply 8 by 6, divide that result by 144 divided by 12, and then add the total result to 10, the order of those operations must proceed sequentially:

( (8x6) / (144/12) ) + 10

The order of operations must not be as follows:

(8x6) / ( (144/12) + 10 )

This is logical and easy to understand. Early computers typically had one processor, and processing one instruction blocked the processing of subsequent instructions. But things did not stay that way, and we have moved far beyond single-core computers.

If you think about the previous example, it should be obvious that calculating 144/12 and 8x6 can be done independently—one need not wait for the other. A problem can be divided into smaller problems and distributed across a pool of available people or workers to work on in parallel, and the results can be combined into a correctly ordered final calculation.

Multiple processes, each solving one part of a single mathematical problem simultaneously, are an example of parallelism.

Rob Pike, co-inventor of Google's Go programming language, defines concurrency in this way:

"Concurrency is a way to structure a thing so that you can, maybe, use parallelism to do a better job. But parallelism is not the goal of concurrency; concurrency's goal is a good structure."

Concurrency is not parallelism. A system demonstrating concurrency allows developers to compose applications as if multiple independent processes are simultaneously executing many possibly related things. Successful high-concurrency application development frameworks provide an easy-to-reason-about vocabulary to describe and build such a system.

Node's design suggests that achieving its primary goal—to provide an easy way to build scalable network programs—includes simplifying how the execution order of coexisting processes is structured and composed. Node helps a developer reasoning about a program, within which many things are happening at once (such as serving many concurrent clients), to better organize his or her code.

Let's take a look at the differences between parallelism and concurrency, threads and processes, and the special way that Node absorbs the best parts of each into its own unique design.

Parallelism and threads

The following diagram describes how a traditional microprocessor might execute the simple program discussed previously:

The program is broken up into individual instructions that are executed in order. This works but does require that instructions be processed in a serial fashion, and, while any one instruction is being processed, subsequent instructions must wait. This is a blocking process—executing any one segment of this chain blocks the execution of subsequent segments. There is a single thread of execution in play.

However, there is some good news. The processor has (literally) total control of the board, and there is no danger of another processor nulling memory or overriding any other state that this primary processor might manipulate. Speed is sacrificed for stability and safety.

We do like speed; however, the model discussed earlier rapidly became obsolete as chip designers and systems programmers worked to introduce parallel computing. Rather than having one blocking thread, the goal was to have multiple cooperating threads.

This improvement definitely increased the speed of calculation but introduced some problems, as described in the following schematic:

This diagram illustrates cooperating threads executing in parallel within a single process, which reduces the time necessary to perform the given calculation. Distinct threads are employed to break apart, solve, and compose a solution. As many subtasks can be completed independently, the overall completion time can be reduced dramatically.

Threads provide parallelism within a single process. A single thread represents a single sequence of (serially executed) instructions. A process can contain any number of threads.

Difficulties arise out of the complexity of thread synchronization. It is very difficult to model highly concurrent scenarios using threads, especially models in which the state is shared. It is difficult to anticipate all the ways in which an action taken in one thread will affect all the others if it is never clear when an asynchronously executing thread will complete:

  • The shared memory and the locking behavior this requires lead to systems that are very difficult to reason about as they grow in complexity.

  • Communication between tasks requires the implementation of a wide range of synchronization primitives, such as mutexes and semaphores, condition variables, and so on. An already challenging environment requires highly complex tools, expanding the level of expertise necessary to complete even relatively simple systems.

  • Race conditions and deadlocks are a common pitfall in these sorts of systems. Contemporaneous read/write operations within a shared program space lead to problems of sequencing, where two threads may be in an unpredictable race for the right to influence a state, event, or other key system characteristic.

  • Because maintaining dependable boundaries between threads and their states is so difficult, ensuring that a library (for Node, it would be a package or module) is thread safe occupies a great deal of the developer's time. Can I know that this library will not destroy some part of my application? Guaranteeing thread safety requires great diligence on the part of a library's developer, and these guarantees may be conditional: for example, a library may be thread safe when reading—but not when writing.

We want the power of parallelization provided by threads but could do without the mind-bending world of semaphores and mutexes. In the Unix world, there is a concept that is sometimes referred to as the Rule of Simplicity: Developers should design for simplicity by looking for ways to break up program systems into small, straightforward cooperating pieces. This rule aims to discourage developers' affection for writing 'intricate and beautiful complexities' that are, in reality, bug-prone programs.

Concurrency and processes

Parallelism within a single process is a complicated illusion that is achieved deep within mind-bendingly complex chipsets and other hardware. The question is really about appearances—about how the activity of the system appears to, and can be programmed by, a developer. Threads offer hyper-efficient parallelism, but make concurrency difficult to reason about.

Rather than have the developer struggle with this complexity, Node itself manages I/O threads, simplifying this complexity by demanding only that control flow be managed between events. There is a need to micromanage I/O threading; one simply designs an application to establish data availability points (callbacks) and the instructions to be executed once the said data is available. A single stream of instructions that explicitly takes and relinquishes control in a clear, collision-free, and predictable way aids development:

  • Instead of concerning themselves with arbitrary locking and other collisions, developers can focus on constructing execution chains, the ordering of which is predictable.

  • Parallelization is accomplished through the use of multiple processes, each with an individual and distinct memory space, due to which communication between processes remains uncomplicated—via the Rule of Simplicity, we achieve not only simple and bug-free components, but also easier interoperability.

  • The state is not (arbitrarily) shared between individual Node processes. A single process is automatically protected from surprise visits from other processes bent on memory reallocation or resource monopolization. Communication is through clear channels using basic protocols, all of which make it very hard to write programs that make unpredictable changes across processes.

  • Thread safety is one less concern for developers to waste time worrying about. Because single-threaded concurrency obviates the collisions present in multithreaded concurrency, development can proceed more quickly and on surer ground.

A single thread describing asynchronous control flow efficiently managed by an event loop brings stability, maintainability, readability, and resilience to Node programs. The big news is that Node continues to deliver the speed and power of multithreading to its developers—the brilliance of Node's design makes such power transparent, reflecting one part of Node's stated aim of bringing the most power to the most people with the least difficulty.

Events

Many JavaScript extensions in Node emit events. These are instances of events.EventEmitter. Any object can extend EventEmitter, which gives the developer an elegant toolkit to build tight, asynchronous interfaces to their object methods.

Work through this example demonstrating how to set an EventEmitter object as the prototype of a function constructor. As each constructed instance now has the EventEmitter object exposed to its prototype chain, this provides a natural reference to the event's Application Programming Interface (API). The counter instance methods can, therefore, emit events, and these can be listened for. Here, we emit the latest count whenever the counter.increment method is called and bind a callback to the "incremented" event, which simply prints the current counter value to the command line:

var EventEmitter = require('events').EventEmitter;
var util = require('util');

var Counter = function(init) {
  this.increment = function() {
    init++;
    this.emit('incremented', init);
  }
}

util.inherits(Counter, EventEmitter);

var counter = new Counter(10);

var callback = function(count) {
  console.log(count);
}
counter.addListener('incremented', callback);

counter.increment(); // 11
counter.increment(); // 12

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

To remove the event listeners bound to counter, use counter.removeListener('incremented', callback).

EventEmitter, as an extensible object, adds to the expressiveness of JavaScript. For example, it allows I/O data streams to be handled in an event-oriented manner in keeping with Node's principle of asynchronous, nonblocking programming:

var stream = require('stream');
var Readable = stream.Readable;
var util = require('util');

var Reader = function() {
  Readable.call(this);
  this.counter = 0;
}

util.inherits(Reader, Readable);

Reader.prototype._read = function() {
  if(++this.counter > 10) {
    return this.push(null);
  }
  this.push(this.counter.toString());
};

// When a #data event occurs, display the chunk.
//
var reader = new Reader();
reader.setEncoding('utf8');
reader.on('data', function(chunk) {
  console.log(chunk);
});
reader.on('end', function() {
  console.log('--finished--');
});

In this program, we have a Readable stream pushing out a set of numbers—with listeners on that stream's data event catching numbers as they are emitted and logging them—and finishing with a message when the stream has ended. It is plain that the listener is called once per number, which means that running this set did not block the event loop. Because Node's event loop need only commit resources to handling callbacks, many other instructions can be processed in the downtime of each event.

The event loop

The code seen in non-networked software is often synchronous or blocking. I/O operations in the following pseudo-code are also blocking:

variable = produceAValue()
print variable
// some value is output when #produceAValue is finished.

The following iterator will read one file at a time, dump its contents, and then read the next until it is done:

fileNames = ['a','b','c']
while(filename = fileNames.shift()) {
  fileContents = File.read(filename)
  print fileContents
}
//	> a
//	> b
//	> c

This is a fine model for many cases. However, what if these files are very large? If each takes 1 second to fetch, all will take 3 seconds to fetch. The retrieval on one file is always waiting on another retrieval to finish, which is inefficient and slow. Using Node, we can initiate file reads on all files simultaneously:

var fs = require('fs');
var fileNames = ['a','b','c'];
fileNames.forEach(function(filename) {
  fs.readFile(filename, {encoding:'utf8'}, function(err, content) {
    console.log(content);
  });
});
//	> b
//	> a
//	> c

The Node version will read all three files at once, each call to fs.readFile returning its result at some unknowable point in the future. This is why we can't always expect the files to be returned in the order they were arrayed. We can expect that all three will be returned in roughly the time it took for one to be retrieved—something less than 3 seconds. We have traded a predictable execution order for speed, and, as with threads, achieving synchronization in concurrent environments requires extra work. How do we manage and describe unpredictable data events so that our code is both easy to understand and efficient?

The key design choice made by Node's designers was the implementation of an event loop as a concurrency manager. The following description of event-driven programming (taken from http://www.princeton.edu/~achaney/tmve/wiki100k/docs/Event-driven_programming.html) clearly not only describes the event-driven paradigm, but also introduces us to how events are handled in Node and how JavaScript is an ideal language for such a paradigm:

"In computer programming, event-driven programming or event-based programming is a programming paradigm in which the flow of the program is determined by events—that is, sensor outputs or user actions (mouse clicks, key presses) or messages from other programs or threads.

Event-driven programming can also be defined as an application architecture technique in which the application has a main loop that is clearly divided down to two sections: the first is event selection (or event detection), and the second is event handling […]

Event-driven programs can be written in any language although the task is easier in languages that provide high-level abstractions, such as closures."

As we've seen in the preceding quote, single-threaded execution environments block and can, therefore, run slowly. V8 provides a single thread of execution for JavaScript programs.

How can this single thread be made more efficient?

Node makes a single thread more efficient by delegating many blocking operations to OS subsystems to process, bothering the main V8 thread only when there is data available for use. The main thread (your executing Node program) expresses interest in some data (such as via fs.readFile) by passing a callback and is notified when that data is available. Until that data arrives, no further burden is placed on V8's main JavaScript thread. How? Node delegates I/O work to libuv, as quoted at http://nikhilm.github.io/uvbook/basics.html#event-loops:

"In event-driven programming, an application expresses interest in certain events and responds to them when they occur. The responsibility of gathering events from the operating system or monitoring other sources of events is handled by libuv, and the user can register callbacks to be invoked when an event occurs."

The user in the preceding quote is the Node process executing a JavaScript program. Callbacks are JavaScript functions, and managing callback invocation for the user is accomplished by Node's event loop. Node manages a queue of I/O requests populated by libuv, which is responsible for polling the OS for I/O data events and handing off the results to JavaScript callbacks.

Consider the following code:

var fs = require('fs');
fs.readFile('foo.js', {encoding:'utf8'}, function(err, fileContents) {
  console.log('Then the contents are available', fileContents);
});
console.log('This happens first');

This program will result in the following output:

> This happens first
> Then the contents are available, [file contents shown]

Here's what Node does when executing this program:

  • Node loads the fs module. This provides access to fs.binding, which is a static type map defined in src/node.cc that provides glue between C++ and JS code. (https://groups.google.com/forum/#!msg/nodejs/R5fDzBr0eEk/lrCKaJX_6vIJ).

  • The fs.readFile method is passed instructions and a JavaScript callback. Through fs.binding, libuv is notified of the file read request and is passed a specially prepared version of the callback sent by the original program.

  • libuv invokes the OS-level functions necessary to read a file within its own thread pool.

  • The JavaScript program continues, printing This happens first. Because there is a callback outstanding, the event loop continues to spin, waiting for that callback to resolve.

  • When the file descriptor has been fully read by the OS, libuv (via internal mechanisms) is informed and the callback passed to libuv is invoked, which essentially prepares the original JavaScript callback for re-entrance into the main (V8) thread.

  • The original JavaScript callback is pushed onto the event loop queue and is invoked on the next tick of the loop.

  • The file contents are printed to the console.

  • As there are no further callbacks in flight, the process exits.

Here, we see the key ideas that Node implements to achieve fast, manageable, and scalable I/O. If, for example, there were 10 read calls made for 'foo.js' in the preceding program, the execution time would, nevertheless, remain roughly the same. Each call would have been made in parallel in its own thread within the libuv thread pool. Even though we wrote our code "in JavaScript", we are actually deploying a very efficient multithreaded execution engine while avoiding the difficulties of thread management.

Let's close with more details on how exactly libuv results are returned into the main thread's event loop.

When data becomes available on a socket or other stream interface, we cannot simply execute our callback immediately. JavaScript is single threaded, so results must be synchronized. We can't suddenly change the state in the middle of an event loop tick—this would create some of the classic multithreaded application problems of race conditions, memory access conflicts, and so on.

Upon entering an event loop, Node (in effect) makes a copy of the current instruction queue (also known as stack), empties the original queue, and executes its copy. The processing of this instruction queue is referred to as a tick. If libuv, asynchronously, receives results while the chain of instructions copied at the start of this tick are being processed on the single main thread (V8), these results (wrapped as callbacks) are queued. Once the current queue is emptied and its last instruction has completed, the queue is again checked for instructions to execute on the next tick. This pattern of checking and executing the queue will repeat (loop) until the queue is emptied, and no further data events are expected, at which point the Node process exits.

Note

This discussion at https://github.com/joyent/node/issues/5798 among some core Node developers about the process.nextTick and setImmediate implementations offers very precise information on how the event loop operates.

The following are the sorts of I/O events fed into the queue:

  • Execution blocks: These are blocks of JavaScript code comprising the Node program; they could be expressions, loops, functions, and so on. This includes EventEmitter events emitted within the current execution context.

  • Timers: These are callbacks deferred to a time in the future specified in milliseconds, such as setTimeout and setInterval.

  • I/O: These are prepared callbacks returned to the main thread after being delegated to Node's managed thread pool, such as filesystem calls and network listeners.

  • Deferred execution blocks: These are mainly the functions slotted on the stack according to the rules of setImmediate and process.nextTick.

There are two important things to remember:

  • You don't start and/or stop the event loop. The event loop starts as soon as a process starts and ends when no further callbacks remain to be performed. The event loop may, therefore, run forever.

  • The event loop executes on a single thread but delegates I/O operations to libuv, which manages a thread pool that parallelizes these operations, notifying the event loop when results are available. An easy-to-reason-about single-threaded programming model is reinforced with the efficiency of multithreading.

To learn more about how Node is bound to libuv and other core libraries, parse through the fs module code at https://github.com/joyent/node/blob/master/lib/fs.js. Compare the fs.read and the fs.readSync methods to observe the difference between how synchronous and asynchronous actions are implemented—note the wrapper callback that is passed to the native binding.read method in fs.read.

To take an even deeper dive into the very heart of Node's design, including the queue implementation, read through the Node source at https://github.com/joyent/node/tree/master/src. Follow MakeCallback within fs_event_wrap.cc and node.cc. Investigate the req_wrap class, a wrapper for the V8 engine, deployed in node_file.cc and elsewhere and defined in req_wrap.h.

 

The implications of Node's design on system architects


Node is a new technology. At the time of writing this, it has yet to reach its 1.0 version. Security flaws have been found and fixed. Memory leaks have been found and fixed. Eran Hammer, mentioned at the beginning of this chapter, and his entire team at Walmart Labs actively contribute to the Node codebase—in particular when they find flaws! This is true of many other large companies committed to Node, such as PayPal.

If you have chosen Node, and your application has grown to such a size that you feel you need to read a book on how to deploy Node, you have the opportunity to not only benefit from the community, but have a part, perhaps, in literally designing aspects of the environment based on your particular needs. Node is open source, and you can submit pull requests.

In addition to events, there are two key design aspects that are important to understand if you are going to do advanced Node work: build your systems out of small parts and use evented streams when piping data between them.

Building large systems out of small systems

In his book, The Art of Unix Programming, Eric Raymond proposed the Rule of Modularity:

"Developers should build a program out of simple parts connected by well-defined interfaces, so problems are local, and parts of the program can be replaced in future versions to support new features. This rule aims to save time on debugging complex code that is complex, long, and unreadable."

This idea of building complex systems out of "small pieces, loosely joined" is seen in management theory, theories of government, manufacturing, and many other contexts. In terms of software development, it advises developers to contribute only the simplest, most useful component necessary within a larger system. Large systems are hard to reason about, especially if the boundaries of their components are fuzzy.

One of the primary difficulties when constructing scalable JavaScript programs is the lack of a standard interface to assemble a coherent program out of many smaller ones. For example, a typical web application might load dependencies using a sequence of <script> tags in the <head> section of a HyperText Markup Language (HTML) document:

<head>
  <script src="fileA.js"></script>
  <script src="fileB.js"></script>
</head>

There are many problems with this sort of system:

  • All potential dependencies must be declared prior to their being needed—dynamic inclusion requires complicated hacks.

  • The introduced scripts are not forcibly encapsulated—nothing stops both files from writing to the same global object. Namespaces can easily collide, which makes arbitrary injection dangerous.

  • fileA cannot address fileB as a collection—an addressable context, such as fileB.method, isn't available.

  • The <script> method itself isn't systematic, precluding the design of useful module services, such as dependency awareness and version control.

  • Scripts cannot be easily removed or overridden.

  • Because of these dangers and difficulties, sharing is not effortless, thus diminishing opportunities for collaboration in an open ecosystem.

Ambivalently inserting unpredictable code fragments into an application frustrates attempts to predictably shape functionality. What is needed is a standard way to load and share discreet program modules.

Accordingly, Node introduced the concept of the package, following the CommonJS specification. A package is a collection of program files bundled with a manifest file describing the collection. Dependencies, authorship, purpose, structure, and other important metadata is exposed in a standard way. This encourages the construction of large systems from many small, interdependent systems. Perhaps, even more importantly, it encourages sharing:

 

"What I'm describing here is not a technical problem. It's a matter of people getting together and making a decision to step forward and start building up something bigger and cooler together."

 
 --Kevin Dangoor, creator of CommonJS

In many ways, the success of Node is due to the growth in the number and quality of packages available to the developer community that are distributed via Node's package management system, npm. This system has done much to help make JavaScript a viable, professional option for systems programming.

Note

A good introduction to npm for anyone new to Node can be found at: https://www.npmjs.org/doc/developers.html.

Streams

In his book, The C++ Programming Language, Third Edition, Bjarne Stoustrup states:

"Designing and implementing a general input/output facility for a programming language is notoriously difficult. […] An I/O facility should be easy, convenient, and safe to use; efficient and flexible; and, above all, complete."

It shouldn't surprise anyone that a design team focused on providing efficient and easy I/O has delivered such a facility through Node. Through a symmetrical and simple interface, which handles data buffers and stream events so that the implementer does not have to, Node's Stream module is the preferred way to manage asynchronous data streams for both internal modules and, hopefully, for the modules that developers will create.

Note

An excellent tutorial on the Stream module can be found at https://github.com/substack/stream-handbook. Also, the Node documentation is comprehensive at http://nodejs.org/api/stream.html.

A stream in Node is simply a sequence of bytes or, if you like, a sequence of characters. At any time, a stream contains a buffer of bytes, and this buffer has a length of zero or more.

Because each character in a stream is well defined, and because every type of digital data can be expressed in bytes, any part of a stream can be redirected, or piped, to any other stream, different chunks of the stream can be sent to different handlers. In this way, stream input and output interfaces are both flexible and predictable and can be easily coupled.

In addition to events, Node is distinctive for its comprehensive use of streams. Continuing the idea of composing applications out of many small processes emitting events or reacting to events, several Node I/O modules and features are implemented as streams. Network sockets, file readers and writers, stdin and stdout, Zlib, and so on, are all data producers and/or consumers that are easily connected through the abstract Stream interface. Those familiar with Unix pipes will see some similarities.

Five distinct base classes are exposed via the abstract Stream interface: Readable, Writable, Duplex, Transform, and PassThrough. Each base class inherits from EventEmitter, which we know to be an interface to which event listeners and emitters can be bound. Streams in Node are evented streams, and sending data between processes is commonly done using streams. Because streams can be easily chained and otherwise combined, they are fundamental tools for the Node developer.

It is recommended that you develop a clear understanding of what streams are and how they are implemented in Node before going further as we will use streams extensively throughout this book.

 

Using full-stack JavaScript to maximum effect


JavaScript has become a full-stack language. A native JavaScript runtime exists in all browsers. V8, the JavaScript interpreter used by Node, is the same engine powering Google's Chrome browser. And the language has gone even further than covering both the client and server layers of the software stack. JavaScript is used to query the CouchDB database, do map/reduce with MongoDB, and find data in ElasticSearch collections. The wildly popular JavaScript Object Notation (JSON) data format simply represents data as a JavaScript object.

When different languages are used within the same application, the cost of context switching goes up. If a system is composed of parts described in different languages, the system architecture becomes more difficult to describe, understand, and extend. If different parts of a system speak differently, every cross-dialect conversation will require expensive translation.

Inefficiencies in comprehension lead to larger costs and more brittle systems. The members of the engineering team for this system must each be fluent in these many languages or be grouped by different skill sets; engineers are expensive to find and/or train. When the inner workings of significant parts of a system become opaque to all but a few engineers, it is likely that cross-team collaboration will decrease, making product upgrades and additions more difficult and likely leading to more errors.

What new opportunities open up when these difficulties are reduced or eliminated?

Hot code

Because your clients and servers will speak the same language, each can pass code to be natively executed on the other. If you are building a web application, this opens up very interesting (and unique) opportunities.

For example, consider an application that allows one client to make changes to another's environment. This tool allows a software developer to make changes to the JavaScript powering a website and allows their clients to see those changes in real time in their browsers. What this application must do is transform live code in many browsers so that it reflects changes. One way to do this would be to capture a change set into a transform function, pass that function across to all connected clients, and have that function executed in their local environment, updating it to reflect the canonical view. One application evolves, it emits a genetic update in the code of JavaScript, and the rest of its species similarly evolves. We will use one such technology in Chapter 7, Deploying and Maintaining.

Since Node shares the same JavaScript code base, a Node server, on its own initiative, can take this action. The network itself can broadcast code for its clients to execute. Similarly, clients can send code to the server for execution. It is easy to see how this allows hot code pushes, where a Node process sends a unique packet of raw JavaScript to specific clients for execution.

When Remote Procedure Calls (RPC) no longer require a broker layer to translate between communicating contexts, code can exist anywhere in the network for as long or as brief a period as necessary and can execute in multiple contexts, which are chosen for reasons of load balancing, data awareness, computational power, geographic precision, and so on.

Browserify

JavaScript is the language common to Node and the browser. However, Node significantly extends the JavaScript language, adding many commands and other constructs that are not available to the client-side developer. For example, there is no equivalent of the core Node Stream module in JavaScript.

Additionally, the npm repository is rapidly growing, and, at the time of writing, contains more than 80,000 Node packages. Many of these packages are equally useful on the client as well as within Node. The spread of JavaScript to the server has, in effect, created two cooperating threads producing enterprise-grade JavaScript libraries and modules.

Browserify was developed to make it easy to share npm modules and core Node modules seamlessly with the client. Once a package has been browserified, it is easily imported into a browser environment using the standard <script> tag. Installing Browserify is simple:

npm install -g browserify

Let's build an example. Create a file, math.js, written as you would write an npm module:

module.exports = function() {
  this.add = function(a, b) {
    return a + b;
  }
  this.subtract = function(a, b) {
    return a - b;
  }
};

Next, create a program file, add.js, that uses this module:

var Math = require('./math.js');
var math = new Math;

console.log(math.add(1,3)); // 4

Executing this program using Node on the command line (> node add.js ) will result in 4 being printed to your terminal. What if we wanted to use our math module in the browser? Client-side JavaScript doesn't have a require statement, so we browserify it:

browserify math.js -o bundle.js

Browserify walks through your code, finding require statements and automatically bundling those dependencies (and the dependencies of those dependencies) into one file that you load into your client application:

<script src="bundle.js"></script>

As an added bonus, this bundle automatically introduces some useful Node globals to your browser environment: __filename, __dirname, process, Buffer, and global. This means you have, for example, process.nextTick available in the browser.

Note

The creator of Browserify, James Halliday, is a prolific contributor to the Node community. Visit him at https://github.com/substack. Also, there exists an online service for testing out browserified npm modules at http://requirebin.com. The full documentation can be found at https://github.com/substack/node-browserify#usage.

Another exciting project that, like Browserify, leverages Node to enhance the JavaScript available to browser-based JavaScript is Component. The authors describe it this way: Component is currently a stopgap for ES6 modules and Web Components. When all modern browsers start supporting these features, Component will begin focusing more on semantic versioning and server-side bundling as browsers would be able to handle the rest. The project is still in flux but worth a look. Here's the link: https://github.com/componentjs/guide.

 

Summary


In this chapter, we went on a whirlwind tour of Node. You learned something about why it is designed the way it is and why this event-driven environment is a good solution to modern problems in networked software. Having explained the event loop and the related ideas around concurrency and parallelism, we talked a bit about the Node philosophy of composing software from small pieces loosely joined. You learned about the special advantages that full-stack JavaScript provides and explored new possibilities of applications made possible because of them.

You now have a good understanding of the kind of applications we will be deploying, and this understanding will help you see the unique concerns and considerations faced when building and maintaining Node applications. In the next chapter, we'll dive right in with building servers with Node, options for hosting these applications, and ideas around building, packaging, and distributing them.

About the Author
  • Sandro Pasquali

    Sandro Pasquali formed a technology company named Simple in 1997, that sold the world's first JavaScript-based application development framework and was awarded several patents for deployment and advertising technologies that anticipated the future of Internet-based software. Node represents, for him, the natural next step in the inexorable march towards the day when JavaScript powers nearly every level of software development.

    Sandro has led the design of enterprise-grade applications for some of the largest companies in the world, including Nintendo, Major League Baseball, Bang and Olufsen, LimeWire, AppNexus, Conde Nast, and others. He has displayed interactive media exhibits during the Venice Biennial, won design awards, built knowledge management tools for research institutes and schools, and started and run several start-ups. Always seeking new ways to blend design excellence and technical innovation, he has made significant contributions across all levels of software architecture, from data management and storage tools to innovative user interfaces and frameworks.

    He is the author of Deploying Node.js, also by Packt Publishing, which aims to help developers get their work in front of others.

    Sandro runs a software development company in New York and trains corporate development teams interested in using Node and JavaScript to improve their products. He spends the rest of his time entertaining his beautiful daughter, and his wife.

    Browse publications by this author
Deploying Node.js
Unlock this book and the full library FREE for 7 days
Start now