Node Web Development - Second Edition

Chapter 1. About Node

Node is an exciting new platform for developing web applications, application servers, any sort of network server or client, and general purpose programming. It is designed for extreme scalability in networked applications through an ingenious combination of server-side JavaScript, asynchronous I/O, and asynchronous programming, and is built around JavaScript anonymous functions, and a single execution thread event-driven architecture.

The Node model is very different from common application server platforms that scale using threads. The claim is that, because of the event-driven architecture, the memory footprint is low, the throughput is high, the latency profile under load is better, and the programming model is simpler. The Node platform is in a phase of rapid growth, and many are seeing it as a compelling alternative to the traditional—Apache, Java, PHP, Python, Ruby on Rails, and so on—approach to building web applications.

At heart it is a standalone JavaScript virtual machine, with extensions making it suitable for general purpose programming, and with a clear focus on application server development. The Node platform isn't directly comparable to programming languages frequently used for developing web applications, and neither is it directly comparable to the containers which deliver the HTTP protocol to web clients (Apache httpd, Tomcat, GlassFish, and so on). At the same time, many regard it as potentially supplanting the traditional web application development stacks.

It is implemented around a non-blocking I/O event loop and a layer of file and network I/O libraries, all built on top of the V8 JavaScript engine (from the Chrome web browser). The I/O library is enough to implement any sort of server implementing any TCP or UDP protocol, such as, DNS, HTTP, IRC, or FTP. While it supports developing servers or clients for any network protocol, the biggest use case is in regular websites in place of a technology such as an Apache/PHP or Rails stack, or to complement existing websites. For example, adding real-time chat or monitoring to existing websites can be easily done with the Socket.IO library for Node.

This book will give you an introduction to Node. We presume that you already know how to write software, are familiar with JavaScript, and know something about developing web applications in other languages. We will dive right into developing working applications and recognize that often the best way to learn is by rummaging around in working code.

The capabilities of Node

Node is a platform for writing JavaScript applications outside web browsers. This is not the JavaScript we are familiar with in web browsers! There is no DOM built into Node, nor any other browser capability, but a DOM can be added using JSDom, and there are Node-based wrappers available for some browser engines. Between the JavaScript language and its asynchronous I/O framework, it is a powerful application development platform.

Beyond its native ability to execute JavaScript, the bundled modules provide the following capabilities:

Command-line tools (in shell script style)
Interactive-TTY style of program (REPL or Read-Eval-Print Loop)
Excellent process control functions to oversee child processes
A buffer object to deal with binary data
TCP or UDP sockets with comprehensive event-driven callbacks
DNS lookup
Layered on top of the TCP library is an HTTP and HTTPS client/server
File system access
Built-in rudimentary unit testing support through assertions

The network layer of Node is low level while being simple to use. For example, the HTTP modules allow you to write an HTTP server (or client) in a few lines of code. This is powerful, but it puts you, the programmer, very close to the protocol requests and makes you implement precisely which HTTP headers to return in request responses. Where a PHP programmer generally doesn't care about the headers, a Node programmer does.

In other words, it's very easy to write an HTTP server in Node, but the typical web application developer doesn't need to work at that level of detail. For example, PHP coders assume Apache is already there, and that they don't have to implement the HTTP server portion of the stack. The Node community has developed a wide range of web application frameworks, such as Connect, allowing developers to quickly configure an HTTP server that provides all of the basics we've come to expect—sessions, cookies, serving static files, logging, and so on—thus letting developers focus on their business logic.

Server-side JavaScript

Quit scratching your head already. Of course you're doing it, scratching your head and mumbling to yourself, "What's a browser language doing on the server?" In truth, JavaScript has a long and largely unknown history outside the browser. JavaScript is a programming language, just like any other language, and the better question to ask is "Why should JavaScript remain trapped inside browsers?"

Back in the dawn of the web age, the tools for writing web applications were at a fledgling stage. Some were experimenting with Perl or TCL to write CGI scripts, the PHP and Java languages had just been developed, and even JavaScript was being used on the server side. One early web application server was Netscape's LiveWire server, which used JavaScript. Some versions of Microsoft's ASP used JScript, their version of JavaScript. A more recent server-side JavaScript project is the RingoJS application framework in the Java universe. It's built on top of Rhino, a JavaScript implementation written in Java. All this means that JavaScript outside the browser is not a new thing, even if it is uncommon.

Node brings to the table a combination never seen before; namely, the coupling of fast event-driven I/O and a fast modern JavaScript engine such as Google's V8, the ultrafast JavaScript engine at the heart of Google's Chrome web browser.

Why should you use Node?

The JavaScript language is very popular due to its ubiquity in web browsers. It compares favorably against other languages while having many modern advanced language concepts. Thanks to its popularity, there is a deep talent pool of experienced JavaScript programmers out there.

JavaScript is a dynamic programming language with loosely typed and dynamically extendable objects, that can be informally declared as needed. Functions are a first class object routinely used as anonymous closures (nameless functions that can be passed around with ease). This makes JavaScript more powerful than some other languages commonly used for web applications. In theory these features make developers more productive.

There is a raging debate between dynamic and non-dynamic languages, or rather between statically typed and loosely typed ones. Loosely typed dynamic languages such as JavaScript are thought to give programmers more freedom to quickly write code. Proponents of strongly typed languages, such as Java, argue that the compiler helps to catch programming mistakes that are not caught in loosely typed languages. The debate is not settled, and may never be settled. The Node platform, by using JavaScript, of course sits in the loosely typed languages camp.

One of the main disadvantages of JavaScript is the global object. In a web page, all the top-level variables are tossed together in the global object, which can create an unruly chaos when mixing modules together. Since web applications tend to have lots of objects, probably coded by multiple organizations, one may think programming in Node would be a minefield of conflicting global objects. Instead, Node uses the CommonJS module system, meaning that variables local to a module are truly local to the module, even if they look like global variables. This clean separation between modules negates the global object problem.

Having the same programming language on the server and client has been a long-time dream on the web. This dream dates back to the early days of Java, where applets were to be the front end to server applications written in Java, and JavaScript was originally envisioned as a lightweight scripting language for applets. Something fell down along the way, and we ended up with JavaScript as the principle in-browser client-side language, rather than Java. With Node we may finally be able to implement applications with the same programming language on the client and server, by having JavaScript at both ends of the Web, in the browser and server.

A common language for the frontend and backend offers several potential wins:

The same programming staff can work on both ends of the wire
Code can be migrated between server and client more easily
Common data formats (JSON) between server and client
Common software tools for server and client
Common testing or quality reporting tools for server and client
When writing web applications, view templates can be used on both sides
A similar language between server and client teams could make for better communication among team members

Node facilitates implementing all these positive benefits (and more) with a compelling platform and development community.

Threaded versus asynchronous event-driven architecture

The asynchronous event-driven architecture of Node is said to be the cause of its blistering performance. Well, that and Chrome's V8 JavaScript engine. The normal application server model uses blocking I/O to retrieve data, and uses threads for concurrency. Blocking I/O causes threads to wait, causing churn between threads as they are forced to wait on I/O while the application server handles requests. Threads add complexity to the application server, as well as server overhead.

Node has a single execution thread with no waiting on I/O or context switching. Instead there is an event loop, looking for events and dispatching them to handler functions. The paradigm is to pass an anonymous function into any operation that will take time to complete. The handler function is invoked when the operation is complete, and in the meantime the event loop continues dispatching events.

This model is typical in GUI applications, as well as for JavaScript execution in a web browser. Like with those systems, event handler functions must quickly return to the event loop for dispatching the next immediately runnable task.

To help us wrap our heads around this, Ryan Dahl, the creator of Node, (in his "Cinco de Node" presentation) asked us what happens while executing a code like this:

result = query('SELECT * from db');
// operate on the result

Of course, the program pauses at that point while the database layer sends the query to the database, which determines the result, and returns the data. Depending on the query that pause can be quite long. Well, a few milliseconds, but that is an eon in computer time. This pause is bad because while the entire thread is idling, another request might come in, and for thread-based server architectures that means a thread context switch. The more outstanding connections to the server, the greater the number of thread context switches. Context switching is not free, because more threads requires more memory for per-thread state and more time for the CPU to spend on thread management overhead.

Simply by using asynchronous, event-driven I/O, Node removes most of this overhead while introducing very little of its own.

Frequently implementing concurrency with threads comes with admonitions like these: "expensive and error-prone", "the error-prone synchronization primitives of Java", or "designing concurrent software can be complex and error prone". The complexity comes from the access to shared variables and various strategies to avoid deadlock and competition between threads. The "synchronization primitives of Java" are an example of such a strategy, and obviously many programmers find them hard to use. There's the tendency to create frameworks such as java.util.concurrent to tame the complexity of threaded concurrency, but some might argue that papering over complexity does not make things simpler.

Node asks us to think differently about concurrency. Callbacks fired asynchronously from an event loop are a much simpler concurrency model; simpler to understand, and simpler to implement.

Ryan Dahl points to the relative access time of objects to understand the need for asynchronous I/O. Objects in memory are more quickly accessed (on the order of nanoseconds) than objects on disk or objects retrieved over the network (milliseconds or seconds). The longer access time for external objects is measured in zillions of clock cycles, which can be an eternity when your customer is sitting at their web browser ready to be bored and move on if it takes longer than two seconds to load the page.

In Node, the query discussed previously would read as follows:

query('SELECT * from db', function (err, result) {
  if (err) throw err; // handle errors
  // operate on result
});

This code performs the same query written earlier. The difference is that the query result is not the result of the function call, but is provided to a callback function that will be called later. The order of execution is not one line after another, but instead determined by the order of callback function execution.

In this example, the query function will return almost immediately to the event loop, which goes on to service other requests. One of those requests will be the response to the query, which invokes the callback function. Quickly returning to the event loop ensures higher server utilization. That's great for the owner of the server, but there's an even bigger gain which might help the user to experience quicker page content construction.

Commonly, web pages bring together data from dozens of sources. Each one has a query and response as discussed earlier. By using asynchronous queries each one can happen in parallel, where the page construction function can fire off dozens of queries—no waiting, each with their own callback—then go back to the event loop, invoking the callbacks as each is done. Because it's in parallel, the data can be collected much more quickly than if these queries were done synchronously, one at a time. Now the reader on their web browser is happier because the page loads more quickly.

Performance and utilization

Some of the excitement over Node is due to its throughput (requests per second it can serve). Comparative benchmarks of similar applications, for example, Apache and Node, show that Node has tremendous performance gains.

One benchmark going around is this simple HTTP server (borrowed from nodejs.org), which simply returns a "Hello World" message, directly from memory:

var http = require('http');
http.createServer(function (req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.end('Hello World\n');
}).listen(8124, "127.0.0.1");
console.log('Server running at http://127.0.0.1:8124/');

This is one of the simpler web servers one can build with Node. The http object encapsulates the HTTP protocol, and its http.createServer method creates a whole web server, listening on the port specified in the listen method. Every request (whether a GET or POST on any URL) on that web server calls the provided function. It is very simple and lightweight. In this case, regardless of the URL, it returns a simple text/plain Hello World response.

Because of its minimal nature, this simple application should demonstrate the maximum request throughput of Node. Indeed many have published benchmark studies starting from this simplest of HTTP servers.

Ryan Dahl (Node's original author) showed a simple benchmark (http://nodejs.org/cinco_de_node.pdf) which returned a 1 megabyte binary buffer; Node gave 822 req/sec while Nginx gave 708 req/sec, for a 15 percent improvement over Nginx. He also noted that Nginx peaked at 4 megabytes memory, while Node peaked at 64 megabytes.

Yahoo! search engineer Fabian Frank published a performance case study of a real-world search query suggestion widget implemented with Apache/PHP and two variants of Node stacks (http://www.slideshare.net/FabianFrankDe/nodejs-performance-case-study). The application is a pop-up panel showing search suggestions as the user types in phrases, using a JSON-based HTTP query. The Node version could handle eight times the number of requests/second with the same request latency. Fabian Frank said both Node stacks scaled linearly until CPU usage hit 100 percent. In another presentation (http://www.slideshare.net/FabianFrankDe/yahoo-scale-nodejs), he discussed how Yahoo! Axis is running on Manhattan + Mojito and the value of being able to use the same language (JavaScript) and framework (YUI/YQL) on both the frontend and backend.

LinkedIn did a massive overhaul of their mobile app, using Node for the server-side to replace an old Ruby on Rails app. The switch let them move from 30 servers down to three, and to merge the frontend and backend team because everything was written in JavaScript. Before choosing Node, they'd evaluated Rails with Event Machine, Python with Twisted, and Node, choosing Node for the reasons just given (http://arstechnica.com/information-technology/2012/10/a-behind-the-scenes-look-at-linkedins-mobile-engineering/).

Mikito Takada blogged about benchmarking and performance improvements in a "48 hour hackathon" application he built (http://blog.mixu.net/2011/01/17/performance-benchmarking-the-node-js-backend-of-our-48h-product-wehearvoices-net/) comparing Node with what he claims is a similar application written with Django (a web application framework for Python). The unoptimized Node version is quite a bit slower (response time) than the Django version but a few optimizations (MySQL connection pooling, caching, and so on) made drastic performance improvements, handily beating out Django.

A key realization about Node performance is the need to quickly return to the event loop. We go over this in Chapter 4, HTTP Servers and Clients – A Web Application's First Steps, in more detail, but if a callback handler takes too long to execute, it will prevent Node from being the blistering fast server it was designed to be. In one of Ryan Dahl's earliest blog posts about the Node project he discussed a requirement that event handlers execute within 5 ms. Most of the ideas in that post were never implemented, but Alex Payne wrote an intriguing blog post on this (http://al3x.net/2010/07/27/node.html), drawing a distinction between "scaling in the small" and "scaling in the large".

Small-scale web applications (scaling in the small) should have performance and implementation advantages when written for Node, instead of the "P" languages (Perl, PHP, Python, and so on) normally used. JavaScript is a powerful language, and the Node environment with its modern fast virtual machine design offers performance and concurrency advantages over interpreted languages such as PHP.

He goes on to argue that "scaling in the large", enterprise-level applications, will always be hard and complex. One typically throws in load balancers, caching servers, multiple redundant machines, in geographically dispersed locations, to serve millions of users from around the world with a fast web browsing experience. Perhaps the application development platform isn't as important as the whole system.

Is Node a cancerous scalability disaster?

In October 2011, software developer and blogger Ted Dziuba wrote an infamous blog post (since pulled from his blog) claiming that Node is a cancer, calling it a "scalability disaster." The example he showed as proof is a CPU-bound implementation of the Fibonacci sequence algorithm. While his argument was flawed, he raised a valid point that Node application developers have to consider. Where do you put the heavy computational tasks?

The Fibonacci sequence, serving as a stand-in for heavy computational tasks, quickly becomes computationally expensive to calculate, especially for a naïve implementation. The previous version of this book used an identical Fibonacci implementation, and was used to demonstrate why event handlers have to return quickly to the event loop.

var fibonacci = exports.fibonacci = function(n) {
    if (n === 1 || n === 2)
        return 1;
    else
        return fibonacci(n-1) + fibonacci(n-2);
}

Yes, there are many ways to calculate Fibonacci numbers more quickly. We are showing this as a general example of what happens to Node when event handlers are slow, and not to debate the best ways to calculate mathematics functions.

If you call this from the request handler in a Node HTTP server, for sufficiently large values of n (say, 40) the server becomes completely unresponsive because the event loop is not running, because this function is grinding through the calculation.

Does this mean Node is a flawed platform? No. It just means that the programmer must take care to identify code with long-running computations, and develop a solution. The possible solutions include rewriting the algorithm to work with the event loop, or to foist computationally expensive calculations to a backend server.

Additionally, there is nothing CPU intensive about retrieving data items from a database, plugging it into a template to send through the HTTP response. That's what typical web applications do after all.

With the Fibonacci algorithm, a simple rewrite dispatches the computations through the event loop, letting the server continue handling requests on the event loop. By using callbacks and closures (anonymous functions) we're able to maintain asynchronous I/O and concurrency promises.

var fibonacciAsync = exports.fibonacciAsync = function(n, done) {
    if (n === 1 || n === 2) done(1);
    else {
        process.nextTick(function() {
            fibonacciAsync(n-1, function(val1) {
                process.nextTick(function() {
                    fibonacciAsync(n-2, function(val2) {
                        done(val1+val2);
                    });
                });
            });
        });
    }
}

Dziuba's valid point wasn't expressed well in his blog post, and was somewhat lost in the flames following that post. His point was that while Node is a great platform for I/O-bound applications, it isn't a good platform for computationally intensive ones.

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com . If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Server utilization, the bottom line, and green web hosting

The striving for optimal efficiency (handling more requests/second) is not just about the geeky satisfaction that comes from optimization. There are real business and environmental benefits. Handling more requests per second, as Node servers can do, means the difference between buying lots of servers and buying only a few servers. Node can let your organization do more with less.

Roughly speaking, the more servers one buys, the greater the cost, and the greater the environmental impact. There's a whole field of expertise around reducing cost and environmental impact of running web server facilities, to which that rough guideline doesn't do justice. The goal is fairly obvious; fewer servers, lower costs, and lower environmental impact.

Intel's paper, Increasing Data Center Efficiency with Server Power Measurements (http://download.intelintel.com/it/pdf/Server_Power_Measurement_final.pdf), gives an objective framework for understanding efficiency and data center costs. There are many factors such as building, cooling system, and computer system design. Efficient building design, efficient cooling systems, and efficient computer systems (datacenter efficiency, datacenter density, and storage density) can decrease costs and environmental impact. But you can destroy those gains by deploying an inefficient software stack compelling you to buy more servers than if you had an efficient software stack. Alternatively you can amplify gains from datacenter efficiency with an efficient software stack.

This talk about efficient software stacks isn't just for altruistic environmental purposes. This is one of those cases where being green can help your business bottom line.

E.E. Feb 17, 2014

David Herron explains Node.js concepts clearly. The chapters are interactive, with step-by-step instructions that are helpful in understanding the concepts. Each focuses on a small project, and the later ones build off previous projects in a way that works well for this type of book.I've taken off one star only for the book's sometimes confusing mistakes. For instance, in the chapter on Socket.IO, Herron says to modify some code that's shown in a snippet. The only problem is that we haven't yet written the last third of the code to be able to modify it yet. The errors can usually be worked around without much hassle, but they can be frustrating for those following along.

Amazon Verified review

Node Web Development - Second Edition: JavaScript is no longer just for browsers and this exciting introduction to Node.js will show you how to build data-intensive applications that run in real time. Benefit from an easy, step-by-step approach that really works. , Second Edition

What do you get with Print?