Node Web Development

By David Herron
  • Instant online access to over 8,000+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies

About this book

Node is an exciting new technology stack that brings JavaScript to the server-side of web applications for the first time. Node means that JavaScript is no longer just for browsers. It's for web application development, it's for developing any internet protocol, it's for the real-time web, it's for command line scripts, and much more.

Node Web Development gives you an excellent starting point straight into the heart of developing server side web applications with node. You will learn, through practical examples, how to use the HTTP Server and Client objects, the Connect and Express application frameworks, the algorithms for asynchronous execution, and use both SQL and MongoDB databases.

This book is the ideal companion for getting started with Node. Starting with practical advice on installing Node for both development and application deployment, you will learn how to develop both HTTP Server and Client applications. Many different ways of working with Node are shown, including using database storage engines in applications and developing websites both with and without the Connect/Express web application framework. You will also get an introduction to Node’s CommonJS module system allowing you to implement an important subset of object-oriented design.

Publication date:
August 2011
Publisher
Packt
Pages
172
ISBN
9781849515146

 

Chapter 1. What is Node?

Node is an exciting new platform for developing web applications, application servers, any sort of network server or client, and general purpose programming. It is designed for extreme scalability in networked applications through an ingenious combination of asynchronous I/O, server-side JavaScript, smart use of JavaScript anonymous functions, and a single execution thread event-driven architecture.

The Node model is very different from common application server platforms that scale using threads. The claim is that, because of the event-driven architecture, memory footprint is low, throughput is high, and the programming model is simpler. The Node platform is in a phase of rapid growth, and many are seeing it as a compelling alternative to the traditional—Apache, PHP, Python, and so on—approach to building web applications.

At heart it is a standalone JavaScript virtual machine, with extensions making it suitable for general purpose programming, and with a clear focus on application server development. The Node platform isn't directly comparable to programming languages frequently used for developing web applications (PHP/Python/Ruby/Java/ and so on), neither is it directly comparable to the containers which deliver the HTTP protocol to web clients (Apache/Tomcat/Glassfish/ and so on). At the same time, many regard it as potentially supplanting the traditional web applications development stacks.

It is implemented around a non-blocking I/O event loop and a layer of file and network I/O libraries, all built on top of the V8 JavaScript engine (from the Chrome web browser). The I/O library is general enough to implement any sort of server implementing any TCP or UDP protocol, whether it's DNS, HTTP, IRC, FTP, and so on. While it supports developing servers or clients for any network protocol, the biggest use case is regular websites where you're replacing things like an Apache/PHP or Rails stack.

This book will give you an introduction to Node. We presume that you already know how to write software, are familiar with JavaScript, and know something about developing web applications in other languages. We will dive right into developing working applications and recognize that often the best way to learn is by rummaging around in working code.

 

What can you do with Node?


Node is a platform for writing JavaScript applications outside web browsers. This is not the JavaScript we are familiar with in web browsers. There is no DOM built into Node, nor any other browser capability. With the JavaScript language and the asynchronous I/O framework, it is a powerful application development platform.

One thing Node cannot do is desktop GUI applications. Today, there is no equivalent for Swing (or SWT if you prefer) built into Node, nor is there a Node add-on GUI toolkit, nor can it be embedded in a web browser. If a GUI toolkit were available Node could be used to build desktop applications. Some projects have begun to create GTK bindings for Node, which would provide a cross-platform GUI toolkit. The V8 engine used by Node brings along with it an extension API, allowing one to incorporate C/C++ code, to extend JavaScript or to integrate with native code libraries.

Beyond its native ability to execute JavaScript, the bundled modules provide capabilities of this sort:

  • Command-line tools (in shell script style)

  • Interactive-TTY style of program (REPL or Read-Eval-Print Loop)

  • Excellent process control functions to oversee child processes

  • A Buffer object to deal with binary data

  • TCP or UDP sockets with comprehensive event driven callbacks

  • DNS lookup

  • Layered on top of the TCP library is a HTTP and HTTPS client/server

  • File system access

  • Built-in rudimentary unit testing support through assertions

The network layer of Node is low level while being simple to use. For example, the HTTP modules allow you to write an HTTP server (or client) in a few lines of code, but that layer puts you, the programmer, very close to the protocol requests and makes you implement precisely which HTTP headers will be returned in responding to requests. Where a PHP programmer generally doesn't care about the headers, a Node programmer does.

In other words, it's very easy to write an HTTP server in Node, but the typical web application developer doesn't need to work at that level of detail. For example, PHP coders assume Apache is already there, and that they don't have to implement the HTTP server portion of the stack. The Node community has developed a wide range of web application frameworks like Connect, allowing developers to quickly configure an HTTP server that provides all of the basics we've come to expect—sessions, cookies, serving static files, logging, and so on—thus letting developers focus on their business logic.

Server-side JavaScript

Quit scratching your head already. Of course you're doing it, scratching your head and mumbling to yourself, "What's a browser language doing on the server?" In truth, JavaScript has a long and largely unknown history outside the browser. JavaScript is a programming language, just like any other language, and the better question to ask is "Why should JavaScript remain trapped inside browsers?"

Back in the dawn of the Web age, the tools for writing web applications were at a fledgling stage. Some were experimenting with Perl or TCL to write CGI scripts, the PHP and Java languages had just been developed, and even JavaScript was being used in the server side. One early web application server was Netscape's LiveWire server, which used JavaScript. Some versions of Microsoft's ASP used JScript, their version of JavaScript. A more recent server-side JavaScript project is the RingoJS application framework in the Java universe. It is built on top of Rhino, a JavaScript implementation written in Java.

Node brings to the table a combination never seen before. Namely, the coupling of fast event-driven I/O and a fast JavaScript engine like V8, the ultra fast JavaScript engine at the heart of Google's Chrome web browser.

 

Why should you use Node?


The JavaScript language is very popular due to its ubiquity in web browsers. It compares favorably against other languages while having many modern advanced language concepts. Thanks to its popularity there is a deep talent pool of experienced JavaScript programmers out there.

It is a dynamic programming language with loosely typed and dynamically extendable objects, that can be informally declared as needed. Functions are a first class object routinely used as anonymous closures. This makes JavaScript more powerful than some other languages commonly used for web applications. In theory these features make developers more productive. To be fair, the debate between dynamic and non-dynamic languages, or between statically typed and loosely typed, is not settled and may never be settled.

One of the main disadvantages of JavaScript is the Global Object. All of the top-level variables are tossed together in the Global Object, which can create an unruly chaos when mixing modules together. Since web applications tend to have lot of objects, probably coded by multiple organizations, one may think programming in Node will be a minefield of conflicting global objects. Instead, Node uses the CommonJS module system, meaning that variables local to a module are truly local to the module, even if they look like global variables. This clean separation between modules prevents the Global Object problem from being a problem.

Having the same programming language on server and client has been a long-time dream on the Web. This dream dates back to the early days of Java, where Applets were to be the frontend to server applications written in Java, and JavaScript was originally envisioned as a lightweight scripting language for Applets. Something fell down along the way, and we ended up with JavaScript as the principle in browser client-side language, rather than Java. With Node we may finally be able to implement that dream of the same programming language on client and server, with JavaScript at both ends of the Web, in the browser and server.

A common language for frontend and backend offers several potential wins:

  • The same programming staff can work on both ends of the wire

  • Code can be migrated between server and client more easily

  • Common data formats (JSON) between server and client

  • Common software tools for server and client

  • Common testing or quality reporting tools for server and client

  • When writing web applications, view templates can be used on both sides

  • Similar languaging between server and client teams

Node facilitates implementing all these positive benefits (and more) with a compelling platform and development community.

Architecture: Threads versus asynchronous event-driven

The asynchronous event-driven architecture of Node is said to be the cause of its blistering performance. Well, that and the V8 JavaScript engine. The normal application server model uses blocking I/O and threads for concurrency. Blocking I/O causes threads to wait, causing churn between threads as they are forced to wait on I/O while the application server handles requests.

Node has a single execution thread with no waiting on I/O or context switching. Instead, I/O calls set up request handling functions that work with the event loop to dispatch events when some things becomes available. The event loop and event handler model is common, such as JavaScript execution in a web browser. Program execution is expected to quickly return to the event loop for dispatching the next immediately runnable task.

To help us wrap our heads around this, Ryan Dahl (in his "Cinco de Node" presentation) asked us what happens while executing a code like this:

result = query('SELECT * from db');

Of course, the program pauses at that point while the database layer sends the query to the database, which determines the result, and returns the data. Depending on the query that pause can be quite long. This is bad because while the entire thread is idling another request might come in, and if all the threads are busy (remember computers have finite resources) it will be dropped. Looks like quite a waste. Context switching is not free either, the more threads we use the more time the CPU spends in storing and restoring the state. Furthermore, the execution stack for each thread takes up memory. Simply by using asynchronous, event-driven I/O, Node removes most of this overhead while introducing very little on its own.

Frequently the implementation of concurrency with threads comes with admonitions like these: "expensive and error-prone", "the error-prone synchronization primitives of Java", or "designing concurrent software can be complex and error-prone" (actual quotes from actual search engine results). The complexity comes from the access to shared variables and various strategies to avoid deadlock and competition between threads. The "synchronization primitives of Java" are an example of such a strategy, and obviously many programmers find them hard to use; and then there's the tendency to create frameworks like java.util.concurrent to tame the complexity of threaded concurrency, but some might argue that papering over complexity does not make things simpler.

Node asks us to think differently about concurrency. Callbacks fired asynchronously from an event loop are a much simpler concurrency model, simpler to understand, and simpler to implement.

Ryan Dahl points to the relative access time of objects to understand the need for asynchronous I/O. Objects in memory are more quickly accessed (on the order of nanoseconds) than objects on disk or objects retrieved over the network (milliseconds or seconds). The longer access time for external objects is measured in the zillions of clock cycles, which can be an eternity when your customer is sitting at their web browser ready to be bored and move on if it takes longer than two seconds to load the page.

In Node, the query discussed previously would read like the following:

query('SELECT * from db', function (result) {
  // operate on result
});

This code makes the same query written earlier. The difference is that the query result is not the result of the function call, but is provided to a callback function that will be called later. What happens is that this will return almost immediately to the event loop, and the server can go on to servicing other requests. One of those requests will be the response to the query and it will invoke the callback function. This model of quickly returning to the event loop ensures higher server utilization. That's great for the owner of the server, but there's an even bigger gain which might help the user to experience more quickly constructing page content.

Commonly web pages bring together data from dozens of sources. Each one has a query and response as discussed earlier. By using asynchronous queries each one can happen in parallel, where the page construction function can fire off dozens of queries—no waiting, each with their own callback—then go back to the event loop, invoking the callbacks as each is done. Because it's in parallel the data can be collected much more quickly than if these queries were done synchronously one at a time. Now the reader on their web browser is happier because the page loads more quickly.

Performance and utilization

Some of the excitement over Node is due to its throughput (requests per second it can serve). Comparative benchmarks of similar applications, for example, Apache and Node, show it having tremendous performance gains.

One benchmark going around is this simple HTTP server, which simply returns a "Hello World" message, directly from memory:

var http = require('http');
http.createServer(function (req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.end('Hello World\n');
}).listen(8124, "127.0.0.1");
console.log('Server running at http://127.0.0.1:8124/');

This is one of the simpler web servers one can build with Node. The http object encapsulates the HTTP protocol and its http.createServer method creates a whole web server, listening on the port specified in the .listen method. Every request (whether a GET or PUT on any URL) on that web server calls the provided function. It is very simple and lightweight. In this case, regardless of the URL, it returns a simple text/plain "Hello World" response.

Because of its minimal nature, this simple application should demonstrate the maximum request throughput of Node. Indeed many have published benchmark studies starting from this simplest of HTTP servers.

Ryan Dahl (Node's original author) showed a simple benchmark (http://nodejs.org/cinco_de_node.pdf) which returned a 1 megabyte binary buffer; Node gave 822 req/sec, while nginx gave 708 req/sec. He also noted that nginx peaked at 4 megabytes memory, while Node peaked at 64 megabytes.

Dustin McQuay (http://www.synchrosinteractive.com/blog/9-nodejs/22-nodejs-has-a-bright-future) showed what he claimed were similar Node and PHP/Apache programs:

  • PHP/Apache 3187 requests/second

  • Node.js 5569 requests/second

Hannes Wallnöfer, the author of RingoJS, wrote a blog post in which he cautioned against making important decisions based on benchmarks (http://hns.github.com/2010/09/21/benchmark.html), and then went on to use benchmarks to compare RingoJS with Node. RingoJS is an app server built around the Rhino JavaScript engine for Java. Depending on the scenario, the performance of RingoJS and Node is not so far apart. The findings show that on applications with rapid buffer or string allocation, Node performs worse than RingoJS. In a later blog post (http://hns.github.com/2010/09/29/benchmark2.html) he used a JSON string parsing workload to simulate a common task, and found RingoJS to be much better.

Mikito Takada blogged about benchmarking and performance improvements in a "48 hour hackathon" application he built (http://blog.mixu.net/2011/01/17/performance-benchmarking-the-node-js-backend-of-our-48h-product-wehearvoices-net/) comparing Node with what he claims is a similar application written with Django. The unoptimized Node version is quite a bit slower (response time) than the Django version but a few optimizations (MySQL connection pooling, caching, and so on) made drastic performance improvements handily beating out Django. The final performance graph shows achieving nearly the requests/second rate of the simple "Hello World" benchmark discussed earlier.

A key realization about Node performance is the need to quickly return to the event loop. We go over this in Chapter 4, Variations on a Simple Application in more detail, but if a callback handler takes "too long" to execute, it will prevent Node from being the blistering fast server it was designed to be. In one of Ryan Dahl's earliest blog posts about the Node project (four.livejournal.com/963421.html) he discussed a requirement that event handlers execute within 5ms. Most of the ideas in that post were never implemented, but Alex Payne wrote an intriguing blog post on this, (http://al3x.net/2010/07/27/node.html) drawing a distinction between "scaling in the small" and "scaling in the large".

Small-scale web applications ("scaling in the small") should have performance and implementation advantages when written for Node instead of the 'P' languages (Perl, PHP, Python, and so on) normally used. JavaScript is a powerful language, and the Node environment with its modern fast virtual machine design offers performance and concurrency advantages over interpreted languages like PHP.

He goes on to argue that "scaling in the large", enterprise-level applications, will always be hard and complex. One typically throws in load balancers, caching servers, multiple redundant machines, in geographically dispersed locations, to serve zillions of users from around the world with a fast web browsing experience. Perhaps the application development platform isn't so important as the whole system.

We won't know how well Node really fits in until it sees real long-term deployment in significant production environments.

Server utilization, the bottom line, and green web hosting

The striving for optimal efficiency (handling more requests/second) is not just about the geeky satisfaction that comes from optimization. There are real business and environmental benefits. Handling more requests per second, as Node servers can do, means the difference between buying lots of servers and buying only a few servers. Essentially the advantage is in doing more with less.

Roughly speaking, the more servers one buys, the greater the cost, and the greater the environmental impact, and likewise buying fewer servers means lower cost and lower environmental impact. There's a whole field of expertise around reducing cost and environmental impact of running web server facilities, which that rough guideline doesn't do justice to. The goal is fairly obvious, fewer servers, lower costs, and lower environmental impact.

Intel's paper "Increasing Data Center Efficiency with Server Power Measurements" (http://download.intel.com/it/pdf/Server_Power_Measurement_final.pdf) gives an objective framework for understanding efficiency and data center costs. There are many factors such as building, cooling system, and computer system design. Efficient building design, efficient cooling systems, and efficient computer systems (Datacenter Efficiency, Datacenter Density, and Storage Density) can decrease costs and environmental impact. But you can destroy those gains by deploying an inefficient software stack which compels you to buy more servers than if you had an efficient software stack, or you can amplify gains from datacenter efficiency with an efficient software stack.

 

Spelling: Node, Node.js, or Node.JS?


The name of the platform is Node.js but throughout this book we are spelling it as Node because we are following a cue from the nodejs.org website, which says the trademark is Node.js (lower case .js) but throughout the site they spell it as Node. We are doing the same in this book.

 

Summary


We've learned a lot in this chapter, specifically:

  • That JavaScript has a life outside web browsers

  • The difference between asynchronous and blocking I/O

  • A look at Node

  • Node performance

Now that we've had this introduction to Node we're ready to dive in and start using it. In Chapter 2, Setting up Node we'll go over setting up a Node environment, so let's get started.

About the Author

  • David Herron

    David Herron is a software engineer living in Silicon Valley who has worked on projects ranging from an X.400 email server to being part of the team that launched the OpenJDK project, to Yahoo's Node.js application-hosting platform, and a solar array performance monitoring service. That took David through several companies until he grew tired of communicating primarily with machines, and developed a longing for human communication. Today, David is an independent writer of books and blog posts covering topics related to technology, programming, electric vehicles, and clean energy technologies.

    Browse publications by this author
Book Title
Access this book, plus 8,000 other titles for FREE
Access now