Exploring Performance Issues in Node.js/Express Applications

Node.js is an exciting new platform for developing web applications, application servers, any sort of network server or client, and general purpose programming. It is designed for extreme scalability in networked applications through an ingenious combination of server-side JavaScript, asynchronous I/O, asynchronous programming, built around JavaScript anonymous functions, and a single execution thread event-driven architecture.

Companies—large and small—are adopting Node.js, for example, PayPal is one of the companies converting its application stack over to Node.js. An up-and-coming leading application model, the MEAN stack, combines MongoDB (or MySQL) with Express, AngularJS and, of course, Node.js. A look through current job listings demonstrates how important the MEAN stack and Node.js in general have become.

It's claimed that Node.js is a lean, low-overhead, software platform. The excellent performance is supposedly because Node.js eschews the accepted wisdom of more traditional platforms, such as JavaEE and its complexity. Instead of relying on a thread-oriented architecture to fill the CPU cores of the server, Node.js has a simple single-threaded architecture, avoiding the overhead and complexity of threads.

Using threads to implement concurrency often comes with admonitions like these: expensive and error-prone, the error-prone synchronization primitives of Java, or designing concurrent software can be complex and error prone. The complexity comes from the access to shared variables and various strategies to avoid deadlock and competition between threads. The synchronization primitives of Java are an example of such a strategy, and obviously many programmers find them hard to use. There's the tendency to create frameworks, such as java.util.concurrent, to tame the complexity of threaded concurrency, but some might argue that papering over complexity does not make things simpler.

Adopting Node.js is not a magic wand that will instantly make performance problems disappear forever. The development team must approach this intelligently, or else, you'll end up with one core on the server running flat out and the other cores twiddling their thumbs. Your manager will want to know how you're going to fully utilize the server hardware. And, because Node.js is single-threaded, your code must return from event handlers quickly, or else, your application will be frequently blocked and will provide poor performance. Your manager will want to know how you'll deliver the promised high transaction rate.

In this article by David Herron, author of the book Node.JS Web Development - Third Edition, we will explore this issue. We'll write a program with an artificially heavy computational load. The naive Fibonacci function we'll use is elegantly simple, but is extremely recursive and can take a long time to compute.

(For more resources related to this topic, see here.)

Node.js installation

Before launching into writing code, we need to install Node.js on our laptop. Proceed to the Node.js downloads page by going to http://nodejs.org/ and clicking on the downloads link.

It's preferable if you can install Node.js from the package management system for your computer. While the Downloads page offers precompiled binary Node.js packages for popular computer systems (Windows, Mac OS X, Linux, and so on), installing from the package management system makes it easier to update the install later. The Downloads page has a link to instructions for using package management systems to install Node.js.

Once you've installed Node.js, you can quickly test it by running a couple of commands:

$ node –help

Prints out helpful information about using the Node.js command-line tool:

$ npm help

Npm is the default package management system for Node.js, and is automatically installed along with Node.js. It lets us download Node.js packages from over the Internet, using them as the building blocks for our applications.

Next, let's create a directory to develop an Express application within it to calculate Fibonacci numbers:

$ mkdir fibonacci
$ cd fibonacci
$ npm install express-generator@4.x
$ ./node_modules/.bin/express . --ejs
$ npm install

The application will be written against the current Express version, version 4.x. Specifying the version number this way makes sure of compatibility.

The express command generated for us a starting application. You can inspect the package.json file to see what will be installed, and the last command installs those packages. What we'll have in front of us is a minimal Express application.

Our first stop is not to create an Express application, but to gather some basic data about computation-dominant code in Node.js.

Heavy-weight computation

Let's start the exploration by creating a Node.js module namedmath.js, containing:

var fibonacci = exports.fibonacci = function(n) {
  if (n === 1) return 1;
  else if (n === 2) return 1;
  else return fibonacci(n-1) + fibonacci(n-2);
}

Then, create another file namedfibotimes.js containing this:

var math = require('./math');
var util = require('util');
for (var num = 1; num < 80; num++) {
  util.log('Fibonacci for '+ num +' = '+ math.fibonacci(num));
}

Running this script produces the following output:

$ node fibotimes.js
31 Jan 14:41:28 - Fibonacci for 1 = 1
31 Jan 14:41:28 - Fibonacci for 2 = 1
31 Jan 14:41:28 - Fibonacci for 3 = 2
31 Jan 14:41:28 - Fibonacci for 4 = 3
31 Jan 14:41:28 - Fibonacci for 5 = 5
31 Jan 14:41:28 - Fibonacci for 6 = 8
31 Jan 14:41:28 - Fibonacci for 7 = 13
…
31 Jan 14:42:27 - Fibonacci for 38 = 39088169
31 Jan 14:42:28 - Fibonacci for 39 = 63245986
31 Jan 14:42:31 - Fibonacci for 40 = 102334155
31 Jan 14:42:34 - Fibonacci for 41 = 165580141
31 Jan 14:42:40 - Fibonacci for 42 = 267914296
31 Jan 14:42:50 - Fibonacci for 43 = 433494437
31 Jan 14:43:06 - Fibonacci for 44 = 701408733

This quickly calculates the first 40 or so members of the Fibonacci sequence. After the 40th member, it starts taking a couple seconds per result and quickly degrades from there. It isuntenable to execute code of this sort on a single-threaded system that relies on a quick return to the event loop.

That's an important point because the Node.js design requires that event handlers quickly return to the event loop. The single-thread event-loop does everything in Node.js and event handlers that return quickly to the event loop keep it humming. A correctly written application can sustain a tremendous request throughput, but a badly written application can prevent Node.js from fulfilling that promise.

This Fibonacci function demonstrates algorithms that churn away at their calculation without ever letting Node.js process the event loop. Calculating fibonacci(44) requires 16 seconds of calculation, which is an eternity for a modern web service. With any server that's bogged down like this, not processing events, the perceived performance is zilch. Your manager will be rightfully angry.

This is a completely artificial example, because it's trivial to refactor the Fibonacci calculation for excellent performance. This is a stand-in for any algorithm that might monopolize the event loop.

There are two general ways in Node.js to solve this problem:

  • Algorithmic refactoring: Perhaps, like the Fibonacci function we chose, one of your algorithms is suboptimal and can be rewritten to be faster. Or, if not faster, it can be split into callbacks dispatched through the event loop. We'll look at one such method in a moment.
  • Creating a backend service: Can you imagine a backend server dedicated to calculating Fibonacci numbers? Okay, maybe not, but it's quite common to implement backend servers to offload work from frontend servers, and we will implement a backend Fibonacci server at the end of this article.

But first, we need to set up an Express application that demonstrates the impact on the event loop.

An Express app to calculate Fibonacci numbers

To see the impact of a computation-heavy application on Node.js performance, let's write a simple Express application to do Fibonacci calculations. Express is a key Node.js technology, so this will also give you a little exposure to writing an Express application.

We've already created the blank application, so let's make a couple of small changes, so it uses our Fibonacci algorithm.

Edit views/index.ejs to have this code:

<!DOCTYPE html>
<html>
<head>
<title><%= title %></title>
<link rel='stylesheet' href='/stylesheets/style.css' />
</head>
<body>
<h1><%= title %></h1>
<% if (typeof fiboval !== "undefined") { %>
<p>Fibonacci for <%= fibonum %> is <%= fiboval %></p>
<hr/>
<% } %>
<p>Enter a number to see its' Fibonacci number</p>
<form name='fibonacci' action='/' method='get'>
<input type='text' name='fibonum' />
<input type='submit' value='Submit' />
</form>
</body>
</html>

This simple template sets up an HTML form where we can enter a number. This number designates the desired member of the Fibonacci sequences to calculate.

This is written for the EJS template engine. You can see that <%= variable %> substitutes the named variable into the output, and JavaScript code is written in the template by enclosing it within <% %> delimiters. We use that to optionally print out the requested Fibonacci value if one is available.

var express = require('express');
var router = express.Router();
var math = require('../math');
router.get('/', function(req, res, next) {
  if (req.query.fibonum) {
    res.render('index', {
      title: "Fibonacci Calculator",
      fibonum: req.query.fibonum,
      fiboval: math.fibonacci(req.query.fibonum)
    });
  } else {
    res.render('index', {
      title: "Fibonacci Calculator", fiboval: undefined
    });
  }
});
module.exports = router;

This router definition handles the home page for the Fibonacci calculator. The router.get function means this route handles HTTP GET operations on the / URL.

If the req.query.fibonum value is set, that means the URL had a ?fibonum=# value which would be produced by the form in index.ejs. If that's the case, the fiboval value is calculated by calling math.fibonacci, the function we showed earlier.

By using that function, we can safely predict ahead performance problems when requesting larger Fibonacci values.

On the res.render calls, the second argument is an object defining variables that will be made available to the index.ejs template. Notice how the two res.render calls differ in the values passed to the template, and how the template will differ as a result.

There are no changes required in app.js. You can study that file, and bin/www, if you're curious how Express applications work. In the meantime, you run it simply:

$ npm start

> fibonacci@0.0.0 start /Users/david/fibonacci
> node ./bin/www

And this is what it'll look like in the browser—at http://localhost:3000:

Node.JS Web Development - Third Edition

For small Fibonacci values, the result will return quickly. As implied by the timing results we looked at earlier, at around the 40th Fibonacci number, it'll take a few seconds to calculate the result. The 50th Fibonacci number will take 20 minutes or so. That's enough time to run a little experiment.

Open two browser windows onto http://localhost:3000.

You'll see the Fibonacci calculator in each window. In one, request the value for 45 or more. In the other, enter 10 that, in normal circumstances, we know would return almost immediately. Instead, the second window won't respond until the first one finishes. Unless, that is, your browser times out and throws an error.

What's happening is the Node.js event loop is blocked from processing events because the Fibonacci algorithm is running and does not ever yield to the event loop. As soon as the Fibonacci calculation finishes, the event loop starts being processed again. It then receives and processes the request made from the second window.

Algorithmic refactoring

The problem here is the applications that stop processing events. We might solve the problem by ensuring events are handled while still performing calculations. In other words, let's look at algorithmic refactoring.

To prove that we have an artificial problem on our hands, add this function to math.js:

var fibonacciLoop = exports.fibonacciLoop = function(n) {
    var fibos = [];
    fibos[0] = 0;
    fibos[1] = 1;
    fibos[2] = 1;
    for (var i = 3; i <= n; i++) {
        fibos[i] = fibos[i-2] + fibos[i-1];
    }
    return fibos[n];
}

Change fibotimes.js to call this function, and the Fibonacci values will fly by so fast your head will spin.

Some algorithms aren't so simple to optimize as this. For such a case, it is possible to divide the calculation into chunks and then dispatch the computation of those chunks through the event loop. Consider the following code:

var fibonacciAsync = exports.fibonacciAsync = function(n, done) {
    if (n === 0) done(undefined, 0);
    else if (n === 1 || n === 2) done(undefined, 1);
    else {
        setImmediate(function() {
            fibonacciAsync(n-1, function(err, val1) {
                if (err) done(err);
                else setImmediate(function() {
                    fibonacciAsync(n-2, function(err, val2) {
                        if (err) done(err);
                        else done(undefined, val1+val2);
                    });
                });
            });
        });
    }
};

This converts the fibonacci function from a synchronous function to an asynchronous function one with a callback. By using setImmediate, each stage of the calculation is managed through Node.js's event loop, and the server can easily handle other requests while churning away on a calculation. It does nothing to reduce the computation required; this is still the silly inefficient Fibonacci algorithm. All we've done is spread the computation through the event loop.

To use this new Fibonacci function, we need to change the router function in routes/index.js to the following:

exports.index = function(req, res) {
  if (req.query.fibonum) {
    math.fibonacciAsync(req.query.fibonum, function(err,fiboval){
      res.render('index', {
        title: "Fibonacci Calculator",
        fibonum: req.query.fibonum, fiboval: fiboval
      });
    });
  } else {
    res.render('index', {
      title: "Fibonacci Calculator", fiboval: undefined
    });
  }
};

This makes an asynchronous call to fibonacciAsync, and when the calculation finishes, the result is sent to the browser.

With this change, the server no longer freezes when calculating a large Fibonacci number. The calculation, of course, still takes a long time, because fibonacciAsync is still an inefficient algorithm. At least, other users of the application aren't blocked, because it regularly yields to the event loop.

Repeat the same test used earlier. Open two or more browser windows to the Fibonacci calculator, make a large request in one window, and the requests in the other window will be promptly answered.

Creating a backend REST service

The next way to mitigate computationally intensive code is to push the calculation to a backend process. To do that, we'll request computations from a backend Fibonacci server.

While Express has a powerful templating system, making it suitable for delivering HTML web pages to browsers, it can also be used to implement a simple REST service. Express supports parameterized URL's in route definitions, so it can easily receive REST API arguments, and Express makes it easy to return data encoded in JSON.

Create a file named fiboserver.js containing this code:

var math = require('./math');
var express = require('express');
var logger = require('morgan');
var util   = require('util');
var app = express();
app.use(logger('dev'));
app.get('/fibonacci/:n', function(req, res, next) {
    math.fibonacciAsync(Math.floor(req.params.n),
    function(err, val) {
        if (err) next('FIBO SERVER ERROR ' + err);
        else {
            util.log(req.params.n +': '+ val);
            res.send({ n: req.params.n, result: val });
        }
    });
});
app.listen(3333);

This is a stripped down Express application that gets right to the point of providing a Fibonacci calculation service. The one route it supports does the Fibonacci computation using the same fibonacciAsync function used earlier.

The res.send function is a flexible way to send data responses. As used here, it automatically detects the object, formats it as JSON text, and sends it with the correct content-type.

Then, in package.json, add this to the scripts section:

"server": "node ./fiboserver"

Now, let's run it:

$ npm run server

> fibonacci@0.0.0 server /Users/david/fibonacci
> node ./fiboserver

Then, in a separate command window, use curl to request values from this service.

$ curl -f http://localhost:3002/fibonacci/10
{"n":"10","result":55}

Over in the window, where the service is running, we'll see a log of GET requests and how long each took to process.

It's easy to create a small Node.js script to directly call this REST service. But let's instead move directly to changing our Fibonacci calculator application to do so. Make this change to routes/index.js:

router.get('/', function(req, res, next) {
  if (req.query.fibonum) {
    var httpreq = require('http').request({
      method: 'GET', host: "localhost", port: 3333,
      path: "/fibonacci/"+Math.floor(req.query.fibonum)
    }, function(httpresp) {
      httpresp.on('data', function(chunk) {
        var data = JSON.parse(chunk);
        res.render('index', {
          title: "Fibonacci Calculator",
          fibonum: req.query.fibonum, fiboval: data.result
        });
      });
      httpresp.on('error', function(err) { next(err); });
    });
    httpreq.on('error', function(err) { next(err); });
    httpreq.end();
  } else {
    res.render('index', {
      title: "Fibonacci Calculator", fiboval: undefined
    });
  }
});

Running the Fibonacci Calculator service now requires starting both processes. In one command window, we run:

$ npm run server

And in the other command window:

$ npm start

In the browser, we visit http://localhost:3000 and see what looks like the same application, because no changes were made to views/index.ejs. As you make requests in the browser window, the Fibonacci service window prints a log of requests it receives and values it sent.

You can, of course, repeat the same experiment as before. Open two browser windows, in one window request a large Fibonacci number, and in the other make smaller requests. You'll see, because the server uses fibonacciAsync, that it's able to respond to every request.

Why did we go through this trouble when we could just directly call fibonacciAsync?

We can now push the CPU load for this heavy-weight calculation to a separate server. Doing so would preserve CPU capacity on the frontend server, so it can attend to web browsers. The heavy computation can be kept separate, and you could even deploy a cluster of backend servers sitting behind a load balancer evenly distributing requests. Decisions like this are made all the time to create multitier systems.

What we've demonstrated is that it's possible to implement simple multitier REST services in a few lines of Node.js and Express.

Summary

While the Fibonacci algorithm we chose is artificially inefficient, it gave us an opportunity to explore common strategies to mitigate performance problems in Node.js.

Optimizing the performance of our systems is as important as correctness, fixing bugs, mobile friendliness, and usability. Inefficient algorithms means having to deploy more hardware to satisfy load requirements, costing more money, and creating a bigger environmental impact.

For real-world applications, optimizing away performance problems won't be as easy as it would be for the Fibonacci calculator. We could have just used the fibonacciLoop function, since it provides all the performance we'd need. But we needed to explore more typical approaches to performing heavy-weight calculations in Node.js while still keeping the event loop rolling.

The bottom line is that in Node.js the event loop must run.

Resources for Article:


Further resources on this subject:


You've been reading an excerpt of:

Node.js Web Development - Third Edition

Explore Title