Node Cookbook

By David Mark Clements
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Making a Web Server

About this book

The principles of asynchronous event-driven programming are perfect for today's web, where efficient real-time applications and scalability are at the forefront. Server-side JavaScript has been here since the 90's but Node got it right. With a thriving community and interest from Internet giants, it could be the PHP of tomorrow.

"Node Cookbook" shows you how to transfer your JavaScript skills to server side programming. With simple examples and supporting code, "Node Cookbook" talks you through various server side scenarios often saving you time, effort, and trouble by demonstrating best practices and showing you how to avoid security faux pas.

Beginning with making your own web server, the practical recipes in this cookbook are designed to smoothly progress you to making full web applications, command line applications, and Node modules. Node Cookbook takes you through interfacing with various database backends such as MySQL, MongoDB and Redis, working with web sockets, and interfacing with network protocols, such as SMTP. Additionally, there are recipes on correctly performing heavy computations, security implementations, writing, your own Node modules and different ways to take your apps live.

Publication date:
July 2012
Publisher
Packt
Pages
342
ISBN
9781849517188

 

Chapter 1. Making a Web Server

In this chapter we will cover:

  • Setting up a router

  • Serving static files

  • Caching content in memory for immediate delivery

  • Optimizing performance with streaming

  • Securing against filesystem hacking exploits

 

Introduction


One of the great qualities of Node is its simplicity. Unlike PHP or ASP there is no separation between the web server and code, nor do we have to customize large configuration files to get the behavior we want. With Node we can create the server, customize it, and deliver content all at the code level. This chapter demonstrates how to create a web server with Node and feed content through it, all while implementing security and performance enhancements to cater for various situations.

 

Setting up a router


In order to deliver web content we need to make a URI available. This recipe walks us through the creation of an HTTP server that exposes routes to the user.

Getting ready

First, let's create our server file. If our main purpose is to expose server functionality, it's general practice to call the file server.js, which we could put in a new folder. It's also a good idea to install and use hotnode:

sudo npm -g install hotnode
hotnode server.js

Hotnode will conveniently auto-restart the server when we save changes.

How to do it...

In order to create the server we need the http module, so let's load it and use the http.createServer method:

var http = require('http');
http.createServer(function (request, response) {
response.writeHead(200, {'Content-Type': 'text/html'});
response.end('Woohoo!');
}).listen(8080);

Note

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files emailed directly to you.

Now, if we save our file and access localhost:8080 on a web browser or using curl, our browser (or curl) will exclaim:'Woohoo!'. However, the same will occur at localhost:8080/foo. Indeed, any path will render the same behavior, so let's build in some routing. We can use the path module to extract basename of the path (the final part of the path), and reverse any URI encoding from the client with decodeURI:

var http = require('http');
var path = require('path');

http.createServer(function (request, response) {
var lookup = path.basename(decodeURI(request.url));

We now need a way to define our routes. One option is to use an array of objects:

var pages = [
{route: '', output: 'Woohoo!'},
{route: 'about', output: 'A simple routing with Node example'},
{route: 'another page', output: function() {return 'Here\'s '+this.route;}},
];

Our pages array should be placed above the http.createServer call.

Within our server, we need to loop through our array and see if the lookup variable matches any of our routes. If it does we can supply the output. We'll also implement some 404 handling:

http.createServer(function (request, response) {
var lookup = path.basename(decodeURI(request.url));
pages.forEach(function(page) {
if (page.route === lookup) {
response.writeHead(200, {'Content-Type': 'text/html'});
response.end(typeof page.output === 'function'
? page.output() : page.output);
}
});
if (!response.finished) {
response.writeHead(404);
response.end('Page Not Found!');
}
}).listen(8080);

How it works...

The callback function we provide to http.createServer gives us all the functionality we need to interact with our server through the request and response objects. We use request to obtain the requested URL and then we acquire its basename with path. We also use decodeURI which our another page route would fail without as our code would try to match another%20page against our pages array and return false.

Once we have our basename, we can match it in any way we want. We could send it in a database query to retrieve content, use regular expressions to effectuate partial matches, or we could match it to a file name and load its contents.

We could have used a switch statement to handle routing but our pages array has several advantages. It's easier to read and extend, and it can be seamlessly converted to JSON. We loop through our pages array using forEach.

Node is built on Google's V8 engine, which provides us with a number of ECMAScript 5 features. These features can't be used in all browsers as they're not yet universally implemented, but using them in Node is no problem! forEach is an ES5 implementation, but the ES3 way is to use the less convenient for loop.

While looping through each object, we check its route property. If we get a match, we write the 200 OK status and content-type headers. We then end the response with the object's output property.

response.end allows us to pass a parameter to it, which it writes just before finishing the response. In response.end, we used a ternary operator (?:) to conditionally call page.output as a function or simply pass it as a string. Notice that the another page route contains a function instead of a string. The function has access to its parent object through the this variable, and allows for greater flexibility in assembling the output we want to provide. In the event that there is no match in our forEach loop, response.end would never be called. Therefore, the client would continue to wait for a response until it times out. To avoid this, we check the response.finished property and if it's false, we write a 404 header and end the response.

response.finished depends on the forEach callback, yet it's not nested within the callback. Callback functions are mostly used for asynchronous operations. So on the surface this looks like a potential race condition, however forEach does not operate asynchronously. It continues to block until all loops are complete.

There's more...

There are many ways to extend and alter this example. There's also some great non-core modules available that do the leg work for us.

Simple multilevel routing

So far, our routing only deals with a single-level path. A multilevel path (for example, /about/node) will simply return a 404. We can alter our object to reflect a subdirectory-like structure, remove path, and use request.url for our routes instead of path.basename:

var http=require('http');
var pages = [
{route: '/', output: 'Woohoo!'},
{route: '/about/this', output: 'Multilevel routing with Node'},
{route: '/about/node', output: 'Evented I/O for V8 JavaScript.'},

{route: '/another page', output: function () {return 'Here\'s ' + this.route; }}
];
http.createServer(function (request, response) {
var lookup = decodeURI(request.url);

Note

When serving static files, request.url must be cleaned prior to fetching a given file. Check out the Securing against filesystem hacking exploits section discussed in this chapter.

Multilevel routing could be taken further, allowing us to build and then traverse a more complex object.

{route: 'about', childRoutes: [
{route: 'node', output: 'Evented I/O for V8 Javascript'},
{route: 'this', output: 'Complex Multilevel Example'}
]}

After the third or fourth level, this object would become a leviathan to look at. We could instead create a helper function to define our routes that essentially pieces our object together for us. Alternatively, we could use one of the excellent non-core routing modules provided by the open source Node community. Excellent solutions already exist which provide helper methods to handle the increasing complexity of scalable multilevel routing (see Routing modules discussed in this chapter andChapter 6, Accelerating Development with Express).

Parsing the querystring

Two other useful core modules are url and querystring. The url.parse method allows two parameters. First the URL string (in our case, this will be request.url) and second a Boolean parameter named parseQueryString. If set to true, it lazy loads the querystring module, saving us the need to require it, to parse the query into an object. This makes it easy for us to interact with the query portion of a URL.

var http = require('http');
var url = require('url');
var pages = [
{id: '1', route: '', output: 'Woohoo!'},
{id: '2', route: 'about', output: 'A simple routing with Node example'},
{id: '3', route: 'another page', output: function () {return 'Here\'s ' + this.route; }},
];
http.createServer(function (request, response) {
var id = url.parse(decodeURI(request.url), true).query.id;
if (id) {
pages.forEach(function (page) {
if (page.id === id) {
response.writeHead(200, {'Content-Type': 'text/html'});
response.end(typeof page.output === 'function'
? page.output() : page.output);
}
});
}
if (!response.finished) {
response.writeHead(404);
response.end('Page Not Found');
}
}).listen(8080);

With the added id properties we can access our object data by, for instance, localhost:8080?id=2.

Routing modules

There's an up-to-date list of various routing modules for Node at https://www.github.com/joyent/node/wiki/modules#wiki-web-frameworks-routers. These community-made routers cater to various scenarios. It's important to research the activity and maturity of a module before taking it into a production environment. In Chapter 6, Accelerating Development with Express, we will go into greater detail on using the built-in Express/Connect router for more comprehensive routing solutions.

See also

  • Serving static files and Securing against filesystem hacking exploits discussed in this chapter

  • Dynamic Routing discussed In Chapter 6, Accelerating Development with Express.

 

Serving static files


If we have information stored on disk that we want to serve as web content, we can use the fs (filesystem) module to load our content and pass it through the createServer callback. This is a basic conceptual starting point for serving static files. As we will learn in the following recipes there are much more efficient solutions.

Getting ready

We'll need some files to serve. Let's create a directory named content, containing the following three files:

index.html:

<html>
<head>
<title>Yay Node!</title>
<link rel=stylesheet href=styles.css type=text/css>
<script src=script.js type=text/javascript></script>
</head>
<body>
<span id=yay>Yay!</span>
</body>
</html>

script.js:

window.onload=function() {alert('Yay Node!');};

styles.css:

#yay {font-size:5em;background:blue;color:yellow;padding:0.5em}

How to do it...

As in the previous recipe, we'll be using the core modules http and path. We'll also need to access the filesystem, so we'll require the fs module too. Let's create our server:

var http = require('http');
var path = require('path');
var fs = require('fs');
http.createServer(function (request, response) {
var lookup = path.basename(decodeURI(request.url)) || 'index.html',
f = 'content/' + lookup;
fs.exists(f, function (exists) {
console.log(exists ? lookup + " is there" : lookup + " doesn't exist");
});
}).listen(8080);

If we haven't already, we can initialize our server.js file:

hotnode server.js

Try loading localhost:8080/foo and the console will say foo doesn't exist, because it doesn't. localhost:8080/script.js will tell us script.js is there, because it is. Before we can save a file, we are supposed to let the client know the content-type, which we can determine from the file extensions. So let's make a quick map using an object:

var mimeTypes = {
'.js' : 'text/javascript',
'.html': 'text/html',
'.css' : 'text/css'
};

We could extend our mimeTypes map later to support more types.

Note

Modern browsers may be able to interpret certain mime types (such as text/javascript) without the server sending a content-type header. However, older browsers or less common mime types will rely upon the correct content-type header being sent from the server.

Remember to place mimeTypes outside the server callback since we don't want to initialize the same object on every client request. If the requested file exists, we can convert our file extension into content-type by feeding path.extname into mimeTypes and then passing our retrieved content-type to response.writeHead. If the requested file doesn't exist, we'll write out a 404 and end the response.

//requires variables, mimeType object...
http.createServer(function (request, response) {
var lookup = path.basename(decodeURI(request.url)) || 'index.html',
f = 'content/' + lookup;
fs.exists(f, function (exists) {
if (exists) {
fs.readFile(f, function (err, data) {
if (err) { response.writeHead(500);
response.end('Server Error!'); return; }
var headers = {'Content-type': mimeTypes[path. extname(lookup)]};
response.writeHead(200, headers);
response.end(data);
});
return;
}
response.writeHead(404); //no such file found!
response.end();
});
}).listen(8080);

At the moment, there is still no content sent to the client. We have to get this content from our file, so we wrap the response handling in an fs.readFile method callback.

//http.createServer, inside fs.exists:
if (exists) {
fs.readFile(f, function(err, data) {

var headers={'Content-type': mimeTypes[path.extname(lookup)]};
response.writeHead(200, headers);
response.end(data);
});
return;
}

Before we finish, let's apply some error handling to our fs.readFile callback as follows:

//requires variables, mimeType object...
//http.createServer, fs.exists, inside if(exists):
fs.readFile(f, function(err, data) {
if (err) {response.writeHead(500); response.end('Server Error!'); return; }

var headers = {'Content-type': mimeTypes[path.extname(lookup)]};
response.writeHead(200, headers);
response.end(data);
});
return;
}

Notice that return stays outside the fs.readFile callback. We are returning from the fs.exists callback to prevent further code execution (for example, sending 404). Placing a return in an if statement is similar to using an else branch. However, the if return pattern is generally preferable to using if else in Node, as it eliminates yet another set of curly braces.

So now we can navigate to localhost:8080 which will serve our index.html file. The index.html file makes calls to our script.js and styles.css files, which our server also delivers with appropriate mime types. The result can be seen in the following screenshot:

This recipe serves to illustrate the fundamentals of serving static files. Remember, this is not an efficient solution! In a real-world situation, we don't want to make an I/O call every time a request hits the server, this is very costly especially with larger files. In the following recipes, we'll learn better ways to serve static files.

How it works...

Our script creates a server and declares a variable called lookup. We assign a value to lookup using the double pipe (||) or operator. This defines a default route if path.basename is empty. Then we pass lookup to a new variable that we named f in order to prepend our content directory to the intended filename. Next we run f through the fs.exists method and check the exist parameter in our callback to see if the file is there. If the file exists we read it asynchronously using fs.readFile. If there is a problem accessing the file, we write a 500 server error, end the response, and return from the fs.readFile callback. We can test the error-handling functionality by removing read permissions from index.html.

chmod -r index.html

Doing so will cause the server to throw the 500 server error status code. To set things right again run the following command:

chmod +r index.html

As long as we can access the file, we grab content-type using our handy mimeTypes mapping object, write the headers, end the response with data loaded from the file, and finally return from the function. If the requested file does not exist, we bypass all this logic, write a 404, and end the response.

There's more...

Here's something to watch out for...

The favicon gotcha

When using a browser to test our server, sometimes an unexpected server hit can be observed. This is the browser requesting the default favicon.ico icon file that servers can provide. Apart from the initial confusion of seeing additional hits, this is usually not a problem. If the favicon request begins to interfere, we can handle it like this:

if (request.url === '/favicon.ico') {
response.end();
return;
}

If we wanted to be more polite to the client, we could also inform it of a 404 by using response.writeHead(404) before issuing response.end.

See also

  • Caching content in memory for immediate delivery discussed in this chapter

  • Optimizing performance with streaming discussed in this chapter

  • Securing against filesystem hacking exploits discussed in this chapter

 

Caching content in memory for immediate delivery


Directly accessing storage on each client request is not ideal. For this example, we will explore how to enhance server efficiency by accessing the disk on only the first request, caching the data from file for that first request, and serving all further requests out of the process memory.

Getting ready

We are going to improve upon the code from the previous task, so we'll be working with server.js, and in the content directory with index.html, styles.css, and script.js.

How to do it...

Let's begin by looking at our script from the previous recipe Serving static files:

var http = require('http');
var path = require('path');
var fs = require('fs');
var mimeTypes = {
'.js' : 'text/javascript',
'.html': 'text/html',
'.css' : 'text/css'
} ;
http.createServer(function (request, response) {
var lookup = path.basename(decodeURI(request.url)) || 'index.html';
var f = 'content/'+lookup;
fs.exists(f, function (exists) {
if (exists) {
fs.readFile(f, function(err,data) {
if (err) { response.writeHead(500);
response.end('Server Error!'); return; }
var headers = {'Content-type': mimeTypes[path.extname(lookup)]};
response.writeHead(200, headers);
response.end(data);
});
return;
}
response.writeHead(404); //no such file found!
response.end('Page Not Found!');
});

We need to modify this code to only read the file once, load its contents into memory, and afterwards respond to all requests for that file from memory. To keep things simple and preserve maintainability, we'll extract our cache handling and content delivery into a separate function. So above http.createServer, and below mimeTypes, we'll add the following:

var cache = {};
function cacheAndDeliver(f, cb) {
if (!cache[f]) {
fs.readFile(f, function(err, data) {
if (!err) {
cache[f] = {content: data} ;
}
cb(err, data);
});
return;
}
console.log('loading ' + f + ' from cache');
cb(null, cache[f].content);
}
//http.createServer …..

A new cache object has been added to store our files in memory as well as a new function called cacheAndDeliver. Our function takes the same parameters as fs.readFile, so we can replace fs.readFile in the http.createServer callback while leaving the rest of the code intact:

//...inside http.createServer:
fs.exists(f, function (exists) {
if (exists) {
cacheAndDeliver(f, function(err, data) {
if (err) { response.writeHead(500);
response.end('Server Error!'); return; }
var headers = {'Content-type': mimeTypes[path.extname(f)]};
response.writeHead(200, headers);
response.end(data);
});
return;
}
//rest of fs.exists code (404 handling)...

When we execute our server.js file and access localhost:8080 twice consecutively, the second request causes the console to output the following:

loading content/index.html from cache
loading content/styles.css from cache
loading content/script.js from cache

How it works...

We defined a function called cacheAndDeliver, which like fs.readFile, takes a filename and callback as parameters. This is great because we can pass the exact same callback of fs.readFile to cacheAndDeliver, padding the server out with caching logic without adding any extra complexity visually to the inside of the http.createServer callback. As it stands, the worth of abstracting our caching logic into an external function is arguable, but the more we build on the server's caching abilities the more feasible and useful this abstraction becomes. Our cacheAndDeliver function checks to see if the requested content is already cached, if not, we call fs.readFile and load the data from disk. Once we have this data we may as well hold onto it, so it's placed into the cache object referenced by its file path (the f variable). The next time anyone requests the file, cacheAndDeliver will see that we have the file stored in the cache object and will issue an alternative callback containing the cached data. Notice that we fill the cache[f] property with another new object containing a content property. This makes it easier to extend the caching functionality in the future since we would just need to place extra properties into our cache[f] object and supply logic that interfaces with these properties accordingly.

There's more...

If we were to modify the files we are serving, any changes wouldn't be reflected until we restart the server. We can do something about that.

Reflecting content changes

To detect whether a requested file has changed since we last cached it, we must know when the file was cached and when it was last modified. To record when the file was last cached, let's extend the cache[f] object:

cache[f] = {content: data,
timestamp: Date.now() //store a Unix time stamp
};

That was easy. Now we need to find out when the file was updated last. The fs.stat method returns an object as the second parameter of its callback. This object contains the same useful information as the command-line GNU coreutils stat.fs.stat supplies three time-related properties: last accessed (atime), last modified (mtime), and last changed (ctime). The difference between mtime and ctime is that ctime will reflect any alterations to the file, whereas mtime will only reflect alterations to the content of the file. Consequently, if we changed the permissions of a file, ctime would update but mtime would stay the same. We want to pay attention to permission changes as they happen, so let's use the ctime property:

//requires and mimeType object....
var cache = {};
function cacheAndDeliver(f, cb) {
fs.stat(f, function (err, stats) {
var lastChanged = Date.parse(stats.ctime),
isUpdated = (cache[f]) && lastChanged > cache[f].timestamp;
if (!cache[f] || isUpdated) {
fs.readFile(f, function (err, data) {
console.log('loading ' + f + ' from file');
//rest of cacheAndDeliver
}); //end of fs.stat
} // end of cacheAndDeliver

The contents of cacheAndDeliver have been wrapped in an fs.stat callback. Two variables have been added and the if(!cache[f]) statement has been modified. We parse the ctime property of the second parameter, dub stats using Date.parse to convert it to milliseconds since midnight, January 1, 1970 (the Unix epoch), and assign it to our lastChanged variable. Then we check whether the requested file's last changed time is greater than when we cached the file (provided the file is indeed cached) and assign the result to our isUpdated variable. After that, it's merely a case of adding the isUpdated Boolean to the conditional if(!cache[f]) statement via the || (or) operator. If the file is newer than our cached version (or if it isn't yet cached), we load the file from the disk into the cache object.

See also

  • Optimizing performance with streaming discussed in this chapter

  • Browser-server transmission via AJAX discussed in Chapter 3, Working with Data Serialization

 

Optimizing performance with streaming


Caching content certainly improves upon reading a file from disk for every request. However, with fs.readFile, we are reading the whole file into memory before sending it out in response. For better performance, we can stream a file from disk and pipe it directly to the response object, sending data straight to the network socket one piece at a time.

Getting ready

We are building on our code from the last example, so let's get server.js, index.html, styles.css, and script.js ready.

How to do it...

We will be using fs.createReadStream to initialize a stream, which can be piped to the response object. In this case, implementing fs.createReadStream within our cacheAndDeliver function isn't ideal because the event listeners of fs.createReadStream will need to interface with the request and response objects. For the sake of simplicity, these would preferably be dealt within the http.createServer callback. For brevity's sake, we will discard our cacheAndDeliver function and implement basic caching within the server callback:

//requires, mime types, createServer, lookup and f vars...
fs.exists(f, function (exists) {
if (exists) {
var headers = {'Content-type': mimeTypes[path.extname(f)]};
if (cache[f]) {
response.writeHead(200, headers);
response.end(cache[f].content);
return;
} //...rest of server code...

Later on, we will fill cache[f].content while we're interfacing with the readStream object. Here's how we use fs.createReadStream:

var s = fs.createReadStream(f);

This will return a readStream object which streams the file that is pointed at by the f variable. readStream emits events that we need to listen to. We can listen with addEventListener or use the shorthand on:

var s = fs.createReadStream(f).on('open', function () {
//do stuff when the readStream opens
});

Since createReadStream returns the readStream object, we can latch our event listener straight onto it using method chaining with the dot notation. Each stream is only going to open once, we don't need to keep on listening to it. Therefore, we can use the once method instead of on to automatically stop listening after the first event occurrence:

var s = fs.createReadStream(f).once('open', function () {
//do stuff when the readStream opens
});

Before we fill out the open event callback, let's implement error handling as follows:

var s = fs.createReadStream(f).once('open', function () {
//do stuff when the readStream opens
}).once('error', function (e) {
console.log(e);
response.writeHead(500);
response.end('Server Error!');
});

The key to this entire endeavor is the stream.pipe method. This is what enables us to take our file straight from disk and stream it directly to the network socket via our response object.

var s = fs.createReadStream(f).once('open', function () {
response.writeHead(200, headers);
this.pipe(response);
}).once('error', function (e) {
console.log(e);
response.writeHead(500);
response.end('Server Error!');
});

What about ending the response? Conveniently, stream.pipe detects when the stream has ended and calls response.end for us. For caching purposes, there's one other event we need to listen to. Still within our fs.exists callback, underneath the createReadStream code block, we write the following code:

fs.stat(f, function(err, stats) {
var bufferOffset = 0;
cache[f] = {content: new Buffer(stats.size)};
s.on('data', function (chunk) {
chunk.copy(cache[f].content, bufferOffset);
bufferOffset += chunk.length;
});
});

We've used the data event to capture the buffer as it's being streamed, and copied it into a buffer that we supplied to cache[f].content, using fs.stat to obtain the file size for the file's cache buffer.

How it works...

Instead of the client waiting for the server to load the entire file from the disk prior to sending it to the client, we use a stream to load the file in small, ordered pieces and promptly send them to the client. With larger files this is especially useful, as there is minimal delay between the file being requested and the client starting to receive the file.

We did this by using fs.createReadStream to start streaming our file from the disk. fs.createReadStream creates readStream, which inherits from the EventEmitter class.

The EventEmitter class accomplishes the evented part of Node's tag line: Evented I/O for V8 JavaScript. Due to this, we'll use listeners instead of callbacks to control the flow of stream logic.

Then we added an open event listener using the once method since we want to stop listening for open once it has been triggered. We respond to the open event by writing the headers and using the stream.pipe method to shuffle the incoming data straight to the client.

stream.pipe handles the data flow. If the client becomes overwhelmed with processing, it sends a signal to the server which should be honored by pausing the stream. Under the hood, stream.pipe uses stream.pause and stream.resume to manage this interplay.

While the response is being piped to the client, the content cache is simultaneously being filled. To achieve this, we had to create an instance of the Buffer class for our cache[f].content property. A Buffer must be supplied with a size (or an array or string) which in our case is the size of the file. To get the size, we used the asynchronous fs.stat and captured the size property in the callback. The data event returns Buffer as its only callback parameter.

The default bufferSize for a stream is 64 KB. Any file whose size is less than the bufferSize will only trigger one data event because the entire file will fit into the first chunk of data. However, for files greater than bufferSize, we have to fill our cache[f].content property one piece at a time.

Note

Changing the default readStream buffer size:

We can change the buffer size of readStream by passing an options object with a bufferSize property as the second parameter of fs.createReadStream.

For instance, to double the buffer you could use fs.createReadStream(f,{bufferSize: 128 * 1024});

We cannot simply concatenate each chunk with cache[f].content since this will coerce binary data into string format which, though no longer in binary format, will later be interpreted as binary. Instead, we have to copy all the little binary buffer chunks into our binary cache[f].content buffer.

We created a bufferOffset variable to assist us with this. Each time we add another chunk to our cache[f].content buffer, we update our new bufferOffset by adding the length of the chunk buffer to it. When we call the Buffer.copy method on the chunk buffer, we pass bufferOffset as the second parameter so our cache[f].content buffer is filled correctly.

Moreover, operating with the Buffer class renders performance enhancements with larger files because it bypasses the V8 garbage collection methods. These tend to fragment large amounts of data thus slowing down Node's ability to process them.

There's more...

While streaming has solved a problem of waiting for files to load into memory before delivering them, we are nevertheless still loading files into memory via our cache object. With larger files, or large amounts of files, this could have potential ramifications.

Protecting against process memory overruns

There is a limited amount of process memory. By default, V8's memory is set to 1400 MB on 64-bit systems and 700 MB on 32-bit systems. This can be altered by running Node with --max-old-space-size=N where N is the amount of megabytes (the actual maximum amount that it can be set to depends upon the OS and of course the amount of physical RAM available). If we absolutely needed to be memory intensive, we could run our server on a large cloud platform, divide up the logic, and start new instances of node using the child_process class.

In this case, high memory usage isn't necessarily required and we can optimize our code to significantly reduce the potential for memory overruns. There is less benefit to caching larger files. The slight speed improvement relative to the total download time is negligible while the cost of caching them is quite significant in ratio to our available process memory. We can also improve cache efficiency by implementing an expiration time on cache objects which can then be used to clean the cache, consequently removing files in low demand and prioritizing high-demand files for faster delivery. Let's rearrange our cache object slightly:

var cache = {
store: {},
maxSize : 26214400, //(bytes) 25mb
}

For a clearer mental model, we're making a distinction between the cache as a functioning entity and the cache as a store (which is a part of the broader cache entity). Our first goal is to only cache files under a certain size. We've defined cache.maxSize for this purpose. All we have to do now is insert an if condition within the fs.stat callback:

fs.stat(f, function (err, stats) {
if (stats.size < cache.maxSize) {

var bufferOffset = 0;
cache.store[f] = {content: new Buffer(stats.size),
timestamp: Date.now() };
s.on('data', function (data) {
data.copy(cache.store[f].content, bufferOffset);
bufferOffset += data.length;
});
}

});

Notice we also slipped in a new timestamp property into our cache.store[f]. This is for cleaning the cache, which is our second goal. Let's extend cache:

var cache = {
store: {},
maxSize: 26214400, //(bytes) 25mb
maxAge: 5400 * 1000, //(ms) 1 and a half hours
clean: function(now) {
var that = this;
Object.keys(this.store).forEach(function (file) {
if (now > that.store[file].timestamp + that.maxAge) {
delete that.store[file];
}
});
}

};

So in addition to maxSize, we've created a maxAge property and added a clean method. We call cache.clean at the bottom of the server like so:

//all of our code prior
cache.clean(Date.now());

}).listen(8080); //end of the http.createServer

cache.clean loops through cache.store and checks to see if it has exceeded its specified lifetime. If it has, we remove it from store. We'll add one further improvement and then we're done. cache.clean is called on each request. This means cache.store is going to be looped through on every server hit, which is neither necessary nor efficient. It would be better if we cleaned the cache, say, every two hours or so. We'll add two more properties to cache. The first is cleanAfter to specify how long between cache cleans. The second is cleanedAt to determine how long it has been since the cache was last cleaned.

var cache = {
store: {},
maxSize: 26214400, //(bytes) 25mb
maxAge : 5400 * 1000, //(ms) 1 and a half hours
cleanAfter: 7200 * 1000,//(ms) two hours
cleanedAt: 0, //to be set dynamically

clean: function (now) {
if (now - this.cleanAfter > this.cleanedAt) {

this.cleanedAt = now;
that = this;
Object.keys(this.store).forEach(function (file) {
if (now > that.store[file].timestamp + that.maxAge) {
delete that.store[file];
}
});
}
}
};

We wrap our cache.clean method in an if statement which will allow a loop through cache.store only if it has been longer than two hours (or whatever cleanAfter is set to), since the last clean.

See also

  • Handling file uploads discussed In Chapter 2, Exploring the HTTP Object

  • Securing Against Filesystem Hacking Exploits discussed in this chapter.

 

Securing against filesystem hacking exploits


For a Node app to be insecure, there must be something an attacker can interact with for exploitation purposes. Due to Node's minimalist approach, the onus is mostly on programmers to ensure their implementation doesn't expose security flaws. This recipe will help identify some security risk anti-patterns that could occur when working with the filesystem.

Getting ready

We'll be working with the same content directory as in the previous recipes, but we'll start a new insecure_server.js file (there's a clue in the name!) from scratch to demonstrate mistaken techniques.

How to do it...

Our previous static file recipes tend to use path.basename to acquire a route, but this flat levels all request. If we accessed localhost:8080/foo/bar/styles.css, our code would take styles.css as the basename and deliver content/styles.css to us. Let's make a subdirectory in our content folder, call it subcontent, and move our script.js and styles.css files into it. We'd need to alter our script and link tags in index.html:

<link rel=stylesheet type=text/css href=subcontent/styles.css>
<script src=subcontent/script.js type=text/javascript></script>

We can use the url module to grab the entire pathname. So let's include the url module in our new insecure_server.js file, create our HTTP server, and use pathname to get the whole requested path:

var http = require('http');
var path = require('path');
var url = require('url');

var fs = require('fs');
http.createServer(function (request, response) {
var lookup = url.parse(decodeURI(request.url)).pathname;

lookup = (lookup === "/") ? '/index.html' : lookup;
var f = 'content' + lookup;
console.log(f);
fs.readFile(f, function (err, data) {
response.end(data);
});
}).listen(8080);

If we navigate to localhost:8080, everything works great. We've gone multilevel, hooray. For demonstration purposes, a few things have been stripped out from previous recipes (such as fs.exists), but even with them, the following code presents the same security hazards:

curl localhost:8080/../insecure_server.js

Now we have our server's code. An attacker could also access /etc/passwd with a few attempts at guessing its relative path:

curl localhost:8080/../../../../../../../etc/passwd

In order to test these attacks, we have to use curl or another equivalent because modern browsers will filter these sorts of requests. As a solution, what if we added a unique suffix to each file we wanted to serve and made it mandatory for the suffix to exist before the server coughs it up? That way, an attacker could request /etc/passwd or our insecure_server.js because they wouldn't have the unique suffix. To try this, let's copy the content folder and call it content-pseudosafe, and rename our files to index.html-serve, script.js-serve, and styles.css-serve. Let's create a new server file and name it pseudosafe_server.js. Now all we have to do is make the -serve suffix mandatory:

//requires section...
http.createServer(function (request, response) {
var lookup = url.parse(decodeURI(request.url)).pathname;
lookup = (lookup === "/") ? '/index.html-serve' : lookup + '-serve';

var f = 'content-pseudosafe' + lookup;

For feedback purposes, we'll also include some 404 handling with the help of fs.exists.

//requires, create server etc
fs.exists(f, function (exists) {
if (!exists) {
response.writeHead(404);
response.end('Page Not Found!');
return;
}
//read file etc

So let's start our pseudosafe_server.js file and try out the same exploit:

curl -i localhost:8080/../insecure_server.js

We've used the -i argument so that curl will output the headers. What's the result? A 404, because the file it is actually looking for is ../insecure_server.js-serve, which doesn't exist. So what's wrong with this method? Well it's inconvenient and prone to error. However, more importantly an attacker can still work around it!

curl localhost:8080/../insecure_server.js%00/index.html

And voila! There's our server code again. The solution to our problem is path.normalize, which cleans up our pathname before it gets to fs.readFile.

http.createServer(function (request, response) {
var lookup = url.parse(decodeURI(request.url)).pathname;
lookup = path.normalize(lookup);

lookup = (lookup === "/") ? '/index.html' : lookup;
var f = 'content' + lookup

Prior recipes haven't used path.normalize, yet they're still relatively safe. path.basename gives us the last part of the path, so any leading relative directory pointers (../) are discarded, thus preventing the directory traversal exploit.

How it works...

Here we have two filesystem exploitation techniques: the relative directory traversal and poison null byte attacks. These attacks can take different forms, such as in a POST request or from an external file. They can have different effects. For example, if we were writing to files instead of reading them, an attacker could potentially start making changes to our server. The key to security in all cases is to validate and clean any data that comes from the user. In insecure_server.js, we pass whatever the user requests to our fs.readFile method. This is foolish because it allows an attacker to take advantage of the relative path functionality in our operating system by using ../, thus gaining access to areas that should be off limits. By adding the -serve suffix, we didn't solve the problem. We put a plaster on it which can be circumvented by the poison null byte. The key to this attack is %00, which is a URL hex code for the null byte. In this case, the null byte blinds Node to the ../insecure_server.js portion, but when the same null byte is sent through to our fs.readFile method, it has to interface with the kernel. However, the kernel gets blinded to the index.html part. So our code sees index.html but the read operation sees ../insecure_server.js. This is known as null byte poisoning. To protect ourselves, we could use a regex statement to remove the ../ parts of the path. We could also check for the null byte and spit out a 400 Bad Request statement. However, we don't need to, because path.normalize filters out the null byte and relative parts for us.

There's more...

Let's further delve into how we can protect our servers when it comes to serving static files.

Whitelisting

If security was an extreme priority, we could adopt a strict whitelisting approach. In this approach, we would create a manual route for each file we are willing to deliver. Anything not on our whitelist would return 404. We can place a whitelist array above http.createServer as shown in the following code:

var whitelist = [
'/index.html',
'/subcontent/styles.css',
'/subcontent/script.js'
];

Inside of our http.createServer callback, we'll put an if statement to check if the requested path is in the whitelist array:

if (whitelist.indexOf(lookup) === -1) {
response.writeHead(404);
response.end('Page Not Found!');
return;
}

That's it. We can test this by placing a file non-whitelisted.html in our content directory.

curl -i localhost:8080/non-whitelisted.html

The preceding command will return 404 because non-whitelisted.html isn't on whitelist.

Node-static

https://github.com/joyent/node/wiki/modules#wiki-web-frameworks-static has a list of static file server modules available for different purposes. It's a good idea to ensure that a project is mature and active before relying on it to serve your content. Node-static is a well developed module with built-in caching. It's also compliant with the RFC2616 HTTP standards specification. This defines how files should be delivered over HTTP. Node-static implements all the essentials discussed in this chapter and more besides. This piece of code is slightly adapted from the node-static Github page at https://github.com/cloudhead/node-static:

var static = require('node-static');
var fileServer = new static.Server('./content');
require('http').createServer(function (request, response) {
request.addListener('end', function () {
fileServer.serve(request, response);
});
}).listen(8080);

The preceding code will interface with the node-static module to handle server-side and client-side caching, use streams to deliver content, and filter out relative requests and null bytes, among other things.

See also

  • Preventing cross-site request forgery discussed In Chapter 7, Implementing Security, Encryption, and Authentication

  • Setting up an HTTPS web server discussed In Chapter 7, Implementing Security, Encryption, and Authentication

  • Deploying to a server environment discussed In Chapter 10, Taking It Live

  • Cryptographic password sashing discussed In Chapter 7, Implementing Security, Encryption, and Authentication

About the Author

  • David Mark Clements

    David Mark Clements is a principal architect with nearForm, specializing in Node, frontend web, and JavaScript performance.

    He assists multinationals and start-ups alike with architecture planning, creating and leading development teams, innovation projects, internal evangelism, training, and deep dive consultancy on all aspects of live systems (architecture, performance, infrastructure, and deployment).

    David is also an avid open source enthusiast, and regularly speaks at various JavaScript and web conferences.

    Node.js became a core component of his toolset (since version 0.4) due to its versatility, vast ecosystem, and the cognitive ease that comes with full-stack JavaScript. Being primarily self-taught, David Mark Clements has a potent curiosity that typically drives him to approach problems with a unique perspective.

    Browse publications by this author
Node Cookbook
Unlock this book and the full library for $5 a month*
Start now