One of the great qualities of Node is its simplicity. Unlike PHP or ASP there is no separation between the web server and code, nor do we have to customize large configuration files to get the behavior we want. With Node we can create the server, customize it, and deliver content all at the code level. This chapter demonstrates how to create a web server with Node and feed content through it, all while implementing security and performance enhancements to cater for various situations.
In order to deliver web content we need to make a URI available. This recipe walks us through the creation of an HTTP server that exposes routes to the user.
First, let's create our server file. If our main purpose is to expose server functionality, it's general practice to call the file server.js
, which we could put in a new folder. It's also a good idea to install and use hotnode:
sudo npm -g install hotnode
hotnode server.js
Hotnode
will conveniently auto-restart the server when we save changes.
In order to create the server we need the http
module, so let's load it and use the http.createServer
method:
var http = require('http'); http.createServer(function (request, response) { response.writeHead(200, {'Content-Type': 'text/html'}); response.end('Woohoo!'); }).listen(8080);
Note
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files emailed directly to you.
Now, if we save our file and access localhost:8080
on a web browser or using curl, our browser (or curl) will exclaim:'Woohoo!'
. However, the same will occur at localhost:8080/foo
. Indeed, any path will render the same behavior, so let's build in some routing. We can use the path
module to extract basename
of the path (the final part of the path), and reverse any URI encoding from the client with decodeURI:
var http = require('http'); var path = require('path'); http.createServer(function (request, response) { var lookup = path.basename(decodeURI(request.url));
We now need a way to define our routes. One option is to use an array of objects:
var pages = [ {route: '', output: 'Woohoo!'}, {route: 'about', output: 'A simple routing with Node example'}, {route: 'another page', output: function() {return 'Here\'s '+this.route;}}, ];
Our pages
array should be placed above the http.createServer
call.
Within our server, we need to loop through our array and see if the lookup variable matches any of our routes. If it does we can supply the output. We'll also implement some 404
handling:
http.createServer(function (request, response) {
var lookup = path.basename(decodeURI(request.url));
pages.forEach(function(page) {
if (page.route === lookup) {
response.writeHead(200, {'Content-Type': 'text/html'});
response.end(typeof page.output === 'function'
? page.output() : page.output);
}
});
if (!response.finished) {
response.writeHead(404);
response.end('Page Not Found!');
}
}).listen(8080);
The callback function we provide to http.createServer
gives us all the functionality we need to interact with our server through the request
and response
objects. We use request
to obtain the requested URL and then we acquire its basename
with path
. We also use decodeURI
which our another page
route would fail without as our code would try to match another%20page
against our pages
array and return false
.
Once we have our basename
, we can match it in any way we want. We could send it in a database query to retrieve content, use regular expressions to effectuate partial matches, or we could match it to a file name and load its contents.
We could have used a switch
statement to handle routing but our pages
array has several advantages. It's easier to read and extend, and it can be seamlessly converted to JSON. We loop through our pages
array using forEach
.
Node is built on Google's V8 engine, which provides us with a number of ECMAScript 5 features. These features can't be used in all browsers as they're not yet universally implemented, but using them in Node is no problem! forEach
is an ES5 implementation, but the ES3 way is to use the less convenient for
loop.
While looping through each object, we check its route
property. If we get a match, we write the 200 OK
status and content-type
headers. We then end the response with the object's output property.
response.end
allows us to pass a parameter to it, which it writes just before finishing the response. In response.end
, we used a ternary operator (?:) to conditionally call page.output
as a function or simply pass it as a string. Notice that the another page
route contains a function instead of a string. The function has access to its parent object through the this
variable, and allows for greater flexibility in assembling the output we want to provide. In the event that there is no match in our forEach
loop, response.end
would never be called. Therefore, the client would continue to wait for a response until it times out. To avoid this, we check the response.finished
property and if it's false, we write a 404
header and end the response.
response.finished
depends on the forEach
callback, yet it's not nested within the callback. Callback functions are mostly used for asynchronous operations. So on the surface this looks like a potential race condition, however forEach
does not operate asynchronously. It continues to block until all loops are complete.
There are many ways to extend and alter this example. There's also some great non-core modules available that do the leg work for us.
So far, our routing only deals with a single-level path. A multilevel path (for example, /about/node)
will simply return a 404
. We can alter our object to reflect a subdirectory-like structure, remove path
, and use request.url
for our routes instead of path.basename:
var http=require('http'); var pages = [ {route: '/', output: 'Woohoo!'}, {route: '/about/this', output: 'Multilevel routing with Node'}, {route: '/about/node', output: 'Evented I/O for V8 JavaScript.'}, {route: '/another page', output: function () {return 'Here\'s ' + this.route; }} ]; http.createServer(function (request, response) { var lookup = decodeURI(request.url);
Note
When serving static files, request.url
must be cleaned prior to fetching a given file. Check out the Securing against filesystem hacking exploits section discussed in this chapter.
Multilevel routing could be taken further, allowing us to build and then traverse a more complex object.
{route: 'about', childRoutes: [ {route: 'node', output: 'Evented I/O for V8 Javascript'}, {route: 'this', output: 'Complex Multilevel Example'} ]}
After the third or fourth level, this object would become a leviathan to look at. We could instead create a helper function to define our routes that essentially pieces our object together for us. Alternatively, we could use one of the excellent non-core routing modules provided by the open source Node community. Excellent solutions already exist which provide helper methods to handle the increasing complexity of scalable multilevel routing (see Routing modules discussed in this chapter andChapter 6, Accelerating Development with Express).
Two other useful core modules are url
and querystring
. The url.parse
method allows two parameters. First the URL string (in our case, this will be request.url)
and second a Boolean parameter named parseQueryString
. If set to true
, it lazy loads the querystring
module, saving us the need to require it, to parse the query into an object. This makes it easy for us to interact with the query portion of a URL.
var http = require('http'); var url = require('url'); var pages = [ {id: '1', route: '', output: 'Woohoo!'}, {id: '2', route: 'about', output: 'A simple routing with Node example'}, {id: '3', route: 'another page', output: function () {return 'Here\'s ' + this.route; }}, ]; http.createServer(function (request, response) { var id = url.parse(decodeURI(request.url), true).query.id; if (id) { pages.forEach(function (page) { if (page.id === id) { response.writeHead(200, {'Content-Type': 'text/html'}); response.end(typeof page.output === 'function' ? page.output() : page.output); } }); } if (!response.finished) { response.writeHead(404); response.end('Page Not Found'); } }).listen(8080);
With the added id
properties we can access our object data by, for instance, localhost:8080?id=2.
There's an up-to-date list of various routing modules for Node at https://www.github.com/joyent/node/wiki/modules#wiki-web-frameworks-routers. These community-made routers cater to various scenarios. It's important to research the activity and maturity of a module before taking it into a production environment. In Chapter 6, Accelerating Development with Express, we will go into greater detail on using the built-in Express/Connect router for more comprehensive routing solutions.
Serving static files and Securing against filesystem hacking exploits discussed in this chapter
Dynamic Routing discussed In Chapter 6, Accelerating Development with Express.
If we have information stored on disk that we want to serve as web content, we can use the fs
(filesystem) module to load our content and pass it through the createServer
callback. This is a basic conceptual starting point for serving static files. As we will learn in the following recipes there are much more efficient solutions.
We'll need some files to serve. Let's create a directory named content
, containing the following three files:
index.html:
<html> <head> <title>Yay Node!</title> <link rel=stylesheet href=styles.css type=text/css> <script src=script.js type=text/javascript></script> </head> <body> <span id=yay>Yay!</span> </body> </html>
script.js:
window.onload=function() {alert('Yay Node!');};
styles.css:
#yay {font-size:5em;background:blue;color:yellow;padding:0.5em}
As in the previous recipe, we'll be using the core modules http
and path
. We'll also need to access the filesystem, so we'll require the fs
module too. Let's create our server:
var http = require('http'); var path = require('path'); var fs = require('fs'); http.createServer(function (request, response) { var lookup = path.basename(decodeURI(request.url)) || 'index.html', f = 'content/' + lookup; fs.exists(f, function (exists) { console.log(exists ? lookup + " is there" : lookup + " doesn't exist"); }); }).listen(8080);
If we haven't already, we can initialize our server.js
file:
hotnode server.js
Try loading localhost:8080/foo
and the console will say foo doesn't exist
, because it doesn't. localhost:8080/script.js
will tell us script.js is there
, because it is. Before we can save a file, we are supposed to let the client know the content-type
, which we can determine from the file extensions. So let's make a quick map using an object:
var mimeTypes = { '.js' : 'text/javascript', '.html': 'text/html', '.css' : 'text/css' };
We could extend our mimeTypes
map later to support more types.
Note
Modern browsers may be able to interpret certain mime types (such as text/javascript)
without the server sending a content-type
header. However, older browsers or less common mime types will rely upon the correct content-type
header being sent from the server.
Remember to place mimeTypes
outside the server callback since we don't want to initialize the same object on every client request. If the requested file exists, we can convert our file extension into content-type
by feeding path.extname
into mimeTypes
and then passing our retrieved content-type
to response.writeHead
. If the requested file doesn't exist, we'll write out a 404
and end the response.
//requires variables, mimeType object... http.createServer(function (request, response) { var lookup = path.basename(decodeURI(request.url)) || 'index.html', f = 'content/' + lookup; fs.exists(f, function (exists) { if (exists) { fs.readFile(f, function (err, data) { if (err) { response.writeHead(500); response.end('Server Error!'); return; } var headers = {'Content-type': mimeTypes[path. extname(lookup)]}; response.writeHead(200, headers); response.end(data); }); return; } response.writeHead(404); //no such file found! response.end(); }); }).listen(8080);
At the moment, there is still no content sent to the client. We have to get this content from our file, so we wrap the response handling in an fs.readFile
method callback.
//http.createServer, inside fs.exists: if (exists) { fs.readFile(f, function(err, data) { var headers={'Content-type': mimeTypes[path.extname(lookup)]}; response.writeHead(200, headers); response.end(data); }); return; }
Before we finish, let's apply some error handling to our fs.readFile
callback as follows:
//requires variables, mimeType object...
//http.createServer, fs.exists, inside if(exists):
fs.readFile(f, function(err, data) {
if (err) {response.writeHead(500); response.end('Server Error!'); return; }
var headers = {'Content-type': mimeTypes[path.extname(lookup)]};
response.writeHead(200, headers);
response.end(data);
});
return;
}
Notice that return
stays outside the fs.readFile
callback. We are returning from the fs.exists
callback to prevent further code execution (for example, sending 404)
. Placing a return
in an if
statement is similar to using an else
branch. However, the if return
pattern is generally preferable to using if else
in Node, as it eliminates yet another set of curly braces.
So now we can navigate to localhost:8080
which will serve our index.html
file. The index.html
file makes calls to our script.js
and styles.css
files, which our server also delivers with appropriate mime types. The result can be seen in the following screenshot:
![]() |
This recipe serves to illustrate the fundamentals of serving static files. Remember, this is not an efficient solution! In a real-world situation, we don't want to make an I/O call every time a request hits the server, this is very costly especially with larger files. In the following recipes, we'll learn better ways to serve static files.
Our script creates a server and declares a variable called lookup
. We assign a value to lookup
using the double pipe (||) or operator. This defines a default route if path.basename
is empty. Then we pass lookup
to a new variable that we named f
in order to prepend our content
directory to the intended filename. Next we run f
through the fs.exists
method and check the exist
parameter in our callback to see if the file is there. If the file exists we read it asynchronously using fs.readFile
. If there is a problem accessing the file, we write a 500
server error, end the response, and return from the fs.readFile
callback. We can test the error-handling functionality by removing read permissions from index.html
.
chmod -r index.html
Doing so will cause the server to throw the 500
server error status code. To set things right again run the following command:
chmod +r index.html
As long as we can access the file, we grab content-type
using our handy mimeTypes
mapping object, write the headers, end the response with data loaded from the file, and finally return from the function. If the requested file does not exist, we bypass all this logic, write a 404
, and end the response.
Here's something to watch out for...
When using a browser to test our server, sometimes an unexpected server hit can be observed. This is the browser requesting the default favicon.ico
icon file that servers can provide. Apart from the initial confusion of seeing additional hits, this is usually not a problem. If the favicon request begins to interfere, we can handle it like this:
if (request.url === '/favicon.ico') { response.end(); return; }
If we wanted to be more polite to the client, we could also inform it of a 404
by using response.writeHead(404)
before issuing response.end
.
Directly accessing storage on each client request is not ideal. For this example, we will explore how to enhance server efficiency by accessing the disk on only the first request, caching the data from file for that first request, and serving all further requests out of the process memory.
We are going to improve upon the code from the previous task, so we'll be working with server.js
, and in the content
directory with index.html, styles.css
, and script.js
.
Let's begin by looking at our script from the previous recipe Serving static files:
var http = require('http'); var path = require('path'); var fs = require('fs'); var mimeTypes = { '.js' : 'text/javascript', '.html': 'text/html', '.css' : 'text/css' } ; http.createServer(function (request, response) { var lookup = path.basename(decodeURI(request.url)) || 'index.html'; var f = 'content/'+lookup; fs.exists(f, function (exists) { if (exists) { fs.readFile(f, function(err,data) { if (err) { response.writeHead(500); response.end('Server Error!'); return; } var headers = {'Content-type': mimeTypes[path.extname(lookup)]}; response.writeHead(200, headers); response.end(data); }); return; } response.writeHead(404); //no such file found! response.end('Page Not Found!'); });
We need to modify this code to only read the file once, load its contents into memory, and afterwards respond to all requests for that file from memory. To keep things simple and preserve maintainability, we'll extract our cache handling and content delivery into a separate function. So above http.createServer
, and below mimeTypes
, we'll add the following:
var cache = {}; function cacheAndDeliver(f, cb) { if (!cache[f]) { fs.readFile(f, function(err, data) { if (!err) { cache[f] = {content: data} ; } cb(err, data); }); return; } console.log('loading ' + f + ' from cache'); cb(null, cache[f].content); } //http.createServer …..
A new cache
object has been added to store our files in memory as well as a new function called cacheAndDeliver
. Our function takes the same parameters as fs.readFile
, so we can replace fs.readFile
in the http.createServer
callback while leaving the rest of the code intact:
//...inside http.createServer: fs.exists(f, function (exists) { if (exists) { cacheAndDeliver(f, function(err, data) { if (err) { response.writeHead(500); response.end('Server Error!'); return; } var headers = {'Content-type': mimeTypes[path.extname(f)]}; response.writeHead(200, headers); response.end(data); }); return; } //rest of fs.exists code (404 handling)...
When we execute our server.js
file and access localhost:8080
twice consecutively, the second request causes the console to output the following:
loading content/index.html from cache
loading content/styles.css from cache
loading content/script.js from cache
We defined a function called cacheAndDeliver
, which like fs.readFile
, takes a filename and callback as parameters. This is great because we can pass the exact same callback of fs.readFile
to cacheAndDeliver
, padding the server out with caching logic without adding any extra complexity visually to the inside of the http.createServer
callback. As it stands, the worth of abstracting our caching logic into an external function is arguable, but the more we build on the server's caching abilities the more feasible and useful this abstraction becomes. Our cacheAndDeliver
function checks to see if the requested content is already cached, if not, we call fs.readFile
and load the data from disk. Once we have this data we may as well hold onto it, so it's placed into the cache
object referenced by its file path (the f
variable). The next time anyone requests the file, cacheAndDeliver
will see that we have the file stored in the cache
object and will issue an alternative callback containing the cached data. Notice that we fill the cache[f]
property with another new object containing a content
property. This makes it easier to extend the caching functionality in the future since we would just need to place extra properties into our cache[f]
object and supply logic that interfaces with these properties accordingly.
If we were to modify the files we are serving, any changes wouldn't be reflected until we restart the server. We can do something about that.
To detect whether a requested file has changed since we last cached it, we must know when the file was cached and when it was last modified. To record when the file was last cached, let's extend the cache[f]
object:
cache[f] = {content: data, timestamp: Date.now() //store a Unix time stamp };
That was easy. Now we need to find out when the file was updated last. The fs.stat
method returns an object as the second parameter of its callback. This object contains the same useful information as the command-line GNU coreutils stat.fs.stat
supplies three time-related properties: last accessed (atime
), last modified (mtime
), and last changed (ctime
). The difference between mtime
and ctime
is that ctime
will reflect any alterations to the file, whereas mtime
will only reflect alterations to the content of the file. Consequently, if we changed the permissions of a file, ctime
would update but mtime
would stay the same. We want to pay attention to permission changes as they happen, so let's use the ctime
property:
//requires and mimeType object.... var cache = {}; function cacheAndDeliver(f, cb) { fs.stat(f, function (err, stats) { var lastChanged = Date.parse(stats.ctime), isUpdated = (cache[f]) && lastChanged > cache[f].timestamp; if (!cache[f] || isUpdated) { fs.readFile(f, function (err, data) { console.log('loading ' + f + ' from file'); //rest of cacheAndDeliver }); //end of fs.stat } // end of cacheAndDeliver
The contents of cacheAndDeliver
have been wrapped in an fs.stat
callback. Two variables have been added and the if(!cache[f])
statement has been modified. We parse the ctime
property of the second parameter, dub stats
using Date.parse
to convert it to milliseconds since midnight, January 1, 1970 (the Unix epoch), and assign it to our lastChanged
variable. Then we check whether the requested file's last changed time is greater than when we cached the file (provided the file is indeed cached) and assign the result to our isUpdated
variable. After that, it's merely a case of adding the isUpdated
Boolean to the conditional if(!cache[f])
statement via the ||
(or) operator. If the file is newer than our cached version (or if it isn't yet cached), we load the file from the disk into the cache object.
Optimizing performance with streaming discussed in this chapter
Browser-server transmission via AJAX discussed in Chapter 3, Working with Data Serialization
Caching content certainly improves upon reading a file from disk for every request. However, with fs.readFile
, we are reading the whole file into memory before sending it out in response
. For better performance, we can stream a file from disk and pipe
it directly to the response
object, sending data straight to the network socket one piece at a time.
We are building on our code from the last example, so let's get server.js, index.html, styles.css
, and script.js
ready.
We will be using fs.createReadStream
to initialize a stream, which can be piped to the response
object. In this case, implementing fs.createReadStream
within our cacheAndDeliver
function isn't ideal because the event listeners of fs.createReadStream
will need to interface with the request
and response
objects. For the sake of simplicity, these would preferably be dealt within the http.createServer
callback. For brevity's sake, we will discard our cacheAndDeliver
function and implement basic caching within the server callback:
//requires, mime types, createServer, lookup and f vars... fs.exists(f, function (exists) { if (exists) { var headers = {'Content-type': mimeTypes[path.extname(f)]}; if (cache[f]) { response.writeHead(200, headers); response.end(cache[f].content); return; } //...rest of server code...
Later on, we will fill cache[f].content
while we're interfacing with the readStream
object. Here's how we use fs.createReadStream:
var s = fs.createReadStream(f);
This will return a readStream
object which streams the file that is pointed at by the f
variable. readStream
emits events that we need to listen to. We can listen with addEventListener
or use the shorthand on:
var s = fs.createReadStream(f).on('open', function () { //do stuff when the readStream opens });
Since createReadStream
returns the readStream
object, we can latch our event listener straight onto it using method chaining with the dot notation. Each stream is only going to open once, we don't need to keep on listening to it. Therefore, we can use the once
method instead of on
to automatically stop listening after the first event occurrence:
var s = fs.createReadStream(f).once('open', function () { //do stuff when the readStream opens });
Before we fill out the open
event callback, let's implement error handling as follows:
var s = fs.createReadStream(f).once('open', function () {
//do stuff when the readStream opens
}).once('error', function (e) {
console.log(e);
response.writeHead(500);
response.end('Server Error!');
});
The key to this entire endeavor is the stream.pipe
method. This is what enables us to take our file straight from disk and stream it directly to the network socket via our response
object.
var s = fs.createReadStream(f).once('open', function () {
response.writeHead(200, headers);
this.pipe(response);
}).once('error', function (e) {
console.log(e);
response.writeHead(500);
response.end('Server Error!');
});
What about ending the response? Conveniently, stream.pipe
detects when the stream has ended and calls response.end
for us. For caching purposes, there's one other event we need to listen to. Still within our fs.exists
callback, underneath the createReadStream
code block, we write the following code:
fs.stat(f, function(err, stats) { var bufferOffset = 0; cache[f] = {content: new Buffer(stats.size)}; s.on('data', function (chunk) { chunk.copy(cache[f].content, bufferOffset); bufferOffset += chunk.length; }); });
We've used the data
event to capture the buffer as it's being streamed, and copied it into a buffer that we supplied to cache[f].content
, using fs.stat
to obtain the file size for the file's cache buffer.
Instead of the client waiting for the server to load the entire file from the disk prior to sending it to the client, we use a stream to load the file in small, ordered pieces and promptly send them to the client. With larger files this is especially useful, as there is minimal delay between the file being requested and the client starting to receive the file.
We did this by using fs.createReadStream
to start streaming our file from the disk. fs.createReadStream
creates readStream
, which inherits from the EventEmitter
class.
The EventEmitter
class accomplishes the evented part of Node's tag line: Evented I/O for V8 JavaScript. Due to this, we'll use listeners instead of callbacks to control the flow of stream logic.
Then we added an open
event listener using the once
method since we want to stop listening for open
once it has been triggered. We respond to the open
event by writing the headers and using the stream.pipe
method to shuffle the incoming data straight to the client.
stream.pipe
handles the data flow. If the client becomes overwhelmed with processing, it sends a signal to the server which should be honored by pausing the stream. Under the hood, stream.pipe
uses stream.pause
and stream.resume
to manage this interplay.
While the response is being piped to the client, the content cache is simultaneously being filled. To achieve this, we had to create an instance of the Buffer
class for our cache[f].content
property. A Buffer
must be supplied with a size (or an array or string) which in our case is the size of the file. To get the size, we used the asynchronous fs.stat
and captured the size
property in the callback. The data
event returns Buffer
as its only callback parameter.
The default bufferSize
for a stream is 64 KB. Any file whose size is less than the bufferSize
will only trigger one data
event because the entire file will fit into the first chunk of data. However, for files greater than bufferSize
, we have to fill our cache[f].content
property one piece at a time.
Note
Changing the default readStream
buffer size:
We can change the buffer size of readStream
by passing an options
object with a bufferSize
property as the second parameter of fs.createReadStream
.
For instance, to double the buffer you could use fs.createReadStream(f,{bufferSize: 128 * 1024})
;
We cannot simply concatenate each chunk
with cache[f].content
since this will coerce binary data into string format which, though no longer in binary format, will later be interpreted as binary. Instead, we have to copy all the little binary buffer chunks
into our binary cache[f].content
buffer.
We created a bufferOffset
variable to assist us with this. Each time we add another chunk
to our cache[f].content
buffer, we update our new bufferOffset
by adding the length of the chunk
buffer to it. When we call the Buffer.copy
method on the chunk
buffer, we pass bufferOffset
as the second parameter so our cache[f].content
buffer is filled correctly.
Moreover, operating with the Buffer
class renders performance enhancements with larger files because it bypasses the V8 garbage collection methods. These tend to fragment large amounts of data thus slowing down Node's ability to process them.
While streaming has solved a problem of waiting for files to load into memory before delivering them, we are nevertheless still loading files into memory via our cache
object. With larger files, or large amounts of files, this could have potential ramifications.
There is a limited amount of process memory. By default, V8's memory is set to 1400 MB on 64-bit systems and 700 MB on 32-bit systems. This can be altered by running Node with --max-old-space-size=N
where N
is the amount of megabytes (the actual maximum amount that it can be set to depends upon the OS and of course the amount of physical RAM available). If we absolutely needed to be memory intensive, we could run our server on a large cloud platform, divide up the logic, and start new instances of node using the child_process
class.
In this case, high memory usage isn't necessarily required and we can optimize our code to significantly reduce the potential for memory overruns. There is less benefit to caching larger files. The slight speed improvement relative to the total download time is negligible while the cost of caching them is quite significant in ratio to our available process memory. We can also improve cache efficiency by implementing an expiration time on cache objects which can then be used to clean the cache, consequently removing files in low demand and prioritizing high-demand files for faster delivery. Let's rearrange our cache
object slightly:
var cache = { store: {}, maxSize : 26214400, //(bytes) 25mb }
For a clearer mental model, we're making a distinction between the cache as a functioning entity and the cache as a store (which is a part of the broader cache entity). Our first goal is to only cache files under a certain size. We've defined cache.maxSize
for this purpose. All we have to do now is insert an if
condition within the fs.stat
callback:
fs.stat(f, function (err, stats) { if (stats.size < cache.maxSize) { var bufferOffset = 0; cache.store[f] = {content: new Buffer(stats.size), timestamp: Date.now() }; s.on('data', function (data) { data.copy(cache.store[f].content, bufferOffset); bufferOffset += data.length; }); } });
Notice we also slipped in a new timestamp
property into our cache.store[f]
. This is for cleaning the cache, which is our second goal. Let's extend cache:
var cache = {
store: {},
maxSize: 26214400, //(bytes) 25mb
maxAge: 5400 * 1000, //(ms) 1 and a half hours
clean: function(now) {
var that = this;
Object.keys(this.store).forEach(function (file) {
if (now > that.store[file].timestamp + that.maxAge) {
delete that.store[file];
}
});
}
};
So in addition to maxSize
, we've created a maxAge
property and added a clean
method. We call cache.clean
at the bottom of the server like so:
//all of our code prior
cache.clean(Date.now());
}).listen(8080); //end of the http.createServer
cache.clean
loops through cache.store
and checks to see if it has exceeded its specified lifetime. If it has, we remove it from store
. We'll add one further improvement and then we're done. cache.clean
is called on each request. This means cache.store
is going to be looped through on every server hit, which is neither necessary nor efficient. It would be better if we cleaned the cache, say, every two hours or so. We'll add two more properties to cache
. The first is cleanAfter
to specify how long between cache cleans. The second is cleanedAt
to determine how long it has been since the cache was last cleaned.
var cache = { store: {}, maxSize: 26214400, //(bytes) 25mb maxAge : 5400 * 1000, //(ms) 1 and a half hours cleanAfter: 7200 * 1000,//(ms) two hours cleanedAt: 0, //to be set dynamically clean: function (now) { if (now - this.cleanAfter > this.cleanedAt) { this.cleanedAt = now; that = this; Object.keys(this.store).forEach(function (file) { if (now > that.store[file].timestamp + that.maxAge) { delete that.store[file]; } }); } } };
We wrap our cache.clean
method in an if
statement which will allow a loop through cache.store
only if it has been longer than two hours (or whatever cleanAfter
is set to), since the last clean.
Handling file uploads discussed In Chapter 2, Exploring the HTTP Object
Securing Against Filesystem Hacking Exploits discussed in this chapter.
For a Node app to be insecure, there must be something an attacker can interact with for exploitation purposes. Due to Node's minimalist approach, the onus is mostly on programmers to ensure their implementation doesn't expose security flaws. This recipe will help identify some security risk anti-patterns that could occur when working with the filesystem.
We'll be working with the same content
directory as in the previous recipes, but we'll start a new insecure_server.js
file (there's a clue in the name!) from scratch to demonstrate mistaken techniques.
Our previous static file recipes tend to use path.basename
to acquire a route, but this flat levels all request. If we accessed localhost:8080/foo/bar/styles.css
, our code would take styles.css
as the basename
and deliver content/styles.css
to us. Let's make a subdirectory in our content
folder, call it subcontent
, and move our script.js
and styles.css
files into it. We'd need to alter our script and link tags in index.html:
<link rel=stylesheet type=text/css href=subcontent/styles.css> <script src=subcontent/script.js type=text/javascript></script>
We can use the url
module to grab the entire pathname
. So let's include the url
module in our new insecure_server.js
file, create our HTTP server, and use pathname
to get the whole requested path:
var http = require('http'); var path = require('path'); var url = require('url'); var fs = require('fs'); http.createServer(function (request, response) { var lookup = url.parse(decodeURI(request.url)).pathname; lookup = (lookup === "/") ? '/index.html' : lookup; var f = 'content' + lookup; console.log(f); fs.readFile(f, function (err, data) { response.end(data); }); }).listen(8080);
If we navigate to localhost:8080
, everything works great. We've gone multilevel, hooray. For demonstration purposes, a few things have been stripped out from previous recipes (such as fs.exists)
, but even with them, the following code presents the same security hazards:
curl localhost:8080/../insecure_server.js
Now we have our server's code. An attacker could also access /etc/passwd
with a few attempts at guessing its relative path:
curl localhost:8080/../../../../../../../etc/passwd
In order to test these attacks, we have to use curl or another equivalent because modern browsers will filter these sorts of requests. As a solution, what if we added a unique suffix to each file we wanted to serve and made it mandatory for the suffix to exist before the server coughs it up? That way, an attacker could request /etc/passwd
or our insecure_server.js
because they wouldn't have the unique suffix. To try this, let's copy the content
folder and call it content-pseudosafe
, and rename our files to index.html-serve, script.js-serve
, and styles.css-serve
. Let's create a new server file and name it pseudosafe_server.js
. Now all we have to do is make the -serve
suffix mandatory:
//requires section...
http.createServer(function (request, response) {
var lookup = url.parse(decodeURI(request.url)).pathname;
lookup = (lookup === "/") ? '/index.html-serve' : lookup + '-serve';
var f = 'content-pseudosafe' + lookup;
For feedback purposes, we'll also include some 404
handling with the help of fs.exists
.
//requires, create server etc fs.exists(f, function (exists) { if (!exists) { response.writeHead(404); response.end('Page Not Found!'); return; } //read file etc
So let's start our pseudosafe_server.js
file and try out the same exploit:
curl -i localhost:8080/../insecure_server.js
We've used the -i
argument so that curl will output the headers. What's the result? A 404
, because the file it is actually looking for is ../insecure_server.js-serve
, which doesn't exist. So what's wrong with this method? Well it's inconvenient and prone to error. However, more importantly an attacker can still work around it!
curl localhost:8080/../insecure_server.js%00/index.html
And voila! There's our server code again. The solution to our problem is path.normalize
, which cleans up our pathname
before it gets to fs.readFile
.
http.createServer(function (request, response) {
var lookup = url.parse(decodeURI(request.url)).pathname;
lookup = path.normalize(lookup);
lookup = (lookup === "/") ? '/index.html' : lookup;
var f = 'content' + lookup
Prior recipes haven't used path.normalize
, yet they're still relatively safe. path.basename
gives us the last part of the path, so any leading relative directory pointers (../) are discarded, thus preventing the directory traversal exploit.
Here we have two filesystem exploitation techniques: the relative directory traversal and poison null byte attacks. These attacks can take different forms, such as in a POST request or from an external file. They can have different effects. For example, if we were writing to files instead of reading them, an attacker could potentially start making changes to our server. The key to security in all cases is to validate and clean any data that comes from the user. In insecure_server.js
, we pass whatever the user requests to our fs.readFile
method. This is foolish because it allows an attacker to take advantage of the relative path functionality in our operating system by using ../
, thus gaining access to areas that should be off limits. By adding the -serve
suffix, we didn't solve the problem. We put a plaster on it which can be circumvented by the poison null byte. The key to this attack is %00
, which is a URL hex code for the null byte. In this case, the null byte blinds Node to the ../insecure_server.js
portion, but when the same null byte is sent through to our fs.readFile
method, it has to interface with the kernel. However, the kernel gets blinded to the index.html
part. So our code sees index.html
but the read operation sees ../insecure_server.js
. This is known as null byte poisoning. To protect ourselves, we could use a regex
statement to remove the ../
parts of the path. We could also check for the null byte and spit out a 400 Bad Request
statement. However, we don't need to, because path.normalize
filters out the null byte and relative parts for us.
Let's further delve into how we can protect our servers when it comes to serving static files.
If security was an extreme priority, we could adopt a strict whitelisting approach. In this approach, we would create a manual route for each file we are willing to deliver. Anything not on our whitelist would return 404
. We can place a whitelist
array above http.createServer
as shown in the following code:
var whitelist = [ '/index.html', '/subcontent/styles.css', '/subcontent/script.js' ];
Inside of our http.createServer
callback, we'll put an if
statement to check if the requested path is in the whitelist
array:
if (whitelist.indexOf(lookup) === -1) { response.writeHead(404); response.end('Page Not Found!'); return; }
That's it. We can test this by placing a file non-whitelisted.html
in our content
directory.
curl -i localhost:8080/non-whitelisted.html
The preceding command will return 404
because non-whitelisted.html
isn't on whitelist.
https://github.com/joyent/node/wiki/modules#wiki-web-frameworks-static has a list of static file server modules available for different purposes. It's a good idea to ensure that a project is mature and active before relying on it to serve your content. Node-static is a well developed module with built-in caching. It's also compliant with the RFC2616 HTTP standards specification. This defines how files should be delivered over HTTP. Node-static implements all the essentials discussed in this chapter and more besides. This piece of code is slightly adapted from the node-static Github page at https://github.com/cloudhead/node-static:
var static = require('node-static'); var fileServer = new static.Server('./content'); require('http').createServer(function (request, response) { request.addListener('end', function () { fileServer.serve(request, response); }); }).listen(8080);
The preceding code will interface with the node-static
module to handle server-side and client-side caching, use streams to deliver content, and filter out relative requests and null bytes, among other things.
Preventing cross-site request forgery discussed In Chapter 7, Implementing Security, Encryption, and Authentication
Setting up an HTTPS web server discussed In Chapter 7, Implementing Security, Encryption, and Authentication
Deploying to a server environment discussed In Chapter 10, Taking It Live
Cryptographic password sashing discussed In Chapter 7, Implementing Security, Encryption, and Authentication