Chapter 5. Node.js and Regex
So far, we've had fun learning how to create regular expressions for different situations. However, you may be wondering what it would be like to apply a regular expression in a real-world situation, such as reading a log file and presenting its information in a user-friendlier format?
In this chapter, we will learn how to implement a simple Node.js application that reads a log file and parses it using a regular expression. This way, we can retrieve specific information from it and output it in a different format. We are going to test all the knowledge we obtained from the previous chapters of this book.
In this chapter we will cover the following topics:
Installing the required software to develop our example
Reading a file with Node.js
Analyzing the anatomy of an Apache log file
Creating a parse with regular expressions to read an Apache log file
Since we will be developing a Node.js application, the first step is to have Node.js installed. We can get it from http://nodejs.org/download/. Just follow the download instructions and we will have it set up on our computer.
Note
If this is your first time working with Node.js, please go through the tutorials at https://nodejs.org/.
To make sure we have Node.js installed, open the terminal application (Command Prompt, if you're using Windows), and type node –v
. The Node.js version installed should be displayed as follows:
We are now good to go!
Getting started with our application
Let's start developing our sample application with Node.js, which will read a log file and parse its information using a regular expression. We are going to create all the required code inside a JavaScript file, which we will name as regex.js
. Before we start coding, we will perform a simple test. Add the following content inside the regex.js
:
Next, in the terminal application, execute the regex.js
command node from the directory that the file was created in. The Hello, World! message should be displayed as follows:
The hello world application with Node.js is created and it works! We can now start coding our application.
Reading a file with Node.js
As the main goal of our application is to read a file, we need the file that the application is going to read! We will be using a sample Apache log file. There are many files on the Internet, but we will be using the log file that can be downloaded from http://fossies.org/linux/source...
The anatomy of an Apache log file
Before we create the regular expression that will match a line of the Apache file, we need to understand what kind of information it holds.
Let's take a look at a line from access.log
:
The Apache access log that we are reading follows the %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"
format. Let's take a look at each part:
%h
: The first part of the log is the (127.0.0.1
) IP address
%l
: In the second part, the hyphen in the output indicates that the requested piece of information is not available
%u
: The third part is the user ID of the person requesting the (jan
) document.
%t
: The fourth part is the time taken for the request to be received, such as ([30/Jun/2004:22:20:17 +0200]
). It is in the [day/month/year:hour:minute:second...
In this chapter, we learned how to create a simple Node.js application that read an Apache log file and extracted the log information using a regular expression. We were able to put in to practice the knowledge we acquired in the previous chapters of the book.
We also learned that to create a very complex Regex, it is best to do it in parts. We learned that we can be very specific while creating a regular expression or we can be more generic and achieve the same results.
As a new version of
EcmaScript is being created (EcmaScript 6, which will add lots of new features to JavaScript), it is good to familiarize yourself with the improvements related to regular expressions as well. For more information please visit http://www.ecmascript.org/dev.php.
We hope you enjoy the book! Have fun creating regular expressions!