Home

Web Development

PhantomJS Cookbook

By Rob Friesel

Book

eBook $27.99 $18.99

Print $45.99

Subscription $15.99 $10 p/m for three months

BUY NOW

$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

What do you get with a Packt Subscription?

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

eBook $27.99 $18.99

Print $45.99

Subscription $15.99 $10 p/m for three months

What do you get with a Packt Subscription?

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

About this book

Publication date:: June 2014
Publisher: Packt
Pages: 304
ISBN: 9781783981922

Chapter 1. Getting Started with PhantomJS

In this chapter, we will cover the following recipes:

Installing PhantomJS
Launching the PhantomJS REPL
Running a PhantomJS script
Running a PhantomJS script with arguments
Running PhantomJS with cookies
Running PhantomJS with a disk cache
Running PhantomJS with a JSON configuration file
Debugging a PhantomJS script

Introduction

PhantomJS is the headless WebKit – a fully-fledged WebKit-based browser with absolutely no graphical user interface. Instead of a GUI, PhantomJS features a scripting API that allows us to do just about anything that we would do with a normal browser. Since its introduction in 2010, PhantomJS has grown to be an essential tool in the web development stack. It is ideal for fast unit test watches, end-to-end tests in continuous integration, screen captures, screen scraping, performance data collection, and more.

The recipes in this chapter focus on PhantomJS fundamentals. We will discuss how to install PhantomJS, how to work with its Read-Evaluate-Print Loop (REPL), how to employ its command-line options, and how to launch PhantomJS in a debug harness.

Installing PhantomJS

Let's begin the PhantomJS Cookbook with the recipe that is the prerequisite for all of the other recipes—downloading and installing PhantomJS so that it is available on our computers.

Prebuilt binaries of PhantomJS are available for most major platforms, and in the interest of expedience and simplicity, that is how we proceed. PhantomJS is designed to be a stand-alone application, and in most situations, no external dependencies are required.

Getting ready

To install PhantomJS, we will need access to the Internet and permission to install applications.

How to do it…

Perform the following steps to download and install PhantomJS:

Navigate to the PhantomJS download page at http://phantomjs.org/download.
Locate and download the prebuilt binary that is appropriate for our system. Prebuilt binaries exist for the following operating systems:
- Windows (XP or later).
- Mac OS X (10.6 or later).
- Linux (for 32-bit or 64-bit systems). Current binaries are built on CentOS 5.8, and should run successfully on Ubuntu 10.04.4 (Lucid Lynx) or more modern systems.
Extract the prebuilt binary. For Windows and OS X systems, this will be a .zip archive; for Linux systems, this will be a .tar.bz2 archive. For Windows machines, the binary should be phantomjs.exe; for OS X and Linux machines, the binary should be bin/phantomjs.
Note
We should place the binary somewhere on your system that makes sense to us.
Once extracted, make sure to add PhantomJS to the system's PATH.
Tip
The PATH or search path is a variable on the command line that contains a list of directories searched by the shell to find an executable file when it is called. On POSIX-compatible systems (Linux and OS X), this list is delimited by colons (:), and on Windows, it is delimited by semicolons (;). For more information about the PATH variable, visit http://en.wikipedia.org/wiki/PATH_(variable).
For a tutorial that focuses on POSIX-compatible systems, visit http://quickleft.com/blog/command-line-tutorials-path.
For documentation on the Windows PATH, visit http://msdn.microsoft.com/en-us/library/w0yaz275(v=vs.80).aspx.
After placing the PhantomJS binary on our PATH, we can verify that it was installed by typing the following in the command line:
```
phantomjs –v
```

The version of PhantomJS that we just installed should print out to the console.

Tip

If we have trouble here, we should check out the troubleshooting guide on the PhantomJS project site at http://phantomjs.org/troubleshooting.html.

How it works…

In an effort to lower the barrier to entry and help drive adoption, the prebuilt binaries of PhantomJS are made available by community volunteers. This is, in part, an acknowledgment that building PhantomJS from the source code can be a complex and time-consuming task. To quote the build page on the PhantomJS site: "With 4 parallel compile jobs on a modern machine, the entire process takes roughly 30 minutes." It is easy to imagine that this might scare off many developers who just want to try it out.

These prebuilt binaries should therefore make it easy to drop PhantomJS onto any system and have it running in minutes. These binaries are intended to be fully independent applications, with no external library dependencies such as Qt or WebKit. On some Linux systems, however, a little extra work may be required to ensure that the libraries necessary for proper font rendering (for example, FreeType and Fontconfig) are in place, along with the basic font files.

Note

Throughout this book, our code will assume that we are using Version 1.9 or higher of PhantomJS.

There's more…

In addition to the prebuilt binaries, Mac OS X users may also install PhantomJS using Homebrew. To do this, enter the following as the command line:

brew update && brew install phantomjs

Note that installing PhantomJS with Homebrew also means that we will be compiling it from source.

Tip

Homebrew is an open source, community-run package manager for OS X built on top of Git and Ruby. To find out more information about Homebrew, check out its website at http://brew.sh.

As a bonus, Homebrew also automatically adds PhantomJS to your PATH.

Installing from Source

In the event that one of the prebuilt binaries is not suitable for your specific situation, you may need to consider building PhantomJS from the source code. If this is the case, you will want to check out the build instructions that are listed at http://phantomjs.org/build.html; note that you will need the developer tools specific to your system (for example, Xcode on OS X and Microsoft Visual C++ on Windows) to be installed before you begin.

Launching the PhantomJS REPL

In this recipe, we will learn how to use the PhantomJS REPL. The PhantomJS REPL is an excellent tool for getting familiar with the runtime environment and for quickly hacking out an idea without needing to write a fully qualified script.

Getting ready

To run this recipe, we will need to have PhantomJS installed on our PATH. We will also need an open terminal window.

How to do it…

Perform the following steps to invoke and work in the PhantomJS REPL:

At the command-line prompt, type the following:
```
phantomjs
```
When the PhantomJS REPL starts up, we should see its default command-line prompt:
```
phantomjs>
```
At the PhantomJS prompt, we can enter any command from the PhantomJS API or any other valid JavaScript expressions and statements. The REPL will print the return value from the expression we entered, although we may need to wrap the expression in a console.log statement for a readable response, for example:
```
phantomjs> 1 + 1
{}
phantomjs> console.log(1 + 1)
2
undefined
phantomjs> for (var prop in window) console.log(prop)
document
window
// 475 more...
undefined
```
When we are finished in the REPL, type the following command to exit the REPL:
```
phantom.exit()
```

How it works…

The PhantomJS REPL, also called interactive mode, was introduced to PhantomJS starting with Version 1.5. The REPL is the default mode for PhantomJS when the application is invoked without any arguments.

REPL stands for Read-Evaluate-Print Loop. The commands we enter at the prompt are read by the interpreter, which evaluates them and prints the results, before finally looping back to the prompt for us to continue. Many programming environments feature REPLs (for example, Node.js provides another popular JavaScript REPL), and the debugger consoles in tools such as the Chrome Developer Tools and Firebug would also qualify as REPLs. REPLs are useful for quickly trying out ideas in the runtime environment.

In our example, we enter the PhantomJS REPL by invoking phantomjs from the command line without any arguments. Once we are in the REPL, we can type in whatever commands we need to explore our ideas, hitting Enter after each command. Note that we must enter a full and syntactically valid expression or statement before hitting Enter; if we do not, PhantomJS will report an error (for example, Can't find variable: foo or Parse error).

The PhantomJS REPL also features auto-completion. Hitting the Tab key in the REPL will autoexpand our options. We can even hit Tab multiple times to cycle through our available options; for example, try typing p and then hit Tab to see what options the REPL presents.

Finally, when we are finished, we use phantom.exit() to leave the REPL; we can also use the Ctrl + C or Ctrl + D key commands to exit the REPL.

Running a PhantomJS script

This recipe demonstrates how to run a script using the PhantomJS runtime.

Getting ready

To run this recipe, we will need PhantomJS installed on our PATH. We will also need a script to run with PhantomJS; the script in this recipe is available in the downloadable code repository as recipe03.js under chapter01. If we run the provided example script, we must change to the root directory for the book's sample code.

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. Alternatively, you can use the Git version control system to clone the repository. The repository is hosted on GitHub at https://github.com/founddrama/phantomjs-cookbook.

How to do it…

Given the following script:

console.log('A console statement from PhantomJS on ' +
  new Date().toDateString() + '!');

phantom.exit();

Type the following at the command line:

phantomjs chapter01/recipe03.js

Tip

Throughout this book, we will be using POSIX-compatible filesystem paths for command-line examples. Windows users may find it helpful to change the forward slashes (/) to back slashes (\) in filesystem paths.

How it works…

Our preceding example script performs the following actions:

We print a message to the console (including a date string) using console.log.
The script exits the PhantomJS runtime using phantom.exit.
Since we did not provide an integer argument to phantom.exit, it returns an exit code of 0 (its default) to the shell.

As we learned in the Launching the PhantomJS REPL recipe, PhantomJS will enter the REPL when invoked without any arguments. However, the runtime environment will attempt to evaluate and execute the first unrecognized argument as though it were a JavaScript file, regardless of whether or not it ends in .js. Most of the time that we work with PhantomJS, we will interact with it using scripts such as these.

As long as PhantomJS can resolve the first unrecognized argument as a file and correctly parse its contents as syntactically valid JavaScript, it will attempt to execute the contents. However, what happens if those preconditions are not met?

If the argument cannot be resolved as a file on disk, or if the file has no contents, PhantomJS will print an error message to the console, for example:

phantomjs does-not-exist-or-empty
Can't open 'does-not-exist-or-empty'

If the argument exists but the file's contents cannot be parsed as a valid JavaScript, then PhantomJS will print an error message to the console and hang, for example:

phantomjs invalid.js
SyntaxError: Parse error

Note

In the event of such a SyntaxError, the PhantomJS process will not automatically terminate, and we must forcefully quit it (Ctrl + C).

Recall that PhantomJS is a headless web browser, and it helps to think of it as a version of Chrome or Safari that has no window. Just as we interact with our normal web browser by entering URLs into the location bar, clicking the back button, or clicking links on the page, so we will need to interact with PhantomJS. However, as it has no window and no UI components, we must interact with it through its programmable API. The PhantomJS API is written in JavaScript, and scripts targeting the PhantomJS runtime are also written in JavaScript; the API is documented online at http://phantomjs.org/api/.

There's more…

If you have been exposed to both PhantomJS and Node.js, you may be wondering about the differences between them, especially after witnessing demonstrations of their respective REPLs and script running abilities. When comparing the two, it is helpful to consider them using the phrase "based on" as your frame of reference. Node.js is based on Google Chrome's V8 JavaScript engine; PhantomJS is based on the WebKit layout engine. Node.js is a JavaScript runtime; PhantomJS has a JavaScript runtime. Where Node.js is an excellent platform for building JavaScript-based server applications, it does not have any native HTML rendering. This is the key differentiator when comparing it to PhantomJS. The mission of PhantomJS is not to provide a platform for building JavaScript applications, but instead to provide a fast and standards-compliant headless browser.

Running a PhantomJS script with arguments

In this recipe, we will learn how to run a PhantomJS script with additional arguments that are passed into the script for evaluation. Note that these are arguments passed into the execution context and are not command-line arguments for the PhantomJS runtime itself.

Getting ready

To run this recipe, we will need a script to run with PhantomJS; the script in this recipe is available in the downloadable code repository as recipe04.js under chapter01. If we run the provided example script, we must change to the root directory for the book's sample code. Lastly, we will need the arguments we wish to pass into the script.

How to do it…

Given the following script:

if (phantom.args.length === 0) {
  console.log('No arguments were passed in.');
} else {
  phantom.args.forEach(function(arg, index) {
    console.log('[' + index + '] ' + arg);
  });
}

phantom.exit();

Enter the following command at the command line:

phantomjs chapter01/recipe04.js foo bar "boo baa"

We will see the following results printed in the terminal:

[0] foo
[1] bar
[2] boo baa

How it works…

Our preceding example script performs the following actions:

It checks the length of the phantom.args array and prints a message if that array is empty.
If the phantom.args array is not empty, we iterate over the items in the array, printing their index followed by the value of the argument itself.
Lastly, we exit from the PhantomJS runtime using phantom.exit.

As we discussed in the Running a PhantomJS script recipe, PhantomJS will attempt to evaluate and execute the first unrecognized argument as though it were a valid JavaScript file. But what does PhantomJS do with all of the arguments after that?

The answer is that they are collected into the phantom.args array as string values. Each argument after the script name goes into this array. Note that phantom.args does not include the script name itself. Instead, PhantomJS records that in the read-only phantom.scriptName property.

There's more…

It is worth noting that both phantom.args and phantom.scriptName are both marked as deprecated in the API documentation. As such, usage of both of these properties is discouraged. Although using them for quick one-off or exploratory scripts is fine, neither of these properties should go into any library that we intend to maintain or distribute.

Wherever possible, we should use the system.args array (from the system module) instead of phantom.args and phantom.scriptName.

Tip

When in doubt, check the PhantomJS project website and its documentation at http://phantomjs.org/api/. It is actively maintained, and as such will contain up-to-date information about the preferred APIs.

Running PhantomJS with cookies

In this recipe, we will learn how to use the cookies-file command-line switch to specify the location of the file for persistent cookies in PhantomJS.

Getting ready

To run this recipe, we will need a script to run with PhantomJS that accesses a site where cookies are read or written. We will need a filesystem path to specify it as the command-line argument, making sure that we have write permissions to that path.

The script in this recipe is available in the downloadable code repository as recipe05.js under chapter01. If we run the provided example script, we must change to the root directory for the book's sample code.

Lastly, the script in this recipe runs against the demo site that is included with the cookbook's sample code repository. To run that demo site, we must have Node.js installed. In a separate terminal, change to the phantomjs-sandbox directory (in the sample code's directory) and start the app with the following command:

node app.js

Note

Node.js is a JavaScript runtime environment based on Chrome's V8 engine. It has an event-driven programming model and non-blocking I/O and can be used for building fast networking applications, shell scripts, and everything in between. We can learn more about Node.js including how to install it at http://nodejs.org/.

We will use this demo for many recipes throughout this cookbook. When we run the demo app for the first time, we need to download and install the Node.js modules that it depends on. To do this, we can change to the phantomjs-sandbox directory and run the following command:

npm install

How to do it…

Given the following script:

var webpage = require('webpage').create();

webpage.open('http://localhost:3000/cookie-demo', function(status) {
    if (status === 'success') {
      phantom.cookies.forEach(function(cookie, i) {
        for (var key in cookie) {
          console.log('[cookie:' + i + '] ' + key + ' = ' +
            cookie[key]);
        }
      });
    
      phantom.exit();
    } else {
      console.error('Could not open the page! (Is it running?)');
      phantom.exit(1);
    }
  });

Enter the following command at the command line:

phantomjs --cookies-file=cookie-jar.txt chapter01/recipe05.js

Note

PhantomJS will create the cookie-jar.txt file for us; there is no need to create it manually.

The script will print out the properties for each cookie in the response, as follows:

[cookie:0] domain = localhost
[cookie:0] expires = Sat, 07 Dec 2013 02:05:06 GMT
[cookie:0] expiry = 1386381906
[cookie:0] httponly = false
[cookie:0] name = dave
[cookie:0] path = /cookie-demo
[cookie:0] secure = false
[cookie:0] value = oatmeal-raisin
[cookie:1] domain = localhost
[cookie:1] expires = Sat, 07 Dec 2013 02:04:22 GMT
[cookie:1] expiry = 1386381862
[cookie:1] httponly = false
[cookie:1] name = rob
[cookie:1] path = /cookie-demo
[cookie:1] secure = false
[cookie:1] value = chocolate-chip

We can then open cookie-jar.txt in a text editor and examine its contents. The cookie jar file should look something like the following:

[General]
cookies="@Variant(\0\0\0\x7f\0\0\0\x16QList<QNetworkCookie>\0\0\0\0\x1\0\0\0\x2\0\0\0_dave=oatmeal-raisin; expires= Sat, 07 Dec 2013 02:05:06 GMT; domain=localhost; path=/cookie-demo\0\0\0^rob=chocolate-chip; expires= Sat, 07 Dec 2013 02:04:22 GMT; domain=localhost; path=/cookie-demo)"

How it works…

Our preceding example script performs the following actions:

It creates a webpage object and opens the target URL (http://localhost:3000/cookie-demo).
In the callback function, we check for status of 'success', printing an error message and exiting PhantomJS if that condition fails.
Tip
Throughout this cookbook, we will use exit codes of 0 and 1 for success and failure respectively, because those are the exit codes traditionally used for those reasons on POSIX and Windows systems.
If we successfully open the URL, then we loop through each cookie in the phantom.cookies collection and print out information about each one.
Lastly, we exit from the PhantomJS runtime using phantom.exit.

When we start PhantomJS with the cookies-file argument, we are telling the runtime to read and write cookies from a specific location on the filesystem. What this allows us to do is to use cookies in PhantomJS like we would with any other browser. In other words, an HTTP response or client-side script can set cookies, and when we run our PhantomJS script against that URL again, we can trust that the cookies are still there in the file.

Notice that the cookie jar file itself is essentially a plain text file. The actual file extension does not matter; we used .txt in our example, but it could just as easily be .cookies or even no extension at all. When persisting the cookies, PhantomJS writes them to this file. If we examine the file, then we see that it is a serialized, text-based version of the QNetworkCookie class that PhantomJS uses behind the scenes. Although the on-disk version is not necessarily easy to read, we can easily make a copy and parse it or transform it into its constituent cookies. This can be useful for examining their contents after a script has completed (for example, to ensure that the expected values are being written to disk).

Additionally, with the cookies written to disk, they are available for future PhantomJS script runs against URLs that expect the same cookies. For example, this can be useful when running scripts against sites that require authentication where those authentication tokens are passed around as cookies.

Running PhantomJS with a disk cache

In this recipe, we will learn about running PhantomJS with an on-disk cache that is enabled using the disk-cache and max-disk-cache-size command-line arguments. We can use this to test how browsers cache our static assets.

Getting ready

To run this recipe, we will need a script to run with PhantomJS that accesses a website with cacheable assets. Optionally, we will also need a sense of how large we wish to set the on-disk cache (in kilobytes).

The script in this recipe is available in the downloadable code repository as recipe06.js under chapter01. If we run the provided example script, we must change to the root directory for the book's sample code.

Lastly, the script in this recipe runs against the demo site that is included with the cookbook's sample code repository. To run that demo site, we must have Node.js installed. In a separate terminal, change into the phantomjs-sandbox directory (in the sample code's directory) and start the app with the following command:

node app.js

How to do it…

Given the following script:

var page  = require('webpage').create(),
    count = 0,
    until = 2;

page.onResourceReceived = function(res) {
  if (res.stage === 'end') {
    console.log(JSON.stringify(res, undefined, 2));
  }
};

page.onLoadStarted = function() {
  count += 1;
  console.log('Run ' + count + ' of ' + until + '.');
};

page.onLoadFinished = function(status) {
  if (status === 'success') {
    if (count < until) {
      console.log('Go again.\n');
      page.reload();
    } else {
      console.log('All done.');
      phantom.exit();
    }
  } else {
    console.error('Could not open page! (Is it running?)');
    phantom.exit(1);
  }
};

page.open('http://localhost:3000/cache-demo');

Enter the following command at the command line:

phantomjs --disk-cache=true --max-disk-cache-size=4000 chapter01/recipe06.js

The script will print out details about each resource in the response as JSON.

How it works…

Our preceding example script performs the following actions:

It creates a webpage object and sets two variables, count and until.
We assign an event handler function to the webpage object's onResourceReceived callback. This callback will print out every property of each resource received.
We assign an event handler function to the webpage object's onLoadStarted callback. This callback will increment count when the page load starts and print a message indicating which run it is.
We assign an event handler function to the webpage object's onLoadFinished callback. This callback checks the status of the response and takes action accordingly as follows:
- If status is not 'success', then we print an error message and exit from PhantomJS
- If the callback's status is 'success', then we check to see if count is less than until, and if it is, then we call reload on the webpage object; otherwise, we exit PhantomJS
Finally, we open the target URL (http://localhost:3000/cache-demo) using webpage.open.

There's more…

Even though the disk cache is off by default, PhantomJS still performs some in-memory caching. This detail becomes important in later explorations, as it produces some otherwise difficult to explain results. For example, in our preceding sample script, we used webpage.reload for our second request of the URL, and in that second request, we saw all of the images re-requested. However, if we had used a second call to webpage.open (instead of webpage.reload), then the onResourceReceived callback would have shown a second request to the URL but none of the images would have been re-requested. (As an interesting aside, we would also see that behavior if we set the disk-cache argument to false; the in-memory cache cannot be disabled.)

Another interesting observation is that PhantomJS always reports an HTTP response status of 200 Ok for every successfully retrieved asset. If we look at the Node.js console output for the demo app while our sample script runs, we can see the discrepancy. Again, when our sample script runs, we can see that an HTTP status code of 200 is reported by PhantomJS for each of the images during both the first and second request/response cycles. However, the output from the Node.js app looks something like this:

GET /cache-demo 200 1ms - 573b
GET /images/583519989_1116956980_b.jpg 200 4ms - 264.64kb
GET /images/152824439_ffcc1b2aa4_b.jpg 200 8ms - 615.21kb
GET /images/357292530_f225d7e306_b.jpg 200 6ms - 497.98kb
GET /images/391560246_f2ac936f6d_b.jpg 200 5ms - 446.68kb
GET /images/872027465_2519a358b9_b.jpg 200 5ms - 766.94kb
GET /cache-demo 200 1ms - 573b
GET /images/152824439_ffcc1b2aa4_b.jpg 304 3ms
GET /images/357292530_f225d7e306_b.jpg 304 3ms
GET /images/391560246_f2ac936f6d_b.jpg 304 2ms
GET /images/583519989_1116956980_b.jpg 304 3ms
GET /images/872027465_2519a358b9_b.jpg 304 3ms

We can see that the server responds with 304 Not Modified for each of the image assets. This is exactly what we would expect for a second request to the same URL when the assets are served with Cache-Control headers that specify a max-age, and for assets that are also cached to disk.

disk-cache

We can enable the disk cache by setting the disk-cache argument to true or yes. By default, the disk cache is disabled, but we can also explicitly disable it by providing false or no to the command-line argument. When the disk cache is enabled, PhantomJS will cache assets to the on-disk cache, which it stores at the desktop services cache storage location. Caching these assets has the potential to speed up future script runs against URLs that share those assets.

max-disk-cache-size

Optionally, we may also wish to limit the size of the disk cache (for example, to simulate the small caches on some mobile devices). To limit the size of the disk cache, we use the max-disk-cache-size command-line argument and provide an integer that determines the size of the cache in kilobytes. By default (if you do not use the max-disk-cache-size argument), the cache size is unbounded. Most of the time, we will not need to use the max-disk-cache-size argument.

Cache locations

If we need to inspect the cached data that is persisted to disk, PhantomJS writes to the desktop services cache storage location for the platform it's running on. These locations are listed as follows:

Platform	Location
Windows	`%AppData%/Local/Ofi Labs/PhantomJS/cache/http`
Mac OS X	`~/Library/Caches/Ofi Labs/PhantomJS/data7`
Linux	`~/.qws/cache/Ofi Labs/PhantomJS`

Note

These locations may not exist until after we have run PhantomJS with the disk-cache argument enabled.

Running PhantomJS with a JSON configuration file

In this recipe, we will learn how to store PhantomJS configuration options in a JSON document and load those options using the config command-line argument.

Getting ready

To run this recipe, we will need a JSON-formatted configuration file with our PhantomJS command-line options.

The script in this recipe is available in the downloadable code repository as recipe07.js under chapter01. If we run the provided example script, we must change to the root directory for the book's sample code. An example configuration file is also in this directory as recipe07-config.json.

node app.js

How to do it…

Select our command-line configuration options (changing hyphenated property names into their camel-cased equivalents) and apply our values. Save these configuration settings to a JSON-formatted document. For example, the contents of recipe07-config.json under chapter01:

{
  "cookiesFile"     : "cookie-jar.txt",
  "ignoreSslErrors" : true
}

Tip

For more information about JSON, including its formatting rules, visit http://www.json.org.

Given the script from the Running PhantomJS with cookies recipe earlier in this chapter, enter the following at the command line:

phantomjs --config=chapter01/recipe07-config.json chapter01/recipe07.js

How it works…

The configuration file is a JSON document where we can take our preferred command-line arguments and store them on disk. The keys in the JSON object have a one-to-one correspondence with the command-line arguments themselves – the hyphenated command-line argument names are converted to their camel-cased versions (for example, cookies-file becomes cookiesFile). The values in the JSON object follow easy conversion rules based on the most applicable JavaScript primitives: strings are strings, numbers are numbers, and true/false or yes/no become the corresponding true or false Boolean literals. Creating our own JSON-formatted configuration file requires only two things: a text editor and the knowledge of which command-line arguments we wish to capture in it.

Tip

See http://phantomjs.org/api/command-line.html for the complete list of documented command-line options in the PhantomJS API.

Note

The help and version command-line arguments do not have corresponding versions in the JSON configuration file. Also, at the time of writing this book, there is a documented defect wherein the JSON key for the load-images argument is not recognized.

The example script in this recipe (recipe07.js under chapter01) is identical to the one that we used for our demonstration in the Running PhantomJS with cookies recipe; we are reusing it here for convenience. For a more thorough explanation of what it is doing, see the How it works… section under that recipe.

When launching PhantomJS with the config command-line argument, the PhantomJS runtime interprets the argument's value as a path on the filesystem and attempts to load and evaluate that file as a JSON document. If the file cannot be parsed as a JSON document, then PhantomJS prints a warning and ignores it. If the file is correctly parsed, then PhantomJS configures itself as if the arguments in the JSON document had been passed as normal command-line arguments.

This raises an interesting question: given equivalent arguments, which one takes precedence? The one specified in the JSON configuration file? Or the one specified on the command line? The answer is that it depends which one comes last. In other words, given recipe07-config.json, we can run:

phantomjs --cookies-file=jar-of-cookies.txt --config=chapter01/recipe07-config.json chapter01/recipe07.js

That creates cookie-jar.txt, as specified in recipe07-config.json. While the following command creates jar-of-cookies.txt, as specified on the command line:

phantomjs --config=chapter01/recipe07-config.json --cookies-file=jar-of-cookies.txt chapter01/recipe07.js

There's more…

Saving a PhantomJS configuration to a JSON document can help us in a couple of ways. First, by putting it into a file, we can put it under version control and track the changes to that configuration over time. Also, by putting the configuration into a file, it can more easily be shared across teams or jobs in continuous integration.

Debugging a PhantomJS script

In this recipe, we will learn about remote debugging PhantomJS scripts using the remote-debugger-port and remote-debugger-autorun command-line arguments.

Getting ready

To run this recipe, we will need the following:

PhantomJS installed on our PATH
A script to run with PhantomJS, which we are interested in debugging
Our computer's IP address
An open port over which the debugger will communicate
Another browser such as Google Chrome or Safari

The script in this recipe is available in the downloadable code repository as recipe08.js under chapter01. If we run the provided example script, we must change to the root directory of the book's sample code.

The script in this recipe runs against the demo site that is included with the cookbook's sample code repository. To run that demo site, we must have Node.js installed. In a separate terminal, change to the phantomjs-sandbox directory and start the app with the following command:

node app.js

How to do it…

Given the following script:

var page = require('webpage').create();

page.onResourceReceived = function(res) {
  if (res.stage === 'end') {
    console.log(JSON.stringify(res, undefined, 2));
  }
};

page.open('http://localhost:3000/cache-demo', function(status) {
  if (status === 'success') {
    console.log('All done.');
    phantom.exit();
  } else {
    console.error('Could not open page! (Is it running?)');
    phantom.exit(1);
  }
});

Enter the following at the command line:

phantomjs --remote-debugger-port=9000 --remote-debugger-autorun=true chapter01/recipe08.js

Note that with the remote-debugger-autorun argument set to true, the script will run immediately as it normally would, but it will also ignore calls to phantom.exit and suspend execution, printing out the following message:

Phantom::exit() called but not quitting in debug mode.

Tip

If we want more control over when the script begins (for example, we want to set breakpoints first), then simply omit the remote-debugger-autorun argument. By omitting that argument, PhantomJS will start and will load the script, but will not execute it until you issue the __run() command in the debugger.

Now we can open our other browser (for example, Chrome) and enter our IP address and the port that we specified with remote-debugger-port. For example, if our computer's IP address is 10.0.1.8, we would enter http://10.0.1.8:9000/ into the location bar. Then, we should see something like the following screenshot:

The viewport will contain the PhantomJS browsing session's history as a list. As we are interested in accessing the debugger tools, we will click on the link that reads about:blank. This will take us to /webkit/inspector/inspector.html, and it should look something like the following screenshot:

If we have worked in the Chrome or Safari developer tools before, the toolbar should be familiar. While debugging PhantomJS scripts, we will be particularly interested in the Scripts and Console tabs.

Tip

For those unfamiliar with the WebKit Web Inspector, check out Majd Taby's thorough introduction, "The WebKit Inspector", at http://jtaby.com/blog/2012/04/23/modern-web-development-part-1.

Once we have the debugger open, click on the Scripts tab. In the Scripts tab, click on the drop-down menu (in the top toolbar, just below the tabs) and select about:blank. This will show us our script as seen in the following screenshot. Click on any line number in the left-side gutter to set a breakpoint.

With our breakpoint set, click on the Console tab to toggle into the console. Since we used the remote-debugger-autorun argument, we will see our console.log and other such statements printed to the console from our first (automatic) run. Note the blue prompt at the bottom of the console as seen in the following screenshot; we can enter new expressions to be evaluated here at this prompt. To run our PhantomJS script again, we enter __run().

Entering __run() in the console will execute the script again. The script execution will pause on any breakpoints that we set and we will automatically be brought into the Scripts tab. In the Scripts tab, we can inspect our call stack, inspect local variables and objects at runtime, manipulate the runtime environment through the console, and more.

When we are done debugging our script, we can simply close the browser and then use Ctrl + C to quit the PhantomJS process in the terminal.

How it works…

Our preceding example script is a simple one. We proceed in the following manner:

We create a webpage object.
We assign an event handler function to the webpage object's onResourceReceived callback. This callback will print out each resource received using JSON.stringify.
Lastly, we open the target URL (http://localhost:3000/cache-demo) using webpage.open, calling phantom.exit in the callback.

There's more…

Effective debugging is an essential skill for every developer, and it is fantastic that PhantomJS has the WebKit remote debugging built-in as a first-class tool. While the debugger itself may be overkill for simple situations, sometimes console.log just isn't a powerful enough (or fast enough) tool. In those cases, it is comforting to know that you have these debug tools at your disposal.

One important thing to note about using the remote debugger with PhantomJS is that we will need to be aware of what context we are attempting to debug. Are we debugging the PhantomJS script itself? Or a script on the page that the PhantomJS script is accessing? Or some interaction between them? In the simple case (as previously demonstrated), the remote debug mode makes it almost trivial to inspect our PhantomJS script's execution at runtime. However, it does take some extra work if we need to also debug a script on the page that PhantomJS is accessing. In those cases, we may find it useful to use the remote-debugger-autorun argument; this will pre-populate the debugger's landing page with links to the inspector for the PhantomJS script's context and also the accessed web page's context. We can open these links each in a new tab, giving a separate debugger session for each context we need to work in.

remote-debugger-port

Of the two debugger-related command-line arguments, remote-debugger-port is the essential one. The remote-debugger-port argument serves two functions. The first, implicit function is to put PhantomJS into the debug harness. Its second, explicit function is to set the port that PhantomJS will use for the WebKit remote debugging protocol.

Having these remote debugging capabilities in PhantomJS is extremely handy if we need to inspect or otherwise troubleshoot some misbehaving or unpredictable code. But something else that is nice about how the debugging toolkit is implemented is that we don't need anything else except another browser with a GUI. We do not need to install any special extensions in Chrome or Safari for the debugger to work. All we need to do is specify the port on the command line and point the browser at our computer's IP and voila—the full power of a GUI debugger for our otherwise headless web browser.

Tip

Although we can use any browser as the target viewport for the remote debugger, our best results will be in Safari or Chrome. Safari is currently the dominant WebKit-based browser; Chrome uses the Blink rendering engine, but retains many of the features from its WebKit heritage. The remote debugger will function in other browsers (for example, Firefox or Opera) but certain things may not render properly, making it much more difficult to use.

remote-debugger-autorun

The remote-debugger-autorun command-line argument is optional and if specified as true, the script passed to PhantomJS will be run immediately in the debug harness. While this may be a convenient feature, it is seldom what we want.

Under normal debugging, we would already have some idea of where our code is defective (for example, from the errors or stack traces that we already have). With that knowledge, we would want to start our PhantomJS script in the debug harness, then navigate to the Scripts tab and set our breakpoints, and then execute the script.

If we have not set the script to run automatically, then how do we execute it? If we look again at our script as it appears in the about:blank selection under the Scripts tab, we will notice that it has been wrapped in a function and assigned to the variable named __run. To execute our script, we enter __run() into the debugger console and hit enter to call the function.

About the Author

Rob Friesel

Rob Friesel is a senior user interface developer and 10-year veteran at Dealer.com, where he develops UI frameworks and toolkits for their enterprise platform. He blogs about and presents on a variety of technologies, but his first love is the front-end. He has contributed as a credited reviewer to several books on JavaScript and one on Clojure. He tweets at @founddramaand blogs at http://blog.founddrama.net/.
Browse publications by this author

PhantomJS Cookbook

Chapter 1. Getting Started with PhantomJS

Introduction

Installing PhantomJS

Getting ready

How to do it…

Note

Tip

Tip

How it works…

Note

There's more…

Tip

Installing from Source

Launching the PhantomJS REPL

Getting ready

How to do it…

How it works…

Running a PhantomJS script

Getting ready

Tip

How to do it…

Tip

How it works…

Note

There's more…

See also

Running a PhantomJS script with arguments

Getting ready

How to do it…

How it works…

There's more…

Tip

See also

Running PhantomJS with cookies

Getting ready

Note

How to do it…

Note

How it works…

Tip

See also

Running PhantomJS with a disk cache

Getting ready

How to do it…

How it works…

There's more…

disk-cache

max-disk-cache-size

Cache locations

Note

See also

Running PhantomJS with a JSON configuration file

Getting ready

How to do it…

Tip

How it works…

Tip

Note

There's more…

See also

Debugging a PhantomJS script

Getting ready

How to do it…

Tip

Tip

How it works…

There's more…

remote-debugger-port

Tip

remote-debugger-autorun