In this chapter, we will cover the following recipes:
Installing PhantomJS
Launching the PhantomJS REPL
Running a PhantomJS script
Running a PhantomJS script with arguments
Running PhantomJS with cookies
Running PhantomJS with a disk cache
Running PhantomJS with a JSON configuration file
Debugging a PhantomJS script
PhantomJS is the headless WebKit – a fully-fledged WebKit-based browser with absolutely no graphical user interface. Instead of a GUI, PhantomJS features a scripting API that allows us to do just about anything that we would do with a normal browser. Since its introduction in 2010, PhantomJS has grown to be an essential tool in the web development stack. It is ideal for fast unit test watches, end-to-end tests in continuous integration, screen captures, screen scraping, performance data collection, and more.
The recipes in this chapter focus on PhantomJS fundamentals. We will discuss how to install PhantomJS, how to work with its Read-Evaluate-Print Loop (REPL), how to employ its command-line options, and how to launch PhantomJS in a debug harness.
Let's begin the PhantomJS Cookbook with the recipe that is the prerequisite for all of the other recipes—downloading and installing PhantomJS so that it is available on our computers.
Prebuilt binaries of PhantomJS are available for most major platforms, and in the interest of expedience and simplicity, that is how we proceed. PhantomJS is designed to be a stand-alone application, and in most situations, no external dependencies are required.
To install PhantomJS, we will need access to the Internet and permission to install applications.
Perform the following steps to download and install PhantomJS:
Navigate to the PhantomJS download page at http://phantomjs.org/download.
Locate and download the prebuilt binary that is appropriate for our system. Prebuilt binaries exist for the following operating systems:
Windows (XP or later).
Mac OS X (10.6 or later).
Linux (for 32-bit or 64-bit systems). Current binaries are built on CentOS 5.8, and should run successfully on Ubuntu 10.04.4 (Lucid Lynx) or more modern systems.
Extract the prebuilt binary. For Windows and OS X systems, this will be a
.zip
archive; for Linux systems, this will be a.tar.bz2
archive. For Windows machines, the binary should bephantomjs.exe
; for OS X and Linux machines, the binary should bebin/phantomjs
.Once extracted, make sure to add PhantomJS to the system's
PATH
.Tip
The
PATH
or search path is a variable on the command line that contains a list of directories searched by the shell to find an executable file when it is called. On POSIX-compatible systems (Linux and OS X), this list is delimited by colons (:
), and on Windows, it is delimited by semicolons (;
). For more information about thePATH
variable, visit http://en.wikipedia.org/wiki/PATH_(variable).For a tutorial that focuses on POSIX-compatible systems, visit http://quickleft.com/blog/command-line-tutorials-path.
For documentation on the Windows
PATH
, visit http://msdn.microsoft.com/en-us/library/w0yaz275(v=vs.80).aspx.After placing the PhantomJS binary on our
PATH
, we can verify that it was installed by typing the following in the command line:phantomjs –v
The version of PhantomJS that we just installed should print out to the console.
Tip
If we have trouble here, we should check out the troubleshooting guide on the PhantomJS project site at http://phantomjs.org/troubleshooting.html.
In an effort to lower the barrier to entry and help drive adoption, the prebuilt binaries of PhantomJS are made available by community volunteers. This is, in part, an acknowledgment that building PhantomJS from the source code can be a complex and time-consuming task. To quote the build page on the PhantomJS site: "With 4 parallel compile jobs on a modern machine, the entire process takes roughly 30 minutes." It is easy to imagine that this might scare off many developers who just want to try it out.
These prebuilt binaries should therefore make it easy to drop PhantomJS onto any system and have it running in minutes. These binaries are intended to be fully independent applications, with no external library dependencies such as Qt or WebKit. On some Linux systems, however, a little extra work may be required to ensure that the libraries necessary for proper font rendering (for example, FreeType
and Fontconfig
) are in place, along with the basic font files.
In addition to the prebuilt binaries, Mac OS X users may also install PhantomJS using Homebrew. To do this, enter the following as the command line:
brew update && brew install phantomjs
Note that installing PhantomJS with Homebrew also means that we will be compiling it from source.
Tip
Homebrew is an open source, community-run package manager for OS X built on top of Git and Ruby. To find out more information about Homebrew, check out its website at http://brew.sh.
As a bonus, Homebrew also automatically adds PhantomJS to your PATH
.
In the event that one of the prebuilt binaries is not suitable for your specific situation, you may need to consider building PhantomJS from the source code. If this is the case, you will want to check out the build instructions that are listed at http://phantomjs.org/build.html; note that you will need the developer tools specific to your system (for example, Xcode on OS X and Microsoft Visual C++ on Windows) to be installed before you begin.
In this recipe, we will learn how to use the PhantomJS REPL. The PhantomJS REPL is an excellent tool for getting familiar with the runtime environment and for quickly hacking out an idea without needing to write a fully qualified script.
To run this recipe, we will need to have PhantomJS installed on our PATH
. We will also need an open terminal window.
Perform the following steps to invoke and work in the PhantomJS REPL:
At the command-line prompt, type the following:
phantomjs
When the PhantomJS REPL starts up, we should see its default command-line prompt:
phantomjs>
At the PhantomJS prompt, we can enter any command from the PhantomJS API or any other valid JavaScript expressions and statements. The REPL will print the return value from the expression we entered, although we may need to wrap the expression in a
console.log
statement for a readable response, for example:phantomjs> 1 + 1 {} phantomjs> console.log(1 + 1) 2 undefined phantomjs> for (var prop in window) console.log(prop) document window // 475 more... undefined
When we are finished in the REPL, type the following command to exit the REPL:
phantom.exit()
The PhantomJS REPL, also called interactive mode, was introduced to PhantomJS starting with Version 1.5. The REPL is the default mode for PhantomJS when the application is invoked without any arguments.
REPL stands for Read-Evaluate-Print Loop. The commands we enter at the prompt are read by the interpreter, which evaluates them and prints the results, before finally looping back to the prompt for us to continue. Many programming environments feature REPLs (for example, Node.js provides another popular JavaScript REPL), and the debugger consoles in tools such as the Chrome Developer Tools and Firebug would also qualify as REPLs. REPLs are useful for quickly trying out ideas in the runtime environment.
In our example, we enter the PhantomJS REPL by invoking phantomjs
from the command line without any arguments. Once we are in the REPL, we can type in whatever commands we need to explore our ideas, hitting Enter after each command. Note that we must enter a full and syntactically valid expression or statement before hitting Enter; if we do not, PhantomJS will report an error (for example, Can't find variable: foo or Parse error).
The PhantomJS REPL also features auto-completion. Hitting the Tab key in the REPL will autoexpand our options. We can even hit Tab multiple times to cycle through our available options; for example, try typing p
and then hit Tab to see what options the REPL presents.
Finally, when we are finished, we use phantom.exit()
to leave the REPL; we can also use the Ctrl + C or Ctrl + D key commands to exit the REPL.
This recipe demonstrates how to run a script using the PhantomJS runtime.
To run this recipe, we will need PhantomJS installed on our PATH
. We will also need a script to run with PhantomJS; the script in this recipe is available in the downloadable code repository as recipe03.js
under chapter01
. If we run the provided example script, we must change to the root directory for the book's sample code.
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. Alternatively, you can use the Git version control system to clone the repository. The repository is hosted on GitHub at https://github.com/founddrama/phantomjs-cookbook.
Given the following script:
console.log('A console statement from PhantomJS on ' + new Date().toDateString() + '!'); phantom.exit();
Type the following at the command line:
phantomjs chapter01/recipe03.js
Our preceding example script performs the following actions:
We print a message to the console (including a date string) using
console.log
.The script exits the PhantomJS runtime using
phantom.exit
.Since we did not provide an integer argument to
phantom.exit
, it returns an exit code of0
(its default) to the shell.
As we learned in the Launching the PhantomJS REPL recipe, PhantomJS will enter the REPL when invoked without any arguments. However, the runtime environment will attempt to evaluate and execute the first unrecognized argument as though it were a JavaScript file, regardless of whether or not it ends in .js
. Most of the time that we work with PhantomJS, we will interact with it using scripts such as these.
As long as PhantomJS can resolve the first unrecognized argument as a file and correctly parse its contents as syntactically valid JavaScript, it will attempt to execute the contents. However, what happens if those preconditions are not met?
If the argument cannot be resolved as a file on disk, or if the file has no contents, PhantomJS will print an error message to the console, for example:
phantomjs does-not-exist-or-empty Can't open 'does-not-exist-or-empty'
If the argument exists but the file's contents cannot be parsed as a valid JavaScript, then PhantomJS will print an error message to the console and hang, for example:
phantomjs invalid.js SyntaxError: Parse error
Note
In the event of such a SyntaxError
, the PhantomJS process will not automatically terminate, and we must forcefully quit it (Ctrl + C).
Recall that PhantomJS is a headless web browser, and it helps to think of it as a version of Chrome or Safari that has no window. Just as we interact with our normal web browser by entering URLs into the location bar, clicking the back button, or clicking links on the page, so we will need to interact with PhantomJS. However, as it has no window and no UI components, we must interact with it through its programmable API. The PhantomJS API is written in JavaScript, and scripts targeting the PhantomJS runtime are also written in JavaScript; the API is documented online at http://phantomjs.org/api/.
If you have been exposed to both PhantomJS and Node.js, you may be wondering about the differences between them, especially after witnessing demonstrations of their respective REPLs and script running abilities. When comparing the two, it is helpful to consider them using the phrase "based on" as your frame of reference. Node.js is based on Google Chrome's V8 JavaScript engine; PhantomJS is based on the WebKit layout engine. Node.js is a JavaScript runtime; PhantomJS has a JavaScript runtime. Where Node.js is an excellent platform for building JavaScript-based server applications, it does not have any native HTML rendering. This is the key differentiator when comparing it to PhantomJS. The mission of PhantomJS is not to provide a platform for building JavaScript applications, but instead to provide a fast and standards-compliant headless browser.
In this recipe, we will learn how to run a PhantomJS script with additional arguments that are passed into the script for evaluation. Note that these are arguments passed into the execution context and are not command-line arguments for the PhantomJS runtime itself.
To run this recipe, we will need a script to run with PhantomJS; the script in this recipe is available in the downloadable code repository as recipe04.js
under chapter01
. If we run the provided example script, we must change to the root directory for the book's sample code. Lastly, we will need the arguments we wish to pass into the script.
Given the following script:
if (phantom.args.length === 0) { console.log('No arguments were passed in.'); } else { phantom.args.forEach(function(arg, index) { console.log('[' + index + '] ' + arg); }); } phantom.exit();
Enter the following command at the command line:
phantomjs chapter01/recipe04.js foo bar "boo baa"
We will see the following results printed in the terminal:
[0] foo [1] bar [2] boo baa
Our preceding example script performs the following actions:
As we discussed in the Running a PhantomJS script recipe, PhantomJS will attempt to evaluate and execute the first unrecognized argument as though it were a valid JavaScript file. But what does PhantomJS do with all of the arguments after that?
The answer is that they are collected into the phantom.args
array as string values. Each argument after the script name goes into this array. Note that phantom.args
does not include the script name itself. Instead, PhantomJS records that in the read-only phantom.scriptName
property.
It is worth noting that both phantom.args
and phantom.scriptName
are both marked as deprecated in the API documentation. As such, usage of both of these properties is discouraged. Although using them for quick one-off or exploratory scripts is fine, neither of these properties should go into any library that we intend to maintain or distribute.
Wherever possible, we should use the system.args
array (from the system
module) instead of phantom.args
and phantom.scriptName
.
Tip
When in doubt, check the PhantomJS project website and its documentation at http://phantomjs.org/api/. It is actively maintained, and as such will contain up-to-date information about the preferred APIs.
The Inspecti ng command-line arguments recipe in Chapter 2, PhantomJS Core Modules
In this recipe, we will learn how to use the cookies-file
command-line switch to specify the location of the file for persistent cookies in PhantomJS.
To run this recipe, we will need a script to run with PhantomJS that accesses a site where cookies are read or written. We will need a filesystem path to specify it as the command-line argument, making sure that we have write permissions to that path.
The script in this recipe is available in the downloadable code repository as recipe05.js
under chapter01
. If we run the provided example script, we must change to the root directory for the book's sample code.
Lastly, the script in this recipe runs against the demo site that is included with the cookbook's sample code repository. To run that demo site, we must have Node.js installed. In a separate terminal, change to the phantomjs-sandbox
directory (in the sample code's directory) and start the app with the following command:
node app.js
Note
Node.js is a JavaScript runtime environment based on Chrome's V8 engine. It has an event-driven programming model and non-blocking I/O and can be used for building fast networking applications, shell scripts, and everything in between. We can learn more about Node.js including how to install it at http://nodejs.org/.
We will use this demo for many recipes throughout this cookbook. When we run the demo app for the first time, we need to download and install the Node.js modules that it depends on. To do this, we can change to the phantomjs-sandbox
directory and run the following command:
npm install
Given the following script:
var webpage = require('webpage').create(); webpage.open('http://localhost:3000/cookie-demo', function(status) { if (status === 'success') { phantom.cookies.forEach(function(cookie, i) { for (var key in cookie) { console.log('[cookie:' + i + '] ' + key + ' = ' + cookie[key]); } }); phantom.exit(); } else { console.error('Could not open the page! (Is it running?)'); phantom.exit(1); } });
Enter the following command at the command line:
phantomjs --cookies-file=cookie-jar.txt chapter01/recipe05.js
The script will print out the properties for each cookie in the response, as follows:
[cookie:0] domain = localhost [cookie:0] expires = Sat, 07 Dec 2013 02:05:06 GMT [cookie:0] expiry = 1386381906 [cookie:0] httponly = false [cookie:0] name = dave [cookie:0] path = /cookie-demo [cookie:0] secure = false [cookie:0] value = oatmeal-raisin [cookie:1] domain = localhost [cookie:1] expires = Sat, 07 Dec 2013 02:04:22 GMT [cookie:1] expiry = 1386381862 [cookie:1] httponly = false [cookie:1] name = rob [cookie:1] path = /cookie-demo [cookie:1] secure = false [cookie:1] value = chocolate-chip
We can then open cookie-jar.txt
in a text editor and examine its contents. The cookie jar file should look something like the following:
[General] cookies="@Variant(\0\0\0\x7f\0\0\0\x16QList<QNetworkCookie>\0\0\0\0\x1\0\0\0\x2\0\0\0_dave=oatmeal-raisin; expires= Sat, 07 Dec 2013 02:05:06 GMT; domain=localhost; path=/cookie-demo\0\0\0^rob=chocolate-chip; expires= Sat, 07 Dec 2013 02:04:22 GMT; domain=localhost; path=/cookie-demo)"
Our preceding example script performs the following actions:
It creates a
webpage
object and opens the target URL (http://localhost:3000/cookie-demo
).In the callback function, we check for
status
of'success'
, printing an error message and exiting PhantomJS if that condition fails.If we successfully open the URL, then we loop through each cookie in the
phantom.cookies
collection and print out information about each one.Lastly, we exit from the PhantomJS runtime using
phantom.exit
.
When we start PhantomJS with the cookies-file
argument, we are telling the runtime to read and write cookies from a specific location on the filesystem. What this allows us to do is to use cookies in PhantomJS like we would with any other browser. In other words, an HTTP response or client-side script can set cookies, and when we run our PhantomJS script against that URL again, we can trust that the cookies are still there in the file.
Notice that the cookie jar file itself is essentially a plain text file. The actual file extension does not matter; we used .txt
in our example, but it could just as easily be .cookies
or even no extension at all. When persisting the cookies, PhantomJS writes them to this file. If we examine the file, then we see that it is a serialized, text-based version of the QNetworkCookie
class that PhantomJS uses behind the scenes. Although the on-disk version is not necessarily easy to read, we can easily make a copy and parse it or transform it into its constituent cookies. This can be useful for examining their contents after a script has completed (for example, to ensure that the expected values are being written to disk).
Additionally, with the cookies written to disk, they are available for future PhantomJS script runs against URLs that expect the same cookies. For example, this can be useful when running scripts against sites that require authentication where those authentication tokens are passed around as cookies.
The Managing cookies with the phantom object recipe in Chapter 2, PhantomJS Core Modules
In this recipe, we will learn about running PhantomJS with an on-disk cache that is enabled using the disk-cache
and max-disk-cache-size
command-line arguments. We can use this to test how browsers cache our static assets.
To run this recipe, we will need a script to run with PhantomJS that accesses a website with cacheable assets. Optionally, we will also need a sense of how large we wish to set the on-disk cache (in kilobytes).
The script in this recipe is available in the downloadable code repository as recipe06.js
under chapter01
. If we run the provided example script, we must change to the root directory for the book's sample code.
Lastly, the script in this recipe runs against the demo site that is included with the cookbook's sample code repository. To run that demo site, we must have Node.js installed. In a separate terminal, change into the phantomjs-sandbox
directory (in the sample code's directory) and start the app with the following command:
node app.js
Given the following script:
var page = require('webpage').create(), count = 0, until = 2; page.onResourceReceived = function(res) { if (res.stage === 'end') { console.log(JSON.stringify(res, undefined, 2)); } }; page.onLoadStarted = function() { count += 1; console.log('Run ' + count + ' of ' + until + '.'); }; page.onLoadFinished = function(status) { if (status === 'success') { if (count < until) { console.log('Go again.\n'); page.reload(); } else { console.log('All done.'); phantom.exit(); } } else { console.error('Could not open page! (Is it running?)'); phantom.exit(1); } }; page.open('http://localhost:3000/cache-demo');
Enter the following command at the command line:
phantomjs --disk-cache=true --max-disk-cache-size=4000 chapter01/recipe06.js
The script will print out details about each resource in the response as JSON.
Our preceding example script performs the following actions:
It creates a
webpage
object and sets two variables,count
anduntil
.We assign an event handler function to the
webpage
object'sonResourceReceived
callback. This callback will print out every property of each resource received.We assign an event handler function to the
webpage
object'sonLoadStarted
callback. This callback will incrementcount
when the page load starts and print a message indicating which run it is.We assign an event handler function to the
webpage
object'sonLoadFinished
callback. This callback checks thestatus
of the response and takes action accordingly as follows:If
status
is not'success'
, then we print an error message and exit from PhantomJSIf the callback's
status
is'success
', then we check to see ifcount
is less thanuntil
, and if it is, then we callreload
on thewebpage
object; otherwise, we exit PhantomJS
Finally, we open the target URL (
http://localhost:3000/cache-demo
) usingwebpage.open
.
Even though the disk cache is off by default, PhantomJS still performs some in-memory caching. This detail becomes important in later explorations, as it produces some otherwise difficult to explain results. For example, in our preceding sample script, we used webpage.reload
for our second request of the URL, and in that second request, we saw all of the images re-requested. However, if we had used a second call to webpage.open
(instead of webpage.reload
), then the onResourceReceived
callback would have shown a second request to the URL but none of the images would have been re-requested. (As an interesting aside, we would also see that behavior if we set the disk-cache
argument to false
; the in-memory cache cannot be disabled.)
Another interesting observation is that PhantomJS always reports an HTTP response status of 200 Ok
for every successfully retrieved asset. If we look at the Node.js console output for the demo app while our sample script runs, we can see the discrepancy. Again, when our sample script runs, we can see that an HTTP status code of 200
is reported by PhantomJS for each of the images during both the first and second request/response cycles. However, the output from the Node.js app looks something like this:
GET /cache-demo 200 1ms - 573b GET /images/583519989_1116956980_b.jpg 200 4ms - 264.64kb GET /images/152824439_ffcc1b2aa4_b.jpg 200 8ms - 615.21kb GET /images/357292530_f225d7e306_b.jpg 200 6ms - 497.98kb GET /images/391560246_f2ac936f6d_b.jpg 200 5ms - 446.68kb GET /images/872027465_2519a358b9_b.jpg 200 5ms - 766.94kb GET /cache-demo 200 1ms - 573b GET /images/152824439_ffcc1b2aa4_b.jpg 304 3ms GET /images/357292530_f225d7e306_b.jpg 304 3ms GET /images/391560246_f2ac936f6d_b.jpg 304 2ms GET /images/583519989_1116956980_b.jpg 304 3ms GET /images/872027465_2519a358b9_b.jpg 304 3ms
We can see that the server responds with 304 Not Modified
for each of the image assets. This is exactly what we would expect for a second request to the same URL when the assets are served with Cache-Control
headers that specify a max-age
, and for assets that are also cached to disk.
We can enable the disk cache by setting the disk-cache
argument to true
or yes
. By default, the disk cache is disabled, but we can also explicitly disable it by providing false
or no
to the command-line argument. When the disk cache is enabled, PhantomJS will cache assets to the on-disk cache, which it stores at the desktop services cache storage location. Caching these assets has the potential to speed up future script runs against URLs that share those assets.
Optionally, we may also wish to limit the size of the disk cache (for example, to simulate the small caches on some mobile devices). To limit the size of the disk cache, we use the max-disk-cache-size
command-line argument and provide an integer that determines the size of the cache in kilobytes. By default (if you do not use the max-disk-cache-size
argument), the cache size is unbounded. Most of the time, we will not need to use the max-disk-cache-size
argument.
If we need to inspect the cached data that is persisted to disk, PhantomJS writes to the desktop services cache storage location for the platform it's running on. These locations are listed as follows:
Platform |
Location |
---|---|
Windows |
|
Mac OS X |
|
Linux |
|
The Opening a URL within PhantomJS recipe in Chapter 3, Working with webpage Objects
In this recipe, we will learn how to store PhantomJS configuration options in a JSON document and load those options using the config
command-line argument.
To run this recipe, we will need a JSON-formatted configuration file with our PhantomJS command-line options.
The script in this recipe is available in the downloadable code repository as recipe07.js
under chapter01
. If we run the provided example script, we must change to the root directory for the book's sample code. An example configuration file is also in this directory as recipe07-config.json
.
Lastly, the script in this recipe runs against the demo site that is included with the cookbook's sample code repository. To run that demo site, we must have Node.js installed. In a separate terminal, change to the phantomjs-sandbox
directory (in the sample code's directory) and start the app with the following command:
node app.js
Select our command-line configuration options (changing hyphenated property names into their camel-cased equivalents) and apply our values. Save these configuration settings to a JSON-formatted document. For example, the contents of recipe07-config.json
under chapter01
:
{ "cookiesFile" : "cookie-jar.txt", "ignoreSslErrors" : true }
Tip
For more information about JSON, including its formatting rules, visit http://www.json.org.
Given the script from the Running PhantomJS with cookies recipe earlier in this chapter, enter the following at the command line:
phantomjs --config=chapter01/recipe07-config.json chapter01/recipe07.js
The configuration file is a JSON document where we can take our preferred command-line arguments and store them on disk. The keys in the JSON object have a one-to-one correspondence with the command-line arguments themselves – the hyphenated command-line argument names are converted to their camel-cased versions (for example, cookies-file
becomes cookiesFile
). The values in the JSON object follow easy conversion rules based on the most applicable JavaScript primitives: strings are strings, numbers are numbers, and true
/false
or yes
/no
become the corresponding true
or false
Boolean literals. Creating our own JSON-formatted configuration file requires only two things: a text editor and the knowledge of which command-line arguments we wish to capture in it.
Tip
See http://phantomjs.org/api/command-line.html for the complete list of documented command-line options in the PhantomJS API.
Note
The help
and version
command-line arguments do not have corresponding versions in the JSON configuration file. Also, at the time of writing this book, there is a documented defect wherein the JSON key for the load-images
argument is not recognized.
The example script in this recipe (recipe07.js
under chapter01
) is identical to the one that we used for our demonstration in the Running PhantomJS with cookies recipe; we are reusing it here for convenience. For a more thorough explanation of what it is doing, see the How it works… section under that recipe.
When launching PhantomJS with the config
command-line argument, the PhantomJS runtime interprets the argument's value as a path on the filesystem and attempts to load and evaluate that file as a JSON document. If the file cannot be parsed as a JSON document, then PhantomJS prints a warning and ignores it. If the file is correctly parsed, then PhantomJS configures itself as if the arguments in the JSON document had been passed as normal command-line arguments.
This raises an interesting question: given equivalent arguments, which one takes precedence? The one specified in the JSON configuration file? Or the one specified on the command line? The answer is that it depends which one comes last. In other words, given recipe07-config.json
, we can run:
phantomjs --cookies-file=jar-of-cookies.txt --config=chapter01/recipe07-config.json chapter01/recipe07.js
That creates cookie-jar.txt
, as specified in recipe07-config.json
. While the following command creates jar-of-cookies.txt
, as specified on the command line:
phantomjs --config=chapter01/recipe07-config.json --cookies-file=jar-of-cookies.txt chapter01/recipe07.js
Saving a PhantomJS configuration to a JSON document can help us in a couple of ways. First, by putting it into a file, we can put it under version control and track the changes to that configuration over time. Also, by putting the configuration into a file, it can more easily be shared across teams or jobs in continuous integration.
In this recipe, we will learn about remote debugging PhantomJS scripts using the remote-debugger-port
and remote-debugger-autorun
command-line arguments.
To run this recipe, we will need the following:
PhantomJS installed on our
PATH
A script to run with PhantomJS, which we are interested in debugging
Our computer's IP address
An open port over which the debugger will communicate
Another browser such as Google Chrome or Safari
The script in this recipe is available in the downloadable code repository as recipe08.js
under chapter01
. If we run the provided example script, we must change to the root directory of the book's sample code.
The script in this recipe runs against the demo site that is included with the cookbook's sample code repository. To run that demo site, we must have Node.js installed. In a separate terminal, change to the phantomjs-sandbox
directory and start the app with the following command:
node app.js
Given the following script:
var page = require('webpage').create(); page.onResourceReceived = function(res) { if (res.stage === 'end') { console.log(JSON.stringify(res, undefined, 2)); } }; page.open('http://localhost:3000/cache-demo', function(status) { if (status === 'success') { console.log('All done.'); phantom.exit(); } else { console.error('Could not open page! (Is it running?)'); phantom.exit(1); } });
Enter the following at the command line:
phantomjs --remote-debugger-port=9000 --remote-debugger-autorun=true chapter01/recipe08.js
Note that with the remote-debugger-autorun
argument set to true
, the script will run immediately as it normally would, but it will also ignore calls to phantom.exit
and suspend execution, printing out the following message:
Phantom::exit() called but not quitting in debug mode.
Tip
If we want more control over when the script begins (for example, we want to set breakpoints first), then simply omit the remote-debugger-autorun
argument. By omitting that argument, PhantomJS will start and will load the script, but will not execute it until you issue the __run()
command in the debugger.
Now we can open our other browser (for example, Chrome) and enter our IP address and the port that we specified with remote-debugger-port
. For example, if our computer's IP address is 10.0.1.8
, we would enter http://10.0.1.8:9000/
into the location bar. Then, we should see something like the following screenshot:

The viewport will contain the PhantomJS browsing session's history as a list. As we are interested in accessing the debugger tools, we will click on the link that reads about:blank. This will take us to /webkit/inspector/inspector.html
, and it should look something like the following screenshot:

If we have worked in the Chrome or Safari developer tools before, the toolbar should be familiar. While debugging PhantomJS scripts, we will be particularly interested in the Scripts and Console tabs.
Tip
For those unfamiliar with the WebKit Web Inspector, check out Majd Taby's thorough introduction, "The WebKit Inspector", at http://jtaby.com/blog/2012/04/23/modern-web-development-part-1.
Once we have the debugger open, click on the Scripts tab. In the Scripts tab, click on the drop-down menu (in the top toolbar, just below the tabs) and select about:blank. This will show us our script as seen in the following screenshot. Click on any line number in the left-side gutter to set a breakpoint.

With our breakpoint set, click on the Console tab to toggle into the console. Since we used the remote-debugger-autorun
argument, we will see our console.log
and other such statements printed to the console from our first (automatic) run. Note the blue prompt at the bottom of the console as seen in the following screenshot; we can enter new expressions to be evaluated here at this prompt. To run our PhantomJS script again, we enter __run()
.

Entering __run()
in the console will execute the script again. The script execution will pause on any breakpoints that we set and we will automatically be brought into the Scripts tab. In the Scripts tab, we can inspect our call stack, inspect local variables and objects at runtime, manipulate the runtime environment through the console, and more.

When we are done debugging our script, we can simply close the browser and then use Ctrl + C to quit the PhantomJS process in the terminal.
Our preceding example script is a simple one. We proceed in the following manner:
We create a
webpage
object.We assign an event handler function to the
webpage
object'sonResourceReceived
callback. This callback will print out each resource received usingJSON.stringify
.Lastly, we open the target URL (
http://localhost:3000/cache-demo
) usingwebpage.open
, callingphantom.exit
in the callback.
Effective debugging is an essential skill for every developer, and it is fantastic that PhantomJS has the WebKit remote debugging built-in as a first-class tool. While the debugger itself may be overkill for simple situations, sometimes console.log
just isn't a powerful enough (or fast enough) tool. In those cases, it is comforting to know that you have these debug tools at your disposal.
One important thing to note about using the remote debugger with PhantomJS is that we will need to be aware of what context we are attempting to debug. Are we debugging the PhantomJS script itself? Or a script on the page that the PhantomJS script is accessing? Or some interaction between them? In the simple case (as previously demonstrated), the remote debug mode makes it almost trivial to inspect our PhantomJS script's execution at runtime. However, it does take some extra work if we need to also debug a script on the page that PhantomJS is accessing. In those cases, we may find it useful to use the remote-debugger-autorun
argument; this will pre-populate the debugger's landing page with links to the inspector for the PhantomJS script's context and also the accessed web page's context. We can open these links each in a new tab, giving a separate debugger session for each context we need to work in.
Of the two debugger-related command-line arguments, remote-debugger-port
is the essential one. The remote-debugger-port
argument serves two functions. The first, implicit function is to put PhantomJS into the debug harness. Its second, explicit function is to set the port that PhantomJS will use for the WebKit remote debugging protocol.
Having these remote debugging capabilities in PhantomJS is extremely handy if we need to inspect or otherwise troubleshoot some misbehaving or unpredictable code. But something else that is nice about how the debugging toolkit is implemented is that we don't need anything else except another browser with a GUI. We do not need to install any special extensions in Chrome or Safari for the debugger to work. All we need to do is specify the port on the command line and point the browser at our computer's IP and voila—the full power of a GUI debugger for our otherwise headless web browser.
Tip
Although we can use any browser as the target viewport for the remote debugger, our best results will be in Safari or Chrome. Safari is currently the dominant WebKit-based browser; Chrome uses the Blink rendering engine, but retains many of the features from its WebKit heritage. The remote debugger will function in other browsers (for example, Firefox or Opera) but certain things may not render properly, making it much more difficult to use.
The remote-debugger-autorun
command-line argument is optional and if specified as true
, the script passed to PhantomJS will be run immediately in the debug harness. While this may be a convenient feature, it is seldom what we want.
Under normal debugging, we would already have some idea of where our code is defective (for example, from the errors or stack traces that we already have). With that knowledge, we would want to start our PhantomJS script in the debug harness, then navigate to the Scripts tab and set our breakpoints, and then execute the script.
If we have not set the script to run automatically, then how do we execute it? If we look again at our script as it appears in the about:blank selection under the Scripts tab, we will notice that it has been wrapped in a function and assigned to the variable named __run
. To execute our script, we enter __run()
into the debugger console and hit enter to call the function.