Search icon
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
$9.99 | ALL EBOOKS & VIDEOS
Over 7,000 tech titles at $9.99 each with AI-powered learning assistants on new releases
Instant Heat Maps in R How-to
Instant Heat Maps in R How-to

Instant Heat Maps in R How-to: Learn how to design heat maps in R to enhance your data analysis

By Sebastian Raschka
$16.99 $9.99
Book Jun 2013 72 pages 1st Edition
eBook
$16.99 $9.99
Print
$26.99
Subscription
$15.99 Monthly
eBook
$16.99 $9.99
Print
$26.99
Subscription
$15.99 Monthly

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now
Table of content icon View table of contents Preview book icon Preview Book

Instant Heat Maps in R How-to

Chapter 1. Instant Heat Maps in R How-to

Welcome to Instant Heat Maps in R How-to.

Throughout this book, we will learn how to create simple and advanced heat maps, customize them, and create a nice output for presentations. We will be using data from various different file formats as input and work on our heat maps to add some interactivity to them.

We will also take a look at simple heat maps by creating a choropleth map of the United States and make a contour plot from topographical volcano data.

Creating your first heat map in R (Simple)


In this recipe, we will learn how to construct our first heat map in R from the AirPassenger data set, which is a standard data set included in the data package that is available with R distributions. For this task, we will use the levelplot() function from the lattice package and explore the enhanced features of the gplots package, the heatmap.2() function.

The following image shows one of the heat maps that we are going to create in this recipe from the total count of air passengers:

Getting ready

Download the script 5644_01_01.r from your account at http://www.packtpub.com and save it to your hard disk. The first section of the script, below the comment line starting with ### loading packages, will automatically check for the availability of the R packages gplots and lattice, which are required for this recipe.

If those packages are not already installed, you will be prompted to select an official server from the Comprehensive R Archive Network (CRAN) to allow the automatic download and installation of the required packages.

If you have already installed those two packages prior to executing the script, I recommend you to update them to the most recent version by calling the following function in the R command line:

update.packages("gplots")
update.packages("lattice")

Tip

Use the source() function in the R command-line to execute an external script from any location on your hard drive.

If you start a new R session from the same directory as the location of the script, simply provide the name of the script as an argument in the function call as follows:

source("myScript.r")

You have to provide the absolute or relative path to the script on your hard drive if you started your R session from a different directory to the location of the script. Refer to the following example:

source("/home/username/Downloads/myScript.r")

You can view the current working directory of your current R session by executing the following command in the R command-line:

getwd()

How to do it...

Run the 5644OS_01_01.r script in R to execute the following code, and take a look at the output printed on the screen as well as the PDF file, first_heatmaps.pdf that will be created by this script:

### loading packages
if (!require("gplots")) {
install.packages("gplots", dependencies = TRUE)
library(gplots)
}
if (!require("lattice")) {
install.packages("lattice", dependencies = TRUE)
library(lattice)
}

### loading data
data(AirPassengers)

### converting data
rowcolNames <- list(as.character(1949:1960), month.abb)
air_data <- matrix(AirPassengers, 
  ncol = 12, 
  byrow = TRUE, 
  dimnames = rowcolNames)

### drawing heat maps
pdf("firstHeatmaps.pdf")

# 1) Air Passengers #1
print(levelplot(air_data,
  col.regions=heat.colors, 
  xlab = "year",
  ylab = "month", 
  main = "Air Passengers #1"))

# 2) Air Passengers #2
heatmap.2(air_data, 
  trace = "none", 
  density.info = "none", 
  xlab = "month", 
  ylab = "year", 
  main = "Air Passengers #2")

# 3) Air Passengers #3
heatmap.2(air_data, 
  trace = "none", 
  xlab = "month", 
  ylab = "year", 
  main = "Air Passengers #3",
  density.info = "histogram",
  dendrogram = "column",
  keysize = 1.8)

dev.off()

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

How it works...

There are different functions for drawing heat maps in R, and each has its own advantages and disadvantages. In this recipe, we will take a look at the levelplot() function from the lattice package to draw our first heat map. Furthermore, we will use the advanced heatmap.2() function from gplots to apply a clustering algorithm to our data and add the resulting dendrograms to our heat maps.

The following image shows an overview of the different plotting functions that we are using throughout this book:

Now let us take a look at how we read in and process data from different data files and formats step-by-step:

  1. Loading packages: The first eight lines preceding the ### loading data section will make sure that R loads the lattice and gplots package, which we need for the two heat map functions in this recipe: levelplot() and heatmap.2().

    Note

    Each time we start a new session in R, we have to load the required packages in order to use the levelplot() and heatmap.2() functions. To do so, enter the following function calls directly into the R command-line or include them at the beginning of your script:

    • library(lattice)

    • library(gplots)

  2. Loading the data set: R includes a package called data, which contains a variety of different data sets for testing and exploration purposes. More information on the different data sets that are contained in the data package can be found at http://stat.ethz.ch/ROmanual/ROpatched/library/datasets/.

    For this recipe, we are loading the AirPassenger data set, which is a collection of the total count of air passengers (in thousands) for international airlines from 1949-1960 in a time-series format.

    data(AirPassengers)
  3. Converting the data set into a numeric matrix: Before we can use the heat map functions, we need to convert the AirPassenger time-series data into a numeric matrix first. Numeric matrices in R can have characters as row and column labels, but the content itself must consist of one single mode: numerical.

    We use the matrix() function to create a numeric matrix consisting of 12 columns to which we pass the AirPassenger time-series data row-by-row. Using the argument dimnames = rowcolNames, we provide row and column names that we assigned previously to the variable rowColNames, which is a list of two vectors: a series of 12 strings representing the years 1949 to 1960, and a series of strings for the 12 three-letter abbreviations of the months from January to December, respectively.

    rowcolNames <- list(as.character(1949:1960), month.abb)
    air_data <- matrix(AirPassengers, 
      ncol = 12, 
      byrow = TRUE, 
      dimnames = rowcolNames)
  4. A si mple heat map using levelplot(): Now that we have converted the AirPassenger data into a numeric matrix format and assigned it to the variable air_data, we can go ahead and construct our first heat map using the levelplot() function from the lattice package:

    print(levelplot(air_data,
      col.regions=heat.colors, 
      xlab = "year",
      ylab = "month", 
      main = "Air Passengers #1"))

    The levelplot() function creates a simple heat map with a color key to the right-hand side of the map. We can use the argument col.regions = heat.colors to change the default color transition to yellow and red. X and y axis labels are specified by the xlab and ylab parameters, respectively, and the main parameter gives our heat map its caption.

    Note

    In contrast to most of the other plotting functions in R, the lattice package returns objects, so we have to use the print() function in our script if we want to save the plot to a data file. In an interactive R session, the print() call can be omitted. Typing the name of the variable will automatically display the referring object on the screen.

  5. Creating enhanced heat maps with heatmap.2(): Next, we will use the heatmap.2() function to apply a clustering algorithm to the AirPassenger data and to add row and column dendrograms to our heat map:

    heatmap.2(air_data, 
        trace = "none", 
        density.info = "none", 
      xlab = "month", 
      ylab = "year", 
      main = "Air Passengers #2")

    Note

    Hierarchical clustering is especially popular in gene expression analyses. It is a very powerful method for grouping data to reveal interesting trends and patterns in the data matrix.

    Another neat feature of heatmap.2() is that you can display a histogram of the count of the individual values inside the color key by including the argument density.info = NULL in the function call. Alternatively, you can set density.info = "density" for displaying a density plot inside the color key.

    By adding the argument keysize = 1.8, we are slightly increasing the size of the color key—the default value of keysize is 1.5:

    heatmap.2(air_data, 
      trace = "none", 
      xlab = "month", 
      ylab = "year", 
      main = "Air Passengers #3",
    density.info = "histogram",  
    dendrogram = "column",
      keysize = 1.8) 

    Did you notice the missing row dendrogram in the resulting heat map? This is due to the argument dendrogram = "column" that we passed to the heat map function. Similarly, you can type row instead of column to suppress the column dendrogram, or use neither to draw no dendrogram at all.

There's more...

By default, levelplot() places the color key on the right-hand side of the heat map, but it can be easily moved to the top, bottom, or left-hand side of the map by modifying the space parameter of colorkey:

levelplot(air_data, 
col.regions = heat.colors, 
colorkey = list(space = "top"))

Replacing top by left or bottom will place the color key on the left-hand side or on the bottom of the heat map, respectively.

Moving around the color key for heatmap.2() can be a little bit more of a hassle. In this case we have to modify the parameters of the layout() function. By default, heatmap.2() passes a matrix, lmat, to layout(), which has the following content:

     [,1] [,2]
[1,]    4    3
[2,]    2    1

The numbers in the preceding matrix specify the locations of the different visual elements on the plot (1 implies heat map, 2 implies row dendrogram, 3 implies column dendrogram, and 4 implies key). If we want to change the position of the key, we have to modify and rearrange those values of lmat that heatmap.2() passes to layout().

For example, if we want to place the color key at the bottom left-hand corner of the heat map, we need to create a new matrix for lmat as follows:

lmat
     [,1] [,2]
[1,]    0    3
[2,]    2    1
[3,]    4    0

We can construct such a matrix by using the rbind() function and assigning it to lmat:

lmat = rbind(c(0,3),c(2,1),c(4,0))

Furthermore, we have to pass an argument for the column height parameter lhei to heatmap.2(), which will allow us to use our modified lmat matrix for rearranging the color key:

heatmap.2(air_data, 
dendrogram = "none",
trace = "none",
density.info = "none", 
keysize = "1.3",
xlab = "month", 
ylab = "year", 
main = "Air Passengers", 
lmat = rbind(c(0,3),c(2,1),c(4,0)), 
lhei = c(1.5,4,1.5))

If you don't need a color key for your heat map, you could turn it off by using the argument key = FALSE for heatmap.2() and colorkey = FALSE for levelplot(), respectively.

Tip

R also has a base function for creating heat maps that does not require you to install external packages and is most advantageous if you can go without a color key. The syntax is very similar to the heatmap.2() function, and all options for heatmap.2() that we have seen in this recipe also apply to heatmap():

heatmap(air_data, 
    xlab = "month", 
   ylab = "year", 
   main = "Air Passengers")

More information on dendrograms and clustering

By default, the dendrograms of heatmap.2() are created by a hierarchical agglomerate clustering method, also known as bottom-up clustering.

In this approach, all individual objects start as individual clusters and are successively merged until only one single cluster remains. The distance between a pair of clusters is calculated by the farthest neighbor method, also called the complete linkage method, which is based by default on the Euclidean distance of the two points from both clusters that are farthest apart from each other. The computed dendrograms are then reordered based on the row and column means.

By modifying the default parameters of the dist() function, we can use another distance measure rather than the Euclidean distance. For example, if we want to use the Manhattan distance measure (based on a grid-like path rather than a direct connection between two objects), we would modify the method parameter of the dist() function and assign it to a variable distance first:

distance = dist(myData, method = "manhattan")

Other options for the method parameter are: euclidean (default), maximum, canberra, binary, or minkowski.

To use other agglomeration methods than the complete linkage method, we modify the method parameter in the hclust() function and assign it to another variable cluster. Note the first argument distance that we pass to the hclust() function, which comes from our previous assignment:

cluster = hclust(distance, method = "ward")

By setting the method parameter to ward, R will use Joe H. Ward's minimum variance method for hierarchical clustering. Other options for the method parameter that we can pass as arguments to hclust() are: complete (default), single, average, mcquitty, median, or centroid.

To use our modified clustering parameters, we simply call the as.dendrogram() function within heatmap.2() using the variable cluster that we assigned previously:

heatmap.2(myData, 
Colv = as.dendrogram(cluster), 
Rowv = as.dendrogram(cluster))

We can also draw the cluster dendrogram without the heat map by using the plot() function:

plot(cluster)

Tip

To turn off row and column reordering, we need to turn off the dendrograms and set the parameters Colv and Rowv to NA:

heatmap.2(my_data, dendrograms = "none", Colv = "NA", Rowv = "NA")

Reading data from different file formats (Intermediate)


In the previous recipe, we familiarized ourselves with heat map functions in R using built-in data. Now, we will learn how to use external data to create our heat maps.

In real life, we often have no control over the format of the data files that we download, or the data that is the output of a particular program. Because we can not always rely on luck that the data comes in the right format, we will learn in this recipe how to read in data from different file formats and get it into shape.

The following image shows a heat map that we are going to create from the gene_expression.txt data set in this recipe:

Getting ready

Download the script 5644OS_02_01.r and the data sets gene_expression.txt, runners.csv, and apple_stocks.xlsx from your account at http://www.packtpub.com and save them to your hard drive.

I recommend that you download and save the script and data files to the same folder on your hard drive. If you execute the script from a different location to the location of the data files, you have to change the current R working directory accordingly.

Tip

You can change the current working directory of your R session by using the setwd() function. If you saved the data files for this recipe under /home/user/Downloads, for example, you would type setwd("/home/user/Downloads") into the R command-line. Alternatively, you can uncomment the fourth line of the script and provide the location of the data file directory to the setwd() function in a similar way.

For more information on how to view the current working directory of your current R session and an explanation on how to run scripts in R, please read the Getting ready section of the Creating your first heat map in R recipe.

How to do it...

After you have executed the following code from the 5644OS_02_01.r script, take a look at the PDF file readingData.pdf, which was created in the same location where you executed the script:

# if you are running the script from a different location
# than the location of the data sets, uncomment the
# next line and point setwd() to the data set location
# setwd("/home/username/Datasets")

### loading packages
if (!require("gplots")) {
install.packages("gplots", dependencies = TRUE)
library(gplots)
}
if (!require("lattice")) {
install.packages("lattice", dependencies = TRUE)
library(lattice)
}
if (!require("xlsx")) {
install.packages("xlsx", dependencies = TRUE)
library(xlsx)
}

pdf("readingData.pdf")

### loading data and drawing heat maps

# 1) gene_expression.txt
gene_data <- read.table("gene_expression.txt",
  comment.char = "/", 
  blank.lines.skip = TRUE,
  header = TRUE,  
  sep = "\t", 
  nrows = 20)
gene_data <- data.matrix(gene_data)
gene_ratio <- outer(gene_data[,"Treatment"],
  gene_data[,"Control"], 
  FUN = "/")
heatmap.2(gene_ratio, 
  xlab = "Control", 
  ylab = "Treatment", 
  trace = "none",
  main = "gene_expression.txt")

# 2) runners.csv
runner_data <- read.csv("runners.csv")
rownames(runner_data) <- runner_data[,1]
runner_data <- data.matrix(runner_data[,2:ncol(runner_data)])
colnames(runner_data) <- c(2003:2012)
runner_data[runner_data == 0.00] <- NA
heatmap.2(runner_data, 
  dendrogram = "none", 
  Colv = NA,
  Rowv = NA,
  trace = "none", 
  na.color = "gray",
  main = "runners.csv",
  margin = c(8,10))

# 3) apple_stocks.xlsx
stocks_table <- read.xlsx("apple_stocks.xlsx", 
  sheetIndex = 1, 
  rowIndex = c(1:28), 
  colIndex = c(1:5,7))
row_names <- (stocks_table[,1])
stocks_matrix <- data.matrix(
  stocks_table[2:ncol(stocks_table)])
rownames(stocks_matrix) <- as.character(row_names)
stocks_data = t(stocks_matrix)
print(levelplot(stocks_data, 
  col.regions = heat.colors, 
  margin = c(10,10),  
  scales = list(x = list(rot = 90)), 
  main = "apple_stocks.xlsx",
  ylab = NULL,
  xlab = NULL))

dev.off()

How it works...

Generally, R is able to read data from any file that contains data in a proper text format. First, we will see how to use the universal read.table() function to read data from a .txt file. Next, the recipe shows us how to read data from a Comma Separated Values (CSV) file using the read.csv() function. And finally, we take a look at the xlsx() function from the xlsx package, in order to process Microsoft Excel spreadsheet files.

  1. Inspecting gene_expression.txt: The first file that reads into R, gene_expression.txt, contains exemplary gene expression data obtained from two different conditions: control and treatment. The individual values in this data file resemble fold-differences of gene expression relative to a housekeeping gene that was used as a reference to normalize the data.

    The data was saved as a .txt file and consists of 105 lines. The first two lines from the top are comments about the data and begin with a slash (/) as the first character. Followed by two blank lines, a header labels the two data columns of the expression data under the two conditions.

    Notice that the data columns are separated by tab spaces as shown in the following screenshot of the gene_expression.txt data set:

  2. Reading data from gene_expression.txt: Let's take a look at the arguments we need to provide in the read.table() function in order to read this data file as a data table into R:

    gene_data <- read.table("gene_expression.txt",
      comment.char = "/", 
      blank.lines.skip = TRUE,
      header = TRUE,  
      sep = "\t", 
      nrows = 20)

    When we use the read.table() function, R ignores every line in the data file that starts with a hash mark (#), which is the default comment character. However, the first character of the comment lines in our data file is a forward slash (/). Therefore, we have to pass it to the function as an additional argument for the comment.char parameter so R can interpret the first two lines in gene_expression.txt as comments and skip them. The second argument, blank.lines.skip = TRUE, ensures that R also skips the two blank lines after the comment section.

    Tip

    Alternatively, we could also use the argument skip = 4 to force R to ignore the first four lines in this text file.

    From here on, R will read every line that follows and will interpret it as data—skipping the comment and blank lines at the beginning.

    There are two data columns in this text file, which are labeled as Control and Treatment. By setting the header parameter to TRUE, R will store these labels as a header for our data table. The header is followed by 100 rows of tab-delimited data (Gene1 to Gene100). We need to use the argument sep = "/t" to set the field separator character to a tab space, since the default data field separator is the white space, sep = "".

    The data file is quite large with its 100 entries, and for this example, we just use the first 20 genes to create our heat map. We can tell R to stop reading data after the 20th entry by providing the argument nrows = 20 in the function call.

  3. Converting the data table into a numerical matrix: As we remember from the Creating your first heat map in R recipe, we need to convert our data into a numeric matrix format before we can use it to create a heat map. Instead of using the relative expression measures deployed in the data file, we are interested in showing the fold-changes of the gene expression levels. To calculate those fold-changes of the Treatment column in relation to the Control column, we are using the outer() function. This function allows us to create a 20 x 20 matrix by dividing each gene from the Treatment column (third column in the data file) by each gene from the Control column (second column in the data file):

    gene_data <- data.matrix(gene_data)
    gene_ratio <- outer(gene_data[,"Treatment"],
      gene_data[,"Control"], 
      FUN = "/")
  4. Reading data from runners.csv: The runners.csv file contains the fastest personal times of seven popular 100 meters sprinters for the years between 2003 and 2012.

    If we want to read a data file with comma-separated values, it is most convenient to use the read.csv() function:

    runner_data <- read.csv("runners.csv")

    Note

    Basically, the read.csv() function is a derivative of read.table() with the following default parameters:

    runner_data <- read.table("runners.csv",
    header = TRUE,
    sep = ",",…)
  5. Dealing with missing values in runners.csv: The default constant for missing values in R is NA, which stands for Not Available. Therefore, if we read data from a file that contains empty fields, missing values will be replaced by NA when R creates the data table.

    When we read in runner_data.csv, we see that our data table contains many 0.00 values, which means that no time was recorded for the runner in those years. However, it would make more sense to have those values denoted as missing data (NA) in our heat map.

    So, if we want R to interpret values other than NA or empty fields as missing values, we do this by providing an argument for the na.strings parameter in our read.table() or read.csv() function:

    runner_data <- read.csv("runners.csv", na.strings = 0.00)

    Note

    We can convert also particular data in the table to missing values after we read it from a file. In our case, we could use the following command to convert all 0.00 data points to missing values:

    runner_data[runner_data == 0.00] <- NA
    

    By default, cells with NA values will be left blank and appear as white cells in our heat map. This can be very misleading, since the color palette that we are using converges into a very bright yellow. It would be very hard to distinguish those empty cells from values that are seeded very high in our color key.

    To avoid this confusion, we can simply assign a different color to those cells that contain missing values. Here, we are coloring them in gray:

    heatmap.2(runner_data, 
        dendrogram = "none", 
      Colv = NA,
      Rowv = NA,
      trace = "none", 
      na.color = "gray",
      main = "runners.csv",
      margin = c(8,10))

    Note

    You can also remove all incomplete data rows—that contain at least one or more missing values—by calling the na.omit() function on a data table as follows:

    runner_data <- na.omit(runner_data)
  6. Reading Apple's stock data from an Excel spreadsheet file: Now, let us take a look at Apple's daily stock market data from 1984 to 2013.

    The following screenshot shows the apple_stock.xlsx data set opened in Microsoft Excel 2011:

    R has no in-built function to read data from this proprietary format. But, we are lucky that someone developed the xlsx package especially for this case that is freely available on CRAN. Therefore, we can use the read.xlsx() function to conveniently get this data from apple_stock.xlsx into R.

    stocks_table <- read.xlsx("apple_stocks.xlsx", 
      sheetIndex = 1, 
      rowIndex = c(1:28), 
      colIndex = c(1:5,7))

    With sheetIndex, we have selected which sheet we want to read from the Excel spreadsheet file—apple_stock.xlsx only has data on sheet 1. The spreadsheet consists of 7168 rows of data, but we are interested only in the recent data stock data of 2013. So, to only read in those first 28 lines, we included the rowIndex = c(1:28) argument in the read.xlsx() function call above. Furthermore, we are not interested in the Volume column (column 6), so let's skip it via the argument colIndex = c(1:5,7).

    Note

    You can also use the read.xlsx() function to read data from the old Excel spreadsheet format .xls.

    In most cases, we will use the read.table() function to read our data into R. The following table summarizes the most important options:

There's more...

It might happen that we obtain our data in the so-called long format, which contains multiple rows for each individual category, individual or item.

To give you a better understanding, an excerpt from the runners.csv data set in long format is shown as follows:

  Year       Runner  Time
1 2007   Usain_Bolt 10.03
2 2008   Usain_Bolt  9.72
3 2009   Usain_Bolt  9.58
4 2010   Usain_Bolt  9.82
5 2011   Usain_Bolt  9.76
6 2012   Usain_Bolt  9.63
7 2004 Asafa_Powell 10.02
8 2005 Asafa_Powell  9.87
...

Do you see the problem here? The Runner column in the middle of the data table consists of character strings, which are incompatible with the numeric matrix format that is required by our heat map functions.

Fortunately, it is quite easy to convert data from long to wide format—we can simply use the cross tabulation function xtabs():

runners_wide <- xtabs(formula = Time ~ Runner + Year, data = 
runners_long))

              Year
Runner          2004  2005  2007  2008  2009  2010  2011  2012
  Asafa_Powell 10.02  9.87  0.00  0.00  0.00  0.00  0.00  0.00
  Usain_Bolt    0.00  0.00 10.03  9.72  9.58  9.82  9.76  9.63

Tip

If you just want to transpose your data, that is, switch columns and rows, you can use the t() function:

transposed_data <- t(my_data)

More information on decimal marks

Several countries use a decimal comma instead of a decimal point. Chances are high that you want to analyze a data set that comes from one of those countries. In this case you just have to provide an additional argument for the dec parameter:

my_data <- read.table("data.txt", dec=",")

Customizing heat maps (Intermediate)


There is always room for improvements. Now that we have seen how to create impressive heat maps from various different data file types, it is time to add some extraordinary style.

To ensure that our heat maps look good in any situation, we will make use of different color palettes in this recipe, and we will even learn how to create our own.

Further, we will add some more extras to our heat maps including visual aids such as cell note labels, which will make them even more useful and accessible as a tool for visual data analysis.

The following image shows a heat map with cell notes and an alternative color palette created from the arabidopsis_genes.csv data set:

Getting ready

Download the 5644OS_03_01.r script and the Arabidopsis_genes.csv data set from your account at http://www.packtpub.com and save it to your hard drive.

I recommend that you save the script and data file to the same folder on your hard drive. If you execute the script from a different location to the data file, you will have to change the current R working directory accordingly.

Please read the Getting ready section of the Reading data from different data formats recipe for more information on how to change the working directory of your current R session.

For more information on how to view the current working directory of your current R session and an explanation on how to run scripts in R, please read the Getting ready section of the Creating your first heat map recipe.

The script will check automatically if any additional packages need to be installed in R. You can find more information about the installation of packages in the Getting ready section of the Creating your first heat map recipe.

How to do it...

Execute the following code in R via the 5644OS_03_01.r script and take a look at the PDF file custom_heatmaps.pdf that will be created in the current working directory:

### loading packages
if (!require("gplots")) {
install.packages("gplots", dependencies = TRUE)
library(RColorBrewer)
}
if (!require("RColorBrewer")) {
install.packages("RColorBrewer", dependencies = TRUE)
library(RColorBrewer)
}

### reading in data
gene_data <- read.csv("arabidopsis_genes.csv")
row_names <- gene_data[,1]
gene_data <- data.matrix(gene_data[,2:ncol(gene_data)])
rownames(gene_data) <- row_names
### setting heatmap.2() default parameters
heat2 <- function(...) heatmap.2(gene_data, 
  tracecol = "black",
  dendrogram = "column", 
  Rowv = NA, 
  trace = "none", 
  margins = c(8,10), 
  density.info = "density", ...)

pdf("custom_heatmaps.pdf")

### 1) customizing colors
# 1.1) in-built color palettes
heat2(col = terrain.colors(n = 1000), 
  main = "1.1) Terrain Colors") 

# 1.2) RColorBrewer palettes
heat2(col = brewer.pal(n = 9, "YlOrRd"),
  main = "1.2) Brewer Palette")

# 1.3) creating own color palettes
my_colors <- c(y1 = "#F7F7D0", 
  y2 = "#FCFC3A", 
  y3 = "#D4D40D", 
  b1 = "#40EDEA",
  b2 = "#18B3F0", 
  b3 = "#186BF0", 
  r1 = "#FA8E8E",
  r2 = "#F26666", 
  r1 = "#C70404")
heat2(col = my_colors, 
  main = "1.3) Own Color Palette")
my_palette <- colorRampPalette(c("blue", "yellow", "red"))(n = 1000)
heat2(col = my_palette, main = "1.3) ColorRampPalette")

# 1.4) gray scale
heat2(col = gray(level = (0:100)/100),
  main ="1.4) Gray Scale")

### 2) adding cell notes
fold_change <- 2^gene_data
rounded_fold_changes <- round(rounded_fold_changes, 2)
heat2(cellnote = rounded, 
  notecex = 0.5, 
  notecol = "black", 
  col = my_palette,
  main = "2) Cell Notes")

### 3) adding column side colors
heat2(ColSideColors = c("red", "gray", "red",
rep("green",13)),
  main = "3) ColSideColors")

dev.off()

How it works...

Primarily, we will be using the already familiar functions from the previous recipes, read.csv() and heatmap.2(), to read in data into R and construct our heat maps. In this recipe, however, we will focus on advanced features to enhance our heat maps, such as customizing color and other visual elements:

  1. Inspecting the arabidopsis_genes.csv data set: The arabidopsis_genes.csv file contains a compilation of gene expression data from the model plant Arabidopsis thaliana. I obtained the freely available data of 16 different genes as log 2 ratios of target and reference gene from the Arabidopsis eFP Browser (http://bar.utoronto.ca/efp_arabidopsis/). For each gene, expression data of 47 different areas of the plant is available in this data file.

  2. Reading the data and converting it into a numeric matrix: As we already know from the previous recipe, we have to convert the data table into a numeric matrix first before we can construct our heat maps:

    gene_data <- read.csv("arabidopsis_genes.csv")
    row_names <- gene_data[,1]
    gene_data <- data.matrix(gene_data[,2:ncol(gene_data)])
    rownames(gene_data) <- row_names 
  3. Creating a customized heatmap.2() function: To reduce typing efforts, we are defining our own version of the heatmap.2() function now, where we will include some arguments that we are planning to keep using throughout this recipe:

    heat2 <- function(...) heatmap.2(gene_data, 
      tracecol = "black",
      dendrogram = "column", 
      Rowv = NA, 
      trace = "none", 
      margins = c(8,10), 
      density.info = "density", ...)

    So, each time we call our newly defined heat2() function, it will behave similar to the heatmap.2() function, except for the additional arguments that we will pass along. We also include a new argument, black, for the tracecol parameter, to better distinguish the density plot in the color key from the background.

  4. The built-in color palettes: In the previous recipes, we used the default color palette of heatmap.2(), heat.colors, which represents a color transition from yellow to red. There are four more color palettes available in the base R that we could use instead of the heat.colors palette: rainbow, terrain.colors, topo.colors, and cm.colors.

    So let us make use of the terrain.colors color palette now, which will give us a nice color transition from green over yellow to rose:

    heat2(col = terrain.colors(n = 1000), 
      main = "1.1) Terrain Colors")

    If you recall the heat maps that we created in the previous two recipes, you might have noticed that the transition between the colors in the color key was not really smooth, but rather abrupt. The five color palettes mentioned previously allow us to define the number of different color shades that we can use. Therefore every number for the parameter n that is larger than the default value 12 will add additional colors, which will make the transition smoother. A value of 1000 for the n parameter should be more than sufficient to make the transition between the individual colors indistinguishable to the human eye.

    Note

    You can find more information about the positioning of the color key in the There's more... section of the Creating your first heat map in R recipe.

    The following image shows a side-by-side comparison of the heat.colors and terrain.colors color palettes using a different number of color shades:

    Tip

    Further, it is also possible to reverse the direction of the color transition. For example, if we want to have a heat.color transition from yellow to red instead of red to yellow in our heat map, we could simply define a reverse function:

    rev_heat.colors <- function(x) rev(heat.colors(x))
    heat2(col = rev_heat.colors(500))
  5. RColorBrewer palettes: A lot of color palettes are available from the RColorBrewer package. To see how they look like, you can type display.brewer.all() into the R command-line after loading the RColorBrewer package. However, in contrast to the dynamic range color palettes that we have seen previously, the RColorBrewer palettes have a distinct number of different colors. So to select all nine colors from the YlOrRd palette, a gradient from yellow to red, we use the following command:

    heat2(col = brewer.pal(n = 9, "YlOrRd"),
      main = "1.2) Brewer Palette")
    

    The following image gives you a good overview of all the different color palettes that are available from the RColorBrewer package:

  6. Creating our own color palettes: Next, we will see how we can create our own color palettes. A whole bunch of different colors are already defined in R. An overview of those colors can be seen by typing colors() into the command line of R.

    The most convenient way to assign new colors to a color palette is using hex colors (hexadecimal colors). Many different online tools are freely available that allow us to obtain the necessary hex codes. A great example is color picker (http://www.colorpicker.com), which allows us to choose from a rich color table and provides us with the corresponding hex codes.

    Once we gather all the hexadecimal codes for the colors that we want to use for our color palette, we can assign them to a variable as we have done before with the explicit color names:

    my_colors <- c(y1 = "#F7F7D0", 
      y2 = "#FCFC3A", 
      y3 = "#D4D40D", 
      b1 = "#40EDEA",
      b2 = "#18B3F0", 
      b3 = "#186BF0", 
      r1 = "#FA8E8E",
      r2 = "#F26666", 
      r1 = "#C70404")
    heat2(col = my_colors, 
      main = "1.3) Own Color Palette")

    This is a very handy approach for creating a color key with very distinct colors. However, the downside of this method is that we have to provide a lot of different colors if we want to create a smooth color gradient; we have used 1000 different colors for the terrain.color() palette to get a smooth transition in the color key!

  7. Using colorRampPalette for smoother color gradients: A convenient approach to create a smoother color gradient is to use the colorRampPalette() function, so we don't have to insert all the different colors manually. The function takes a vector of different colors as an argument. Here, we provide three colors: blue for the lower end of the color key, yellow for the middle range, and red for the higher end. As we did it for the in-built color palettes, such as heat.color, we assign the value 1000 to the n parameter:

    my_palette <- colorRampPalette(c("blue", "yellow", "red"))(n = 1000)
    heat2(col = my_palette, main = "1.3) ColorRampPalette")

    Note

    In this case, it is more convenient to use discrete color names over hex colors, since we are using the colorRampPalette() function to create a gradient and do not need all the different shades of a particular color.

  8. Grayscales: It might happen that the medium or device that we use to display our heat maps does not support colors. Under these circumstances, we can use the gray palette to create a heat map that is optimized for those conditions.

    The level parameter of the gray() function takes a vector with values between 0 and 1 as an argument, where 0 represents black and 1 represents white, respectively. For a smooth gradient, we use a vector with 100 equally spaced shades of gray ranging from 0 to 1.

    heat2(col = gray(level = (0:200)/200),
      main ="1.4) Gray Scale")

    Tip

    We can make use of the same color palettes for the levelplot() function too. It works in a similar way as it did for the heatmap.2() function that we are using in this recipe. However, inside the levelplot() function call, we must use col.regions instead of the simple col, so that we can include a color palette argument.

  9. Adding cell notes to our heat map: Sometimes, we want to show a data set along with our heat map. A neat way is to use so-called cell notes to display data values inside the individual heat map cells. The underlying data matrix for the cell notes does not necessarily have to be the same numeric matrix we used to construct our heat map, as long as it has the same number of rows and columns.

    As we recall, the data we read from arabidopsis_genes.csv resembles log 2 ratios of sample and reference gene expression levels. Let us calculate the fold changes of the gene expression levels now and display them—rounded to two digits after the decimal point—as cell notes on our heat map:

    fold_change <- 2^gene_data
    rounded_fold_changes <- round(fold_change, 2)
    heat2(cellnote = rounded_fold_changes, 
      notecex = 0.5, 
      notecol = "black", 
      col = rev_heat.colors,
      main = "Cell Notes")

    The notecex parameter controls the size of the cell notes. Its default size is 1, and every argument between 0 and 1 will make the font smaller, whereas values larger than 1 will make the font larger. Here, we decreased the font size of the cell notes by 50 percent to fit it into the cell boundaries. Also, we want to display the cell notes in black to have a nice contrast to the colored background; this is controlled by the notecol parameter.

  10. Row and column side colors: Another approach to pronounce certain regions, that is, rows or columns on the heat map is to make use of row and column side colors. The ColSideColors argument will place a colored box between the dendrogram and heat map that can be used to annotate certain columns. We pass our vector with colors to ColSideColors, where its length must be equal to the number of columns of the heat map. Here, we want to color the first and third column red, the second one gray, and all the remaining 13 columns green:

    heat2(ColSideColors = c("red", "gray", "red", rep("green", 13)),
      main = "ColSideColors")

    You can see in the following image how the column side colors look like when we include the ColSideColors argument as shown previously:

    Attentive readers may have noticed that the order of colors in the column color box slightly differs from the order of colors we passed as a vector to ColSideColors. We see red two times next to each other, followed by a green and a gray box. This is due to the fact that the columns of our heat map have been reordered by the hierarchical clustering algorithm.

Drawing choropleth maps and contour plots (Intermediate)


We have constructed many heat maps so far, including popular applications like gene expression and stock analyses. There are many more fields and disciplines where heat maps are an invaluable tool for intuitive and data analyses and representations. In this recipe, we will learn how to construct the closely related choropleth maps and contour plots, which are very useful to visualize geographic data.

Although choropleth maps get a huge boost of popularity during the election years for sure, there are many more widely used applications, such as population census, disease factors by regions, or income comparisons. Basically, choropleth maps are the representation of choice whenever you want to show and compare statistics between geographic regions on a cartographic map. Those geographic regions are usually separated by county, state, or even country borders.

The following choropleth map shows the annual average temperatures of the US from 1971-2001:

While contour plots are represented less in the media, they are extensively used by engineers and scientists. Common examples are the 2D projections of 3D surfaces, such as geographic terrains. However, contour plots have many more applications and can be used to plot gradients, depths, thicknesses, and other distributions as 2D projections.

Imagine a mountain from a bird's-eye perspective, where the contour lines delineate the different levels of altitude. The regions that are enclosed by the contour lines can be filled with colors or color gradients to better distinguish the different heights.

In the second part of this recipe, we will apply this concept by creating a contour plot of a volcano.

Getting ready

Download the 5644OS_04_01.r script and the data files usa_temp.csv and south_am_pop.csv from your account at http://www.packtpub.com.

I recommend that you download and save the script and data file to the same folder on your hard drive. If you execute the script from a different location to the data file, you have to change the current R working directory accordingly.

Please read the Getting ready section of the Reading data from different file formats recipe for more information on how to change the working directory of your current R session.

For more information on how to view the current working directory of your current R session and an explanation on how to run scripts in R, please read the Getting ready section of the Creating your first heat map in R recipe.

The script will check automatically if any additional packages need to be installed in R. You can find more information about the installation of packages in the Getting ready section of the Creating your first heat map in R recipe.

How to do it...

Execute the following code in R invoking the script 5644OS_04_01.r and take a look at the PDF files chloropleths_maps.pdf and contour_plot.pdf that will be created in the current working directory:

### loading packages

if (!require("ggplot2")) {
install.packages("ggplot2", dependencies = TRUE)
library(ggplot2)
}
if (!require("maps")) {
install.packages("maps", dependencies = TRUE)
library(maps)
}
if (!require("mapdata")) {
install.packages("mapdata", dependencies = TRUE)
library(mapdata)
}

pdf("chloropleth_maps.pdf", height = 7,width = 10)

### 1) average temperature USA

# 1.1) reading in and processing data
usa_map <- map_data(map = "state")

usa_temp <- read.csv("usa_temp.csv", comment.char = "#")

usa_data <- merge(usa_temp, usa_map, 
by.x ="state", by.y =  "region") # case sensitive

usa_sorted <- usa_data[order(usa_data["order"]),]

# 1.2) plotting USA chloropleth maps
usa_map1 <- ggplot(data = usa_sorted) + 
  geom_polygon(aes(x = long, y = lat, 
group = group, fill = fahrenheit)) + 
  ggtitle("USA Map 1")
print(usa_map1)

usa_map2 <- usa_map1 + coord_map("polyconic") + 
  ggtitle("USA Map 2 - polyconic")
print(usa_map2)

usa_map3 <- usa_map2 + 
  geom_path(aes(x = long, y = lat, group = group), 
color = "black") + 
  ggtitle("USA Map 3 - black contours")
print(usa_map3)

usa_map4 <- usa_map3 + 
  scale_fill_gradient(low = "yellow", high = "red") + 
  ggtitle("USA Map 4 - gradient 1")
print(usa_map4)

usa_map5 <- usa_map3 + 
scale_fill_gradient2(low = "steelblue", mid = "yellow",
  high = "red",  midpoint = colMeans(usa_sorted["fahrenheit"])) + 
  ggtitle("USA Map 5 - gradient 2")
print(usa_map5)


### 2) South American population count

# 2.1) reading in and processing data
south_am_map <- map_data("worldHires", 
region = c("Argentina", "Bolivia", "Brazil",
  "Chile", "Colombia", "Ecuador", "Falkland Islands",
"French Guiana", "Guyana", "Paraguay", "Peru",
"Suriname", "Uruguay", "Venezuela"))

south_am_pop <- read.csv("south_america_pop.csv", 
comment.char = "#")

south_am_data <- merge(south_am_pop, south_am_map,  by.x = "country", by.y = "region")

south_am_sorted <- south_am_data[order(
south_am_data["order"]),]

# 2.2) creating chloropleth maps
south_am_map1 <- ggplot(data = south_am_sorted) + 
  geom_polygon(aes(x = long, y = lat, 
group = group, fill = population)) +
  geom_path(aes(x = long, y = lat, group = group), 
color = "black") + 
  coord_map("polyconic") + 
scale_fill_gradient(low = "lightyellow",
  high = "red", guide = "legend")
print(south_am_map1)

south_am_map2 = south_am_map1 + 
  theme(panel.background = element_blank(),
  axis.text = element_blank(), 
  axis.title = element_blank(), 
  axis.ticks = element_blank())
print(south_am_map2)

dev.off()


### 3) Volcano contour plot

pdf("contour_plot.pdf", height = 7,width = 10)

data(volcano)
image.plot(volcano)
contour(volcano, add = TRUE)

dev.off()

How it works...

In this recipe we are using three new packages that we have not encountered before. One of them is Hadley Wickham's ggplot2 package, which is a relatively new and powerful plotting system. It has become a very popular alternative to R's basic plotting functions, because it provides both great versatility and a uniform syntax. The complete documentation can be found at http://docs.ggplot2.org/current/.

The other two packages, maps and mapdata, contain the data for our maps that we are going to use as templates for our choropleth maps. You can find a table with all the maps that are available in both maps and mapdata at the end of this section.

The difference between those two map packages is that they contain different maps with different levels of detail.

The following image compares the levels of detail between the map of Japan extracted from the world map of maps, the high resolution world map of mapdata, and the single high resolution mapdata map of Japan.

  1. Annual average temperatures of the United States: The first data file, usa_temp.csv, contains the annual average temperatures of the USA from 1971 to 2001. The data was made available by the National Climatic Data Center of the United States, National Oceanic and Atmospheric Administration, and has been compiled by current results.

  2. Reading in map and temperature data: First, we use the read.csv() function, known from the Reading data from different file formats recipe, to read our data and assign it to the variable usa_temp. Next, we use the map_data() function from the ggplot2 package to read in the United States map from maps as a data frame.

    Note that in contrast to the heat map functions that we have been using in the previous recipes, the gplot() function takes a data frame as input instead of data in a numerical matrix format. And as we remember from the Reading data from different file formats recipe, data that we read via read.csv() is converted into a data frame automatically.

    Note

    Matrices in R can only contain data of the same type, for example, numeric. However, data frames can contain different types of data as variables (columns), such as numeric, factor, logical, or character.

  3. Merging and sorting the data frames: Now we will merge the map and temperature data:

    usa_data <- merge(usa_temp, usa_map, 
    by.x = "state", by.y = "region") 
    usa_sorted <- usa_data[order(usa_data["order"]),]

    The following flowchart illustrates this process and shows how the data looks like before and after the merging and conversion:

    As we can see in the preceding chart, the map data in the usa_map data frame contains multiple longitudinal (long) and latitudinal (lat) coordinates for each state in the column labeled with region. The numbers in the order column specify the order in which the single dots are connected when we draw the map as a polygon.

    After we merged both map data and temperature data, we can see that the values in the order column have been shuffled. This would result in quite a mess when we wanted to construct the choropleth map from this data frame. Therefore, we use the order() function to restore the original order before we merged the data sets.

    Note

    The merge() function is case sensitive. We have to make sure that the character strings in the two columns that we want to merge match.

  4. Constructing our first choropleth map: Now that we have our data ready in a nice format, we can proceed to the main plotting functions to create our first choropleth map:

    usa_map1 <- ggplot(data = usa_sorted) + 
      geom_polygon(aes(x = long, y = lat, group = group, 
    fill = fahrenheit)) + 
      ggtitle("USA Map 1")

    Basically, how the ggplot2 package works is that we have to create our plots incrementally. First, we use the ggplot() function to assign our data to a new plot object, and then we can add further layers to it. The basic shape of ggplot2 plots is determined by so-called geoms, or geometric objects. Here, we are using geom_polygon(), which will connect our coordinate points to a map. The aes() function nested inside is there for aesthetics and defines the visual properties of the geoms. We provide the variable fahrenheit as an argument for the fill parameter so that we can shade the areas of our choropleth maps according to the Fahrenheit temperatures in our data set.

    Note

    Similar to the plotting functions of the lattice package, which we have used in the previous recipes, ggplot() creates plot graphics as objects. Therefore, we have to use the print() function on the objects to save them as image files. Note that in an interactive R session, you can omit the print() function and just need to type the name of the graphics object, for example, usa_map1, to create the map on the screen.

  5. Customizing choropleth maps: One of the advantages of ggplot2 is that we do not have to retype the whole heat map function with its arguments each time we want to modify the map, since we store our plots as objects. So if we want to convert our map into a polyconic projection, we can simple reuse our old graphic object and make a small modification:

     usa_map2 <- usa_map1 + coord_map("polyconic") + 
      ggtitle("USA Map 2 - polyconic")

    Now, let us add another geometric object so that the state borders will be drawn as black lines:

    usa_map3 <- usa_map2 + 
      geom_path(aes(x = long, y = lat, group = group), 
    color = "black") + 
      ggtitle("USA Map 3 - black contours")

    Since we are showing temperatures on our United States map, it would be more appropriate to replace the blue default color gradient by a color gradient from yellow to red. We can modify the color gradient by adding the scale_fill_gradient() function with two color arguments for our previous graphic object:

    usa_map4 <- usa_map3 + 
      scale_fill_gradient(low = "yellow", high = "red") + 
      ggtitle("USA Map 4 - gradient 1")

    Note

    Instead of discrete color names, we can also use the hex color codes that we have seen in the Customizing heat maps recipe.

    If we want to add a third color to our gradient, we can use the scale_fill_gradient2() function instead. For this function, we need an additional argument that specifies the position (mid-point) of the second color. We can simply use the temperature mean such as a midpoint measure:

    usa_map5 <- usa_map3 + 
    scale_fill_gradient2(low = "steelblue", mid = "yellow",
    high = "red",  midpoint =   
    colMeans(usa_sorted["fahrenheit"])) + 
      ggtitle("USA Map 5 - gradient 2")
  6. Extracting regions from the world map: Now that we have seen how we can use ggplot2 to create simple choropleth maps, let us explore some advanced features.

    As you can see in the table at the end of this section, the number of maps in the maps and mapdata packages is quite limited. However, we are able to extract individual countries from the world map.

    Our second data set, south_am_pop.csv, contains recent population counts of all the countries of South America. Unfortunately, neither maps nor mapdata contains a map of this continent, so we extract all the South American countries from the high resolution world map of mapdata.

    south_am_map <- map_data("worldHires", region = c("Argentina",
    "Bolivia", "Brazil","Chile", "Colombia", "Ecuador", 
    "Falkland Islands", "French Guiana", "Guyana", 
    "Paraguay", "Peru", "Suriname", "Uruguay", "Venezuela"))

    After we read the data from south_am_pop.csv and merge and sort the data frames, we can create the choropleth map:

    south_am_map1 <- ggplot(data = south_am_sorted) + 
      geom_polygon(aes(x = long, y = lat, 
    group = group, fill = population)) +
      geom_path(aes(x = long, y = lat, group = group), 
    color = "black") + 
      coord_map("polyconic") + 
    scale_fill_gradient(low = "lightyellow",
      high = "red", guide = "legend")

    As we have seen it for the USA map before, we use the ggplot() function to create a new plot object and geom_polygon to construct the map graphic. With the color parameter in geom_path, we add a black border around the individual countries, and with the coord_map function, we convert our map into a polyconic projection. We also change the color gradient from light-yellow to red and make the legend categorical by providing the argument legend for the guide parameter.

    By default, ggplot2 creates plots on a gray background with a white grid. To get a nicer layout, let us create a map on a clean, white background and remove axes tick marks, labels, and titles altogether:

    south_am_map2 = south_am_map1 + 
      theme(panel.background = element_blank(),
      axis.text = element_blank(), 
      axis.title = element_blank(), 
      axis.ticks = element_blank())

    In the following image, you can see the resulting choropleth map of South America's population count on a white background:

  7. Volcano contour plot: As mentioned in the introduction of this recipe, we want to take a look at another type of map representation, contour plots:

    data(volcano)
    image.plot(volcano)

    Using the image.plot() function from the fields library, we construct a color grid with the topographical data of a volcano from R's data package. We could also use the image() base function in R, but the image.plot() function has some nice additional features, such as placing a legend for the color grid to the right of the plot.

    After we created the underlying color grid, we use the contour() function to place the contour lines on top of the color grid.

    contour(volcano, add = TRUE)

    The following image shows the final contour plot that we have created from the volcano data set:

The following table summarizes the different maps that are included in the maps and mapdata packages:

Package

Database

Description

maps

usa

Map of the United States

county

Map of the US Counties

state

Map of the states in the US

Italy

Map of Italy

france

Map of France

nz

Map of New Zealand

world

World map

world2

Pacific centric world map

mapdata

china

High resolution map of China

japan

High resolution map of Japan

nzHires

High resolution map of New Zealand

worldHires

High resolution world map

world2Hires

High resolution Pacific centric world map

Exporting for presentation (Simple)


After the exploratory stage of our data analyses, it is likely that we want to present our graphical heat map representations to an audience. This can be via presentation software over a projector, embedded in a website, a poster printout, or an image in a journal article. In this recipe, we will learn about the different file formats for image export so that we can choose whatever is most suited for the purpose.

Getting ready

Download the 5644OS_05_01.r script from your account at http://www.packtpub.com and save it to your hard drive.

For an explanation of how to run scripts in R, please read the Getting ready section of the Creating your first heat map in R recipe.

The script will check automatically if any additional packages need to be installed in R. You can find more information about the installation of packages in the Getting ready section of the Creating your first heat map in R recipe.

How to do it...

Execute the 5644OS_05_01.r script in R and compare the different image files to other that which were written to the current working directory:

if (!require("gplots")) {
install.packages("gplots", dependencies = TRUE)
library(gplots)
}
if (!require("RColorBrewer")) {
install.packages("RColorBrewer", dependencies = TRUE)
library(gplots)
}

### converting data
data(co2)
rowcolNames <- list(as.character(1959:1997), month.abb) 
co2_data <- matrix(co2,
  ncol = 12,
  byrow = TRUE,
  dimnames = rowcolNames)

heat2 <- function(...) 
  heatmap.2(co2_data,
trace = "none", 
  density.info = "none", 
  dendrogram = "none",
  Colv = FALSE,
  Rowv = FALSE,
  col = colorRampPalette(c("blue", "yellow", "red")),
(n = 100),
  margin = c(5,8),
  lhei = c(0.25,1.25),
   ...)


png("1_PNG_default.png")
heat2(main = "PNG default")
dev.off()

png("2_PNG_highres.png", 
  width = 5*300, 
  height = 5*300, 
  res = 300, 
  pointsize = 8)
heat2(main = "PNG High Resolution")
dev.off()

jpeg("3_JPEG_highres.png",
  width = 5*300, 
  height = 5*300, 
  res = 300, 
  pointsize = 8)
heat2(main = "JPEG default")
dev.off()

bmp("4_BMP_default.bmp",
width = 5*300, 
  height = 5*300, 
  res = 300, 
  pointsize = 8)
heat2(main = "BMP default")
dev.off()

pdf("5_PDF_default.pdf",
  width = 5,
  height = 5,
  pointsize = 8)
heat2(main = "PDF default")
dev.off()

svg("6_SVG_default.svg",
  width = 5,
  height = 5,
  pointsize = 8)
heat2(main = "SVG default")
dev.off()

svg("7_PostScript_default.ps",
  width = 5,
  height = 5,
  pointsize = 8)
heat2(main = "PostScript default")
dev.off()

png("8_PNG_transp.png", 
  width = 5*300, 
  height = 5*300, 
  res = 300, 
  pointsize = 8,
  bg = "transparent")
heat2(main = "PNG Transparent Background")
dev.off()

pdf("9_PDF_mono.pdf",
  family = "Courier",
  paper = "USr")
heat2(main = "PDF Monospace Font")
dev.off()

How it works...

There are two major classes of image formats that we can choose from when we want to save our plots as a file: vector graphics and raster graphics. Raster graphics, also known as bitmaps, comprise popular image formats such as PNG, BMP, and JPEG. These file formats store information of each individual pixel, thus the quality of the image heavily depends on the resolution, that is pixel per inch (ppi). However, high-resolution images come with the additional cost of a large file size. Nowadays, we usually do not have to worry about limited storage space of a hard drive anymore, but large images are not suitable for the Web, and in the worst case, they can also have a negative impact on the responsiveness of your presentation software.

One of the nice features of R is to save plots as vector graphics, such as SVG, PDF, and PostScript. While all three vector graphic formats offer high quality graphics, each has its own area of application. SVG can be easily embedded into HTML code and is the format of choice for displaying your plots on the Web. PostScript is the desired format when you want to send articles to a journal, where as PDF is well known for its great compatibility with a wide range of software including PDF readers.

Note

If you are interested to see how the SVG and PostScript code looks like, I encourage you to open the SVG and PostScript files that were created by this script in your favorite plain text editor (for example, TextEdit on Mac OS X or Notepad on Windows).

Rather than storing information of each pixel in a grid, vector graphic files contain instructions for geometrical shapes that are used by the visualization software to render the image. This allows us to zoom in to the image without any loss of quality. Another advantage of vector images is that they generally have a much smaller file size than raster graphics. The only exception is when your plot is heavily over-plotted, so that a lot of instructions have to be saved in the vector graphic file. Generally, vector graphics are great to store image files on your computer, however images can only be rendered by certain software and are converted to raster graphics for printing.

Images from SVG files, for example, can be rendered in every modern web browser, and since they are saved as Extensible Markup Language (XML) code, they can be easily embedded into HTML, which makes SVG the format of choice if you want to display your graphics on a website.

If you do not consider to embed your graphic in a website or send your graphics off to a journal for professional publication, I recommend using the PDF format, since it has great compatibility with other software and offers best quality output in reasonably small file sizes.

You will find an overview table of the different graphic devices that are available in R at the end of this section.

Now let us dive in and create different image files from our heat maps:

  1. Reading data: For this recipe, we will use the co2 data set from the data package in R. This co2 data set is a time-series that consists of 468 monthly measurements of carbon dioxide (CO2) concentrations in parts per million (ppm) from 1959 to 1997.

    We know from the Creating your first heat map in R recipe how to convert data from a time-series format into a numeric matrix, which is the only format that is compatible with the heatmap2.() function.

  2. Setting up our own heatmap.2() function: Like we did in the Customizing heat maps recipe, we create our own derivative of the heatmap.2() function with some default arguments that we want to use for all our heat maps in this recipe to avoid repetitive typing efforts:

    heat2 <- function(...) 
      heatmap.2(co2_data,
    trace = "none", 
      density.info = "none", 
      dendrogram = "none",
      Colv = FALSE,
      Rowv = FALSE,
      col = colorRampPalette(c("blue", "yellow", "red")),
    (n = 100),
      margin = c(5,8),
      lhei = c(0.25,1.25),
       ...)

    We use a new parameter lhei that we have not encountered so far. With this parameter, we can control the height of the different plot elements. The arguments we provide here will make our legend thinner.

    Before we proceed with the graphic devices, let us briefly take a look at the general paradigm of creating image files in R, which consists of three basic steps as follows:

    1. Opening a graphics device.

    2. Calling a plotting function.

    3. Closing the graphics device.

  3. Creating image files: First, let us create a PNG file using the png() function with its default parameters:

    png("1_PNG_default.png")
    heat2(main = "PNG default")
    dev.off()

    The first argument that the graphics device takes is the name of the output file. After we open the graphics device, we create the heat map using our heat2() function, and lastly, we close the graphic device with the dev.off() function.

    Note

    Very often, the main reason why we cannot open a PDF file that we created in R is that we forgot to close the PDF graphics device after we finished plotting!

    Because the resolution of the PNG image is very low (75 ppi) when we use png() with its default parameters, we create another PNG file with a resolution of 300 ppi, which should provide reasonable quality for a print out on a letter-sized piece of paper. The default size of png() is measured in pixels, and here we are creating a 5 x 5 inches output by taking the number of pixels per inch and multiplying it by 5 inches. Further, we decrease the text size slightly from 12 to 8 bp (1 bp equals 1/72 inches) in the pointsize parameter:

    png("2_PNG_highres.png", 
      width = 5*300, 
      height = 5*300, 
      res = 300, 
      pointsize = 8)
    heat2(main = "PNG High Resolution")
    dev.off()

    Next, we create a JPEG, BMB, SVG, and PDF file with the same parameters so we can compare them to each other. Note that the height and width of the SVG and PDF files are measured in inches.

    Tip

    You do not have to download special software to view the SVG file. Simply open it in your favorite web browser.

    When we compare those different image files to each other, we see that they all show the same heat map, but if we zoom in, we notice that the quality differs tremendously.

    Note

    The difference between JPEG, PNG, and BMP is that BMP stores the image file without compression, PNG with lossless compression, and JPEG with a lossy compression, respectively.

    In the following image, you can see the tremendous difference in the image quality between the different file formats. This is something you should consider, especially if you are preparing your graphics for an on-screen presentation.

    When we view the BMP, PNG, and JPEG images, the quality seems to be quite reasonable. But as soon as we zoom in on the image, we can barely read the text of the PNG file with the 75 ppi default. In contrast to the 300 ppi BMP, PNG and JPEG files do not have any jagged edges in the PDF and SVG files, no matter how far we zoom in.

    Tip

    Of course, we can increase the ppi amount of the raster graphics even further, however it depends on the purpose of the image and where it will be presented.

  4. More options: So if we have the choice between the different raster graphics formats, I recommend using PNG, since the files are way smaller then BMP files due to the lossless compression. Using JPEG over PNG should only be considered if file size really matters to you.

    In contrast to JPEG files, BMP and PNG files support transparent backgrounds. This is particularly useful if we want to place the heat map on a patterned background, on a poster, or a presentation slide for example. This can be specified by providing the argument transparent for the bg (background) parameter.

    Tip

    Similarly, we could also use a color name instead of transparent to create a colored background.

    png("test/8_PNG_transp.png", 
      width = 5*300, 
      height = 5*300, 
      res = 300, 
      pointsize = 8,
      bg = "transparent")
    heat2(main = "PNG Transparent Background")
    dev.off()

    Note

    The background of the image files in PDF format is transparent by default.

    The PDF format has the further advantage wherein we can choose a different font family if we like. We can choose from AvantGarde, Bookman, Courier, Helvetica, Helvetica-Narrow, NewCenturySchoolbook, Palatino, and Times. Also, we can specify the paper format, such as a4 for DIN A4, letter for the American Standard Letter format, or a4r and USr for the respective rotated landscape formats:

    pdf("9_PDF_mono.pdf",
      family = "Courier",
      paper = "USr")
    heat2(main = "PDF Monospace Font")
    dev.off()

The following table shows the different graphic devices that are available in R:

On-screen devices

 
 

x11()

X Window System (X11), default in Unix/Linux

 

quartz()

Quartz, default on Mac OS X

 

windows()

Default in Microsoft Windows

Raster graphics devices

 
 

jpeg()

Joint Photographics Experts Group (JPEG) image file with lossy compression

 

png()

Portable Network Graphics (PNG) image file with lossless compression

 

bmp()

Bitmap (BMP) image file with no compression

 

tiff()

Tagged Image File Format (TIFF) with optional compression

Vector image devices

 
 

pdf()

Adobe's popular Portable Documents Format (PDF)

 

svg()

XML-based Scalable Vector Graphics (SVG) format

 

postscript()

PostScript (PS) format

Other

 
 

xfig()

File format for Xfig vector graphics editor

 

pictex()

Graphics format for LaTex import

Bringing your data to life (Advanced)


We learned how to create heat maps, customize them, and save them as image files. Now, it is time to go a step further and add some interactivity for displaying them on the Web. In this recipe, we will learn how to manipulate heat map-containing SVG files to add a nice hover effect and fade-in tool tips using CSS. Further, we will see how to embed our heat map in HTML files, and make use of JavaScript to add further interactivity.

The following screenshot was captured from a Safari web browser after applying the hover effect to our SVG image. Notice the highlighted cell under the mouse pointer:

Getting ready

Download the 5644OS_06_01.r script and a version of the HeatModSVG program for your operating system from your account at http://www.packtpub.com. It is recommended, but not mandatory, to download the latest version of the svgpan JavaScript library from, by Andrea Leofreddi, https://code.google.com/p/svgpan/downloads/list.

Further, I recommend that you save all files to the same folder on your computer.

For an explanation on how to run scripts in R, please read the Getting ready section of the Creating your first heat map in R recipe.

The script will check automatically if any additional packages need to be installed in R. You can find more information about the installation of packages in the Getting ready section of the Creating your first heat map in R recipe.

How to do it...

Run the 5644OS_06_01.r script and then launch HeatModSVG (see the instructions below the script contents):

if (!require("gplots")) {
install.packages("gplots", dependencies = TRUE)
library(gplots)
}
if (!require("MASS")) {
install.packages("MASS", dependencies = TRUE)
library(MASS)
}

# Writing out matrix file
data(mtcars)
car_data <- mtcars[,1:7]
write.matrix(car_data, "car_data.csv", sep = ",")
norm_cars <- scale(car_data) # automatically matrix

# Creating heat map
svg("car_heatmap.svg")
heatmap.2(norm_cars, 
  density.info = "none",
  trace = "none",
  dendrogram = "none",
  Rowv = FALSE, 
  Colv = FALSE,
  margin = c(5,10))

dev.off()

After you have ran the R script, make sure that two new files were created in the current working directory: car_heatmap.svg and car_data.csv. Now, double-click on the HeatModSVG program and a new window should appear on your screen. The lines that require your input are highlighted in the sample session as follows. You can just take over the inputs of this sample session, but make sure that you type in the correct location of the car_heatmap.svg heat map file and the car_data.csv data file.

#####################################
##                                 ##
## HeatModSVG v 1.06 (04/04/2013)  ##
##                                 ##
## Written by Sebastian Raschka    ##
##                                 ##
#####################################


===============
=== Options ===
===============

-- Add Tool Tips: t
-- Add Zoom and Panning: z
-- Add Both: tz
-- Quit: q

Enter your choice: tz
Current working directory: /Users/sebastian
SVG file: /Users/sebastian/Desktop/car_heatmap.svg
Matrix file: /Users/sebastian/Desktop/car_data.csv


MATRIX SPECIFICATION
--------------------

Comment character: #

Separator 
-- "w" for whitespace
-- "t" for tab
-- "c" for comma
: c
Column names? (y/n): y
Row names? (y/n): n

Read in data from car_data.csv:


21.000  6.000   160.000 110.000 3.900   2.620   16.460 
21.000  6.000   160.000 110.000 3.900   2.875   17.020 
22.800  4.000   108.000 93.000  3.850   2.320   18.610 
... 
21.400  4.000   121.000 109.000 4.110   2.780   18.600  

Add a label to tool tips? (y/n): n

... inserted CSS <style> tag after line 2

... inserted link to svgPan.js after CSS <style> tag

... added viewport ID in line 286

... IDs and tool tip <title> tags were inserted in lines 289 to 512

Saving . . . . . . . . 

==> Created /Users/sebastian/Desktop/NEW_car_heatmap.svg

How it works...

To add interactivity to our heat maps, we will make use of R's capability to store the created images in the Scalable Vector Graphics format. The content of SVG files is saved as plain text and can be viewed with any text editor. If we open an SVG file in a text editor, we will see XML code that contains the information for our web browser to render the image.

The advantage of this XML code is that we can manipulate it using HTML, CSS, and JavaScript.

  1. Creating a heat map SVG file in R: First, we create our heat map by running the R script 5644OS_06_01.r. By now, the contents of the script should look very familiar to us, but let us go over it briefly.

    We create our heat map from the mtcars data set from the R data package. The data set contains information about 32 car models from 1973-1974. The data columns from 1 to 7 contain information on miles per gallon, number of cylinders, displacement in cubic inches, horsepower, rear axle ratio, weight (lb/1000), and one fourth mile time.

    data(mtcars)
    car_data <- mtcars[,1:7]
    write.matrix(car_data, "car_data.csv", sep = ",")

    Using the write.matrix() function, we add the mtcars data in a CSV file. We will need this data file later to add tool tips to our heat map.

    We use the scale() function to normalize the data, so we can compare the different variables of mtcars to each other in the heat map. Note that mtcars is in a data frame format, but scale() will automatically convert it into a numerical matrix. Finally, we open a new graphic device to save our heat map to an SVG file.

    norm_cars <- scale(car_data) 
    
    # Creating heat map
    svg("car_heatmap.svg")
    heatmap.2(norm_cars, 
      density.info = "none",
      trace = "none",
      dendrogram = "none",
      Rowv = FALSE, 
      Colv = FALSE,
      margin = c(5,10))
    
    dev.off()
  2. HeatModSVG options overview: Now that we have created an SVG file of our heat map, we can use the HeatModSVG program to add some interactivity to it. Let us take a look at the execution of the program before we discuss how it modifies the SVG file in more detail.

    First, the program asks us whether we want to add Tool Tips or Zoom and Panning or both. When we choose both, the program will insert tool tip labels from an external data file that will be displayed in the individual heat map cells when we hover over it with the mouse pointer. Further, the program will embed a reference to Andrea Leofreddi's SVGPan JavaScript library, which will add features like zooming, dragging, and panning to our heat map.

  3. Reading a data file into HeatModSVG: To add tool tips to our heat map, the HeatModSVG program will prompt us for a data file in addition to the SVG file that we want to modify.

    This tool tip label data can stem from the same data file that we used to create our heat map, or it can be another text file that contains data with the same dimensions (a similar number of rows and columns like the heat map data file).

    In our case, we want to show the original values of the mtcars columns from 1 to 7 that we saved to car_data.csv before we normalize the data to create the heat map.

    The following screenshot highlights the important formatting features of the car_data.csv file that we have to feed into the HeatModSVG program:

    The HeatModSVG program will ask us for the format of the comment characters in the data file, so it can ignore those lines when it reads the data. We entered #, but in the case of car_data.csv, it does not really matter, since car_data.csv does not contain any comments.

    Next, we chose c (comma) as the column separator, since we created car_data.csv as a Comma Separated Value file in R. When HeatModSVG asks us for row labels, we choose no, and for column names, we choose yes, since car_data.csv only has a header and no row labels.

    After we have specified all these options HeatModSVG prompted us to make, the contents of the data file will be printed to the screen based on how it was read in by HeatModSVG. This is the point where we should double-check whether we provided the correct options to HeatModSVG for reading in the data file—we should only see the matrix values here and no row and column labels.

    Finally, the program will ask us if we want to add a label that will be placed in front of our tool tips. If we would have chosen yes, we would have been prompted to enter a character or character string that will appear as a global label in front of our tool tip values, and the tool tips would then be displayed in the following format: <Label:> <value>.

    Just before the program finishes, it will notify us about the changes that it made to the original SVG file:

    ... inserted CSS <style> tag after line 2
    
    ... inserted link to svgPan.js after CSS <style> tag
    
    ... added viewport ID in line 286
    
    ... IDs and tool tip <title> tags were inserted in lines 289 
        to 512
    
    Saving . . . . . . . . 
    
    ==> Created /Users/sebastian/Desktop/NEW_car_heatmap.svg
  4. Modification to the SVG file: Let us take a look at the changes made by HeatModSVG step-by-step to understand how the interactivity effect works:

    ... inserted CSS <style> tag after line 2

    When we open the new SVG file, NEW_car_heatmap.svg, we see that the program inserted a CSS style tag just after line 2:

    <style>
    #hoverItem:hover{opacity: 0.3;}
    </style>

    This CSS style tag adds a mouse-over or hover effect to each element in the XML file that we label with the CSS ID hoverItem.

    Next, a link to the SVGPan.js JavaScript file was inserted at the top of the SVG file, right after the previously inserted CSS style tag:

    ... inserted link to svgPan.js after CSS <style> tag

    The link to the JavaScript file looks like this:

    <script xlink:href="SVGPan.js"/>

    In order for svgpan to work, it has to be in the same folder as the SVG file, or else we have to add the path in front of it.

    Tip

    Alternatively, you can also copy the complete contents of SVGPan.js into your SVG file script tags:

    <script> SVGPan.js contents </script>

    In order for the SVGPan effects to work, we have to add the ID viewport to the heat map elements:

    ... added viewport ID in line 286

    Note that the number 286 refers to the location in the original SVG file; since we inserted four lines already, we will find the viewport ID opening tag in line 290 of the new SVG file:

    <g id="viewport">. 

    In fact, viewport replaced another ID, surface61; this is the ID that determines the start of the visual elements of our heat map.

    Tip

    The contents of the SVG file might seem very confusing. To get an idea about its structure and how it is structured, I recommend you to simply find it out by deleting individual elements from the XML code and see how the SVG image changes in the browser.

    At this point, you may ask why we need an extra program to make those tiny changes to the XML code. In fact, we could have done everything manually in no time, but stay tuned, because now comes the laborious part:

    ... IDs and tool tip <title> tags were inserted in lines 289 to 512

    At the beginning, we saw the CSS style tag that contained the hover action added in line number 3. Now, we have to assign it to each individual cell of the heat map by adding the hoverItem ID. Further, we want to add the tool tip label from the car_data.csv file to these cells too, so that a value appears if we hover over a particular cell in the heat map.

    Without a program to automate this process, we would have to repeat this process 224 times to add the hoverItem ID and tool tip label to each cell of the 32 x 7 heat map.

    The following screenshot shows an XML that shows how an exemplary heat map cell looks like before and after the conversion:

Left arrow icon Right arrow icon

Key benefits

  • Learn something new in an Instant! A short, fast, focused guide delivering immediate results
  • Create heat maps in R using different file formats
  • Learn how to make choropleth maps and contour plots
  • Generate your own customized heat maps and add interactivity for displaying on the web

Description

R has grown rapidly over the years to become one of the most versatile and valuable tools for data analysis and graphing. One of its many useful features is the heat map representation of numerical data, which is an invaluable tool to discover patterns in data quickly and efficiently. Instant Heat Maps in R How-to provides you with practical recipes to create heat maps of all difficulty levels by yourself right from the start. At the end of each recipe, you will find an in-depth analysis that will equip you with everything you need to know to frame the code to your own needs. Instant Heat Maps in R will present you with all the different heat map plotting functions that exist in R. You will start by creating simple heat maps before moving on to learn how to add more features to them. While you advance step-by-step through the well-connected recipes, you will find out which tool suits the given situation best. You will learn how to read data from popular file formats and how to format the data to create heat maps as well as the ways to export them for presentation.

What you will learn

Use different heat map functions in R Customize the layout of your heat maps Read the data and apply it to the heat map Export your heat maps into high-quality picture formats Display geographical data on choropleth maps and contour plots Add interactive hover effects to your heat maps

Product Details

Country selected

Publication date : Jun 24, 2013
Length 72 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781782165644
Category :
Languages :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Jun 24, 2013
Length 72 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781782165644
Category :
Languages :

Table of Contents

7 Chapters
Instant Heat Maps in R How-to Chevron down icon Chevron up icon
Credits Chevron down icon Chevron up icon
About the Author Chevron down icon Chevron up icon
About the Reviewers Chevron down icon Chevron up icon
www.PacktPub.com Chevron down icon Chevron up icon
Preface Chevron down icon Chevron up icon
1. Instant Heat Maps in R How-to Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.