Creating your first heat map in R

Exclusive offer: get 50% off this eBook here
Instant Heat Maps in R How-to [Instant]

Instant Heat Maps in R How-to [Instant] — Save 50%

Learn how to design heat maps in R to enhance your data analysis with this book and ebook

$14.99    $7.50
by Sebastian Raschka | June 2013 | Open Source

In this article by Sebastian Raschka, author of Instant Heat Maps in R How-to [Instant], we will learn how to construct our first heat map in R from the AirPassenger data set, which is a standard data set included in the data package that is available with R distributions. For this task, we will use the levelplot() function from the lattice package and explore the enhanced features of the gplots package, the heatmap.2() function.

(For more resources related to this topic, see here.)

The following image shows one of the heat maps that we are going to create in this recipe from the total count of air passengers:

Getting ready

Download the script 5644_01_01.r from your account at http://www.packtpub.com and save it to your hard disk. The first section of the script, below the comment line starting with ### loading packages, will automatically check for the availability of the R packages gplots and lattice, which are required for this recipe.

If those packages are not already installed, you will be prompted to select an official server from the Comprehensive R Archive Network (CRAN) to allow the automatic download and installation of the required packages.

If you have already installed those two packages prior to executing the script, I recommend you to update them to the most recent version by calling the following function in the R command line:

update.packages("gplots")
update.packages("lattice")

Use the source() function in the R command-line to execute an external script from any location on your hard drive.

If you start a new R session from the same directory as the location of the script, simply provide the name of the script as an argument in the function call as follows:

source("myScript.r")

You have to provide the absolute or relative path to the script on your hard drive if you started your R session from a different directory to the location of the script. Refer to the following example:

source("/home/username/Downloads/myScript.r")

You can view the current working directory of your current R session by executing the following command in the R command-line:

getwd()

How to do it...

Run the 5644OS_01_01.r script in R to execute the following code, and take a look at the output printed on the screen as well as the PDF file, first_heatmaps.pdf that will be created by this script:

### loading packages
if (!require("gplots")) {
install.packages("gplots", dependencies = TRUE)
library(gplots)
}
if (!require("lattice")) {
install.packages("lattice", dependencies = TRUE)
library(lattice)
}
### loading data
data(AirPassengers)
### converting data
rowcolNames <- list(as.character(1949:1960), month.abb)
air_data <- matrix(AirPassengers,
ncol = 12,
byrow = TRUE,
dimnames = rowcolNames)
### drawing heat maps
pdf("firstHeatmaps.pdf")
# 1) Air Passengers #1
print(levelplot(air_data,
col.regions=heat.colors,
xlab = "year",
ylab = "month",
main = "Air Passengers #1"))
# 2) Air Passengers #2
heatmap.2(air_data,
trace = "none",
density.info = "none",
xlab = "month",
ylab = "year",
main = "Air Passengers #2")
# 3) Air Passengers #3
heatmap.2(air_data,
trace = "none",
xlab = "month",
ylab = "year",
main = "Air Passengers #3",
density.info = "histogram",
dendrogram = "column",
keysize = 1.8)
dev.off()

How it works...

There are different functions for drawing heat maps in R, and each has its own advantages and disadvantages. In this recipe, we will take a look at the levelplot() function from the lattice package to draw our first heat map. Furthermore, we will use the advanced heatmap.2() function from gplots to apply a clustering algorithm to our data and add the resulting dendrograms to our heat maps.

The following image shows an overview of the different plotting functions that we are using throughout this book:

Now let us take a look at how we read in and process data from different data files and formats step-by-step:

  1. Loading packages: The first eight lines preceding the ### loading data section will make sure that R loads the lattice and gplots package, which we need for the two heat map functions in this recipe: levelplot() and heatmap.2().

    Each time we start a new session in R, we have to load the required packages in order to use the levelplot() and heatmap.2() functions. To do so, enter the following function calls directly into the R command-line or include them at the beginning of your script:

    • library(lattice)

    • library(gplots)

  2. Loading the data set: R includes a package called data, which contains a variety of different data sets for testing and exploration purposes. More information on the different data sets that are contained in the data package can be found at http:// stat.ethz.ch/ROmanual/ROpatched/library/datasets/.

    For this recipe, we are loading the AirPassenger data set, which is a collection of the total count of air passengers (in thousands) for international airlines from 1949- 1960 in a time-series format.

    data(AirPassengers)

  3. Converting the data set into a numeric matrix: Before we can use the heat map functions, we need to convert the AirPassenger time-series data into a numeric matrix first. Numeric matrices in R can have characters as row and column labels, but the content itself must consist of one single mode: numerical.

    We use the matrix() function to create a numeric matrix consisting of 12 columns to which we pass the AirPassenger time-series data row-by-row. Using the argument dimnames = rowcolNames, we provide row and column names that we assigned previously to the variable rowColNames, which is a list of two vectors: a series of 12 strings representing the years 1949 to 1960, and a series of strings for the 12 three-letter abbreviations of the months from January to December, respectively.

    rowcolNames <- list(as.character(1949:1960), month.abb)
    air_data <- matrix(AirPassengers,
    ncol = 12,
    byrow = TRUE,
    dimnames = rowcolNames)

  4. A simple heat map using levelplot(): Now that we have converted the AirPassenger data into a numeric matrix format and assigned it to the variable air_data, we can go ahead and construct our first heat map using the levelplot() function from the lattice package:

    print(levelplot(air_data,
    col.regions=heat.colors,
    xlab = "year",
    ylab = "month",
    main = "Air Passengers #1"))

    The levelplot() function creates a simple heat map with a color key to the righthand side of the map. We can use the argument col.regions = heat.colors to change the default color transition to yellow and red. X and y axis labels are specified by the xlab and ylab parameters, respectively, and the main parameter gives our heat map its caption.

    In contrast to most of the other plotting functions in R, the lattice package returns objects, so we have to use the print() function in our script if we want to save the plot to a data file. In an interactive R session, the print() call can be omitted. Typing the name of the variable will automatically display the referring object on the screen.

  5. Creating enhanced heat maps with heatmap.2(): Next, we will use the heatmap.2() function to apply a clustering algorithm to the AirPassenger data and to add row and column dendrograms to our heat map:

    heatmap.2(air_data,
    trace = "none",
    density.info = "none",
    xlab = "month",
    ylab = "year",
    main = "Air Passengers #2")

    Hierarchical clustering is especially popular in gene expression analyses. It is a very powerful method for grouping data to reveal interesting trends and patterns in the data matrix.

    Another neat feature of heatmap.2() is that you can display a histogram of the count of the individual values inside the color key by including the argument density.info = NULL in the function call. Alternatively, you can set density. info = "density" for displaying a density plot inside the color key.

    By adding the argument keysize = 1.8, we are slightly increasing the size of the color key—the default value of keysize is 1.5:

    heatmap.2(air_data,
    trace = "none",
    xlab = "month",
    ylab = "year",
    main = "Air Passengers #3",
    density.info = "histogram",
    dendrogram = "column",
    keysize = 1.8)

Did you notice the missing row dendrogram in the resulting heat map? This is due to the argument dendrogram = "column" that we passed to the heat map function. Similarly, you can type row instead of column to suppress the column dendrogram, or use neither to draw no dendrogram at all.

There's more...

By default, levelplot() places the color key on the right-hand side of the heat map, but it can be easily moved to the top, bottom, or left-hand side of the map by modifying the space parameter of colorkey:

levelplot(air_data,
col.regions = heat.colors,
colorkey = list(space = "top"))

Replacing top by left or bottom will place the color key on the left-hand side or on the bottom of the heat map, respectively.

Moving around the color key for heatmap.2() can be a little bit more of a hassle. In this case we have to modify the parameters of the layout() function. By default, heatmap.2() passes a matrix, lmat, to layout(), which has the following content:

       [,1] [,2]

[1,]   4    3

[2,]   2    1

The numbers in the preceding matrix specify the locations of the different visual elements on the plot (1 implies heat map, 2 implies row dendrogram, 3 implies column dendrogram, and 4 implies key). If we want to change the position of the key, we have to modify and rearrange those values of lmat that heatmap.2() passes to layout().

For example, if we want to place the color key at the bottom left-hand corner of the heat map, we need to create a new matrix for lmat as follows:

A DataStage project stores jobs and define their environment, such as their security and execution resources. Your project, as well as your user account, is typically created by your DataStage administrator.

lmat

        [,1]  [,2]

[1,]   0     3

[2,]   2     1

[3,]   4     0

We can construct such a matrix by using the rbind() function and assigning it to lmat:

lmat = rbind(c(0,3),c(2,1),c(4,0))

Furthermore, we have to pass an argument for the column height parameter lhei to heatmap.2(), which will allow us to use our modified lmat matrix for rearranging the color key:

heatmap.2(air_data,
dendrogram = "none",
trace = "none",
density.info = "none",
keysize = "1.3",
xlab = "month",
ylab = "year",
main = "Air Passengers",
lmat = rbind(c(0,3),c(2,1),c(4,0)),
lhei = c(1.5,4,1.5))

If you don't need a color key for your heat map, you could turn it off by using the argument key = FALSE for heatmap.2() and colorkey = FALSE for levelplot(), respectively.

R also has a base function for creating heat maps that does not require you to install external packages and is most advantageous if you can go without a color key. The syntax is very similar to the heatmap.2() function, and all options for heatmap.2() that we have seen in this recipe also apply to heatmap():

heatmap(air_data,
xlab = "month",
ylab = "year",
main = "Air Passengers")

More information on dendrograms and clustering

By default, the dendrograms of heatmap.2() are created by a hierarchical agglomerate clustering method, also known as bottom-up clustering.

In this approach, all individual objects start as individual clusters and are successively merged until only one single cluster remains. The distance between a pair of clusters is calculated by the farthest neighbor method, also called the complete linkage method, which is based by default on the Euclidean distance of the two points from both clusters that are farthest apart from each other. The computed dendrograms are then reordered based on the row and column means.

By modifying the default parameters of the dist() function, we can use another distance measure rather than the Euclidean distance. For example, if we want to use the Manhattan distance measure (based on a grid-like path rather than a direct connection between two objects), we would modify the method parameter of the dist() function and assign it to a variable distance first:

distance = dist(myData, method = "manhattan")

Other options for the method parameter are: euclidean (default), maximum, canberra, binary, or minkowski.

To use other agglomeration methods than the complete linkage method, we modify the method parameter in the hclust() function and assign it to another variable cluster. Note the first argument distance that we pass to the hclust() function, which comes from our previous assignment:

cluster = hclust(distance, method = "ward")

By setting the method parameter to ward, R will use Joe H. Ward's minimum variance method for hierarchical clustering. Other options for the method parameter that we can pass as arguments to hclust() are: complete (default), single, average, mcquitty, median, or centroid.

To use our modified clustering parameters, we simply call the as.dendrogram() function within heatmap.2() using the variable cluster that we assigned previously:

heatmap.2(myData,
Colv = as.dendrogram(cluster),
Rowv = as.dendrogram(cluster))

We can also draw the cluster dendrogram without the heat map by using the plot() function:

plot(cluster)

To turn off row and column reordering, we need to turn off the dendrograms and set the parameters Colv and Rowv to NA:

heatmap.2(my_data, dendrograms = "none", Colv = "NA",
Rowv = "NA")

Summary

This article has helped us create our first heat maps from a small data set provided in R. We have used different heat map functions in R to get a first impression of their functionalities.

Resources for Article :


Further resources on this subject:


Instant Heat Maps in R How-to [Instant] Learn how to design heat maps in R to enhance your data analysis with this book and ebook
Published: June 2013
eBook Price: $14.99
See more
Select your format and quantity:

About the Author :


Sebastian Raschka

Sebastian Raschka is a PhD student at Michigan State University and is pursuing a doctorate in Biochemistry and Computer Science. He works in the field of protein structure modeling and is focused on the specificity of protein-ligand interactions. His research involves the development of a protein-ligand docking software based on a novel approach, where he combines the fields of machine learning, pattern recognition, and data mining.

In his free time, Sebastian works on web development and uses JavaScript among other technologies to develop web applications that are used by Bioinformaticians and Computational Biologists.

Books From Packt


Tableau Data Visualization Cookbook
Tableau Data Visualization Cookbook

Data Visualization: a successful design process
Data Visualization: a successful design process

MATLAB Graphics and Data Visualization Cookbook
MATLAB Graphics and Data Visualization Cookbook

Circos Data Visualization How-to [Instant]
Circos Data Visualization How-to [Instant]

 Instant Google Map Maker Starter [Instant]
Instant Google Map Maker Starter [Instant]

Instant Interactive Map designs with Leaflet JavaScript Library How-to [Instant]
Instant Interactive Map designs with Leaflet JavaScript Library How-to [Instant]

PHP Web 2.0 Mashup Projects: Practical PHP Mashups with Google Maps, Flickr, Amazon, YouTube, MSN Search, Yahoo!
PHP Web 2.0 Mashup Projects: Practical PHP Mashups with Google Maps, Flickr, Amazon, YouTube, MSN Search, Yahoo!

Moodle 1.9 for Design and Technology
Moodle 1.9 for Design and Technology


No votes yet

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
A
m
g
V
A
b
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software