Customizing heat maps (Intermediate)

This article will help you explore more advanced functions to customize the layout of the heat maps. The main focus lies on the usage of different color palettes, but we will also cover other useful features, such as cell notes that will be used in this recipe.

(For more resources related to this topic, see here.)

To ensure that our heat maps look good in any situation, we will make use of different color palettes in this recipe, and we will even learn how to create our own.

Further, we will add some more extras to our heat maps including visual aids such as cell note labels, which will make them even more useful and accessible as a tool for visual data analysis.

The following image shows a heat map with cell notes and an alternative color palette created from the arabidopsis_genes.csv data set:

Getting ready

Download the 5644OS_03_01.r script and the Arabidopsis_genes.csv data set from your account at http://www.packtpub.com and save it to your hard drive.

I recommend that you save the script and data file to the same folder on your hard drive. If you execute the script from a different location to the data file, you will have to change the current R working directory accordingly.

The script will check automatically if any additional packages need to be installed in R.

How to do it...

Execute the following code in R via the 5644OS_03_01.r script and take a look at the PDF file custom_heatmaps.pdf that will be created in the current working directory:

### loading packages
if (!require("gplots")) {
install.packages("gplots", dependencies = TRUE)
library(RColorBrewer)
}
if (!require("RColorBrewer")) {
install.packages("RColorBrewer", dependencies = TRUE)
library(RColorBrewer)
}

### reading in data
gene_data <- read.csv("arabidopsis_genes.csv")
row_names <- gene_data[,1]
gene_data <- data.matrix(gene_data[,2:ncol(gene_data)])
rownames(gene_data) <- row_names
### setting heatmap.2() default parameters
heat2 <- function(...) heatmap.2(gene_data,
tracecol = "black",
dendrogram = "column",
Rowv = NA,
trace = "none",
margins = c(8,10),
density.info = "density", ...)

pdf("custom_heatmaps.pdf")

### 1) customizing colors
# 1.1) in-built color palettes
heat2(col = terrain.colors(n = 1000),
main = "1.1) Terrain Colors")

# 1.2) RColorBrewer palettes
heat2(col = brewer.pal(n = 9, "YlOrRd"),
main = "1.2) Brewer Palette")

# 1.3) creating own color palettes
my_colors <- c(y1 = "#F7F7D0",
y2 = "#FCFC3A",
y3 = "#D4D40D",
b1 = "#40EDEA",
b2 = "#18B3F0",
b3 = "#186BF0",
r1 = "#FA8E8E",
r2 = "#F26666",
r1 = "#C70404")
heat2(col = my_colors,
main = "1.3) Own Color Palette")
my_palette <- colorRampPalette(c("blue", "yellow", "red"))(n = 1000)
heat2(col = my_palette, main = "1.3) ColorRampPalette")

# 1.4) gray scale
heat2(col = gray(level = (0:100)/100),
main ="1.4) Gray Scale")

### 2) adding cell notes
fold_change <- 2^gene_data
rounded_fold_changes <- round(rounded_fold_changes, 2)
heat2(cellnote = rounded,
notecex = 0.5,
notecol = "black",
col = my_palette,
main = "2) Cell Notes")

### 3) adding column side colors
heat2(ColSideColors = c("red", "gray", "red",
rep("green",13)),
main = "3) ColSideColors")

dev.off()

How it works...

Primarily, we will be using read.csv() and heatmap.2() to read in data into R and construct our heat maps. In this recipe, however, we will focus on advanced features to enhance our heat maps, such as customizing color and other visual elements:

  1. Inspecting the arabidopsis_genes.csv data set: The arabidopsis_genes.csv file contains a compilation of gene expression data from the model plant Arabidopsis thaliana. I obtained the freely available data of 16 different genes as log 2 ratios of target and reference gene from the Arabidopsis eFP Browser (http://bar.utoronto.ca/efp_arabidopsis/). For each gene, expression data of 47 different areas of the plant is available in this data file.
  2. Reading the data and converting it into a numeric matrix: We have to convert the data table into a numeric matrix first before we can construct our heat maps:
    gene_data <- read.csv("arabidopsis_genes.csv")
    row_names <- gene_data[,1]
    gene_data <- data.matrix(gene_data[,2:ncol(gene_data)])
    rownames(gene_data) <- row_names
  3. Creating a customized heatmap.2() function: To reduce typing efforts, we are defining our own version of the heatmap.2() function now, where we will include some arguments that we are planning to keep using throughout this recipe:
    heat2 <- function(...) heatmap.2(gene_data,
    tracecol = "black",
    dendrogram = "column",
    Rowv = NA,
    trace = "none",
    margins = c(8,10),
    density.info = "density", ...)

    So, each time we call our newly defined heat2() function, it will behave similar to the heatmap.2() function, except for the additional arguments that we will pass along. We also include a new argument, black, for the tracecol parameter, to better distinguish the density plot in the color key from the background.

  4. The built-in color palettes: There are four more color palettes available in the base R that we could use instead of the heat.colors palette: rainbow, terrain.colors, topo.colors, and cm.colors.

    So let us make use of the terrain.colors color palette now, which will give us a nice color transition from green over yellow to rose:

    heat2(col = terrain.colors(n = 1000),
    main = "1.1) Terrain Colors")

    Every number for the parameter n that is larger than the default value 12 will add additional colors, which will make the transition smoother. A value of 1000 for the n parameter should be more than sufficient to make the transition between the individual colors indistinguishable to the human eye.

    The following image shows a side-by-side comparison of the heat.colors and terrain.colors color palettes using a different number of color shades:

    Further, it is also possible to reverse the direction of the color transition. For example, if we want to have a heat.color transition from yellow to red instead of red to yellow in our heat map, we could simply define a reverse function:

    rev_heat.colors <- function(x) rev(heat.colors(x))
    heat2(col = rev_heat.colors(500))
  5. RColorBrewer palettes: A lot of color palettes are available from the RColorBrewer package. To see how they look like, you can type display.brewer.all() into the R command-line after loading the RColorBrewer package. However, in contrast to the dynamic range color palettes that we have seen previously, the RColorBrewer palettes have a distinct number of different colors. So to select all nine colors from the YlOrRd palette, a gradient from yellow to red, we use the following command:
    heat2(col = brewer.pal(n = 9, "YlOrRd"),
    main = "1.2) Brewer Palette")

    The following image gives you a good overview of all the different color palettes that are available from the RColorBrewer package:

  6. Creating our own color palettes: Next, we will see how we can create our own color palettes. A whole bunch of different colors are already defined in R. An overview of those colors can be seen by typing colors() into the command line of R.

    The most convenient way to assign new colors to a color palette is using hex colors (hexadecimal colors). Many different online tools are freely available that allow us to obtain the necessary hex codes. A great example is color picker (http://www.colorpicker.com), which allows us to choose from a rich color table and provides us with the corresponding hex codes.

    Once we gather all the hexadecimal codes for the colors that we want to use for our color palette, we can assign them to a variable as we have done before with the explicit color names:

    my_colors <- c(y1 = "#F7F7D0",
    y2 = "#FCFC3A",
    y3 = "#D4D40D",
    b1 = "#40EDEA",
    b2 = "#18B3F0",
    b3 = "#186BF0",
    r1 = "#FA8E8E",
    r2 = "#F26666",
    r1 = "#C70404")
    heat2(col = my_colors,
    main = "1.3) Own Color Palette")

    This is a very handy approach for creating a color key with very distinct colors. However, the downside of this method is that we have to provide a lot of different colors if we want to create a smooth color gradient; we have used 1000 different colors for the terrain.color() palette to get a smooth transition in the color key!

  7. Using colorRampPalette for smoother color gradients: A convenient approach to create a smoother color gradient is to use the colorRampPalette() function, so we don't have to insert all the different colors manually. The function takes a vector of different colors as an argument. Here, we provide three colors: blue for the lower end of the color key, yellow for the middle range, and red for the higher end. As we did it for the in-built color palettes, such as heat.color, we assign the value 1000 to the n parameter:
    my_palette <- colorRampPalette(c("blue", "yellow", "red"))(n = 1000)
    heat2(col = my_palette, main = "1.3) ColorRampPalette")

    In this case, it is more convenient to use discrete color names over hex colors, since we are using the colorRampPalette() function to create a gradient and do not need all the different shades of a particular color.

  8. Grayscales: It might happen that the medium or device that we use to display our heat maps does not support colors. Under these circumstances, we can use the gray palette to create a heat map that is optimized for those conditions.

    The level parameter of the gray() function takes a vector with values between 0 and 1 as an argument, where 0 represents black and 1 represents white, respectively. For a smooth gradient, we use a vector with 100 equally spaced shades of gray ranging from 0 to 1.

    heat2(col = gray(level = (0:200)/200),
    main ="1.4) Gray Scale")

    We can make use of the same color palettes for the levelplot() function too. It works in a similar way as it did for the heatmap.2() function that we are using in this recipe. However, inside the levelplot() function call, we must use col.regions instead of the simple col, so that we can include a color palette argument.

  9. Adding cell notes to our heat map: Sometimes, we want to show a data set along with our heat map. A neat way is to use so-called cell notes to display data values inside the individual heat map cells. The underlying data matrix for the cell notes does not necessarily have to be the same numeric matrix we used to construct our heat map, as long as it has the same number of rows and columns.

    As we recall, the data we read from arabidopsis_genes.csv resembles log 2 ratios of sample and reference gene expression levels. Let us calculate the fold changes of the gene expression levels now and display them—rounded to two digits after the decimal point—as cell notes on our heat map:

    fold_change <- 2^gene_data
    rounded_fold_changes <- round(fold_change, 2)
    heat2(cellnote = rounded_fold_changes,
    notecex = 0.5,
    notecol = "black",
    col = rev_heat.colors,
    main = "Cell Notes")

    The notecex parameter controls the size of the cell notes. Its default size is 1, and every argument between 0 and 1 will make the font smaller, whereas values larger than 1 will make the font larger. Here, we decreased the font size of the cell notes by 50 percent to fit it into the cell boundaries. Also, we want to display the cell notes in black to have a nice contrast to the colored background; this is controlled by the notecol parameter.

  10. Row and column side colors: Another approach to pronounce certain regions, that is, rows or columns on the heat map is to make use of row and column side colors. The ColSideColors argument will place a colored box between the dendrogram and heat map that can be used to annotate certain columns. We pass our vector with colors to ColSideColors, where its length must be equal to the number of columns of the heat map. Here, we want to color the first and third column red, the second one gray, and all the remaining 13 columns green:
    heat2(ColSideColors = c("red", "gray", "red", rep("green", 13)),
    main = "ColSideColors")

    You can see in the following image how the column side colors look like when we include the ColSideColors argument as shown previously:

    Attentive readers may have noticed that the order of colors in the column color box slightly differs from the order of colors we passed as a vector to ColSideColors. We see red two times next to each other, followed by a green and a gray box. This is due to the fact that the columns of our heat map have been reordered by the hierarchical clustering algorithm.

Summary

To learn more about the similar technology, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended:

Resources for Article:


Further resources on this subject:


You've been reading an excerpt of:

Instant Heat Maps in R How-to

Explore Title
comments powered by Disqus