There is always room for improvements. Now that we have seen how to create impressive heat maps from various different data file types, it is time to add some extraordinary style.
To ensure that our heat maps look good in any situation, we will make use of different color palettes in this recipe, and we will even learn how to create our own.
Further, we will add some more extras to our heat maps including visual aids such as cell note labels, which will make them even more useful and accessible as a tool for visual data analysis.
The following image shows a heat map with cell notes and an alternative color palette created from the arabidopsis_genes.csv
data set:
Download the 5644OS_03_01.r
script and the Arabidopsis_genes.csv
data set from your account at http://www.packtpub.com and save it to your hard drive.
I recommend that you save the script and data file to the same folder on your hard drive. If you execute the script from a different location to the data file, you will have to change the current R working directory accordingly.
Please read the Getting ready section of the Reading data from different data formats recipe for more information on how to change the working directory of your current R session.
For more information on how to view the current working directory of your current R session and an explanation on how to run scripts in R, please read the Getting ready section of the Creating your first heat map recipe.
The script will check automatically if any additional packages need to be installed in R. You can find more information about the installation of packages in the Getting ready section of the Creating your first heat map recipe.
Execute the following code in R via the 5644OS_03_01.r
script and take a look at the PDF file custom_heatmaps.pdf
that will be created in the current working directory:
### loading packages if (!require("gplots")) { install.packages("gplots", dependencies = TRUE) library(RColorBrewer) } if (!require("RColorBrewer")) { install.packages("RColorBrewer", dependencies = TRUE) library(RColorBrewer) } ### reading in data gene_data <- read.csv("arabidopsis_genes.csv") row_names <- gene_data[,1] gene_data <- data.matrix(gene_data[,2:ncol(gene_data)]) rownames(gene_data) <- row_names ### setting heatmap.2() default parameters heat2 <- function(...) heatmap.2(gene_data, tracecol = "black", dendrogram = "column", Rowv = NA, trace = "none", margins = c(8,10), density.info = "density", ...) pdf("custom_heatmaps.pdf") ### 1) customizing colors # 1.1) in-built color palettes heat2(col = terrain.colors(n = 1000), main = "1.1) Terrain Colors") # 1.2) RColorBrewer palettes heat2(col = brewer.pal(n = 9, "YlOrRd"), main = "1.2) Brewer Palette") # 1.3) creating own color palettes my_colors <- c(y1 = "#F7F7D0", y2 = "#FCFC3A", y3 = "#D4D40D", b1 = "#40EDEA", b2 = "#18B3F0", b3 = "#186BF0", r1 = "#FA8E8E", r2 = "#F26666", r1 = "#C70404") heat2(col = my_colors, main = "1.3) Own Color Palette") my_palette <- colorRampPalette(c("blue", "yellow", "red"))(n = 1000) heat2(col = my_palette, main = "1.3) ColorRampPalette") # 1.4) gray scale heat2(col = gray(level = (0:100)/100), main ="1.4) Gray Scale") ### 2) adding cell notes fold_change <- 2^gene_data rounded_fold_changes <- round(rounded_fold_changes, 2) heat2(cellnote = rounded, notecex = 0.5, notecol = "black", col = my_palette, main = "2) Cell Notes") ### 3) adding column side colors heat2(ColSideColors = c("red", "gray", "red", rep("green",13)), main = "3) ColSideColors") dev.off()
Primarily, we will be using the already familiar functions from the previous recipes, read.csv()
and heatmap.2()
, to read in data into R and construct our heat maps. In this recipe, however, we will focus on advanced features to enhance our heat maps, such as customizing color and other visual elements:
arabidopsis_genes.csv
file contains a compilation of gene expression data from the model plant Arabidopsis thaliana
. I obtained the freely available data of 16 different genes as log 2 ratios of target and reference gene from the Arabidopsis eFP Browser (http://bar.utoronto.ca/efp_arabidopsis/). For each gene, expression data of 47 different areas of the plant is available in this data file.gene_data <- read.csv("arabidopsis_genes.csv") row_names <- gene_data[,1] gene_data <- data.matrix(gene_data[,2:ncol(gene_data)]) rownames(gene_data) <- row_names
heatmap.2()
function now, where we will include some arguments that we are planning to keep using throughout this recipe:heat2 <- function(...) heatmap.2(gene_data, tracecol = "black", dendrogram = "column", Rowv = NA, trace = "none", margins = c(8,10), density.info = "density", ...)
So, each time we call our newly defined heat2()
function, it will behave similar to the heatmap.2()
function, except for the additional arguments that we will pass along. We also include a new argument, black
, for the tracecol
parameter, to better distinguish the density plot in the color key from the background.
heatmap.2(), heat.colors
, which represents a color transition from yellow to red. There are four more color palettes available in the base R that we could use instead of the heat.colors
palette: rainbow
, terrain.colors
, topo.colors
, and cm.colors
.So let us make use of the terrain.colors
color palette now, which will give us a nice color transition from green over yellow to rose:
heat2(col = terrain.colors(n = 1000), main = "1.1) Terrain Colors")
If you recall the heat maps that we created in the previous two recipes, you might have noticed that the transition between the colors in the color key was not really smooth, but rather abrupt. The five color palettes mentioned previously allow us to define the number of different color shades that we can use. Therefore every number for the parameter n
that is larger than the default value 12 will add additional colors, which will make the transition smoother. A value of 1000 for the n
parameter should be more than sufficient to make the transition between the individual colors indistinguishable to the human eye.
The following image shows a side-by-side comparison of the heat.colors
and terrain.colors
color palettes using a different number of color shades:
Further, it is also possible to reverse the direction of the color transition. For example, if we want to have a heat.color
transition from yellow to red instead of red to yellow in our heat map, we could simply define a reverse function:
rev_heat.colors <- function(x) rev(heat.colors(x)) heat2(col = rev_heat.colors(500))
RColorBrewer
package. To see how they look like, you can type display.brewer.all()
into the R command-line after loading the RColorBrewer
package. However, in contrast to the dynamic range color palettes that we have seen previously, the RColorBrewer
palettes have a distinct number of different colors. So to select all nine colors from the YlOrRd
palette, a gradient from yellow to red, we use the following command:heat2(col = brewer.pal(n = 9, "YlOrRd"), main = "1.2) Brewer Palette")
The following image gives you a good overview of all the different color palettes that are available from the RColorBrewer
package:
colors()
into the command line of R.The most convenient way to assign new colors to a color palette is using hex colors (hexadecimal colors). Many different online tools are freely available that allow us to obtain the necessary hex codes. A great example is color picker (http://www.colorpicker.com), which allows us to choose from a rich color table and provides us with the corresponding hex codes.
Once we gather all the hexadecimal codes for the colors that we want to use for our color palette, we can assign them to a variable as we have done before with the explicit color names:
my_colors <- c(y1 = "#F7F7D0", y2 = "#FCFC3A", y3 = "#D4D40D", b1 = "#40EDEA", b2 = "#18B3F0", b3 = "#186BF0", r1 = "#FA8E8E", r2 = "#F26666", r1 = "#C70404") heat2(col = my_colors, main = "1.3) Own Color Palette")
This is a very handy approach for creating a color key with very distinct colors. However, the downside of this method is that we have to provide a lot of different colors if we want to create a smooth color gradient; we have used 1000 different colors for the terrain.color()
palette to get a smooth transition in the color key!
colorRampPalette()
function, so we don't have to insert all the different colors manually. The function takes a vector of different colors as an argument. Here, we provide three colors: blue
for the lower end of the color key, yellow
for the middle range, and red
for the higher end. As we did it for the in-built color palettes, such as heat.color
, we assign the value 1000 to the n
parameter:my_palette <- colorRampPalette(c("blue", "yellow", "red"))(n = 1000) heat2(col = my_palette, main = "1.3) ColorRampPalette")
gray
palette to create a heat map that is optimized for those conditions.The level parameter of the gray()
function takes a vector with values between 0 and 1 as an argument, where 0 represents black and 1 represents white, respectively. For a smooth gradient, we use a vector with 100 equally spaced shades of gray ranging from 0 to 1.
heat2(col = gray(level = (0:200)/200), main ="1.4) Gray Scale")
We can make use of the same color palettes for the levelplot()
function too. It works in a similar way as it did for the heatmap.2()
function that we are using in this recipe. However, inside the levelplot()
function call, we must use col.regions
instead of the simple col
, so that we can include a color palette argument.
As we recall, the data we read from arabidopsis_genes.csv
resembles log 2 ratios of sample and reference gene expression levels. Let us calculate the fold changes of the gene expression levels now and display them—rounded to two digits after the decimal point—as cell notes on our heat map:
fold_change <- 2^gene_data rounded_fold_changes <- round(fold_change, 2) heat2(cellnote = rounded_fold_changes, notecex = 0.5, notecol = "black", col = rev_heat.colors, main = "Cell Notes")
The notecex
parameter controls the size of the cell notes. Its default size is 1, and every argument between 0 and 1 will make the font smaller, whereas values larger than 1 will make the font larger. Here, we decreased the font size of the cell notes by 50 percent to fit it into the cell boundaries. Also, we want to display the cell notes in black to have a nice contrast to the colored background; this is controlled by the notecol
parameter.
ColSideColors
argument will place a colored box between the dendrogram and heat map that can be used to annotate certain columns. We pass our vector with colors to ColSideColors
, where its length must be equal to the number of columns of the heat map. Here, we want to color the first and third column red
, the second one gray
, and all the remaining 13 columns green
:heat2(ColSideColors = c("red", "gray", "red", rep("green", 13)), main = "ColSideColors")
You can see in the following image how the column side colors look like when we include the ColSideColors
argument as shown previously:
Attentive readers may have noticed that the order of colors in the column color box slightly differs from the order of colors we passed as a vector to ColSideColors
. We see red
two times next to each other, followed by a green and a gray box. This is due to the fact that the columns of our heat map have been reordered by the hierarchical clustering algorithm.