**50%**off this eBook here

### R Graphs Cookbook — Save 50%

Detailed hands-on recipes for creating the most useful types of graphs in R – starting from the simplest versions to more advanced applications

With more than two million users worldwide, R is one of the most popular open source projects. It is a free and robust statistical programming environment with very powerful graphical capabilities. Analyzing and visualizing data with R is a necessary skill for anyone doing any kind of statistical analysis.

In this article by **Hrishi V. Mittal**, author of the book R Graph Cookbook, we will learn some intermediate to advanced recipes for customizing line graphs even further. We will look at ways to improve and speed up line graphs with multiple lines representing more than one variable.

## R Graph Cookbook

Read more about this book |

*(For more resources on R, see here.)*

# Adding customized legends for multiple line graphs

Line graphs with more than one line, representing more than one variable, are quite common in any kind of data analysis. In this recipe we will learn how to create and customize legends for such graphs.

## Getting ready

We will use the base graphics library for this recipe, so all you need to do is run the recipe at the R prompt. It is good practice to save your code as a script to use again later.

## How to do it...

First we need to load the cityrain.csv example data file, which contains monthly rainfall data for four major cities across the world. You can download this file from here.

We will use the *cityrain.csv* example dataset.

rain<-read.csv("cityrain.csv")

plot(rain$Tokyo,type="b",lwd=2,

xaxt="n",ylim=c(0,300),col="black",

xlab="Month",ylab="Rainfall (mm)",

main="Monthly Rainfall in major cities")

axis(1,at=1:length(rain$Month),labels=rain$Month)

lines(rain$Berlin,col="red",type="b",lwd=2)

lines(rain$NewYork,col="orange",type="b",lwd=2)

lines(rain$London,col="purple",type="b",lwd=2)

legend("topright",legend=c("Tokyo","Berlin","New York","London"),

lty=1,lwd=2,pch=21,col=c("black","red","orange","purple"),

ncol=2,bty="n",cex=0.8,

text.col=c("black","red","orange","purple"),

inset=0.01)

## How it works...

We used the *legend()* function. It is quite a flexible function and allows us to adjust the placement and styling of the legend in many ways.

The first argument we passed to *legend()* specifies the position of the legend within the plot region. We used "*topright*"; other possible values are "*bottomright*", "*bottom*", "*bottomleft*", "*left*", "*topleft*", "*top*", "*right*", and "*center*". We can also specify the location of legend with x and y co-ordinates as we will soon see.

The other important arguments specific to lines are *lwd* and *lty* which specify the line width and type drawn in the legend box respectively. It is important to keep these the same as the corresponding values in the *plot()* and *lines()* commands. We also set *pch* to *21* to replicate the *type="b"* argument in the *plot()* command. cex and *text.col* set the size and colors of the legend text. Note that we set the text colors to the same colors as the lines they represent. Setting *bty* (box type) to "*n*" ensures no box is drawn around the legend. This is good practice as it keeps the look of the graph clean. *ncol* sets the number of columns over which the legend labels are spread and inset sets the *inset* distance from the margins as a fraction of the plot region.

## There's more...

Let's experiment by changing some of the arguments discussed:

legend(1,300,legend=c("Tokyo","Berlin","New York","London"),

lty=1,lwd=2,pch=21,col=c("black","red","orange","purple"),

horiz=TRUE,bty="n",bg="yellow",cex=1,

text.col=c("black","red","orange","purple"))

This time we used x and y co-ordinates instead of a keyword to position the legend. We also set the horiz argument to TRUE. As the name suggests, *horiz* makes the legend labels horizontal instead of the default vertical. Specifying *horiz* overrides the *ncol* argument. Finally, we made the legend text bigger by setting *cex* to *1* and did not use the inset argument.

An alternative way of creating the previous plot without having to call *plot()* and *lines()* multiple times is to use the *matplot()* function. To see details on how to use this function, please see the help file by running *?matplot* or *help(matplot)* at the R prompt.

# Using margin labels instead of legends for multiple line graphs

While legends are the most commonly used method of providing a key to read multiple variable graphs, they are often not the easiest to read. Labelling lines directly is one way of getting around that problem.

## Getting ready

We will use the base graphics library for this recipe, so all you need to do is run the recipe at the R prompt. It is good practice to save your code as a script to use again later.

## How to do it...

Let's use the *gdp.txt* example dataset to look at the trends in the annual GDP of five countries:

gdp<-read.table("gdp_long.txt",header=T)

library(RColorBrewer)

pal<-brewer.pal(5,"Set1")

par(mar=par()$mar+c(0,0,0,2),bty="l")

plot(Canada~Year,data=gdp,type="l",lwd=2,lty=1,ylim=c(30,60),

col=pal[1],main="Percentage change in GDP",ylab="")

mtext(side=4,at=gdp$Canada[length(gdp$Canada)],text="Canada",

col=pal[1],line=0.3,las=2)

lines(gdp$France~gdp$Year,col=pal[2],lwd=2)

mtext(side=4,at=gdp$France[length(gdp$France)],text="France",

col=pal[2],line=0.3,las=2)

lines(gdp$Germany~gdp$Year,col=pal[3],lwd=2)

mtext(side=4,at=gdp$Germany[length(gdp$Germany)],text="Germany",

col=pal[3],line=0.3,las=2)

lines(gdp$Britain~gdp$Year,col=pal[4],lwd=2)

mtext(side=4,at=gdp$Britain[length(gdp$Britain)],text="Britain",

col=pal[4],line=0.3,las=2)

lines(gdp$USA~gdp$Year,col=pal[5],lwd=2)

mtext(side=4,at=gdp$USA[length(gdp$USA)]-2,

text="USA",col=pal[5],line=0.3,las=2)

## How it works...

We first read the *gdp.txt* data file using the *read.table()* function. Next we loaded the *RColorBrewer* color palette library and set our color palette *pal* to "*Set1*" (with five colors).

Before drawing the graph, we used the *par()* command to add extra space to the right margin, so that we have enough space for the labels. Depending on the size of the text labels you may have to experiment with this margin until you get it right. Finally, we set the box type (*bty*) to an L-shape ("*l*") so that there is no line on the right margin. We can also set it to "*c*" if we want to keep the top line.

We used the *mtext()* function to label each of the lines individually in the right margin. The first argument we passed to the function is the side where we want the label to be placed. Sides (margins) are numbered starting from 1 for the bottom side and going round in a clockwise direction so that *2* is left, *3* is top, and *4* is right.

The *at* argument was used to specify the Y co-ordinate of the label. This is a bit tricky because we have to make sure we place the label as close to the corresponding line as possible. So, here we have used the last value of each line. For example, *gdp$France[length(gdp$France)* picks the last value in the France vector by using its length as the index. Note that we had to adjust the value for USA by subtracting 2 from its last value so that it doesn't overlap the label for Canada.

We used the text argument to set the text of the labels as country names. We set the *col* argument to the appropriate element of the pal vector by using a number index. The line argument sets an offset in terms of margin lines, starting at *0* counting outwards. Finally, setting *las* to *2* rotates the labels to be perpendicular to the axis, instead of the default value of 1 which makes them parallel to the axis.

Sometimes, simply using the last value of a set of values may not work because the value may be missing. In that case we can use the second last value or visually choose a value that places the label closest to the line. Also, the size of the plot window and the proximity of the final values may cause overlapping of labels. So, we may need to iterate a few times before we get the placement right. We can write functions to automate this process but it is still good to visually inspect the outcome.

Detailed hands-on recipes for creating the most useful types of graphs in R – starting from the simplest versions to more advanced applications |

Read more about this book |

*(For more resources on R, see here.)*

# Adding horizontal and vertical grid lines

In this recipe we will learn how to add and customize grid lines to graphs.

## Getting ready

We will use the base graphics for this recipe, so all you need to do is run the recipe at the R prompt. It is good practice to save your code as a script to use again later.

## How to do it...

Let's use the city rainfall example again to see how we can add grid lines to that graph:

rain<-read.csv("cityrain.csv")

plot(rain$Tokyo,type="b",lwd=2,

xaxt="n",ylim=c(0,300),col="black",

xlab="Month",ylab="Rainfall (mm)",

main="Monthly Rainfall in Tokyo")

axis(1,at=1:length(rain$Month),labels=rain$Month)

grid()

## How it works...

It's as simple as that! Adding a simple default grid just needs calling the *grid()* function without passing any arguments. *grid()* automatically computes the number of cells in the grid and aligns with the tick marks on the default axes. It uses the *abline()* function (which we will see again in the next recipe) to draw the grid lines.

## There's more...

We can specify the location of the grid lines using the *nx* and *ny* arguments, corresponding to vertical and horizontal grid lines respectively. By default, these two arguments are set to *NULL*, which results in the default grid lines in both *X* and *Y* directions. If we do not wish to draw grid lines in a particular direction, we can set *nx* or *ny* to *NA*. If nx is set to NA, no vertical grid lines are drawn and if ny is set to NA, no horizontal grid lines are drawn.

The default grid lines are very thin and light colored, they can barely be seen. We can customize the styling of the grid lines using the *lwd*, *lty*, and *col* arguments.

grid(nx=NA, ny=8,

lwd=1,lty=2,col="blue")

# Adding marker lines at specific X and Y values

Sometimes we may only want to draw one or a few lines to indicate specific cut-off or threshold values. In this recipe, we will learn how to do that using the *abline()* function.

## Getting ready

We will use the base graphics library for this recipe, so all you need to do is run the recipe at the R prompt. It is good practice to save your code as a script to use again later.

## How to do it...

Let's draw a vertical line at the month of September in the rainfall graph for Tokyo:

rain <- read.csv("cityrain.csv")

plot(rain$Tokyo,type="b",lwd=2,

xaxt="n",ylim=c(0,300),col="black",

xlab="Month",ylab="Rainfall (mm)",

main="Monthly Rainfall in Tokyo")

axis(1,at=1:length(rain$Month),labels=rain$Month)

abline(v=9)

## How it works...

To draw marker lines with abline() at specific *X* or *Y* locations, we have to set the *v* (as in vertical) or *h* (as in horizontal) arguments respectively. In the example, we set *v=9* (the index of the month September in the Month vector).

## There's more...

Now let's add a red dotted horizontal line to the graph to denote a high rainfall cutoff of 150 mm:

abline(h=150,col="red",lty=2)

# Creating sparklines

Sparklines are small and simple line graphs, useful for summarizing trend data in a small space. The word "sparklines" was coined by Prof. Edward Tufte. In this recipe we will learn how to make sparklines using a basic *plot()* function.

## Getting ready

## How to do it...

Let's represent our city rainfall data in the form of sparklines:

rain <- read.csv("cityrain.csv")

par(mfrow=c(4,1),mar=c(5,7,4,2),omi=c(0.2,2,0.2,2))

for(i in 2:5)

{

plot(rain[,i],ann=FALSE,axes=FALSE,type="l",

col="gray",lwd=2)

mtext(side=2,at=mean(rain[,i]),names(rain[i]),

las=2,col="black")

mtext(side=4,at=mean(rain[,i]),mean(rain[i]),

las=2,col="black")

points(which.min(rain[,i]),min(rain[,i]),pch=19,col="blue")

points(which.max(rain[,i]),max(rain[,i]),pch=19,col="red")

}

## How it works...

The key feature of sparklines is to show the trend in the data with just one line without any axis annotations. In the example, we have shown the trend with a gray line. The minimum and maximum values for each line is represented by blue and red dots respectively, while the mean value is displayed on the right margin.

Since sparklines have to be very small graphics, we first set the margins such that the plot area is small and the outer margins are large. We did this by setting the outer margins in inches using the *omi* argument of the *par()* function. Depending on the dimensions of the plot, sometimes R may produce an error saying that the figure margins are too large and not draw the graph. In that case, we need to try lower values for the margins. Note we also set up a 4x1 layout with the *mfrow* argument.

Next we set up a for loop to draw a sparkline for each of the four cities. We drew the line with the *plot()* command, setting both annotations (*ann*) and *axes* to *false*. Then we used the *mtext()* function to place the name of the city and the mean value of rainfall to the left and right of the line respectively. Finally, we plotted the minimum and maximum values using the *points()* command. Note we use the *which.min()* and *which.max()* functions to get the indices of the minimum and maximum values respectively and used them as the *x* value for the *points()* function calls.

# Plotting functions of a variable in a dataset

Sometimes we may wish to visualize the effect of applying a mathematical function to a set of values, instead of the original variable itself. In this recipe, we will see a simple method to plot functions of variables.

## Getting ready

## How to do it...

Let's say we want to plot the difference in rainfall between Tokyo and London. We can do that just by passing the correct expression to the *plot()* function:

rain <- read.csv("cityrain.csv")

plot(rain$Berlin-rain$London,type="l",lwd=2,

xaxt="n",col="blue",

xlab="Month",ylab="Difference in Rainfall (mm)",

main="Difference in Rainfall between Berlin and London (Berlin

London)")

axis(1,at=1:length(rain$Month),labels=rain$Month)

abline(h=0,col="red")

## How it works...

So, plotting a function of a variable is as simple as passing an expression to the *plot()* function. In the example, the function consisted of two variables in the dataset. We can also plot transformations applied to any one variable.

## There's more...

As another simple example, let's see how we can plot a polynomial function of a set of numbers:

x<-1:100

y<-x^3-6*x^2+5*x+10

plot(y~x,type="l",main=expression(f(x)==x^3-6*x^2+5*x+10))

In this example we defined *y* as a polynomial function of a vector of the numbers *1* to *100* and then plotted it using the *plot()* function. Note that we used the *expression()* function to format the title of the graph. By using *expression()* we could get the power values as superscripts.

# Summary

This article discussed some intermediate to advanced recipes for customizing line graphs and improving and speeding up line graphs with multiple lines.

In the next article we will learn some intermediate to advanced recipes for processing dates to make time series charts and stock charts.

**Further resources on this subject:**

- Graphical Capabilities of R [article]
- Organizing, Clarifying and Communicating the R Data Analyses [article]
- Customizing Graphics and Creating a Bar Chart and Scatterplot in R [article]

Detailed hands-on recipes for creating the most useful types of graphs in R – starting from the simplest versions to more advanced applications |

## About the Author :

## Hrishi V. Mittal

Hrishi Mittal has been working with R for a few years in different capacities. He was introduced to the exciting world of data analysis with R when he was working as Senior Air Quality Scientist at King’s College London, where he used R extensively to analyze large amounts of air pollution and traffic data for informing the London Mayor’s Air Quality Strategy. He has experience in various other programming languages, but prefers R for data analysis and visualization. He is actively involved in various R mailing lists, forums and the development of some R packages.

In early 2010, he started Pretty Graph Limited (http://www.prettygraph.com), a software company specializing in web-based data visualization products. The company’s flagship product Pretty Graph uses R as the backend engine for helping researchers and businesses visualize and analyze data. The goal is to bring the power of R to a wider audience by providing a modern graphical user interface which can be accessed by anyone and from anywhere simply using a web browser.