# Creating Line Graphs in R

January 2011

## R Graph Cookbook

 Detailed hands-on recipes for creating the most useful types of graphs in R – starting from the simplest versions to more advanced applications Learn to draw any type of graph or visual data representation in R Filled with practical tips and techniques for creating any type of graph you need; not just theoretical explanations All examples are accompanied with the corresponding graph images, so you know what the results look like Each recipe is independent and contains the complete explanation and code to perform the task as efficiently as possible

(For more resources on R, see here.)

# Adding customized legends for multiple line graphs

Line graphs with more than one line, representing more than one variable, are quite common in any kind of data analysis. In this recipe we will learn how to create and customize legends for such graphs.

We will use the base graphics library for this recipe, so all you need to do is run the recipe at the R prompt. It is good practice to save your code as a script to use again later.

## How to do it...

First we need to load the cityrain.csv example data file, which contains monthly rainfall data for four major cities across the world. You can download this file from here.

We will use the cityrain.csv example dataset.

plot(rain\$Tokyo,type="b",lwd=2,
xaxt="n",ylim=c(0,300),col="black",
xlab="Month",ylab="Rainfall (mm)",
main="Monthly Rainfall in major cities")
axis(1,at=1:length(rain\$Month),labels=rain\$Month)
lines(rain\$Berlin,col="red",type="b",lwd=2)
lines(rain\$NewYork,col="orange",type="b",lwd=2)
lines(rain\$London,col="purple",type="b",lwd=2)

legend("topright",legend=c("Tokyo","Berlin","New York","London"),
lty=1,lwd=2,pch=21,col=c("black","red","orange","purple"),
ncol=2,bty="n",cex=0.8,
text.col=c("black","red","orange","purple"),
inset=0.01)

## How it works...

We used the legend() function. It is quite a flexible function and allows us to adjust the placement and styling of the legend in many ways.

The first argument we passed to legend() specifies the position of the legend within the plot region. We used "topright"; other possible values are "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "right", and "center". We can also specify the location of legend with x and y co-ordinates as we will soon see.

The other important arguments specific to lines are lwd and lty which specify the line width and type drawn in the legend box respectively. It is important to keep these the same as the corresponding values in the plot() and lines() commands. We also set pch to 21 to replicate the type="b" argument in the plot() command. cex and text.col set the size and colors of the legend text. Note that we set the text colors to the same colors as the lines they represent. Setting bty (box type) to "n" ensures no box is drawn around the legend. This is good practice as it keeps the look of the graph clean. ncol sets the number of columns over which the legend labels are spread and inset sets the inset distance from the margins as a fraction of the plot region.

## There's more...

Let's experiment by changing some of the arguments discussed:

legend(1,300,legend=c("Tokyo","Berlin","New York","London"),
lty=1,lwd=2,pch=21,col=c("black","red","orange","purple"),
horiz=TRUE,bty="n",bg="yellow",cex=1,
text.col=c("black","red","orange","purple"))

This time we used x and y co-ordinates instead of a keyword to position the legend. We also set the horiz argument to TRUE. As the name suggests, horiz makes the legend labels horizontal instead of the default vertical. Specifying horiz overrides the ncol argument. Finally, we made the legend text bigger by setting cex to 1 and did not use the inset argument.

An alternative way of creating the previous plot without having to call plot() and lines() multiple times is to use the matplot() function. To see details on how to use this function, please see the help file by running ?matplot or help(matplot) at the R prompt.

# Using margin labels instead of legends for multiple line graphs

While legends are the most commonly used method of providing a key to read multiple variable graphs, they are often not the easiest to read. Labelling lines directly is one way of getting around that problem.

We will use the base graphics library for this recipe, so all you need to do is run the recipe at the R prompt. It is good practice to save your code as a script to use again later.

## How to do it...

Let's use the gdp.txt example dataset to look at the trends in the annual GDP of five countries:

library(RColorBrewer)
pal<-brewer.pal(5,"Set1")

par(mar=par()\$mar+c(0,0,0,2),bty="l")

col=pal[1],main="Percentage change in GDP",ylab="")

col=pal[1],line=0.3,las=2)
lines(gdp\$France~gdp\$Year,col=pal[2],lwd=2)

mtext(side=4,at=gdp\$France[length(gdp\$France)],text="France",
col=pal[2],line=0.3,las=2)

lines(gdp\$Germany~gdp\$Year,col=pal[3],lwd=2)

mtext(side=4,at=gdp\$Germany[length(gdp\$Germany)],text="Germany",
col=pal[3],line=0.3,las=2)

lines(gdp\$Britain~gdp\$Year,col=pal[4],lwd=2)

mtext(side=4,at=gdp\$Britain[length(gdp\$Britain)],text="Britain",
col=pal[4],line=0.3,las=2)

lines(gdp\$USA~gdp\$Year,col=pal[5],lwd=2)

mtext(side=4,at=gdp\$USA[length(gdp\$USA)]-2,
text="USA",col=pal[5],line=0.3,las=2)

## How it works...

We first read the gdp.txt data file using the read.table() function. Next we loaded the RColorBrewer color palette library and set our color palette pal to "Set1" (with five colors).

Before drawing the graph, we used the par() command to add extra space to the right margin, so that we have enough space for the labels. Depending on the size of the text labels you may have to experiment with this margin until you get it right. Finally, we set the box type (bty) to an L-shape ("l") so that there is no line on the right margin. We can also set it to "c" if we want to keep the top line.

We used the mtext() function to label each of the lines individually in the right margin. The first argument we passed to the function is the side where we want the label to be placed. Sides (margins) are numbered starting from 1 for the bottom side and going round in a clockwise direction so that 2 is left, 3 is top, and 4 is right.

The at argument was used to specify the Y co-ordinate of the label. This is a bit tricky because we have to make sure we place the label as close to the corresponding line as possible. So, here we have used the last value of each line. For example, gdp\$France[length(gdp\$France) picks the last value in the France vector by using its length as the index. Note that we had to adjust the value for USA by subtracting 2 from its last value so that it doesn't overlap the label for Canada.

We used the text argument to set the text of the labels as country names. We set the col argument to the appropriate element of the pal vector by using a number index. The line argument sets an offset in terms of margin lines, starting at 0 counting outwards. Finally, setting las to 2 rotates the labels to be perpendicular to the axis, instead of the default value of 1 which makes them parallel to the axis.

Sometimes, simply using the last value of a set of values may not work because the value may be missing. In that case we can use the second last value or visually choose a value that places the label closest to the line. Also, the size of the plot window and the proximity of the final values may cause overlapping of labels. So, we may need to iterate a few times before we get the placement right. We can write functions to automate this process but it is still good to visually inspect the outcome.

(For more resources on R, see here.)

# Adding horizontal and vertical grid lines

In this recipe we will learn how to add and customize grid lines to graphs.

We will use the base graphics for this recipe, so all you need to do is run the recipe at the R prompt. It is good practice to save your code as a script to use again later.

## How to do it...

Let's use the city rainfall example again to see how we can add grid lines to that graph:

plot(rain\$Tokyo,type="b",lwd=2,
xaxt="n",ylim=c(0,300),col="black",
xlab="Month",ylab="Rainfall (mm)",
main="Monthly Rainfall in Tokyo")
axis(1,at=1:length(rain\$Month),labels=rain\$Month)

grid()

## How it works...

It's as simple as that! Adding a simple default grid just needs calling the grid() function without passing any arguments. grid() automatically computes the number of cells in the grid and aligns with the tick marks on the default axes. It uses the abline() function (which we will see again in the next recipe) to draw the grid lines.

## There's more...

We can specify the location of the grid lines using the nx and ny arguments, corresponding to vertical and horizontal grid lines respectively. By default, these two arguments are set to NULL, which results in the default grid lines in both X and Y directions. If we do not wish to draw grid lines in a particular direction, we can set nx or ny to NA. If nx is set to NA, no vertical grid lines are drawn and if ny is set to NA, no horizontal grid lines are drawn.

The default grid lines are very thin and light colored, they can barely be seen. We can customize the styling of the grid lines using the lwd, lty, and col arguments.

grid(nx=NA, ny=8,
lwd=1,lty=2,col="blue")

# Adding marker lines at specific X and Y values

Sometimes we may only want to draw one or a few lines to indicate specific cut-off or threshold values. In this recipe, we will learn how to do that using the abline() function.

We will use the base graphics library for this recipe, so all you need to do is run the recipe at the R prompt. It is good practice to save your code as a script to use again later.

## How to do it...

Let's draw a vertical line at the month of September in the rainfall graph for Tokyo:

plot(rain\$Tokyo,type="b",lwd=2,
xaxt="n",ylim=c(0,300),col="black",
xlab="Month",ylab="Rainfall (mm)",
main="Monthly Rainfall in Tokyo")
axis(1,at=1:length(rain\$Month),labels=rain\$Month)

abline(v=9)

## How it works...

To draw marker lines with abline() at specific X or Y locations, we have to set the v (as in vertical) or h (as in horizontal) arguments respectively. In the example, we set v=9 (the index of the month September in the Month vector).

## There's more...

Now let's add a red dotted horizontal line to the graph to denote a high rainfall cutoff of 150 mm:

abline(h=150,col="red",lty=2)

# Creating sparklines

Sparklines are small and simple line graphs, useful for summarizing trend data in a small space. The word "sparklines" was coined by Prof. Edward Tufte. In this recipe we will learn how to make sparklines using a basic plot() function.

We will use the base graphics library for this recipe, so all you need to do is run the recipe at the R prompt. It is good practice to save your code as a script to use again later.

## How to do it...

Let's represent our city rainfall data in the form of sparklines:

par(mfrow=c(4,1),mar=c(5,7,4,2),omi=c(0.2,2,0.2,2))

for(i in 2:5)
{
plot(rain[,i],ann=FALSE,axes=FALSE,type="l",
col="gray",lwd=2)

mtext(side=2,at=mean(rain[,i]),names(rain[i]),
las=2,col="black")

mtext(side=4,at=mean(rain[,i]),mean(rain[i]),
las=2,col="black")

points(which.min(rain[,i]),min(rain[,i]),pch=19,col="blue")
points(which.max(rain[,i]),max(rain[,i]),pch=19,col="red")
}

## How it works...

The key feature of sparklines is to show the trend in the data with just one line without any axis annotations. In the example, we have shown the trend with a gray line. The minimum and maximum values for each line is represented by blue and red dots respectively, while the mean value is displayed on the right margin.

Since sparklines have to be very small graphics, we first set the margins such that the plot area is small and the outer margins are large. We did this by setting the outer margins in inches using the omi argument of the par() function. Depending on the dimensions of the plot, sometimes R may produce an error saying that the figure margins are too large and not draw the graph. In that case, we need to try lower values for the margins. Note we also set up a 4x1 layout with the mfrow argument.

Next we set up a for loop to draw a sparkline for each of the four cities. We drew the line with the plot() command, setting both annotations (ann) and axes to false. Then we used the mtext() function to place the name of the city and the mean value of rainfall to the left and right of the line respectively. Finally, we plotted the minimum and maximum values using the points() command. Note we use the which.min() and which.max() functions to get the indices of the minimum and maximum values respectively and used them as the x value for the points() function calls.

# Plotting functions of a variable in a dataset

Sometimes we may wish to visualize the effect of applying a mathematical function to a set of values, instead of the original variable itself. In this recipe, we will see a simple method to plot functions of variables.

We will use the base graphics library for this recipe, so all you need to do is run the recipe at the R prompt. It is good practice to save your code as a script to use again later.

## How to do it...

Let's say we want to plot the difference in rainfall between Tokyo and London. We can do that just by passing the correct expression to the plot() function:

plot(rain\$Berlin-rain\$London,type="l",lwd=2,
xaxt="n",col="blue",
xlab="Month",ylab="Difference in Rainfall (mm)",
main="Difference in Rainfall between Berlin and London (Berlin
London)")

axis(1,at=1:length(rain\$Month),labels=rain\$Month)

abline(h=0,col="red")

## How it works...

So, plotting a function of a variable is as simple as passing an expression to the plot() function. In the example, the function consisted of two variables in the dataset. We can also plot transformations applied to any one variable.

## There's more...

As another simple example, let's see how we can plot a polynomial function of a set of numbers:

x<-1:100
y<-x^3-6*x^2+5*x+10
plot(y~x,type="l",main=expression(f(x)==x^3-6*x^2+5*x+10))

In this example we defined y as a polynomial function of a vector of the numbers 1 to 100 and then plotted it using the plot() function. Note that we used the expression() function to format the title of the graph. By using expression() we could get the power values as superscripts.

# Summary

This article discussed some intermediate to advanced recipes for customizing line graphs and improving and speeding up line graphs with multiple lines.

In the next article we will learn some intermediate to advanced recipes for processing dates to make time series charts and stock charts.

Further resources on this subject:

You've been reading an excerpt of: