**50%**off this eBook here

### R Graphs Cookbook — Save 50%

Detailed hands-on recipes for creating the most useful types of graphs in R – starting from the simplest versions to more advanced applications

With more than two million users worldwide, R is one of the most popular open source projects. It is a free and robust statistical programming environment with very powerful graphical capabilities. Analyzing and visualizing data with R is a necessary skill for anyone doing any kind of statistical analysis.

In the previous article by **Hrishi V. Mittal**, author of the book R Graph Cookbook, we learnt some intermediate to advanced recipes for customizing line graphs.

In this article we will learn some intermediate to advanced recipes for processing dates to make time series charts and stock charts.

## R Graph Cookbook

Read more about this book |

*(For more resources on R, see here.)*

# Formatting time series data for plotting

Time series or trend charts are the most common form of line graphs. There are a lot of ways in R to plot such data, however it is important to first format the data in a suitable format that R can understand. In this recipe, we will look at some ways of formatting time series data using the base and some additional packages.

## Getting ready

In addition to the basic R functions, we will also be using the *zoo* package in this recipe. So first we need to install it:

install.packages("zoo")

## How to do it...

Let's use the *dailysales.csv* example dataset and format its *date* column:

sales<-read.csv("dailysales.csv")

d1<-as.Date(sales$date,"%d/%m/%y")

d2<-strptime(sales$date,"%d/%m/%y")

data.class(d1)

[1] "Date"

data.class(d2)

[1] "POSIXt"

## How it works...

We have seen two different functions to convert a character vector into dates. If we did not convert the *date* column, R would not automatically recognize the values in the column as dates. Instead, the column would be treated as a character vector or a factor.

The *as.Date()* function takes at least two arguments: the character vector to be converted to dates and the format to which we want it converted. It returns an object of the Date class, represented as the number of days since 1970-01-01, with negative values for earlier dates. The values in the date column are in a DD/MM/YYYY format (you can verify this by typing *sales$date* at the R prompt). So, we specify the format argument as "*%d/%m/%y*". Please note that this order is important. If we instead use "*%m/%d/%y*", then our days will be read as months and vice-versa. The quotes around the value are also necessary.

The *strptime()* function is another way to convert character vectors into dates. However, *strptime()* returns a different kind of object of class *POSIXlt*, which is a named list of vectors representing the different components of a date and time, such as year, month, day, hour, seconds, minutes, and a few more.

*POSIXlt* is one of the two basic classes of date/times in R. The other class *POSIXct* represents the (signed) number of seconds since the beginning of 1970 (in the UTC time zone) as a numeric vector. *POSIXct* is more convenient for including in data frames, and *POSIXlt* is closer to human readable forms. A virtual class *POSIXt* inherits from both of the classes. That's why when we ran the *data.class()* function on d2 earlier, we get POSIXt as the result.

*strptime()* also takes a character vector to be converted and the format as arguments.

## There's more...

The *zoo* package is handy for dealing with time series data. The *zoo()* function takes an argument x, which can be a numeric vector, matrix, or factor. It also takes an *order.by* argument which has to be an index vector with unique entries by which the observations in *x* are ordered:

library(zoo)

d3<-zoo(sales$units,as.Date(sales$date,"%d/%m/%y"))

data.class(d3)

[1] "zoo"

See the help on *DateTimeClasses* to find out more details about the ways dates can be represented in R.

# Plotting date and time on the X axis

In this recipe, we will learn how to plot formatted date or time values on the X axis.

## Getting ready

For the first example, we only need to use the base graphics function *plot()*.

## How to do it...

We will use the *dailysales.csv* example dataset to plot the number of units of a product sold daily in a month:

sales<-read.csv("dailysales.csv")

plot(sales$units~as.Date(sales$date,"%d/%m/%y"),type="l",

xlab="Date",ylab="Units Sold")

## How it works...

Once we have formatted the series of dates using *as.Date()*, we can simply pass it to the *plot()* function as the x variable in either the *plot(x,y)* or *plot(y~x)* format.

We can also use *strptime()* instead of using *as.Date()*. However, we cannot pass the object returned by *strptime()* to *plot()* in the *plot(y~x)* format. We must use the *plot(x,y)* format as follows:

plot(strptime(sales$date,"%d/%m/%Y"),sales$units,type="l",

xlab="Date",ylab="Units Sold")

## There's more...

We can plot the example using the *zoo()* function as follows (assuming zoo is already installed):

library(zoo)

plot(zoo(sales$units,as.Date(sales$date,"%d/%m/%y")))

Note that we don't need to specify x and y separately when plotting using zoo; we can just pass the object returned by *zoo()* to *plot()*. We also need not specify the type as "l".

Let's look at another example which has full date and time values on the X axis, instead of just dates. We will use the *openair.csv* example dataset for this example:

air<-read.csv("openair.csv")

plot(air$nox~as.Date(air$date,"%d/%m/%Y %H:%M"),type="l",

xlab="Time", ylab="Concentration (ppb)",

main="Time trend of Oxides of Nitrogen")

(Move the mouse over the image to enlarge it.)

The same graph can be made using zoo as follows:

plot(zoo(air$nox,as.Date(air$date,"%d/%m/%Y %H:%M")),

xlab="Time", ylab="Concentration (ppb)",

main="Time trend of Oxides of Nitrogen")

Detailed hands-on recipes for creating the most useful types of graphs in R – starting from the simplest versions to more advanced applications |

Read more about this book |

*(For more resources on R, see here.)*

# Annotating axis labels in different human readable time formats

In this recipe, we will learn how to choose the formatting of time axis labels, instead of just using the defaults.

## Getting ready

We will only use the basic R functions for this recipe. Make sure you are at the R prompt and load the *openair.csv* dataset:

air<-read.csv("openair.csv")

## How to do it...

Let's redraw our original example of plotting air pollution data from the last recipe, but with labels for each month and year pairing:

plot(air$nox~as.Date(air$date,"%d/%m/%Y %H:%M"),type="l",

xaxt="n",

xlab="Time", ylab="Concentration (ppb)",

main="Time trend of Oxides of Nitrogen")

xlabels<-strptime(air$date, format = "%d/%m/%Y %H:%M")

axis.Date(1, at=xlabels[xlabels$mday==1], format="%b-%Y")

## How it works...

In our original example of plotting air pollution data in the last recipe, we only formatted the date/time vector to pass as an x argument to *plot()*, but the axis labels were chosen automatically by R as the years 1998, 2000, 2002, and 2004. In this example, we drew a custom axis with labels for each month and year pairing.

We first created an object xlabels of class *POSIXlt* by using the strptime() function. Then we used the *axis.Date()* function to add the X axis. *axis.Date()* is similar to the *axis()* function and takes the side and at arguments. In addition, it also takes the format argument, which we can use to specify the format of the labels. We specified the at argument as a subset of xlabels for only the first day of each month by setting *mday=1*. The format value "*%b-%Y*" means abbreviated month name with full year.

# Adding vertical markers to indicate specific time events

We may wish to indicate specific points of importance or measurements in a time series, where there is a significant event or change in the data. In this recipe, we will learn how to add vertical markers using the *abline()* function.

## Getting ready

We will only use the basic R functions for this recipe. Make sure you are at the R prompt and load the *openair.csv* dataset:

air<-read.csv("openair.csv")

## How to do it...

Let's take our air pollution time series example again and draw a red vertical line on Christmas day – 25/12/2003:

plot(air$nox~as.Date(air$date,"%d/%m/%Y %H:%M"),type="l",

xlab="Time", ylab="Concentration (ppb)",

main="Time trend of Oxides of Nitrogen")

abline(v=as.Date("25/12/2003","%d/%m/%Y"))

## How it works...

As we have seen before in the recipe introducing *abline()*, we drew a vertical line in the example by setting the v argument to the date we want to mark. We specified 25/12/2003 as the x co-ordinate by using the *as.Date()* function. Note that the original time series plotted also contains the timestamp in addition to the dates. Since we didn't specify a time, the line was plotted at the start of the specified date 25/12/2003 00:00.

## There's more...

Let's look at another example, where we want to draw a vertical marker line on Christmas day of every year:

markers<-seq(from=as.Date("25/12/1998","%d/%m/%Y"),

to=as.Date("25/12/2004","%d/%m/%Y"),

by="year")

abline(v=markers,col="red")

We created a sequence of the Christmas dates for each year using the *seq()* function, which takes *from*, *to*, and *by* arguments. Then we passed this vector to the *abline()* function as v.

One important thing to note is that by default R does not deal with gaps in a time series. There can be missing values denoted by *NA* and as you can see in the previous examples, the graphs show gaps in those places. However, if any dates or time intervals are missing from the actual dataset, then R draws a line connecting the data points before and after the gap instead of leaving it blank. In order to remove this connecting line, we must fill in the missing time intervals in the gap and set the y values to *NA*.

# Plotting data with varying time averaging periods

In this recipe, we will learn how we can plot the same time series data by averaging it over different time periods using the *aggregate()* function.

## Getting ready

We will only use the basic R functions for this recipe. Make sure you load the *openair.csv* dataset:

air<-read.csv("openair.csv")

## How to do it...

Let's plot the air pollution time series with weekly and daily averages instead of hourly values:

air$date = as.POSIXct(strptime(air$date, format = "%d/%m/%Y %H:%M",

"GMT"))

means <- aggregate(air["nox"], format(air["date"],"%Y-%U"),mean, na.rm

= TRUE)

means$date <- seq(air$date[1], air$date[nrow(air)],length =

nrow(means))

plot(means$date, means$nox, type = "l")

means <- aggregate(air["nox"], format(air["date"],"%Y-%j"),mean, na.rm

= TRUE)`

means$date <- seq(air$date[1], air$date[nrow(air)],length =

nrow(means))

plot(means$date, means$nox, type = "l",

xlab="Time", ylab="Concentration (ppb)",

main="Daily Average Concentrations of Oxides of Nitrogen")

## How it works...

The key function in these examples is the *aggregate()* function. Its first argument is R object x, which has to be aggregated, in this case air["nox"]. The next argument is the list of grouping elements over which x has to be aggregated. This is the part where we specify the time period over which to average the values. In the first example we set it to *format(air["date"],"%Y-%U")*, which extracts all the weeks out of the date column using the format() function. The third argument is FUN or the name of the function to apply to the selected values, in our case mean. Finally, we set na.rm to TRUE, thus telling R to ignore missing values denoted by NA.

Once we have the mean values saved in a data frame, we add a date field to this new vector using the *seq()* function and then plot the means against the date using *plot()*.

In the second example, we use *format(air["date"],"%Y-%j")* to calculate daily means.

# Creating stock charts

Given R's powerful analysis and graphical capabilities, it is no surprise that R is very popular in the world of finance. In this recipe, we will learn how to plot data from the stock market using some special libraries.

## Getting ready

We need the *tseries* and *quantmod* packages to run the following recipes. Let's install and load these two packages:

install.packages("quantmod")

install.packages("tseries")

library(quantmod)

library(tseries)

## How to do it...

Let's first see an example using the *tseries* library function *get.hist.quotes()*. We will compare stock prices of three technology companies:

aapl<-get.hist.quote(instrument = "aapl", quote = c("Cl", "Vol"))

goog <- get.hist.quote(instrument = "goog", quote = c("Cl", "Vol"))

msft <- get.hist.quote(instrument = "msft", quote = c("Cl", "Vol"))

plot(msft$Close,main = "Stock Price Comparison",

ylim=c(0,800) ,col="red" ,type="l" ,lwd=0.5,

pch=19 ,cex=0.6 ,xlab="Date" ,ylab="Stock Price (USD)")

lines(goog$Close,col="blue",lwd=0.5)

lines(aapl$Close,col="gray",lwd=0.5)

legend("top",horiz=T,legend=c("Microsoft","Google","Apple"),

col=c("red","blue","gray"),lty=1,bty="n")

## How it works...

The *get.hist.quote()* function retrieves historical financial data from one of two providers (yahoo (for Yahoo) or oanda (for OANDA), yahoo being the default). We passed the instrument and quote arguments to this function which specify the name of the stock and the measure of stock data we want. In our example, we used the function three times to pull the closing price and volume for Microsoft (msft), Google (goog), and Apple (aapl). We then plotted the three stock prices on a line graph using the *plot()* and *lines()* functions.

## There's more...

Now let's make some charts using the *quantmod* package. This package provides inbuilt graphics functions to visualize the stock data:

getSymbols("AAPL",src='//dgdsbygo8mp3h.cloudfront.net/sites/default/files/blank.gif' data-original="yahoo")

barChart(AAPL)

First we obtained stock data for Apple using the *getSymbols()* function by specifying the stock name and source. Again, the default source is Yahoo. The stock data is stored in an R object with the same name as the stock symbol (AAPL for Apple, GOOG for Google, and so on). Then we passed this object to the *barChart()* function to produce the previous graph above. Of course, it is more than just a bar chart.

A similar chart in a different color scheme can be drawn as follows:

candleChart(AAPL,theme="white")

For more detailed information about the *quantmod* package, visit its website at: http://www.quantmod.com

# Summary

In this article we learnt some intermediate to advanced recipes for processing dates to make time series charts and stock charts.

**Further resources on this subject:**

- Creating Line Graphs in R [article]
- Graphical Capabilities of R [article]
- Adjusting Key Parameters in R [article]
- Organizing, Clarifying and Communicating the R Data Analyses [article]
- Customizing Graphics and Creating a Bar Chart and Scatterplot in R [article]

Detailed hands-on recipes for creating the most useful types of graphs in R – starting from the simplest versions to more advanced applications |

## About the Author :

## Hrishi V. Mittal

Hrishi Mittal has been working with R for a few years in different capacities. He was introduced to the exciting world of data analysis with R when he was working as Senior Air Quality Scientist at King’s College London, where he used R extensively to analyze large amounts of air pollution and traffic data for informing the London Mayor’s Air Quality Strategy. He has experience in various other programming languages, but prefers R for data analysis and visualization. He is actively involved in various R mailing lists, forums and the development of some R packages.

In early 2010, he started Pretty Graph Limited (http://www.prettygraph.com), a software company specializing in web-based data visualization products. The company’s flagship product Pretty Graph uses R as the backend engine for helping researchers and businesses visualize and analyze data. The goal is to bring the power of R to a wider audience by providing a modern graphical user interface which can be accessed by anyone and from anywhere simply using a web browser.