Creating Time Series Charts in R

Exclusive offer: get 50% off this eBook here
R Graph Cookbook

R Graph Cookbook — Save 50%

Detailed hands-on recipes for creating the most useful types of graphs in R – starting from the simplest versions to more advanced applications

$29.99    $15.00
by Hrishi V. Mittal | February 2011 | Open Source

With more than two million users worldwide, R is one of the most popular open source projects. It is a free and robust statistical programming environment with very powerful graphical capabilities. Analyzing and visualizing data with R is a necessary skill for anyone doing any kind of statistical analysis.

In the previous article by Hrishi V. Mittal, author of the book R Graph Cookbook, we learnt some intermediate to advanced recipes for customizing line graphs.

In this article we will learn some intermediate to advanced recipes for processing dates to make time series charts and stock charts.

 

R Graph Cookbook

R Graph Cookbook

Detailed hands-on recipes for creating the most useful types of graphs in R – starting from the simplest versions to more advanced applications

  • Learn to draw any type of graph or visual data representation in R
  • Filled with practical tips and techniques for creating any type of graph you need; not just theoretical explanations
  • All examples are accompanied with the corresponding graph images, so you know what the results look like
  • Each recipe is independent and contains the complete explanation and code to perform the task as efficiently as possible
        Read more about this book      

(For more resources on R, see here.)

Formatting time series data for plotting

Time series or trend charts are the most common form of line graphs. There are a lot of ways in R to plot such data, however it is important to first format the data in a suitable format that R can understand. In this recipe, we will look at some ways of formatting time series data using the base and some additional packages.

Getting ready

In addition to the basic R functions, we will also be using the zoo package in this recipe. So first we need to install it:

install.packages("zoo")

How to do it...

Let's use the dailysales.csv example dataset and format its date column:

sales<-read.csv("dailysales.csv")

d1<-as.Date(sales$date,"%d/%m/%y")

d2<-strptime(sales$date,"%d/%m/%y")

data.class(d1)
[1] "Date"

data.class(d2)
[1] "POSIXt"

How it works...

We have seen two different functions to convert a character vector into dates. If we did not convert the date column, R would not automatically recognize the values in the column as dates. Instead, the column would be treated as a character vector or a factor.

The as.Date() function takes at least two arguments: the character vector to be converted to dates and the format to which we want it converted. It returns an object of the Date class, represented as the number of days since 1970-01-01, with negative values for earlier dates. The values in the date column are in a DD/MM/YYYY format (you can verify this by typing sales$date at the R prompt). So, we specify the format argument as "%d/%m/%y". Please note that this order is important. If we instead use "%m/%d/%y", then our days will be read as months and vice-versa. The quotes around the value are also necessary.

The strptime() function is another way to convert character vectors into dates. However, strptime() returns a different kind of object of class POSIXlt, which is a named list of vectors representing the different components of a date and time, such as year, month, day, hour, seconds, minutes, and a few more.

POSIXlt is one of the two basic classes of date/times in R. The other class POSIXct represents the (signed) number of seconds since the beginning of 1970 (in the UTC time zone) as a numeric vector. POSIXct is more convenient for including in data frames, and POSIXlt is closer to human readable forms. A virtual class POSIXt inherits from both of the classes. That's why when we ran the data.class() function on d2 earlier, we get POSIXt as the result.

strptime() also takes a character vector to be converted and the format as arguments.

There's more...

The zoo package is handy for dealing with time series data. The zoo() function takes an argument x, which can be a numeric vector, matrix, or factor. It also takes an order.by argument which has to be an index vector with unique entries by which the observations in x are ordered:

library(zoo)

d3<-zoo(sales$units,as.Date(sales$date,"%d/%m/%y"))

data.class(d3)
[1] "zoo"

See the help on DateTimeClasses to find out more details about the ways dates can be represented in R.

Plotting date and time on the X axis

In this recipe, we will learn how to plot formatted date or time values on the X axis.

Getting ready

For the first example, we only need to use the base graphics function plot().

How to do it...

We will use the dailysales.csv example dataset to plot the number of units of a product sold daily in a month:

sales<-read.csv("dailysales.csv")
plot(sales$units~as.Date(sales$date,"%d/%m/%y"),type="l",
xlab="Date",ylab="Units Sold")

Creating Time Series Charts in R

How it works...

Once we have formatted the series of dates using as.Date(), we can simply pass it to the plot() function as the x variable in either the plot(x,y) or plot(y~x) format.

We can also use strptime() instead of using as.Date(). However, we cannot pass the object returned by strptime() to plot() in the plot(y~x) format. We must use the plot(x,y) format as follows:

plot(strptime(sales$date,"%d/%m/%Y"),sales$units,type="l",
xlab="Date",ylab="Units Sold")

There's more...

We can plot the example using the zoo() function as follows (assuming zoo is already installed):

library(zoo)
plot(zoo(sales$units,as.Date(sales$date,"%d/%m/%y")))

Note that we don't need to specify x and y separately when plotting using zoo; we can just pass the object returned by zoo() to plot(). We also need not specify the type as "l".

Let's look at another example which has full date and time values on the X axis, instead of just dates. We will use the openair.csv example dataset for this example:

air<-read.csv("openair.csv")

plot(air$nox~as.Date(air$date,"%d/%m/%Y %H:%M"),type="l",
xlab="Time", ylab="Concentration (ppb)",
main="Time trend of Oxides of Nitrogen")

(Move the mouse over the image to enlarge it.)

The same graph can be made using zoo as follows:

plot(zoo(air$nox,as.Date(air$date,"%d/%m/%Y %H:%M")),
xlab="Time", ylab="Concentration (ppb)",
main="Time trend of Oxides of Nitrogen")

R Graph Cookbook Detailed hands-on recipes for creating the most useful types of graphs in R – starting from the simplest versions to more advanced applications
Published: January 2011
eBook Price: $29.99
Book Price: $49.99
See more
Select your format and quantity:
        Read more about this book      

(For more resources on R, see here.)

Annotating axis labels in different human readable time formats

In this recipe, we will learn how to choose the formatting of time axis labels, instead of just using the defaults.

Getting ready

We will only use the basic R functions for this recipe. Make sure you are at the R prompt and load the openair.csv dataset:

air<-read.csv("openair.csv")

How to do it...

Let's redraw our original example of plotting air pollution data from the last recipe, but with labels for each month and year pairing:

plot(air$nox~as.Date(air$date,"%d/%m/%Y %H:%M"),type="l",
xaxt="n",
xlab="Time", ylab="Concentration (ppb)",
main="Time trend of Oxides of Nitrogen")

xlabels<-strptime(air$date, format = "%d/%m/%Y %H:%M")
axis.Date(1, at=xlabels[xlabels$mday==1], format="%b-%Y")

How it works...

In our original example of plotting air pollution data in the last recipe, we only formatted the date/time vector to pass as an x argument to plot(), but the axis labels were chosen automatically by R as the years 1998, 2000, 2002, and 2004. In this example, we drew a custom axis with labels for each month and year pairing.

We first created an object xlabels of class POSIXlt by using the strptime() function. Then we used the axis.Date() function to add the X axis. axis.Date() is similar to the axis() function and takes the side and at arguments. In addition, it also takes the format argument, which we can use to specify the format of the labels. We specified the at argument as a subset of xlabels for only the first day of each month by setting mday=1. The format value "%b-%Y" means abbreviated month name with full year.

Adding vertical markers to indicate specific time events

We may wish to indicate specific points of importance or measurements in a time series, where there is a significant event or change in the data. In this recipe, we will learn how to add vertical markers using the abline() function.

Getting ready

We will only use the basic R functions for this recipe. Make sure you are at the R prompt and load the openair.csv dataset:

air<-read.csv("openair.csv")

How to do it...

Let's take our air pollution time series example again and draw a red vertical line on Christmas day – 25/12/2003:

plot(air$nox~as.Date(air$date,"%d/%m/%Y %H:%M"),type="l",
xlab="Time", ylab="Concentration (ppb)",
main="Time trend of Oxides of Nitrogen")

abline(v=as.Date("25/12/2003","%d/%m/%Y"))

How it works...

As we have seen before in the recipe introducing abline(), we drew a vertical line in the example by setting the v argument to the date we want to mark. We specified 25/12/2003 as the x co-ordinate by using the as.Date() function. Note that the original time series plotted also contains the timestamp in addition to the dates. Since we didn't specify a time, the line was plotted at the start of the specified date 25/12/2003 00:00.

There's more...

Let's look at another example, where we want to draw a vertical marker line on Christmas day of every year:


markers<-seq(from=as.Date("25/12/1998","%d/%m/%Y"),
to=as.Date("25/12/2004","%d/%m/%Y"),
by="year")

abline(v=markers,col="red")

We created a sequence of the Christmas dates for each year using the seq() function, which takes from, to, and by arguments. Then we passed this vector to the abline() function as v.

One important thing to note is that by default R does not deal with gaps in a time series. There can be missing values denoted by NA and as you can see in the previous examples, the graphs show gaps in those places. However, if any dates or time intervals are missing from the actual dataset, then R draws a line connecting the data points before and after the gap instead of leaving it blank. In order to remove this connecting line, we must fill in the missing time intervals in the gap and set the y values to NA.

Plotting data with varying time averaging periods

In this recipe, we will learn how we can plot the same time series data by averaging it over different time periods using the aggregate() function.

Getting ready

We will only use the basic R functions for this recipe. Make sure you load the openair.csv dataset:

air<-read.csv("openair.csv")

How to do it...

Let's plot the air pollution time series with weekly and daily averages instead of hourly values:

air$date = as.POSIXct(strptime(air$date, format = "%d/%m/%Y %H:%M",
"GMT"))
means <- aggregate(air["nox"], format(air["date"],"%Y-%U"),mean, na.rm
= TRUE)
means$date <- seq(air$date[1], air$date[nrow(air)],length =
nrow(means))
plot(means$date, means$nox, type = "l")

means <- aggregate(air["nox"], format(air["date"],"%Y-%j"),mean, na.rm
= TRUE)`
means$date <- seq(air$date[1], air$date[nrow(air)],length =
nrow(means))
plot(means$date, means$nox, type = "l",
xlab="Time", ylab="Concentration (ppb)",
main="Daily Average Concentrations of Oxides of Nitrogen")

How it works...

The key function in these examples is the aggregate() function. Its first argument is R object x, which has to be aggregated, in this case air["nox"]. The next argument is the list of grouping elements over which x has to be aggregated. This is the part where we specify the time period over which to average the values. In the first example we set it to format(air["date"],"%Y-%U"), which extracts all the weeks out of the date column using the format() function. The third argument is FUN or the name of the function to apply to the selected values, in our case mean. Finally, we set na.rm to TRUE, thus telling R to ignore missing values denoted by NA.

Once we have the mean values saved in a data frame, we add a date field to this new vector using the seq() function and then plot the means against the date using plot().

In the second example, we use format(air["date"],"%Y-%j") to calculate daily means.

Creating stock charts

Given R's powerful analysis and graphical capabilities, it is no surprise that R is very popular in the world of finance. In this recipe, we will learn how to plot data from the stock market using some special libraries.

Getting ready

We need the tseries and quantmod packages to run the following recipes. Let's install and load these two packages:

install.packages("quantmod")
install.packages("tseries")
library(quantmod)
library(tseries)

How to do it...

Let's first see an example using the tseries library function get.hist.quotes(). We will compare stock prices of three technology companies:

aapl<-get.hist.quote(instrument = "aapl", quote = c("Cl", "Vol"))

goog <- get.hist.quote(instrument = "goog", quote = c("Cl", "Vol"))

msft <- get.hist.quote(instrument = "msft", quote = c("Cl", "Vol"))

plot(msft$Close,main = "Stock Price Comparison",
ylim=c(0,800) ,col="red" ,type="l" ,lwd=0.5,
pch=19 ,cex=0.6 ,xlab="Date" ,ylab="Stock Price (USD)")

lines(goog$Close,col="blue",lwd=0.5)
lines(aapl$Close,col="gray",lwd=0.5)

legend("top",horiz=T,legend=c("Microsoft","Google","Apple"),
col=c("red","blue","gray"),lty=1,bty="n")

Creating Time Series Charts in R

How it works...

The get.hist.quote() function retrieves historical financial data from one of two providers (yahoo (for Yahoo) or oanda (for OANDA), yahoo being the default). We passed the instrument and quote arguments to this function which specify the name of the stock and the measure of stock data we want. In our example, we used the function three times to pull the closing price and volume for Microsoft (msft), Google (goog), and Apple (aapl). We then plotted the three stock prices on a line graph using the plot() and lines() functions.

There's more...

Now let's make some charts using the quantmod package. This package provides inbuilt graphics functions to visualize the stock data:

getSymbols("AAPL",src='//dgdsbygo8mp3h.cloudfront.net/sites/default/files/blank.gif' data-original="yahoo")
barChart(AAPL)

First we obtained stock data for Apple using the getSymbols() function by specifying the stock name and source. Again, the default source is Yahoo. The stock data is stored in an R object with the same name as the stock symbol (AAPL for Apple, GOOG for Google, and so on). Then we passed this object to the barChart() function to produce the previous graph above. Of course, it is more than just a bar chart.

A similar chart in a different color scheme can be drawn as follows:

candleChart(AAPL,theme="white")

For more detailed information about the quantmod package, visit its website at: http://www.quantmod.com

Summary

In this article we learnt some intermediate to advanced recipes for processing dates to make time series charts and stock charts.


Further resources on this subject:


R Graph Cookbook Detailed hands-on recipes for creating the most useful types of graphs in R – starting from the simplest versions to more advanced applications
Published: January 2011
eBook Price: $29.99
Book Price: $49.99
See more
Select your format and quantity:

About the Author :


Hrishi V. Mittal

Hrishi Mittal has been working with R for a few years in different capacities. He was introduced to the exciting world of data analysis with R when he was working as Senior Air Quality Scientist at King’s College London, where he used R extensively to analyze large amounts of air pollution and traffic data for informing the London Mayor’s Air Quality Strategy. He has experience in various other programming languages, but prefers R for data analysis and visualization. He is actively involved in various R mailing lists, forums and the development of some R packages.

In early 2010, he started Pretty Graph Limited (http://www.prettygraph.com), a software company specializing in web-based data visualization products. The company’s flagship product Pretty Graph uses R as the backend engine for helping researchers and businesses visualize and analyze data. The goal is to bring the power of R to a wider audience by providing a modern graphical user interface which can be accessed by anyone and from anywhere simply using a web browser.

Books From Packt


Statistical Analysis with R
Statistical Analysis with R

PHP jQuery Cookbook
PHP jQuery Cookbook

Inkscape 0.48 Illustrator's Cookbook
Inkscape 0.48 Illustrator's Cookbook

OpenSceneGraph 3.0: Beginner's     Guide
OpenSceneGraph 3.0: Beginner's Guide

Python 2.6 Graphics Cookbook
Python 2.6 Graphics Cookbook

Drupal 7 Module Development
Drupal 7 Module Development

Pentaho Reporting 3.5 for Java Developers
Pentaho Reporting 3.5 for Java Developers

Android User Interface Development:   Beginner's Guide
Android User Interface Development: Beginner's Guide


Your rating: None Average: 5 (1 vote)

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
T
Q
U
M
2
S
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software