In this article by **Atmajitsinh Gohil**, author of the book, R Data Visualization Cookbook we will cover the following recipes:

- Introducing a scatter plot
- Scatter plot with texts, labels, and lines
- Connecting points in a scatter plot
- Generating an interactive scatter plot
- A simple bar plot
- An interactive bar plot
- A simple line plot
- Line plot to tell an effective story
- Generating an interactive Gantt/timeline chart in R
- Merging histograms
- Making an interactive bubble plot
- Constructing a waterfall plot

*(For more resources related to this topic, see here.)*

# Introduction

The main motivation behind this article is to introduce the basics of plotting in R and element of interactivity via the *googleVis* package. The basic plots are important as many packages developed in R use basic plot arguments and hence to understand them creates a good foundation for new R users. We will start by exploring the scatter plots in R, which are the most basic plots for exploratory data analysis, and then dive into interactive plots. Every section will start with an introduction to basic R plot and we will build interactive plots thereafter. We will utilize the power of R analytics and implement them using the *googleVis* package to introduce the element of interactivity.

The *googleVis* package is developed by Google and it uses the Google Chart API to create interactive plots. There is a range of plots available with the *googleVis* package and this provides us with an advantage to plot the same data on various plots and select the one that delivers us the correct message. The package undergoes regular updates and releases, and new charts are implemented with every release.

The readers should note that there are other alternatives available to create interactive plots in R, but it is not possible to explore all of them and hence I have selected *googleVis* to display interactive elements in a chart. I have selected these purely based on my experience with interactivity in plots. The other good interactive package is offered by GGobi.

The first part introduces the basics of plotting in R using scatter plot as an example and also introduces the users to interactivity using the *iPlots* package. The second part introduces bar plot functionality in R and further introduces the *googleVis* package to create an interactive bar plot. The third part delves into line plots and how we can make them more meaningful by simply making use of the options available in the line plot functionality in the *googleVis* package. The fourth section of the book discusses interactive histograms. We conclude the article by introducing interactive bubble plots and waterfall plots in parts five and six, respectively.

# Introducing a scatter plot

Scatter plots are used primarily to conduct a quick analysis of the relationships among different variables in our data. It is simply plotting points on the x-axis and y-axis. Scatter plots help us detect if two variables have a positive, negative, or no relationship. In this recipe, we will study the basics of plotting in R using scatter plots. The following screenshot is an example of a scatter plot:

## Getting ready

For implementing the basic scatter plot in R, we would use Carseats data available with *ISLR* package in R.

## How to do it…

We will also start this recipe by installing necessary packages using the *install.packages()* function and loading the same in R using the *library()* function:.

```
install.packages("ISLR")
library(ISLR)
```

Next, we need to load the data in R. Almost all R packages come with preloaded data and hence you can load the data only after you load the library in R. We can attach the data in R using the *attach() function*. We can view the entire list of datasets along with their respective libraries in R by typing *data()* in R console window:

`attach(Carseats)`

Once we attach the data, it's a good practice to view the data using *head(Carseats)*. The *head()* function will display the first six entries of the dataset and will allow us to know the exact column headings of the data:

`head(Carseats)`

The data can be plotted in R by calling the *plot()* function. The *plot()function* in R comes with a variety of options and the best way to know all the options is by simply typing *?plot()* in the R console window:

```
plot(Income, Sales,col = c(Urban),pch = 20, main ="sales of Child
Car Seats", xlab = "Income (000's of Dollars)",
ylab ="Unit Sales (in 000's)" )
```

This particular plot requires us to plot the legends as the points have two different color schemes. In R, we can add a legend using the *legend()* function:

```
legend("topright",cex = 0.6, fill = c("red","black"),
legend = c("Yes","No"))
```

## How it works…

The *install.packages()* and *library()* functions are used in most of the recipes in this book.

The *attach()* function is a nice way to reference the data as this allows us to avoid typing the *$* notation. The *$* notation is another way to reference columns in a data and is discussed in the next recipe. Once we attach the data, it's a good practice to view the data using *head(Carseats)*. The *head()* f unction has data as its first argument. To view less number of lines in the R console window, we can also type *head(Carseats, 3)*. The *tail(Carseats)* function will display data entries from the bottom of the dataset.

The data can be plotted in R by calling the *plot()* function. The first two arguments in the *plot()* function refer to the data to be plotted on the x-axis (Income) and y-axis (Sales). The *col* argument allows us to assign color to our data points. In this case, we would like to use a qualitative data column (Urban) to color our points. The default color in R is black but we can change this using the *col = "blue"* argument. Please refer to to the code file to learn about various other options. The *pch = 20* argument allows us to plot symbols; the value *20* will plot filled circles. To view all the available *pch* values, please type *?par* or *?points* in the R console window. We can also label the heading of the plot using the *main ="Sales"* argument. The *xlab* and *ylab* arguments are used to label the *x* and *y* axes in R.

*To display a legend is necessary for this scatter plot as we would like to differentiate between sales in urban and rural areas. The first argument in the legend() function corresponds to the position of the legend. The cex argument is used to size the text, the default value for cex is 1. The fill argument fills the boxes with the specified colors and the legend argument applies the labels to each of the box.*

# Scatter plots with texts, labels, and lines

In the previous recipe, we studied how to construct a very basic scatter plot. In order for the plot to deliver a strong message, we need to add elements such as text, labels, and lines. The main objective of a visualization is to grab the attention of its audience and make the optimal use of the data available. The audience should be able to get most of its information from the visualization itself.

The following screenshot plots the child mortality rate in selected countries. The story we would like to share with the readers is the relationship between child mortality rate and **Gross Domestic Product** (**GDP**) of a country. We can improve on our understanding of these relationships if the readers can compare extreme scenarios or compare a specific country with a benchmark (average child mortality rate).

## How to do it…

In the previous recipe, we used a dataset from the *ISLR* package. But what if we would like to import our own data in R? We can set a working directory in R using the *setwd()* function. This is a necessary step as R will always search for the datafile in the active/current directory. The *setwd()* function allows us to set our working directory:

`setwd("D:/book/scatter_Area/data")`

The *read.csv()* function is used to import the data in R:

`child = read.csv("chlmort.csv", header = TRUE, sep =",")`

The *summary()* function is used to get a general idea about the distribution of variables in our data. The *head()* function allows us to view the actual data:

```
summary(child)
head(child)
```

The following code is used to plot the skeleton of our scatter plot. Few of the arguments may look very familiar to you from the previous recipe. We have used *child$gdp_bil* and *child$child* instead of *gdp_bil* and *child*. This change was necessary as we did not use the *attach()* command:

```
plot(child$gdp_bil, child$child, pch = 20, col = "#756bb1",
xlim=c(0,max(child$gdp_bil)), ylim = c(0,190), xlab = "GDP in
Billions in current US$", ylab ="Child Mortality rate", main =
"child Mortality Rate in selected countries for 2012")
```

In order to plot a horizontal or a vertical line in R, we can use the *abline()* function. The *h =()* argument will draw a horizontal line. The value *36.18* is the world average of child mortality rate and to add this makes it easier to compare the data across countries. The *lwd = 1* argument increases the width of the line and *col = "red"* adds color to the line:.

`abline(h = (36.18), lwd = 1, col = "red")`

To generate an effective presentation, we add labels to extreme points on our plot. We can immediately observe that GDP and child mortality rate share a negative relationship. We can go a step further and make the plot easy to interpret if we add text, using the *text()* function, to extreme observations in our data:

```
text(8000,25,labels = c("Luxemborg"), cex = 0.75)
text(600,182,labels= c("Sierra Leone"), col = "red", cex = 0.75)
text(4000, 50,labels = c("Average Child Mortality"),
col = "red", cex = 0.75)
```

## How it works…

To import data in R, we need to direct R to the folder where the data is stored. We can either type the command *setwd()* to let R know where to find the file, or we can navigate to the folder via **Session** | **Set Working Directory** | **Choose Directory**.

Under the *plot()* function, we have introduced the *$* notation. The name before the *$* sign corresponds to the data and the name after the *$* sign refers to the column (*child$gdp_bil*). We have used *ylim()* to specify the *y* limit for the plot. We can also assign a hex value to the color instead of simply typing in the name of the color using *col = "#756bb1"*. All the remaining arguments are discussed in detail in the previous recipe.

The *text()* function uses three arguments: x-axis, y-axis, and labels. The x-axis and y-axis arguments inform R as to where exactly to place the labels. The *labels =c()* argument is the actual label to be placed. The *cex = 0.75* argument is used to state the size of the fonts. We can learn about various other text arguments available under R using the command *?text()*.

## There's more…

At times, we would like to add a trend line to our plot. We could achieve this in a variety of ways by fitting a regression line or using the *scatter.smooth()* function that allows users to plot a smooth line using **Locally Weighted Scatterplot Smoothing** (**LOESS**). We will use LOESS to study the trend in our data. The idea behind the LOESS is to fit a weighted polynomial with more weight to points near the points whose value is being estimated and less weight to points further away. The method was initially proposed by Cleveland. We will avoid going into the mathematical details of how LOESS is calculated, and instead we will assume that R will correctly apply this to our data.

The dashed line in the following screenshot is not the usual regression line but a trend line fitted using LOESS methodology:

Plotting a trend line is not very difficult. The following code would first import the data and then plot the trend line using the LOESS method using the *scatter.smooth()* function:

```
scatter.smooth(child$gdp_bil, child$child, pch = 20, lwd =0.75,
col = "Blue", lpars = list(lty = 3, col ="black", lwd = 2),
xlab ="GDP in Billions in current US$", ylab ="Child Mortality
rate", main = "child Mortality Rate in selected countries
for 2012" )
```

We use the *lpars()* function to beautify the trend line. The attributes passed in the *lpars* function are the same attributes passed in the previous two plots. Readers can learn more about the *scatter.smooth()* function by typing *?scatter.smooth* in the R console window.

## See also

*A long road Ahead in Regaining Lost Jobs*is a New York Times visualization that uses a text to provide additional information to its audience. It can be accessed at http://www.nytimes.com/interactive/2010/10/13/business/economy/economy_graphic.html?_r=0.*Narrative Visualization: Telling Stories with Data*,*Edward Segel and Jeffrey Heer*, 2010, can be accessed at http://vis.stanford.edu/files/2010-Narrative-InfoVis.pdf.- Nathan Yau has explained the
*smooth.scatter()*function and LOESS on his blog and can be viewed at http://flowingdata.com/2010/03/29/how-to-make-a-scatterplot-with-a-smooth-fitted-line/.

# Connecting points in a scatter plot

The primary objective of this recipe is to understand how we can connect points in a scatter plot. The plot is inspired by Alberto Cairo infographic regarding the Gini coefficient and the GDP data under various president's tenure in Brazil and connected points based on these three variables. In this recipe, we will apply the same concept to the USA economy.

## How to do it…

We will start to import the data in R. The dataset comprises of *Gini* coefficient, GDP data of USA and USA Presidents. The *Gini* coefficient is used as a measure of inequality in a country:

`data = read.csv("ginivsgdp1.csv", header = TRUE)`

The plot can be generated using the plot() function:

```
plot(income$gdp_ann,income$Gini,pch = 20, col =
c(data$Presidents), type = "o",xlab =" GDP of USA", ylab = "Gini
coefficient",main = "Inequality in USA", xaxp = c(0,18000,8))
```

Since we use *col* to distinguish between periods of various presidents in USA, we require legends in the plot. The legend is added to our plot using the *legend()* function:

```
legend("bottomright",fill = c(6,7,4,2,9,5,3,1,8), legend =
c("Johnson","Nixon","Ford","Carter","Reagan","G.Bush","Clinton",
"Bush","Obama"), bty = "n", cex=0.7)
```

## How it works…

Most of the arguments used in the *plot()* function have been discussed in prior recipes of this article. The *type = "o"* argument connects lines in a plot. Readers curious to know more about the various types of option should type *?plot* in the R console window.

The important point to note is that R used a qualitative variable such as Presidents to plot points of different colors. We would require the color names to pass as an argument under the fill. R converts the presidents' names into some numeric value and uses it to color each point. We can view these numeric values by typing the following lines:

```
cols = as.numeric(income$Presidents)
cols
```

We can simply use these numeric values and pass it as a vector under the *legend()* function. The first argument in the *legend()* function is the position of the legend. The third argument corresponds to the labels. We suppress drawing a box around the label using the *bty = "n"* argument. The *cex* argument allows us to size the labels in R.

## There's more…

We could also add texts and lines to our plot as shown in the following screenshot:

We have implemented the same code as mentioned in the *How to do it…* section of this recipe. But instead of applying different colors, we can also add a line and text to our chart:

```
abline(v = 14958, lwd =1.5)
text(16200, 0.46,"obama")
```

The *text()* functionality in R will allow us to add a text. The first and second arguments under *text()* represent the *x* and *y* coordinates of the plot. The third argument is the actual label to be applied. The *abline()* function is used to apply a vertical line to our plot*. To plot labels on all the points, we can use the following code*:

```
plot(income$gdp_ann,income$Gini,pch = 20, col = "Black",
xlab =" GDP of USA", ylab = "Gini coefficient",
xaxp = c(0,18000,10),bty = "n")
text(income$gdp_ann, income$Gini,income$years, cex = 0.7,
pos = 2, offset = 0)
```

*Since we require to plot all the years, we pass income$years as our argument for labels. The pos argument is used to adjust the position of the label around the point. Readers may observe overlapping labels and they can try to fix this by setting pos and offset. I would suggest the readers to type ?text in the R console window to learn more about the text() function.*

## See also

- Alberto Cairo visualization on inequality and GDP in Brazil can be accessed at http://www.thefunctionalart.com/2012/09/in-praise-of-connected-scatter-plots.html

# Generating an interactive scatter plot

In the previous recipe, we studied how multivariate data can be displayed on a scatter plot. We used color as a visual cue to display information related to different presidents in USA and how the economy performed. In this recipe, we will build on the same idea and introduce interactive scatter plots. The limitation of a static plot is that they are hard to interpret if the points overlap, and if the gridlines are not present the data may be hard to decipher as well. Interactive plots help us to overcome this limitation. In this recipe, we will plot a simple interactive scatter plot with a trend line as shown in the following screenshot:

## Getting ready

We would plot an interactive scatter plot using the *googleVis* package in R.

## How to do it…

To generate an interactive scatter plot, we will install and load the *googleVis* package in R. We can import the data in R using the *read.csv()* function:

```
install.packages("googleVis")
library(googleVis)
income = read.csv("ginivsgdp1.csv", header = TRUE)
```

By default, the *googleVis* package will use the first column as the *x* variable. Since our first column is not GDP, we will use the GDP data and *gini* data from the imported CSV file to construct a new data frame in R:

`scater = data.frame(gdp = c(income$gdp_ann),gini= c(income$Gini))`

Now, we can generate an interactive scatter plot in R. Note that the *googleVis* package will display the scatter plot in a new browser window only when the *plot()* function is executed:

```
scaterp4 = gvisScatterChart(scater, option= list(width = 650,
height = 600, legend = "none",title = "Reltionship between
Inequality and GDP growth in USA",
hAxis = "{title :'GDP'}",
vAxis = "{title :'Gini'}",
dataOpacity = 0.8,
trendlines="{0:{type : 'linear', visibleInLegend: true,
showR2: true}}"))
plot(scaterp4)
```

## How it works…

Readers new to R can learn about the *data.frame()* function by typing *?data.frame* in the R console window.

The first argument in the *gvisScatterChart()* function is our data frame. We can have more than one column in our data frame but the x-axis will be assigned to the first column. The *googleVis* package comes with some very useful options that allow us to add labels to *x* and *y* axes, and add title and opacity to our scatter plot. We will discuss these in detail at a later point in this article. The options are added to a plot using the *option()* function.

The *trendlines* argument adds a linear trend line to our scatter plot. Note the use of *0* in the *trendline* argument that corresponds to the series. The *visibleInLegend* and *showR2* arguments add the estimates and coefficient of determination to the plots legend section.

## There's more…

In this section, we will learn to plot multiple y-axis values on the same scatter plot. This is shown in the following screenshot. Note that we have processed the data and stored it as a new CSV file.

The code used to generate this plot is exactly the same as the one discussed under the *How to do it…* section of this recipe. But we have altered the data file and stored it as a new CSV file. To learn more about various options available, please refer to the *googleVis* developer website.

## See also

- The
*googleVis*developer website can be accessed at https://developers.google.com/chart/interactive/docs/gallery/scatterchart#Configuration_Options.

# A simple bar plot

A bar plot can often be confused with histograms (studied later in this article). Histograms are used to study the distribution of data whereas bar plots are used to study categorical data. Both the plots may look similar to naked eye but the main difference is that the width of a bar plot is not of significance, whereas in histograms the width of the bars signifies the frequency of data..

In this recipe, I have made use of Infant Mortality Rate in India. The data is made available by the Government of India. The main objective is to study the basics of a bar plot in R as shown in the following screenshot:

## How to do it…

We start the recipe by importing our data in R using the *read.csv()* function. R will search for the data under the current directory, and hence we use the *setwd()* function to set our working directory:

```
setwd("D:/book/scatter_Area/chapter1")
data = read.csv("infant.csv", header = TRUE)
```

Once we import the data, we would like to process the data by ordering this. We will order the data using the *order()* function in R. We would like R to order the column *Total2011* in a decreasing order:

`data = data[order(data$Total2011, decreasing = TRUE),]`

We use the *ifelse()* functionality to create a new column. We would utilize this new column to add different colors to bars in our plot. We could also write a loop in R to do this task but we will keep this for later. The *ifelse()* function is quick and easy. We instruct R to assign *yes* if values in the column *Total2011* are more than 12.2 and no otherwise. The 12.2 value is not randomly chosen but is the average infant mortality rate of India:

`new = ifelse(data$Total2011>12.2,"yes","no")`

We would now like to join the vector of yes and no to our original dataset. In R, we can join columns using the *cbind()* function. Rows can be combined using *rbind()*:

`data = cbind(data,new)`

When we initially plot the bar plot, we observe that we need more space at the bottom of the plot. We can achieve this in R by passing the *mar()* argument within the *par()* function. The *mar()* function uses four arguments: bottom, left, top, and right spacing:

`par(mar = c(10,5,5,5))`

We can now generate a bar plot in R using the *barplot()* function. The *abline()* function is used to add a horizontal line on the bar plot:

```
barplot(data$Total2011, las = 2, names.arg= data$India,width =
0.80, border = NA,ylim=c(0,20), col = "#e34a33", main = "Infant
Mortality Rate of India in 2011")
abline(h = 12.2, lwd =2, col = "white", lty =2)
```

## How it works…

The *order()* function uses permutation to rearrange (decreasing or increasing) the rows based on the variable. We would like to plot the bars from highest to lowest, and hence we require to arrange the data. The *ifelse()* function is used to generate a new column. We would use this column under the *There's more…* section of this recipe. The first argument under the *ifelse()* function is the statement logical test to be performed. The second argument is the value to be assigned if the test is true, and the third argument is the value to be assigned if the logical test fails.

The first argument in the *barplot()* function defines the height of the bars and *horiz = TRUE* (not used in our code) instructs R to plot bars horizontally. The default setting in R will plot bars vertically. The *names.arg* argument is used to label the bars. We also specify *border = NA* to remove the borders and *las = 2* is specified to apply the direction to our labels. Try using the *las* values as 1,2,3, or 4 and observe how the plot changes.

The first argument in the *abline()* function assigns the position where the line is drawn, that is, vertical or horizontal. The *lwd*, *lty*, and *col* arguments are used to define the width, line type, and color of the line.

## There's more…

While plotting a bar plot, it's a good practice to order the data in ascending or descending order. An unordered bar plot does not convey the right message and the plot is hard to read when there are more bars involved. When we observe a plot, we are interested to get the most information out, and ordering the data is the first step toward achieving this objective.

We have not specified how we can use the *ifelse()* and *cbind()* functions in the plot. If we would like to color the plot with different colors to let the readers know which states have high infant mortality above the country level; we can do this by pasting *col = (data$new*) in place of *col = "#e34a33"*.

## See also

- New York Times has a very interesting implementation of interactive bar chart and can be accessed at http://www.nytimes.com/interactive/2007/09/28/business/20070930_SAFETY_GRAPHIC.html.

# An interactive bar plot

We would like to make the bar plot interactive. The advantage of using the Google Chart API in R is the flexibility this provides in making interactive plots. The *googleVis* package allows us to skip the step to export a plot from R to an illustrator and we can make presentable plots right out of R.

The bar plot functionality in R comes with various options and it is not possible to demonstrate all of the options in this recipe. We will try to explore plot options that are specific to bar plot.

In this recipe, we will learn to plot returns data of Microsoft over a 2-year period. The data for the exercise was downloaded using Google Finance. We have calculated one day returns for Microsoft in MS Excel and exported the CSV in R:

`Return = ((Pricet - Pricet-1)/pricet-1) *100`

Readers should note that the following plot is only a part of the actual chart:

## Getting ready

In order to plot a bar plot, we would install the *googleVis* package.

## How to do it…

We would start the recipe by installing the *googleVis* package and loading the same in R:

```
install.packages("googleVis")
library(googleVis)
```

When R loads a library, it loads several messages: we can suppress these messages using *suppressPackageStartupMessages (library (googleVis))*. We can now import our data using the *read.csv()* function. The data file comprises of three variables: date, daily Microsoft prices, and daily returns:

`stock = read.csv("spq.csv", header = TRUE)`

We generate an interactive bar plot using the *gvisBarChart()* function:

```
barpt = gvisBarChart(stock, xvar = "Date", yvar = c("Returns"),
options = list(orientation = "horizontal", width = 1400,
height = 500,title = "Microsoft returns over 2 year period",
legend = "none",
hAxis = "{title :'Time Period',titleTextStyle :{color:'red'}}",
vAxis = "{title : 'Returns(%)', ticks : [-12,-6,0,6,
12],titleTextStyle :{color: 'red'}}",
bar = "{groupWidth: '100%'}"))
```

The plot is generated in a new browser window when we execute the *plot()* function:

`plot(barpt)`

## How it works…

The *googleVis* Package will generate a plot in a new browser window and requires an Internet connectivity to generate the same in R. The *googleVis* package is a simple communication medium between the Google Charts API and R. The following code creates a bar plot.

We have defined our *gvisBarChart()* function as *barpt*; we need this in R as all the *googleVis* functions will generate a list that contains the HTML code and reference to a JavaScript function. If you omit *barpt =*, R will display the code generated in your command window.

The first argument under the *gvisBarChart()* *function is the data argument; we have stored the data in R as a data frame called ***stock***. The second and third arguments are the column names of the data to be displayed on the x-axis and y-axis. The options* *argument is used to define a list of options. All the available options related to the bar plot and their descriptions are available on the googleVis developer website. *Since we are plotting stock returns of Microsoft (only one series), we can avoid legends. The default plot will include legends—this can be overwritten using *legend = "none"*.

To add title to our plot, we use the *title* attribute. We label our axes using the *vAxis *and *hAxis* attributes. Note the use of *{ }* and *[ ]* within *vAxis* and *hAxis*. We make use of *{ }* to group all the elements related to *hAxis* and *vAxis* instead of typing *vAxis.title* or *vAxis.title.TextStyle*. If readers are familiar with CSS or HTML, this code would be very easy to interpret. *We have used the group.width attribute and set it to 100 percent in order to eliminate the spacing between bars. Finally, we call the plot() function to display our visualization.*

## There's more…

In the previous recipe, we constructed a bar plot. The *googleVis* package also allows us to create a table and merge the same with a bar chart. The following screenshot is only part of the plot:

The combined table and bar chart are generated in three different steps. We will first generate a bar chart. The code is exactly the same as the one discussed under the *How to do it…* section of this recipe. We will then generate a table using the following lines of code:

```
table <- gvisTable(stock, options=list(page='enable',
height='automatic',
width='automatic'),
formats = list(Returns =' #.##'))
```

In the final step, we will merge the two chart objects (*barpt* and *table*) using the *gvisMerge()* function:

```
comb = gvisMerge(table,barpt, horizontal = TRUE)
plot(comb)
```

We can display the merged visualization comb using the *plot()* function.

The readers can learn more about exporting the visualization in the *googleVis* package manual. It is also possible to integrate the *googleVis* package plots with *shiny*.

## See also

- Google Chart API developer website can be accessed at https://developers.google.com/chart/interactive/docs/gallery/barchart#Configuration_Options
- The
*googleVis*package manual can be accessed at http://cran.r-project.org/web/packages/googleVis/googleVis.pdf - A blog on integrating
*shiny*with*googleVis*can be accessed at http://www.magesblog.com/2013/02/first-steps-of-using-googlevis-on-shiny.html

# A simple line plot

Line plots are simply lines connecting all the *x* and *y* dots. They are very easy to interpret and are widely used to display an upward or downward trend in data. In this recipe, we will use the *googleVis* package and create an interactive R line plot. We will learn how we can emphasize on certain variables in our data. The following line plot shows fertility rate:

## Getting ready

We will use the *googleVis* package to plot the line plot.

## How to do it…

In order to construct a line chart, we will install and load the *googleVis* package in R. We would also import the fertility data using the *read.csv()* function:

```
install.packages("googleVis")
library(googleVis)
frt = read.csv("fertility.csv", header = TRUE, sep =",")
```

The fertility data is downloaded from the OECD website. We can construct our line object using the *gvisLineChart()* function:

```
gvisLineChart(frt, xvar = "Year",
yvar=c("Australia","Austria","Belgium","Canada","Chile","OECD34"),
options = list( width = 1100, height= 500, backgroundColor =
"#FFFF99",title ="Fertility Rate in OECD countries" ,
vAxis = "{title : 'Total Fertility
Rate',gridlines:{color:'#DEDECE',count : 4}, ticks :
[0,1,2,3,4]}",
series = "{0:{color:'black', visibleInLegend :false},
1:{color:'BDBD9D', visibleInLegend :false},
2:{color:'BDBD9D', visibleInLegend :false},
3:{color:'BDBD9D', visibleInLegend :false},
4:{color:'BDBD9D', visibleInLegend :false},
34:{color:'3333FF', visibleInLegend :true}}"))
```

We can construct the visualization using the *plot()* function in R:

`plot(line)`

## How it works…

The first three arguments of the *gvisLineChart()* function are the data and the name of columns to be plotted on the x-axis and y-axis. The *options* argument lists the chart API options to add and modify elements of a chart.

For the purpose of this recipe, we will use part of the dataset. Hence, while we assign the series to be plotted under *yvar = c()*, we will specify the column names that we would like to be plotted in our chart. Note that the series starts at 0, and hence Australia, which is the first column, is in fact series 0 and not 1.

For the purpose of this exercise, let us assume that we would like to demonstrate the mean fertility rate among all OECD economies to our audience. We can achieve this using *series {}* under *option = list()*. The *series* argument will allow us to specify or customize a specific series in our dataset. Under the *gvisLineChart()* function, we instruct Google API to color OECD series (series 34) and Australia (series 0) with a different color and also make legend visible only for OECD and not for all the series.

It would be best to display all the legends but we use this to show the flexibility that comes with Google Chart API. Finally, we can use the *plot()* function to plot the chart in a browser. The following screenshot displays a part of the data. The *dim()* function gives us a general idea about the dimensions of the fertility data:

New York times Visualization often combines line plots with bar chart and pie charts. Readers should try constructing the visualization. We can use the *gvisMerge()* function to merge plots. The function allows merging of just two plots and hence the readers would have to use multiple *gvisMerge()* functions to create a very similar visualization. The same can also be constructed in R but we will lose the interactive element.

## See also

- OECD website provides economic data related to ECD member countries. The data can be accessed at no charge from the website http://www.oecd.org/statistics/.
- New York Times Visualization combines bar chart and line chart and can be accessed at http://www.nytimes.com/imagepages/2009/10/16/business/20091017_CHARTS_GRAPHIC.html

# Line plot to tell an effective story

In the previous recipe, we learned how to plot a very basic line plot and use some of the options. In this recipe, we will go a step further and make use of specific visual cues such as color and line width for easy interpretation.

Line charts are a great tool to visualize time series data. The fertility data is discrete but connecting points over time provides our audience with a direction. The visualization shows the amazing progress countries such as Mexico and Turkey have achieved in reducing their fertility rate.

OECD defines fertility rate as "Refers to the number of children that would be born per woman, assuming no female mortality at child-bearing ages and the age-specific fertility rates of a specified country and reference period".

Line plots have been widely used by New York Times to create very interesting infographics. This recipe is inspired by one of the New York Times visualizations. It is very important to understand that many of the infographics created by professionals are created using D3.js or Processing. We will not go into the detail of the same but it is good to know the working of these software and how they are used to create visualizations.

## Getting ready

We would require to install and load the *googleVis* package to construct a line chart.

## How to do it…

To generate an interactive plot, we will load the fertility data in R using the *read.csv()* function. To generate a line chart that plots the entire dataset, we will use the *gvisLineChart()* function:

```
line = gvisLineChart(frt, xvar = "Year", yvar=c("Australia",
"Austria","Belgium","Canada","Chile","Czech.Republic",
"Denmark","Estonia","Finland","France","Germany","Greece","Hungary
",
"Iceland","Ireland","Israel","Italy","Japan","Korea","Luxembourg",
"Mexico",
"Netherlands","New.Zealand","Norway","Poland","Portugal","Slovakia
","Slovenia",
"Spain","Sweden","Switzerland","Turkey","United.Kingdom","United.
States","OECD34"),
options = list( width = 1200, backgroundColor = "#ADAD85",title
="Fertility Rate in OECD countries" ,
vAxis = "{gridlines:{color:'#DEDECE',count : 3}, ticks :
[0,1,2,3,4]}",
series = "{0:{color:'BDBD9D', visibleInLegend :false},
20:{color:'009933', visibleInLegend :true},
31:{color:'996600', visibleInLegend :true},
34:{color:'3333FF', visibleInLegend :true}}"))
```

To display our visualization in a new browser, we use the generic R *plot()* function:

`plot(line)`

## How it works…

The arguments passed in the *gvisLineChart()* function stated in the previous section have exactly the same as discussed under the simple line plot with some minor changes. We would like to plot the entire data for this exercise,and hence we have to state all the column names in *yvar =c().*

Also, we would like to color all the series with the same color but highlight Mexico, Turkey, and OECD average. We have achieved this in the previous code using *series {}* and further specify and customize colors and legend visibility for specific countries.

In this particular plot, we have made use of the same color for all the economies but have highlighted Mexico and Turkey to signify the development and growth that took place in the 5-year period. It would also be effective if our audience could compare the OECD average with Mexico and Turkey. This provides the audience with a benchmark they can compare with.

If we plot all the legends, it may make the plot too crowded and 34 legends may not make a very attractive plot. We could avoid this by only making specific legends visible rather than all. This makes our plot tidier.

## See also

- D3 is a great tool to develop interactive visualization and this can be accessed at http://d3js.org/.
- Processing is an open source software developed by MIT and can be downloaded from https://processing.org/.
- A good resource to pick colors and use them in our plots is the following link: http://www.w3schools.com/tags/ref_colorpicker.asp.
- I have used New York Times infographics as an inspiration of this plot. You can find a collection of visualization put out by New York Times in 2011 by going to this link, http://www.smallmeans.com/new-york-times-infographics/.

# Generating an interactive Gantt/timeline chart in R

Wikipedia describes a Gantt chart as "illustrate the start and finish dates of the terminal elements and summary elements of a project". These charts are used to track the progress of project displayed against time. The first Gantt chart was developed by Karol Adamiecki in 1890.

Even though the most important application of Gantt charts is in project management, they have been applied in visualization to represent the following:

- Historical era of artist
- Periods during which baseball players are disabled
- Vanishing Wall Street firms

## Getting ready

To generate a timeline plot, we will require to install and load the *googleVis* package in R.

## How to do it…

We would import our data in R using the *read.csv()* function. The *gvisTimeline()* function requires the dates to be in date format in R. Hence, we use the *as.POSIXct() *and *as.Character()* functions to re-define a new data frame:

```
base = read.csv("disable.csv")
data = data.frame(position = as.character(base$position), player =
as.character(base$player), start = as.POSIXct(base$start), end =
as.POSIXct(base$end))
```

The data was collected by me and is available online. We use the *gvisTimeline()* function to generate an object, which is displayed using the *plot()* function:

```
baseball = gvisTimeline(data = data, rowlabel ="position",start
="start", end = "end",barlabel ="player" , option = list(width =
1000, height = 900,timeline="{singleColor :'#002A3E'}"))
plot(baseball)
```

Many of the arguments used in the *gvisTimeline()* function are self-explanatory. The *start* and *end* arguments refer to the start date and end dates, respectively; in the case of baseball data, they correspond to the length of time for which players are on disability list. We have passed the *single-color* argument under the options to color all the lines with the same color.

## See also

- Hisory of Gantt Chart can be accessed at http://www.gantt.com/.
- New York Times visualizes Mets disability and Mets winning percentage in 2009. The timeline is combined with a line plot to add an interesting element to the visualization and is available at http://www.nytimes.com/interactive/2009/10/01/sports/baseball/mets-injuries.html.
- New York Times has a very interesting timeline plot of financial institution: how they merged or vanished from Wall Street. The text is used to provide its audience with additional information. This is available at http://www.nytimes.com/imagepages/2008/09/28/business/28lloyd.graf01.ready.html.
- Information about injured New York Yankees can be accessed at http://sports.newsday.com/long-island/baseball/yankees/injured-yankees/.

# Merging histograms

Histograms help in studying the underlying distribution. It is more useful when we are trying to compare more than one histogram on the same plot; this provides us greater insight into the skewness and the overall distribution.

In this recipe, we will study how to plot a histogram using the *googleVis* package and how we merge more than one histogram on the same page. We will only merge two plots but we can merge more plots and try to adjust the width of each plot. This makes it easier to compare all the plots on the same page. The following plot shows two merged histograms:

## How to do it…

In order to generate a histogram, we will install the *googleVis* package as well as load the same in R:

```
install.packages("googleVis")
library(googleVis)
```

We have downloaded the prices of two different stocks and have calculated their daily returns over the entire period. We can load the data in R using the *read.csv()* function. Our main aim in this recipe is to plot two different histograms and plot them side by side in a browser. Hence, we require to divide our data in three different data frames. For the purpose of this recipe, we will plot the *aapl* and *msft* data frames:

```
stk = read.csv("stock_cor.csv", header = TRUE, sep = ",")
aapl = data.frame(stk$AAPL)
msft = data.frame(stk$MSFT)
googl = data.frame(stk$GOOGL)
```

To generate the histograms, we implement the *gvisHistogram()* function:

```
al = gvisHistogram(aapl, options = list(histogram = "{bucketSize
:1}",legend = "none",title ='Distribution of AAPL Returns',
width = 500,hAxis = "{showTextEvery: 5,title:
'Returns'}",vAxis = "{gridlines : {count:4}, title :
'Frequency'}"))
mft = gvisHistogram(msft, options = list(histogram = "{bucketSize
:1}",legend = "none",title ='Distribution of MSFT Returns',
width = 500,hAxis = "{showTextEvery: 5,title: 'Returns'}",
vAxis = "{gridlines : {count:4}, title : 'Frequency'}"))
```

We combine the two *gvis* objects in one browser using the *gvisMerge()* function:

```
mrg = gvisMerge(al,mft, horizontal = TRUE)
plot(mrg)
```

## How it works…

The *data.frame()* function is used to construct a data frame in R. We perform this step as we do not want to plot all the three histograms on the same plot. Note the use of the *$* notation in the *data.frame()* function.

The first argument in the *gvisHistogram()* function is our data stored as a data frame. We can display individual histograms using the *plot(al)* and *plot(mft)* functions. But in this recipe, we will plot the final output.

We observe that most of the attributes of histogram function are the same as discussed in previous recipes. The histogram functionality will use an algorithm to create buckets, but we can control this using the *bucketSize* as *histogram = "{bucketSize :1}"*.

Try using different bucket sizes and observe how the buckets in the histograms change. More options related to histograms can also be found in the following link under the *Controlling Buckets* section:

https://developers.google.com/chart/interactive/docs/gallery/histogram#Buckets.

We have utilized *showTextEvery*, which is also very specific to histogram. This option allows us to specify how many horizontal axis labels we would like to show. We have used *5* to make the histogram more compact. Our main objective is to observe the distribution and the plot serves our purpose. Finally, we will make use of *plot()* to plot the chart in our favorite browser.

We do the same steps to plot return distribution of Microsoft (MSFT). Now, we would like to place both the plots side by side and view the differences in the distribution. We will use the *gvisMerge()* function to generate histograms side by side.

In our recipe, we have two plots for AAPL and MSFT. The default setting plots each chart vertically but we can specify *horizontal = true* to plot charts horizontally..

# Making an interactive bubble plot

My first encounter with a bubble plot was while watching a TED video of Hans Roslling. The video led me to search for creating bubble plots in R; a very good introduction to this is available on Flowing Data website.The advantage of a bubble plot is that it allows us to visualize a third variable, which in our case would be the size of the bubble.

In this recipe, I have made use of the *googleVis* package to plot a bubble plot but you can also achieve this in R. The advantage of the Google Chart API is the interactivity and the ease with which they can be attached to a web page. Also note that we could also use squares instead of circles, but this is not implemented in the Google Chart API yet.

In order to implement a bubble plot, I have downloaded the crime dataset by state. The details regarding the link and definition of crime data are available in the *crime.txt* file and are shown in the following screenshot:

## How to do it…

As with all the plots in this article, we will install and load the *googleVis* Package. We will also import our data file in R using the *read.csv()* function:

`crm = read.csv("crimeusa.csv", header = TRUE, sep =",")`

We can construct our bubble chart using the *gvisBubbleChart()* function in R:

```
bub1 = gvisBubbleChart(crm,idvar = "States",xvar= "Robbery", yvar=
"Burglary", sizevar ="Population", colorvar = "Year",
options = list(legend = "none",width = 900, height = 600,title
=" Crime per State in 2012", sizeAxis ="{maxSize : 40, minSize
:0.5}",vAxis = "{title : 'Burglary'}",hAxis= "{title :
'Robbery'}"))
bub2 = gvisBubbleChart(crm,idvar = "States",xvar= "Robbery", yvar=
"Burglary",sizevar ="Population",
options = list(legend = "none",width = 900, height = 600,title
=" Crime per State in 2012", sizeAxis ="{maxSize : 40, minSize
:0.5}",vAxis = "{title : 'Burglary'}",hAxis= "{title :
'Robbery'}"))
```

The *bub2* object does not size the bubbles, but shades them and the scale of shading is automatically displayed on the top of the chart. In order to view the visualization, the readers can type *plot(bub2)* in the R console window. To view both the bubble plots side by side, the readers can use the *gvisMerge()* function in R:

```
bub3 = gvisMerge(bub1,bub2, horizontal = TRUE)
plot(bub3)
```

## How it works…

The *gvisBubbleChart* function uses four attributes to create a bubble chart, which are as follows:

*data*: This is the data defined as a data frame, in our example,*crm**idvar*: This is the vector that is used to assign IDs to the bubbles, in our example,*states**xvar*: This is the column in the data to plot on x-axis, in our example,*Robbery**yvar*: This is the column in the data to plot on y-axis, in our example,*Burglary**sizevar*: This is the column used to define the size of the bubble*colorvar*: This is the column used to define the color

We can define the minimum and maximum sizes of each bubble using *minSize* and *maxSize*, respectively, under *options()*. Note that we have used *gvisMerge *to portray the differences among the bubble plots. In the plot on the right, we have not made use of *colorvar* and hence all the bubbles are of the same size.

## There's more…

The Google Chart API makes it easier for us to plot a bubble, but the same can be achieved using the R basic plot function. We can make use of the symbols to create a plot. The symbols need not be a bubble; it can be a square as well. By this time, you should have watched Hans' TED lecture and would be wondering how you could create a motion chart with bubbles floating around. The Google Charts API has the ability to create motion charts and the readers can definitely use the *googleVis* reference manual to learn about this.

## See also

- Ted video by Hans Rosling can be accessed at http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen
- Flowing data website generates bubble charts using the basic R plot function and can be accessed at http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/
- Animated Bubble Chart by New York Times can be accessed at http://2010games.nytimes.com/medals/map.html

# Constructing a waterfall plot in R

The waterfall plots or staircase plots are observed mostly in financial reports. I have not come across nonfinancial application of these plots. The first and last columns of the plot are usually a total column and the floating columns indicate the incremental change. In this recipe, we generate some fake data of sales figures for every month.

## Getting ready

In order to generate a waterfall plot, we will require to install and load the *plotrix* package in R.

## How to do it…

The data for the same is imported in R using the *read.csv()* function. If we view the data, this begins with a total from last year sales and we have some gains and some losses in sales, which are indicated by positive and negative values. The last row represents the current year sales, which is the sum of all the rows:

`sales = read.csv("waterf.csv")`

The waterfall plot is constructed in R using the *staircase.plot()* function:

```
staircase.plot(sales$value, totals= sales$logic, labels =
sales$labels, total.col = c("lightgreen"),inc.col = c("blue",
"red","red","blue","blue","blue","red","red","blue","blue",
"red","blue"),main ="Waterfall Plot showing financial data")
```

The first argument in the *staircase.plot()* function refers to the height of the columns. The second argument is a logical vector wherein the *TRUE* refers to the total columns in our data and *FALSE* corresponds to the incremental change. The *labels *argument is used to apply the labels to our plot. The *total.col* argument is used to apply color to the total columns (in our case, it is the first and last columns).

We specify the colors for incremental change columns under the *inc.col* argument. For the incremental change columns, we have repeated the blue and red colors. We would prefer to use negative values to have red colored columns and positive incremental change to have blue colored columns.

**Summary**

To learn more about R Data Visualization Cookbook, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended:

- Instant R Starter (https://www.packtpub.com/big-data-and-business-intelligence/instant-r-starter-instant)
- R for Data Science (https://www.packtpub.com/big-data-and-business-intelligence/r-data-science)

## Resources for Article:

**Further resources on this subject:**

- Working with Data – Exploratory Data Analysis [article]
- Big Data Analytics [article]
- Deep learning in R [article]