Customizing Graphics and Creating a Bar Chart and Scatterplot in R

Exclusive offer: get 50% off this eBook here
Statistical Analysis with R

Statistical Analysis with R — Save 50%

Take control of your data and produce superior statistical analysis with R.

$26.99    $13.50
by John M. Quick | October 2010 | Open Source

The R Project for Statistical Computing (or just R for short) is a powerful data analysis tool. It is both a programming language and a computational and graphical environment.

R is free, open source software made available under the GNU General Public License. It runs on Mac, Windows, and Unix operating systems.

The official R website is available at the following site:

http://www.r-project.org

In this article by John M. Quick, author of the book Statistical Analysis with R, you will learn how to:

  • Create different charts, graphs, and plots in R
  • Customize your R visuals using text, colors, axes, and legends

Statistical Analysis with R

Statistical Analysis with R

Take control of your data and produce superior statistical analysis with R.

  • An easy introduction for people who are new to R, with plenty of strong examples for you to work through
  • This book will take you on a journey to learn R as the strategist for an ancient Chinese kingdom!
  • A step by step guide to understand R, its benefits, and how to use it to maximize the impact of your data analysis
  • A practical guide to conduct and communicate your data analysis with R in the most effective manner

 

        Read more about this book      

(For more resources on R, see here.)

Charts, graphs, and plots in R

R features several options for creating charts, graphs, and plots. In this article, we will explore the generation and customization of these visuals, as well as methods for saving and exporting them for use outside of R. The following visuals will be covered in this article:

  • Bar graphs
  • Scatterplots
  • Line charts
  • Box plots
  • Histograms
  • Pie charts

Time for action — creating a bar chart

A bar chart or bar graph is a common visual that uses rectangles to depict the values of different items. Bar graphs are especially useful when comparing data over time or between diverse groups. Let us create a bar chart in R:

  1. Open R and set your working directory:

    > #set the R working directory
    > #replace the sample location with one that is relevant to you
    > setwd("/Users/johnmquick/rBeginnersGuide/")

  2. Use the barplot(...) function to create a bar chart:

    > #create a bar chart that compares the mean durations of
    the battle methods
    > #calculate the mean duration of each battle method
    > meanDurationFire <- mean(subsetFire$DurationInDays)
    > meanDurationAmbush <- mean(subsetAmbush$DurationInDays)
    > meanDurationHeadToHead <-
    mean(subsetHeadToHead$DurationInDays)
    > meanDurationSurround <- mean(subsetSurround$DurationInDays)
    > #use a vector to define the chart's bar values
    > barAllMethodsDurationBars <- c(meanDurationFire,
    meanDurationAmbush, meanDurationHeadToHead,
    meanDurationSurround)
    > #use barplot(...) to create and display the bar chart
    > barplot(height = barAllMethodsDurationBars)

  3. Your chart will be displayed in the graphic window, similar to the following:

What just happened?

You created your first graphic in R. Let us examine the barplot(...) function that we used to generate our bar chart, along with the new R components that we encountered.

barplot(...)

We created a bar chart that compared the mean durations of battles between the different combat methods. As it turns out, there is only one required argument in the barplot(...) function. This height argument receives a series of values that specify the length of each bar. Therefore, the barplot(...) function, at its simplest, takes on the following form:

barplot(height = heightValues)

Accordingly, our bar chart function reflected this same format:

> barplot(height = barAllMethodsDurationBars)

Vectors

We stored the heights of our chart's bars in a vector variable. In R, a vector is a series of data. R's c(...) function can be used to create a vector from one or more data points. For example, the numbers 1, 2, 3, 4, and 5 can be arranged into a vector like so:

> #arrange the numbers 1, 2, 3, 4, and 5 into a vector
> numberVector <- c(1, 2, 3, 4, 5)

Similarly, text data can also be placed into vector form, so long as the values are contained within quotation marks:

> #arrange the letters a, b, c, d, and e into a vector
> textVector <- c("a", "b", "c", "d", "e")

Our vector defined the values for our bars:

> #use a vector to define the chart's bar values
> barAllMethodsDurationBars <- c(meanDurationFire, meanDurationAmbush, meanDurationHeadToHead, meanDurationSurround)

Many function arguments in R require vector input. Hence, it is very common to use and encounter the c(...) function when working in R.

Graphic window

When you executed your barplot(...) function in the R console, the graphic window opened to display it. The graphic window will have different names across different operating systems, but its purpose and function remain the same. For example, in Mac OS X, the graphic window is named Quartz.

For the remainder of this article, all R graphics will be displayed without the graphics window frame, which will allow us to focus on the visuals themselves.

Pop quiz

  1. When entering text into a vector using the c(...) function, what characters must surround each text value?
    1. Quotation marks
    2. Parenthesis
    3. Asterisks
    4. Percent signs
  2. What is the purpose of the R graphic window?
    1. To debug graphics functions
    2. To execute graphics functions
    3. To edit graphics
    4. To display graphics
Statistical Analysis with R Take control of your data and produce superior statistical analysis with R.
Published: October 2010
eBook Price: $26.99
Book Price: $44.99
See more
Select your format and quantity:
        Read more about this book      

(For more resources on R, see here.)

Time for action — customizing graphics

Although the barplot(...) function only requires the height of each bar to be specified, creating a chart in this manner leaves us with a bland and difficult to decipher visual. In most cases, you will want to customize your R graphics by incorporating additional arguments into your functions. Let us explore how to use graphic customization arguments by expanding our bar chart:

  1. Expand your bar chart using graphic customization arguments:

    > #use additional arguments to customize a graphic
    > #define a title for the bar chart
    > barAllMethodsDurationLabelMain <- "Average Duration by Battle Method"
    > #define x and y axis labels for the bar chart
    > barAllMethodsDurationLabelX <- "Battle Method"
    > barAllMethodsDurationLabelY <- "Duration in Days"
    > #set the x and y axis scales
    > barAllMethodsDurationLimX <- c(0, 5)
    > barAllMethodsDurationLimY <- c(0, 120)
    > #define rainbow colors for the bars
    > barAllMethodsDurationRainbowColors <- rainbow(length(barAllMethodsDurationBars))
    > #incorporate customizations into the graphic function using the main, xlab, ylab, xlim, ylim, names, and col arguments
    > #use barplot(...) to create and display the bar chart
    > barplot(height = barAllMethodsDurationBars,
    main = barAllMethodsDurationLabelMain,
    xlab = barAllMethodsDurationLabelX,
    ylab = barAllMethodsDurationLabelY,
    xlim = barAllMethodsDurationLimX,
    ylim = barAllMethodsDurationLimY,
    col = barAllMethodsDurationRainbowColors)

  2. Your chart will be displayed in the graphic window, as shown in the following screenshot:

  3. Add a legend to the chart, using the following snippet:

    > #add a legend to the bar chart
    > #the x and y arguments position the legend
    > #x and y can be defined using words or numerical coordinates
    > #the legend argument receives a vector containing the labels for the legend
    > barAllMethodsDurationLegendLabels <- c("Fire", "Ambush", "Head to Head", "Surround")
    > #the fill argument contains the colors for the legend
    > legend(x = 0, y = 120, legend = barAllMethodsDurationLegendLabels,
    fill = barAllMethodsDurationRainbowColors)

  4. Your legend will be added to the existing chart.

What just happened?

The barplot(...) function, as well as the other graphic functions that we will use in this article, accept a variable number of arguments. In fact, R graphics functions have many customizable options and therefore tend to accept several arguments. We expanded our bar chart using a collection of the most common customization arguments, which apply to nearly all R graphics functions.

Graphic customization arguments

We used six arguments to customize our bar chart:

  • main: a text title for the graphic
  • xlab: a text label for the x axis
  • ylab: a text label for the y axis
  • xlim: a vector containing the lower and upper limits for the x axis
  • ylim: a vector containing the lower and upper limits for the y axis
  • col: a vector containing the colors to be used in the graphic

The general format for these arguments is as follows:

argument = value

When incorporated into a graphics function, these arguments take on the following form:

graphicsFunction(..., argument = value)

Recognize that these six arguments can be applied to nearly every R graphics function. Each one can be used alone or they can be used in tandem. We will use these arguments throughout the article to refine and improve our visuals.

main, xlab, and ylab

The main, xlab, and ylab arguments are all used to add clarifying text to graphics. A primary title for a graphic is defined by main, while labels for the x and y axes are specified using xlab and ylab, respectively.

Our barplot(...) function made use of the main, xlab, and ylab arguments. We saved our argument values into variables prior to incorporating them into the barplot(...) function. First, we defined our text values as variables.

> #define a title for the bar chart
> barAllMethodsDurationLabelMain <- "Average Duration by Battle Method"
> #define x and y axis labels for the bar chart
> barAllMethodsDurationLabelX <- "Battle Method"
> barAllMethodsDurationLabelY <- "Duration in Days"

Then, we used our variables in the final barplot(...) function:

> barplot(height = barAllMethodsDurationBars,
main = barAllMethodsDurationLabelMain,
xlab = barAllMethodsDurationLabelX,
ylab = barAllMethodsDurationLabelY,
xlim = barAllMethodsDurationLimX,
ylim = barAllMethodsDurationLimY,
col = barAllMethodsDurationRainbowColors)

This variable technique has the advantages of rendering our code more decipherable and making it easier for us to return to and reuse our data in future graphics.

xlim and ylim

The xlim and ylim arguments receive a vector containing the minimum and maximum values for the x and y axes respectively. Thus, in:

xlim = c(50, 250)

A graphic's x axis is told to present the data that fall between 50 and 250. The ylim argument operates in identical fashion to xlim, with the exception that it acts upon the y axis. These arguments are useful for rescaling a graphic's axes to improve its visual presentation. They can also have the effect of emphasizing or deemphasizing certain data ranges.

In our chart, we used xlim to set a minimum of 0 and a maximum of 5 for the x axis. This evenly and comfortably spaced our bars within the graphic window. We used ylim to set a minimum of 0 and maximum of 120 for the y axis. This ensured that all of our data were represented and that our bars were displayed at a reasonable height.

> barplot(height = barAllMethodsDurationBars,
main = barAllMethodsDurationLabelMain,
xlab = barAllMethodsDurationLabelX,
ylab = barAllMethodsDurationLabelY,
xlim = barAllMethodsDurationLimX,
ylim = barAllMethodsDurationLimY,
col = barAllMethodsDurationRainbowColors)

Col

R can generate colors in two different forms using Col; they can be rainbow colors which are automatic, or you can specify colors of your choice.

Rainbow colors

R can generate an automatic sequence of colors for a chart with the rainbow(...) function. For our purposes, we simply identified the number of colors that we wished to generate for our chart. To obtain the appropriate number of colors, we used the length(object) command. This function tells us the number of items contained in a given object. In our case, using length(object) on the barAllMethodsDurationBars yielded a result of 4, which represents each of our chart's bars:

> barAllMethodsDurationSpecificColors <- rainbow(length(barAllMethodsDurationBars))

Consequently, the rainbow(...) function generated four colors. These colors were applied to the chart's bars when we included the barAllMethodsDurationRainbowColors variable in the col argument of our barplot(...) function.

> barplot(height = barAllMethodsDurationBars,
main = barAllMethodsDurationLabelMain,
xlab = barAllMethodsDurationLabelX,
ylab = barAllMethodsDurationLabelY,
xlim = barAllMethodsDurationLimX,
ylim = barAllMethodsDurationLimY,
col = barAllMethodsDurationRainbowColors)

Specific colors

Alternatively, specific colors can be defined using the col argument in tandem with a vector list of color names. Common color names such as red, green, blue, and yellow are valid inputs. In this situation, the col argument takes on the following form:

col = colorVector

Where colorVector is a variable storing a vector of color values like the following:

c("red", "green", "blue", "yellow")

You can see a complete list of the colors available in R by executing the colors() function.

Had we wanted to use specific colors in our bar chart, we could have employed the following code:

> #define specific colors for the bars
> barAllMethodsDurationSpecificColors <- c("red", "green", "blue", "yellow")
> #use barplot(...) to create and display the bar chart
> barplot(height = barAllMethodsDurationBars,
main = barAllMethodsDurationLabelMain,
xlab = barAllMethodsDurationLabelX,
ylab = barAllMethodsDurationLabelY,
xlim = barAllMethodsDurationLimX,
ylim = barAllMethodsDurationLimY,
col = barAllMethodsDurationSpecificColors)

legend(...)

The finishing touch to our bar chart was a legend, or key, that indicated what our bars represented. In R, the legend(...) function employs the following arguments:

  • x: the x position of the chart in numeric terms; alternatively you can set the overall position of the legend using one of the text values topleft, top, topright, left, center, right, bottomleft, bottomcenter, or bottomright
  • y: the y position of the chart in numeric terms; if text is used for x, omit this argument
  • legend: a vector containing the labels to be used in the legend
  • fill: a vector containing the colors to be used in the legend

The basic format for the legend function is as follows:

legend(x = xPosition, y = yPosition, legend = labelVector, fill = colorVector)

For instance, the following code:

> legend(x = "topleft", legend = c("a", "b"), fill = rainbow(2))

This would yield a legend placed at the top-left position with labels for a and b whose colors were generated by the rainbow(...) function. Note that the x argument used a text value and y was omitted as an alternative to defining the exact numerical position of the legend.

Our function used the x and y coordinates from our chart to position the legend in the upper left-hand corner. When using numbers to define the x and y arguments, the values will always depend on the limits of the x and y axes. For instance, a position of (0, 120) specified the upper left-hand corner in our chart, but a graphic with a maximum y value of 50 would have an upper left-hand corner position of (0, 50). Our legend and fill arguments incorporated the same labels and colors that were used to generate our bar chart. Thus, our legend was matched to the information depicted in our chart:

> legend(x = 0, y = 120,
legend = barAllMethodsDurationLegendLabels,
fill = barAllMethodsDurationRainbowColors)

Notice the peculiar implementation of the legend(...) function, which we have not previously encountered. As we will see with other graphics functions, legend(...) does not stand alone. To be properly employed, a compatible graphic must already exist for legend(...) to act upon. In this situation, legend(...) adds a new legend on top of the visual that is displayed in the graphic window. However, if no graphic is currently displayed when the legend(...) function is executed, an error message is returned. This is demonstrated in the following code:

> #using the legend(...) function when no graphic already exists
results in the following error
> legend(x = "topleft", legend = c("a", "b"), fill = rainbow(2))
Error in strwidth(legend, units = "user", cex = cex) :
plot.new has not been called yet

Therefore, to add a legend to your graphics in R, be sure to always create the graphic first, then apply the legend(...) function.

Pop quiz

  1. An xlim value of c(100, 300) means which of the following?
    1. Present the data that are not equal to 100 or 300 on the x axis.
    2. Present the data that are equal to 100 or 300 on the x axis.
    3. Present the data that are less than 100 or greater than 300 on the x axis.
    4. Present the data that are between 100 and 300 on the x axis.
  2. When should the legend(...) function be called?
    1. Before a graphic function is called.
    2. During a graphic function, included as an argument.
    3. After a graphic function.
    4. When a compatible graphic is displayed in the graphic window.

Time for action — creating a scatterplot

A scatterplot is a fundamental statistics graphic that can be used to better understand the relationships underlying a dataset. Like descriptive statistics and correlations, scatterplots are especially useful as a precursor to more extensive data analyses, such as linear regression modeling. We can use R to generate scatterplots that depict a single relationship between two variables or the relationships between all of the variables in a dataset. We will practice both of these methods:

  1. Use the plot(...) function to create a scatterplot depicting a single relationship between two variables:

    > #create a scatterplot that depicts the relationship between
    the number of Shu and Wei soldiers engaged in past fire attacks
    > #get the data to be used in the plot
    > scatterplotFireWeiSoldiersData <- subsetFire$WeiSoldiers
    > scatterplotFireShuSoldiersData <- subsetFire$ShuSoldiers
    > #customize the plot
    > scatterplotFireSoldiersLabelMain <- "Soldiers Engaged in Past Fire Attacks"
    > scatterplotFireSoldiersLabelX <- "Wei"
    > scatterplotFireSoldiersLabelY <- "Shu"
    > #use plot(...) to create and display the scatterplot
    > plot(x = scatterplotFireWeiSoldiersData,
    y = scatterplotFireShuSoldiersData,
    main = scatterplotFireSoldiersLabelMain,
    xlab = scatterplotFireSoldiersLabelX,
    ylab = scatterplotFireSoldiersLabelY)

  2. Your plot will be displayed in the graphic window, as shown in the following:

  3. Use the plot(...) function to simultaneously depict the relationships between all of the variables in the dataset:

    > #create a scatterplot that depicts the relationships between all of the variables in our fire attack dataset
    > plot(x = subsetFire)

  4. A grouping of several plots will be displayed in the graphic window:

What just happened?

We created two scatterplots using R's plot(...) function, one portraying a single relationship and one displaying all of the relationships in our dataset.

Single scatterplot

To plot a single relationship between two variables, use R's plot(...) function. The primary arguments for plot(...) are:

  • x: the variable to be plotted on the x axis
  • y: the variable to be plotted on the y axis

Thus, the simplest form of plot(...) contains arguments only for the x and y variables, and is as shown:

plot(x = xVariable, y = yVariable)

We used the plot(...) function to visualize the relationship between the number of Shu and Wei soldiers involved in past fire attacks. To add relevant text to our graphic, we included the main, xlab, and ylab arguments:

> plot(scatterplotFireWeiSoldiersData,
scatterplotFireShuSoldiersData,
main = scatterplotFireSoldiersLabelMain,
xlab = scatterplotFireSoldiersLabelX,
ylab = scatterplotFireSoldiersLabelY)

Multiple scatterplots

We also used the plot(...) function to simultaneously explore all of the relationships within our dataset. This yielded a graphic that contained a scatterplot for every variable pair. The format for creating this type of scatterplot is:

plot(x = dataset)

Where dataset is a set of data containing multiple variables. For us, the dataset argument contained our fire attack data.

> plot(x = subsetFire)

The resulting plot allowed us to visualize all of the relationships between our variables in a single graphic.

Pop quiz

  1. Assume that a and b are data variables. Which of the following best describes the graphic that would result from the following line of code?

    > plot(x = a, y = b)

    1. A scatterplot with a on the x axis and b on the y axis.
    2. A scatterplot with b on the x axis and a on the y axis.
    3. A scatterplot containing all of the relationships in the dataset.
    4. A scatterplot containing none of the relationships in the dataset.
  2. Assume that a is a dataset. Which of the following best describes the graphic that would result from the following line of code?

    > plot(x = a)

    1. A scatterplot with a on the x axis.
    2. A scatterplot with a on the y axis.
    3. A scatterplot containing all of the relationships in the dataset.
    4. A scatterplot containing none of the relationships in the dataset.

Summary

In this article, you created several charts, graphs, and plots. This process entailed using R's graphical prowess to generate, customize, and export visual representations of your data. At this point, you should be able to:

  • Use R to create various charts, graphs, and plots
  • Customize your R visuals using colors, lines, and symbols

In the next article we will take a look at some more charts, graphs, and plots in R. We will also take a look at exporting graphics for use outside of R.


Further resources on this subject:


Statistical Analysis with R Take control of your data and produce superior statistical analysis with R.
Published: October 2010
eBook Price: $26.99
Book Price: $44.99
See more
Select your format and quantity:

About the Author :


John M. Quick

He is an Educational Technology doctoral student at Arizona State University who is interested in the design, research, and use of educational innovations. Currently, his work focuses on mobile, game-based, and global learning, interactive mixed-reality systems, and innovation adoption. John's blog, which provides articles, tutorials, reviews, perspectives, and news relevant to technology and education, is available from http://www.johnmquick.com. In his spare time, John enjoys photography, nature, and travel.

Books From Packt


jQuery 1.4 Reference Guide
jQuery 1.4 Reference Guide

PHP jQuery Cookbook
PHP jQuery Cookbook

Moodle 2.0 First Look
Moodle 2.0 First Look

Drupal 7
Drupal 7

Learning Ext JS 3.2
Learning Ext JS 3.2

PostgreSQL 9 Admin Cookbook
PostgreSQL 9 Admin Cookbook

YUI 2.8: Learning the Library
YUI 2.8: Learning the Library

OpenStreetMap
OpenStreetMap


Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software