About this book

Data is coming at us faster, dirtier, and at an ever increasing rate. The necessity to handle many, complex statistical analysis projects is hitting statisticians and analysts across the globe. This book will show you how to deal with it like never before, thus providing an edge and improving productivity.

"Learning RStudio for R Statistical Computing" will teach you how to quickly and efficiently create and manage statistical analysis projects, import data, develop R scripts, and generate reports and graphics. R developers will learn about package development, coding principles, and version control with RStudio.

This book will help you to learn and understand RStudio features to effectively perform statistical analysis and reporting, code editing, and R development.

The book starts with a quick introduction where you will learn to load data, perform simple analysis, plot a graph, and generate automatic reports. You will then be able to explore the available features for effective coding, graphical analysis, R project management, report generation, and even project management.

"Learning RStudio for R Statistical Computing" is stuffed with feature-rich and easy-to-understand examples, through step-by-step instructions helping you to quickly master the most popular IDE for R development.

Publication date:
December 2012
Publisher
Packt
Pages
126
ISBN
9781782160601

 

Chapter 1. Getting Started

This chapter shows how to obtain R and RStudio. An introduction to the concepts of reproducible research will be given. We will first show a simple RStudio session that already results in a simple, fully reproducible report. If you have ever had to analyze data for work, study, or a research project you'd have probably run into a situation where you ended up with a messy kludge of temporary files, scripts, and intermediate results that are almost impossible to untangle. If this sounds familiar, you probably also had to rewrite pieces of your report while debugging your analyses, or when receiving updates of your data sets. Re-running calculations, and re-inserting figures, tables, and results can take a lot of time. Moreover, as a project turns more and more into a spaghetti of files and folders, reproducing exactly what you did becomes harder and harder. Needless to say, things can become even more difficult when collaborating with a number of people on such projects.

RStudio™ is a free and open source tool that makes it easier for you to do the following:

  • Work with R and R's graphics interactively

  • Organize your code and maintain multiple projects

  • Make your research reproducible

  • Maintain the packages in your R installation

  • Create and share your reports

  • Share your code and collaborate with other users

RStudio runs on all the major operating systems, including Windows, Linux, and Mac OS X. Additionally, it can be used to run R on a remote web server. In that case, RStudio's interface will run in your browser.

This book is aimed at beginning and moderate R users who want to get the most out of R and RStudio. In the coming chapters we will cover most of RStudio's features, and emphasize some best practices in statistical data analyses. A few words about R: R is a free software tool for statistical analyses comprised of the R programming language and the R environment. Here, free means not only free of charge (as in free beer) but also free as in freedom. That is, you are allowed to download and use R, inspect or alter its source code, and redistribute it as you like. Note that this freedom is in fact a requirement to perform truly reproducible research, as it allows one, in principle, to check exactly how data is processed in a certain project, down to R's source code itself.

R is distributed via the Comprehensive R Archive Network, a network of servers around the world from where you can download R and its extension packages. You can access it via www.r-project.org. There are a few other sites offering extension package repositories; the most noteworthy are bioconductor (www.bioconductor.org) and the Omega project for statistical computing(www.omegahat.org).

The R environment is a so-called repl , which stands for a read-evaluate-print loop. That is, it offers a text-based interface where you can enter R commands. After a command is entered, the R engine processes it (evaluation) and possibly prints a result to the screen. Alternatively (and more commonly), the commands can be stored in a text file to be run by R.

Users who are accustomed to point-and-click interfaces for using statistical functionality may find the first encounter with such an interface daunting, and to be honest, the learning curve for R can be steep at times. However, in order to make work reproducible, it is unavoidable to store the steps of your analyses as source code. Moreover, being a true programming language makes R a much more versatile and powerful tool than any point-and-click software that only offers a predefined functionality.

Fortunately for us, writing code is nothing new and over the past decades, many good ideas have been developed in the software industry to make coding and code management a lot easier. RStudio implements many of those ideas for R users. Important tips for your maintaining of your R installation are mentioned as follows:

  • Always use the latest, stable version. This is the version likely to have the least bugs in the older functionality. You can read about the latest features by reading the news file, for example by running View(news()) from the R command line. See the Installing R section for an easier way to install R.

  • Frequently update your installed packages. This is simply done by running the update.packages() command from your R console.

 

RStudio at a glance


Like R, RStudio is a free and open source project. Founded by JJ Allaire, RStudio is also a company that sells services related to their open source product, such as consulting and training.

RStudio is an Integrated Development Environment (IDE) for R. The term IDE comes from the software industry and refers to a tool that makes it easy to develop applications in one or more programming languages. Typical IDEs offer tools to easily write and document code, compile and perform tests, and offer integration with a version control tool.

RStudio integrates the R environment, a highly advanced text editor, R's help system, version control, and much more into a single application. RStudio does not perform any statistical operations; it only makes it easier for you to perform such operations with R. Most importantly, RStudio offers many facilities that make working reproducibly a lot easier.

The following table gives an overview of some of the most important features of RStudio that you will learn to use with this book:

Features

Description

Integration of the R console

Type commands directly in the R console within RStudio.

Code execution

Directly execute code from your script file.

Syntax highlighting

Color (possibly self-defined) keywords and functions for easy reading.

Bracket support

Matching brackets are highlighted upon selection. When typing a bracket "[", brace "(", curly brace or single or double quote, Rstudio autocompletes it for you.

Command completion

Press Tab halfway while typing a command and RStudio shows a menu of matching R functions. When a function is chosen, its arguments and "help" can be shown as well.

Keyboard shortcuts

Common tasks can be accessed quickly by pressing a key or key combination.

Help integration

RStudio allows for browsing and searching R's native help files, and offers context-related help as well.

Object browser

You can inspect every object defined in the running R session.

History browser

RStudio makes it easy to see what commands you used and re-execute them.

Code navigation

Jump from the use of a function to its definition. Jump from code in a report to the code in the source.

Data viewer

A spreadsheet-like view of tables (data.frames).

Data import menus

For some of the most common data types RStudio has a menu that generates the R read command for you.

Graphics integration

Zoom, manipulate, and export graphics interactively.

Project management

Easily switch between several projects.

Version control

RStudio integrates the popular version control systems git and svn.

Document generation

Generate pdf, html, or other report formats using RMarkdown, Sweave, or knitr with the push of a button.

Publishing

Publish your reports and scripts online at Rpubs.com so that others may learn from your examples.

Readers with some programming experience might wonder why a feature such as debugging support is not in the list. The answer is that it is just not there yet. RStudio is continuously being improved and updated, and according to the forums at RStudio's web pages, support for debugging is certainly on the to-do list of the makers.

 

Installing RStudio


Before you install RStudio, you need to install R. It is possible to have multiple versions of R installed side by side. RStudio will use the latest version by default, but can be configured to use a different installed version.

Installing R

RStudio needs at least R version 2.11, but we highly recommend you to install the latest version.

Installing R on Windows and Mac OS X

To download and install R, point your browser to www.r-project.org, click on Download R (in the text underneath the graphics), and choose a server near where you are. From there, follow the instructions in the Download and install R box. Alternatively, use the Download R! button at www.inside-R.org. This website automatically offers you the most recent R version fitting your computer and operating system.

Installing R on Linux

Automatic R installation is supported for several popular Linux flavors, including Debian, OpenSuse, and Ubuntu.

For OpenSuse, the default installation can be obtained by pointing your web browser to http://software.opensuse.org/search, search for r-base, and install from there. At the moment, the newest R version is available from there.

The R version offered by the package installer is frozen when the operating system is released. We assume that you are familiar enough with tools such as Synaptic or aptitude in order to install the R version that comes with those operating systems. Here, we provide some details on how to install the latest R version on Ubuntu or Debian.

CRAN hosts Debian and Ubuntu repositories, which are as follows:

  1. Add the repository for Ubuntu 12.04 (precise pagnolin) by adding (as root) the following line to your /etc/apt/sources.list file:

    deb http://<your_nearest_cran_mirror>/bin/linux/ubuntu precise/
    
  2. Replace <your_nearest_cran_mirror> with a server near where you live. A list of mirrors can be found at http://cran.r-project.org/mirrors.html. Next, register the security key by typing the following:

    sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9
    
  3. Type the following commands to install the R.sudo apt-get update:

    sudo apt-get install r-base
    

Alternatively you can install the latest R now via Synaptic. For Debian 6.05 (squeeze), the line to add to your /etc/apt/sources.list file is deb http://<your_nearest_cran_mirror>/bin/linux/debian squeeze-cran/.

The security key is installed with the following command:

sudo apt-key adv --keyserver subkeys.pgp.net --recv-keys 381BA480

After this, installation proceeds as in Ubuntu.

 

Building R from source


If you wish, you can download the source code R and compile the executables yourself. This is really only for an expert user, so to paraphrase r-project.org: "if you are not sure what compiling means, you most probably do not want to do this".

To make sure that RStudio can talk with the compiled binaries, you need to configure the Makefile using the --enable-R-shlib flag. So after downloading and unpacking the source tarball, change the directory to R2.XX.X, and type the following commands:

./configure --enable-R-shlib
ma
ke
make install
 

Building R using Windows


Most Windows users will use the default installer, but if you want to you can compile R under Windows. You need to download the latest version of RTools (http://cran.r-project.org/bin/windows/Rtools) and follow the instructions on the Rtools web page.

 

Installing RStudio


The desktop version of RStudio can be downloaded from http://www.rstudio.com/ide for Windows XP and higher, MacOS X 10.6 or higher, and several Linux flavors. The desktop version of RStudio can be installed easily by clicking on the link for your platform and following the instructions. We strongly recommend that you check www.rstudio.com once in a while for new updates. Alternatively, you can check for updates from RStudio by clicking on Help | Check for updates.

Installing RStudio Server

RStudio Server is currently only available for Linux-based systems. Before you install it you need to have R installed, as described in the previous paragraph.

  1. Go to http://www.rstudio.com/ide/download/server and follow the instructions there to download and install the RStudio server. Once RStudio is installed, you can run it by typing the following:

    sudo rstudio-server start
    
  2. To log on you need to know the server's URL. If you have installed it locally, you can access it by pointing your browser to the following path:

    http://localhost:8787
    

RStudio allows the users of your Linux system to log on with their standard password and username, so user management can be done as in Linux.

Installing R packages

One of the most attractive features of R is the abundance of freely available extension packages. The installation of R comes bundled with many important packages, but newly developed statistical methods come readily available in packages. These packages are published on the Comprehensive R Archive Network (CRAN) and can be easily installed in RStudio. To get started, we will install the knitr package, which we'll need in our first session.

One of the tabs in the bottom right-hand side of RStudio is a package panel that allows you to browse the currently installed packages. These packages can be updated by clicking on Check for Updates. RStudio will check what packages have newer versions and will give you the option to select which of these packages should be updated. Alternatively you can use the General menu's Tools | Check for Package Updates.

To install the packages click on the Packages tab in the bottom right-hand side panel. Each tab has its own menu items at the top of the panel. Click on the Install button to start the installation. The pop-up menu that appears allows you to choose either a CRAN server or a local repository. If you have Internet access, choose a mirror somewhere near you. Next, type the first letters of the package you wish to install. Here, we will install the knitr package. When typing, RStudio will show suggestions of packages with similar names. Choose knitr and hit Enter. RStudio generates the command that installs the package, copies it to the console, and executes it.

To load the package, scroll down the window with installed packages and check it. The package is now loaded.

Tip

Trying to update a package that is currently loaded may fail. The easiest solution is to close and restart RStudio and update again without the package being loaded.

 

Overview: A first R session


Now we have R and Rstudio installed we can start our first R session from within RStudio. It is a good practice to use an RStudio project for all your data analysis with R, for reasons we will encounter later in this book.

We create an R project using the menu Project | New Project. Choose New Directory and name the project file Abalone.

Note

In this session, we download and manipulate the abalone file. This file will be used in examples throughout the book.

Abalones are a very common type of edible sea snail (sometimes called sea ear) occurring in waters around the world. The data in the file used in this book was compiled and published by Warwick J. Nash, Tracy L. Sellers, Simon R. Talbot, Andrew J. Cawthorn, and Wes B. Ford in 1994 [Sea fisheries division Technical Report No. 48 (ISSN 1034-3288)]. It was generously donated to the UCI machine learning repository in 1995.

If you are a beginner in R programming, the RStudio menus facilitate many R commands. When you click on a menu item, RStudio generates and executes the corresponding R commands in the console window. It is a good (and a reproducible!) practice to put your R code in script files as much as possible; but for now we will use some menu commands.

Select Workspace | Import DataSet | From Web URL.

RStudio (and R) can import text files from the disk and over the Internet as well, as shown in the following example:

Type (or paste) the following URL: http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data.

RStudio downloads the file and shows the Import Dataset dialog:

The top left-hand side shows the name (abalone) of the resulting data.frame. On the bottom left-hand side are the settings for reading the data file that RStudio deduced from the data file. You can alter these; however, in this example they are fine. On the top right-hand side RStudio shows the first 25 lines of the data file. On the bottom right-hand side it shows the first 25 records of the resulting data.frame. Click on the Import button.

RStudio imports the data and creates a data.frame with the name abalone using the R command read.table and the options that you have set in the Import DataSet dialog. Also, it automatically runs View(abalone), which shows the data we just imported. Notice that the Workspace panel on the right-hand side now contains the variable abalone. Also, notice that the column names of the data are missing, so we need to add them.

In the console panel we type the following:

names(abalone) <- c("Sex","Length","Diameter","Height","Whole weight"
                    ,"Shucked weight","Viscera weight","Shell weight"
                    ,"Rings")
write.csv(abalone, "abalone.csv", row.names=FALSE)

This sets the correct names for the data set and stores the data in your project directory, so you don't have to download it again. This data file is part of your compendium.

We will start our first data analysis within RStudio with an R script.

Follow the next few steps in order to start the data analysis:

  1. Create a new R script by navigating to File | New | R script (Ctrl+Shift+N or Command+Shift+N) and type the following:

    abalone <- read.csv("abalone.csv")
    table(abalone$Sex)
    plot(Length ~ Sex, data=abalone)

    These commands load the data, calculate the gender frequencies in the data, and plot a box plot of Length by Sex for abalone.

  2. Save your R script as abalone.R using File | Save (Ctrl+S or Command+S).

  3. Execute your R script with Ctrl+Shift+Enter or Command+Shift+Return.

Et voila! We have run a small R script from within RStudio. Notice that the panel on the bottom right-hand side shows the plot that we have created.

But we can do better than that. If you did not follow the previous instructions to install knitr, now is the time to do it after all. You may also install it by typing install.packages("knitr") in the console.

  1. Choose File | Compile Notebook.

  2. Close the Abalone project with Project | Close Project. Choose Save.

    We have now a new empty RStudio session.

  3. Open your newly created an Abalone project by navigating to Project | Recent Projects | Abalone.

Your environment is restored, including all the commands that you typed, thanks to R and RStudio.

Keyboard shortcuts

Besides the standard keyboard shortcuts that you likely use in everyday computer use (cut-copy-paste, or to undo an activity), RStudio supports many keyboard shortcuts specifically for R code editing, execution, and more. Although you are unlikely to learn or use all of them, it is useful to get used to at least a few. We will highlight a few of the most useful keyboard shortcuts in every chapter.

Panel

Windows & Linux

Mac

Description

Source, console

Tab or Ctrl+space bar

Tab or Command+space bar

Command completion.

Source

Ctrl+Enter

Command+Return

Run current line or selection.

Source

Ctrl+Shift+Enter

Command+Shift+Return

Source with echo (run whole file).

Any

Ctrl+1

Command+1

Move cursor to source editor.

Any

Ctrl+2

Command+2

Move cursor to console.

Getting help

If you run into trouble with RStudio, there are several ways to get help online.

  • The developers of RStudio have shown to be amazingly responsive on the help forum at http://support.rstudio.org/. There are many people using R and RStudio, so chances are that someone has already posted the same question somewhere and had it answered. So, before posting a question, make sure to take a look at the troubleshooting guide at RStudio's support page.

  • Search whether your question has been answered before in the FAQs or the forum.

  • Google your question. It may have been answered on another Q&A forum, such as stack exchange.

When you post a question, it helps a lot to include a small example that reproduces your problem. Also, you may want to attach the output of R's sessionInfo() command to show in what context the problem occurred. Finally, it can be helpful if you attach RStudio's logfile. You can find the folder where it is stored by opening Help>Diagnostics>Show log files. If RStudio fails to start, you can find it in the following place folder:

Operating systems

Folder paths

Windows XP

%USERPROFILE%\Local Settings\Application Data\RStudio-Desktop\log

Windows Vista, 7

%localappdata%\RStudio-Desktop\log

Linux, Max OS x

~/.rstudio-desktop/log/

What if I uninstall RStudio?

Although you may find this hard to believe, this is absolutely no problem. Each RStudio project is just a folder, containing your scripts, reports, and data in their original form. Additionally there is a .proj file that holds some session information for RStudio and possibly an .Rdata file. So even if you wish to uninstall RStudio, your work is as accessible as before. You can still re-open your last-closed R session by starting the default Rgui and opening the .Rdata file in that folder. Scripts are stored as simple text files.

It is important to note that RStudio does not alter the storage format of your data in any way. In contrast, many proprietary products force you to import your data and store it in some binary format that cannot be opened with other products.

Further reading

The paper Statistical Analyses and Reproducible Research by Robert Gentleman and Duncan Temple Lang offers a thorough description of methods for reproducible research. It can be downloaded for free from http://biostats.bepress.com/bioconductor/paper2/. There are many books for learning about R, a lot of which are dedicated to specific subjects. Two recent books that discuss R in general that have quickly gained popularity are R in a Nutshell by Joseph Adler, 2010, O'Reilley, and The Art of R programming by Norman Matloff, 2011, No Starch Press, Inc. The former book discusses R as a language as well as many statistical features while the latter thoroughly discusses R as a programming language. Two books focusing on general statistics with R are worth mentioning here as well. The first is Introductory Statistics with R (2nd ed. 2008, Springer) by Peter Dalgaard. The second is Introductory Probability and Statistics Using R by G. Jay Kerns. The latter book is developed as an open source project and can be downloaded from http://ipsur.org/.

To keep up-to-date information on what happens in the R community, we highly recommend frequent visits to Tal Galili's r-bloggers.com. This website collects a large amount of R related blogs in a convenient newspaper-like layout. Subscribing with an RSS reader for smartphone or PC is also possible.

Summary

In this chapter we emphasized the importance of making your analyses reproducible and introduced the concepts of reproducible research and the compendium. How to install R and RStudio in several environments was shown. RStudio supports the concept of a compendium through projects, and if you followed the first session carefully, you have learned to read, alter, and store a simple CSV file, perform some simple analyses, and make a simple plot and generate an HTML report automatically that you can share with your coworkers.

In the next chapter we will take a deeper dive into writing scripts with RStudio.

About the Authors

  • Mark P.J. van der Loo

    Mark van der Loo obtained his PhD at the Institute for Theoretical Chemistry at the University of Nijmegen (The Netherlands). Since 2007 he has worked at the statistical methodology department of the Dutch official statistics office (Statistics Netherlands). His research interests include automated data cleaning methods and statistical computing. At Statistics Netherlands he is responsible for the local R center of expertise, which supports and educates users on statistical computing with R. Mark has been teaching R for several years and coauthored a number of R packages that are available via CRAN: editrules, deducorrect, rspa, and extremevalues. A list of publications can be found via http://www.markvanderloo.eu.

    Browse publications by this author
  • Edwin de Jonge

    Edwin de Jonge has worked for more than 15 years at the Dutch official statistics office (Statistics Netherlands). With a background in theoretical and computational solid state physics (MSc), he started in the statistical computing department. Currently he works in the statistical methodology department. His research interests include data visualization, data analysis, and statistical computing. He trained over 150 people in a workshop entitled “Graphical Analysis with R”. Edwin has coauthored several R packages that are available via CRAN: tabplot, tabplotd3, ffbase, whisker, editrules, and deducorrect.

    Browse publications by this author

Latest Reviews

(4 reviews total)
Feels out of date or training is based on a different OS than Windows. Or there are steps that aren't spelled out and are assumed. I only began, and had simple issues starting.
Useful follow up to study of R.
This book got me off to a good start and I came to rapidly appreciate the features in RStudio in composing my code but also in applying a structure that makes it easier to pickup from where you left off. Highly recommended.
Book Title
Unlock this book and the full library for FREE
Start free trial