This chapter shows how to obtain R and RStudio. An introduction to the concepts of reproducible research will be given. We will first show a simple RStudio session that already results in a simple, fully reproducible report. If you have ever had to analyze data for work, study, or a research project you'd have probably run into a situation where you ended up with a messy kludge of temporary files, scripts, and intermediate results that are almost impossible to untangle. If this sounds familiar, you probably also had to rewrite pieces of your report while debugging your analyses, or when receiving updates of your data sets. Re-running calculations, and re-inserting figures, tables, and results can take a lot of time. Moreover, as a project turns more and more into a spaghetti of files and folders, reproducing exactly what you did becomes harder and harder. Needless to say, things can become even more difficult when collaborating with a number of people on such projects.
RStudio™ is a free and open source tool that makes it easier for you to do the following:
Work with R and R's graphics interactively
Organize your code and maintain multiple projects
Make your research reproducible
Maintain the packages in your R installation
Create and share your reports
Share your code and collaborate with other users
RStudio runs on all the major operating systems, including Windows, Linux, and Mac OS X. Additionally, it can be used to run R on a remote web server. In that case, RStudio's interface will run in your browser.
This book is aimed at beginning and moderate R users who want to get the most out of R and RStudio. In the coming chapters we will cover most of RStudio's features, and emphasize some best practices in statistical data analyses. A few words about R: R is a free software tool for statistical analyses comprised of the R programming language and the R environment. Here, free means not only free of charge (as in free beer) but also free as in freedom. That is, you are allowed to download and use R, inspect or alter its source code, and redistribute it as you like. Note that this freedom is in fact a requirement to perform truly reproducible research, as it allows one, in principle, to check exactly how data is processed in a certain project, down to R's source code itself.
R is distributed via the Comprehensive R Archive Network, a network of servers around the world from where you can download R and its extension packages. You can access it via www.r-project.org. There are a few other sites offering extension package repositories; the most noteworthy are bioconductor (www.bioconductor.org) and the Omega project for statistical computing(www.omegahat.org).
The R environment is a so-called repl
, which stands for a read-evaluate-print loop. That is, it offers a text-based interface where you can enter R commands. After a command is entered, the R engine processes it (evaluation) and possibly prints a result to the screen. Alternatively (and more commonly), the commands can be stored in a text file to be run by R.
Users who are accustomed to point-and-click interfaces for using statistical functionality may find the first encounter with such an interface daunting, and to be honest, the learning curve for R can be steep at times. However, in order to make work reproducible, it is unavoidable to store the steps of your analyses as source code. Moreover, being a true programming language makes R a much more versatile and powerful tool than any point-and-click software that only offers a predefined functionality.
Fortunately for us, writing code is nothing new and over the past decades, many good ideas have been developed in the software industry to make coding and code management a lot easier. RStudio implements many of those ideas for R users. Important tips for your maintaining of your R installation are mentioned as follows:
Always use the latest, stable version. This is the version likely to have the least bugs in the older functionality. You can read about the latest features by reading the news file, for example by running
View(news())
from the R command line. See the Installing R section for an easier way to install R.Frequently update your installed packages. This is simply done by running the
update.packages()
command from your R console.
Like R, RStudio is a free and open source project. Founded by JJ Allaire, RStudio is also a company that sells services related to their open source product, such as consulting and training.
RStudio is an Integrated Development Environment (IDE) for R. The term IDE comes from the software industry and refers to a tool that makes it easy to develop applications in one or more programming languages. Typical IDEs offer tools to easily write and document code, compile and perform tests, and offer integration with a version control tool.

RStudio integrates the R environment, a highly advanced text editor, R's help system, version control, and much more into a single application. RStudio does not perform any statistical operations; it only makes it easier for you to perform such operations with R. Most importantly, RStudio offers many facilities that make working reproducibly a lot easier.
The following table gives an overview of some of the most important features of RStudio that you will learn to use with this book:
Readers with some programming experience might wonder why a feature such as debugging support is not in the list. The answer is that it is just not there yet. RStudio is continuously being improved and updated, and according to the forums at RStudio's web pages, support for debugging is certainly on the to-do list of the makers.
Before you install RStudio, you need to install R. It is possible to have multiple versions of R installed side by side. RStudio will use the latest version by default, but can be configured to use a different installed version.
RStudio needs at least R version 2.11, but we highly recommend you to install the latest version.
To download and install R, point your browser to www.r-project.org, click on Download R (in the text underneath the graphics), and choose a server near where you are. From there, follow the instructions in the Download and install R box. Alternatively, use the Download R! button at www.inside-R.org. This website automatically offers you the most recent R version fitting your computer and operating system.
Automatic R installation is supported for several popular Linux flavors, including Debian, OpenSuse, and Ubuntu.
For OpenSuse, the default installation can be obtained by pointing your web browser to http://software.opensuse.org/search, search for r-base
, and install from there. At the moment, the newest R version is available from there.
The R version offered by the package installer is frozen when the operating system is released. We assume that you are familiar enough with tools such as Synaptic
or aptitude
in order to install the R version that comes with those operating systems. Here, we provide some details on how to install the latest R version on Ubuntu or Debian.
CRAN hosts Debian and Ubuntu repositories, which are as follows:
Add the repository for Ubuntu 12.04 (precise pagnolin) by adding (as root) the following line to your
/etc/apt/sources.list
file:deb http://<your_nearest_cran_mirror>/bin/linux/ubuntu precise/
Replace
<your_nearest_cran_mirror>
with a server near where you live. A list of mirrors can be found at http://cran.r-project.org/mirrors.html. Next, register the security key by typing the following:sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9
Type the following commands to install the
R.sudo
apt-get update:sudo apt-get install r-base
Alternatively you can install the latest R now via Synaptic. For Debian 6.05 (squeeze), the line to add to your /etc/apt/sources.list
file is deb http://<your_nearest_cran_mirror>/bin/linux/debian squeeze-cran/
.
The security key is installed with the following command:
sudo apt-key adv --keyserver subkeys.pgp.net --recv-keys 381BA480
After this, installation proceeds as in Ubuntu.
If you wish, you can download the source code R and compile the executables yourself. This is really only for an expert user, so to paraphrase r-project.org
: "if you are not sure what compiling means, you most probably do not want to do this".
To make sure that RStudio can talk with the compiled binaries, you need to configure the Makefile
using the --enable-R-shlib
flag. So after downloading and unpacking the source tarball, change the directory to R2.XX.X
, and type the following commands:
./configure --enable-R-shlib ma ke make install
Most Windows users will use the default installer, but if you want to you can compile R under Windows. You need to download the latest version of RTools (http://cran.r-project.org/bin/windows/Rtools) and follow the instructions on the Rtools web page.
The desktop version of RStudio can be downloaded from http://www.rstudio.com/ide for Windows XP and higher, MacOS X 10.6 or higher, and several Linux flavors. The desktop version of RStudio can be installed easily by clicking on the link for your platform and following the instructions. We strongly recommend that you check www.rstudio.com once in a while for new updates. Alternatively, you can check for updates from RStudio by clicking on Help | Check for updates.
RStudio Server is currently only available for Linux-based systems. Before you install it you need to have R installed, as described in the previous paragraph.
Go to http://www.rstudio.com/ide/download/server and follow the instructions there to download and install the RStudio server. Once RStudio is installed, you can run it by typing the following:
sudo rstudio-server start
To log on you need to know the server's URL. If you have installed it locally, you can access it by pointing your browser to the following path:
http://localhost:8787
RStudio allows the users of your Linux system to log on with their standard password and username, so user management can be done as in Linux.
One of the most attractive features of
R is the abundance of freely available extension packages. The installation of R comes bundled with many important packages, but newly developed statistical methods come readily available in packages. These packages are published on the
Comprehensive R Archive Network (CRAN) and can be easily installed in RStudio. To get started, we will install the knitr
package, which we'll need in our first session.
One of the tabs in the bottom right-hand side of RStudio is a package panel that allows you to browse the currently installed packages. These packages can be updated by clicking on Check for Updates. RStudio will check what packages have newer versions and will give you the option to select which of these packages should be updated. Alternatively you can use the General menu's Tools | Check for Package Updates.
To install the packages click on the Packages tab in the bottom right-hand side panel. Each tab has its own menu items at the top of the panel. Click on the Install button to start the installation. The pop-up menu that appears allows you to choose either a CRAN server or a local repository. If you have Internet access, choose a mirror somewhere near you. Next, type the first letters of the package you wish to
install. Here, we will install the knitr
package. When typing, RStudio will show suggestions of packages with similar names. Choose knitr
and hit Enter. RStudio generates the command that installs the package, copies it to the console, and executes it.

To load the package, scroll down the window with installed packages and check it. The package is now loaded.
Now we have R and Rstudio installed we can start our first R session from within RStudio. It is a good practice to use an RStudio project for all your data analysis with R, for reasons we will encounter later in this book.
We create an R project using the menu Project | New Project. Choose New Directory and name the project file Abalone
.

Note
In this session, we download and manipulate the abalone
file. This file will be used in examples throughout the book.
Abalones are a very common type of edible sea snail (sometimes called sea ear) occurring in waters around the world. The data in the file used in this book was compiled and published by Warwick J. Nash, Tracy L. Sellers, Simon R. Talbot, Andrew J. Cawthorn, and Wes B. Ford in 1994 [Sea fisheries division Technical Report No. 48 (ISSN 1034-3288)]. It was generously donated to the UCI machine learning repository in 1995.
If you are a beginner in R programming, the RStudio menus facilitate many R commands. When you click on a menu item, RStudio generates and executes the corresponding R commands in the console window. It is a good (and a reproducible!) practice to put your R code in script files as much as possible; but for now we will use some menu commands.
Select Workspace | Import DataSet | From Web URL.
RStudio (and R) can import text files from the disk and over the Internet as well, as shown in the following example:
Type (or paste) the following URL: http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data.
RStudio downloads the file and shows the Import Dataset dialog:

The top left-hand side shows the name (abalone) of the resulting data.frame
. On the bottom left-hand side are the settings for reading the data file that RStudio deduced from the data file. You can alter these; however, in this example they are fine. On the top right-hand side RStudio shows the first 25 lines of the data file. On the bottom right-hand side it shows the first 25 records of the resulting data.frame
. Click on the
Import button.
RStudio imports the data and creates a data.frame
with the name abalone
using the R command read.table
and the options that you have set in the Import DataSet dialog. Also, it automatically runs View(abalone)
, which shows the data we just imported. Notice that the Workspace panel on the right-hand side now contains the variable abalone
. Also, notice that the column names of the data are missing, so we need to add them.
In the console panel we type the following:
names(abalone) <- c("Sex","Length","Diameter","Height","Whole weight" ,"Shucked weight","Viscera weight","Shell weight" ,"Rings") write.csv(abalone, "abalone.csv", row.names=FALSE)
This sets the correct names for the data set and stores the data in your project directory, so you don't have to download it again. This data file is part of your compendium.
We will start our first data analysis within RStudio with an R script.

Follow the next few steps in order to start the data analysis:
Create a new R script by navigating to File | New | R script (Ctrl+Shift+N or Command+Shift+N) and type the following:
abalone <- read.csv("abalone.csv") table(abalone$Sex) plot(Length ~ Sex, data=abalone)
These commands load the data, calculate the gender frequencies in the data, and plot a box plot of
Length
bySex
forabalone
.Save your R script as
abalone.R
using File | Save (Ctrl+S or Command+S).Execute your R script with Ctrl+Shift+Enter or Command+Shift+Return.
Et voila! We have run a small R script from within RStudio. Notice that the panel on the bottom right-hand side shows the plot that we have created.
But we can do better than that. If you did not follow the previous instructions to install knitr
, now is the time to do it after all. You may also install it by typing install.packages("knitr")
in the console.
Choose File | Compile Notebook.
Close the Abalone project with Project | Close Project. Choose Save.
We have now a new empty RStudio session.
Open your newly created an Abalone project by navigating to Project | Recent Projects | Abalone.
Your environment is restored, including all the commands that you typed, thanks to R and RStudio.

Besides the standard keyboard shortcuts that you likely use in everyday computer use (cut-copy-paste, or to undo an activity), RStudio supports many keyboard shortcuts specifically for R code editing, execution, and more. Although you are unlikely to learn or use all of them, it is useful to get used to at least a few. We will highlight a few of the most useful keyboard shortcuts in every chapter.
Panel |
Description | ||
---|---|---|---|
Source, console |
Tab or Command+space bar |
Command completion. | |
Source |
Command+Return |
Run current line or selection. | |
Source |
Command+Shift+Return |
Source with echo (run whole file). | |
Any |
Command+1 |
Move cursor to source editor. | |
Any |
Command+2 |
Move cursor to console. |
If you run into trouble with RStudio, there are several ways to get help online.
The developers of RStudio have shown to be amazingly responsive on the help forum at http://support.rstudio.org/. There are many people using R and RStudio, so chances are that someone has already posted the same question somewhere and had it answered. So, before posting a question, make sure to take a look at the troubleshooting guide at RStudio's support page.
Search whether your question has been answered before in the FAQs or the forum.
Google your question. It may have been answered on another Q&A forum, such as stack exchange.
When you post a question, it helps a lot to include a small example that reproduces your problem. Also, you may want to attach the output of R's sessionInfo()
command to show in what context the problem occurred. Finally, it can be helpful if you attach RStudio's logfile. You can find the folder where it is stored by opening Help>Diagnostics>Show log files
. If RStudio fails to start, you can find it in the following place folder:
Operating systems |
Folder paths |
---|---|
Windows XP |
|
Windows Vista, 7 |
|
Linux, Max OS x |
|
Although you may find this hard to believe, this is absolutely no problem. Each RStudio project is just a folder, containing your scripts, reports, and data in their original form. Additionally there is a .proj
file that holds some session information for RStudio and possibly an .Rdata
file. So even if you wish to uninstall RStudio, your work is as accessible as before. You can still re-open your last-closed R session by starting the default Rgui and opening the .Rdata
file in that folder. Scripts are stored as simple text files.
It is important to note that RStudio does not alter the storage format of your data in any way. In contrast, many proprietary products force you to import your data and store it in some binary format that cannot be opened with other products.
The paper Statistical Analyses and Reproducible Research by Robert Gentleman and Duncan Temple Lang offers a thorough description of methods for reproducible research. It can be downloaded for free from http://biostats.bepress.com/bioconductor/paper2/. There are many books for learning about R, a lot of which are dedicated to specific subjects. Two recent books that discuss R in general that have quickly gained popularity are R in a Nutshell by Joseph Adler, 2010, O'Reilley, and The Art of R programming by Norman Matloff, 2011, No Starch Press, Inc. The former book discusses R as a language as well as many statistical features while the latter thoroughly discusses R as a programming language. Two books focusing on general statistics with R are worth mentioning here as well. The first is Introductory Statistics with R (2nd ed. 2008, Springer) by Peter Dalgaard. The second is Introductory Probability and Statistics Using R by G. Jay Kerns. The latter book is developed as an open source project and can be downloaded from http://ipsur.org/.
To keep up-to-date information on what happens in the R community, we highly recommend frequent visits to Tal Galili's r-bloggers.com. This website collects a large amount of R related blogs in a convenient newspaper-like layout. Subscribing with an RSS reader for smartphone or PC is also possible.
In this chapter we emphasized the importance of making your analyses reproducible and introduced the concepts of reproducible research and the compendium. How to install R and RStudio in several environments was shown. RStudio supports the concept of a compendium through projects, and if you followed the first session carefully, you have learned to read, alter, and store a simple CSV file, perform some simple analyses, and make a simple plot and generate an HTML report automatically that you can share with your coworkers.
In the next chapter we will take a deeper dive into writing scripts with RStudio.