About this book

R is a highly flexible and powerful tool for analyzing and visualizing data. Shiny is the perfect companion to R, making it quick and simple to share analysis and graphics from R that users can interact with and query over the Web. Let Shiny do the hard work and spend your time generating content and styling, not writing code to handle user inputs.

Web Application Development with R Using Shiny is an accessible introduction to sharing interactive content from R using Shiny. You will learn by doing, with each chapter including code and examples to use and adapt for your own applications. By the end of this book, you will be able to build useful and engaging web applications with only a few lines of code – no JavaScript required.

Web Application Development with R Using Shiny will show you how to begin analyzing, visualizing, and sharing your data using practical examples.

This book will teach you how to rapidly prototype and build interactive data summaries using Shiny's built-in widgets and functions. You will learn how to integrate Shiny applications with your existing HTML and CSS, how to greatly extend the power and usability of your applications using JavaScript, and how to quickly deploy them over the Web.

The book uses practical examples to show you how to get the best out of R and Shiny, helping you to produce and share cutting-edge analytics with minimal effort.

Publication date:
October 2013
Publisher
Packt
Pages
110
ISBN
9781783284474

 

Chapter 1. Installing R and Shiny and Getting Started!

If you have heard about R, you probably know that it's free and open source and well on its way to becoming a preeminent tool for statisticians and data scientists. You may be aware that there are over 4000 user-contributed packages available for R, which help users with tasks as diverse as computational chemistry, physics, finance, clinical trials, medical imaging, psychometrics, machine learning, statistical methods, and extremely powerful and flexible statistical graphics.

The Shiny package is a free contributed package to R that makes it incredibly easy to deliver interactive data summaries and queries to end users through any modern web browser. Shiny comes with a variety of widgets for rapidly building user interfaces and does all of the heavy lifting in terms of setting up interactive user interfaces. The default styling of a Shiny application is clean and effective, however Shiny is very extensible and it is easy to integrate Shiny applications with your own web content using HTML and CSS. JavaScript and jQuery can also be used to further extend the scope of Shiny applications.

This book will show you how to build your own web interfaces with Shiny, right from starting with R to integrating them with your own websites. In this chapter, we are going to learn the following:

  • Install R, choose an IDE, and have a look at the power and flexibility of R

  • Run some examples within R and learn a bit of the R language

  • Look at resources to help you learn more about R and Shiny

  • Install Shiny, and run and browse the examples

R is a big subject and this is a brief tour. So if you get a little lost along the way, don't worry. This chapter is really all about getting started and helping you recognize some of the languages and data structures you will come across later. You can come back to this chapter once you have got the basics of Shiny and want to start delving a bit deeper; and as you write more and more R code, it will all start to sink in.

 

Installing R


R is available for Windows, OS X, and Linux at http://cran.r-project.org . The source code is also available at the same address. It is also included in many Linux package management systems. Linux users are advised to check before downloading from the web. Details on installing from source or binary for Windows, OS X, and Linux are all available at http://www.cran.r-project.org/doc/manuals/R-admin.html.

The R console

Windows and OS X users can run the R application to launch the R console. Linux and OS X users can also run the R console straight from the terminal by typing R.

In either case, the R console will look as shown in the following screenshot:

R will respond to your commands right from the terminal. Let's have a go:

> 2 + 2
[1] 4

The [1] tells you that R returned one result, in this case, 4:

> print("Hello world!")
[1] "Hello world!"

Multiples of pi:

> 1:10 * pi
[1]  3.141593  6.283185  9.424778 12.566371 15.707963 18.849556
[7] 21.991149 25.132741 28.274334 31.415927

This example illustrates vector-based programming in R. 1:10 generates the numbers 1 to 10 as a vector, and each is then multiplied by pi, returning another vector, the elements each being pi times larger than the original. Operating on vectors is an important part of writing simple and efficient R code. As you can see, R again numbers the values it returns at the console, with the seventh value being 21.99.

Before we leave the console, let's have a quick look at some of the graphics capability within R:

> demo(graphics)

Or:

> demo(persp)

Code editors and IDEs

The Windows and OS X versions of R both come with built-in code editors which allow code to be edited, saved, and sent to the R console. Choice of code editors and IDEs is a highly personal decision and if you are just starting out with R, you would best be advised to try a few before settling on one. Following are some choices in this area, available for all the three platforms except where specified otherwise.

Simple and well-featured

These are ideal for beginners:

  • Notepad ++ with the NppToR plugin (Windows only): This supports code highlighting, execution of blocks of code, and a few other useful features

  • RKWard: This includes data editing, data import, and package management

  • Tinn-R (Windows only): This supports some other languages as well as LaTeX, and includes project management functions

  • RStudio: It is very well-featured (and my personal favorite), with project management and version control (including support for Git), viewing of data and graphics, code-completion, package management, and many other features

Complex and extensible

These are ideal for those who are already using other text editors and IDEs. The following plugins are available for R:

  • Emacs with the Emacs Speaks Statistics plugin: Emacs is favored by many for its level of extensibility and support for, well, everything (programming languages, markup languages, project management, e-mail, and even web browsing)

  • Vim with the Vim-R plugin: Like Emacs, Vim is a highly extensible package which supports many programming and markup languages and is extremely powerful

  • Eclipse with the StatET plugin: It is a very well-featured and extensible IDE for R, Java, HTML, and many others

 

Learning R


There are almost as many uses of R as there are people using it. It is not possible to cover all your specific needs within this book. However, it is likely that you may wish to use R to process, query, and visualize data, such as sales figures, satisfaction surveys, concurrent users, sporting results, or whatever type of data your organization processes. The next chapters will concentrate on Google Analytics data downloaded from the Application Programming Interface (API), but for now, let's just have a look at the basics.

Getting help

There are many books and online materials covering all the aspects of R. The name R can make it difficult to come up with useful web-search hits (substituting CRAN for R can sometimes help); nonetheless, searching for R tutorial does give useful results. Some useful resources include the following:

An excellent introduction to the syntax and data structures in R can be found at http://goo.gl/M0RQ5z.

You can watch videos on using R from Google at http://goo.gl/A3uRsh.

Quick-R provides a lot of useful code and examples that can be found at http://www.statmethods.net/.

At the R console, typing ? followed by the function name (for example, ?help) brings up help materials, and the command ??help will bring up a list of potentially relevant functions from the installed packages.

Subscribing to and asking questions on the R-help mailing list at http://www.r-project.org/mail.html allows you to communicate with some of the leading figures in the R community as well as many other talented enthusiasts. Do read the posting guide and research your question before you ask any questions because contributors to the list are often busy and can be unforgiving of poor questions.

There are two Stack Exchange communities which can provide further help that can be accessed at http://stats.stackexchange.com/ (for questions on statistics and visualization with R) and http://stackoverflow.com/ (for questions on programming with R).

Loading data

The simplest way to load data into R is probably using a comma separated value (.csv) spreadsheet file, which can be downloaded from many data sources, and loaded and saved in all spreadsheet software (such as Excel or LibreOffice). The read.table() command imports data of this type by specifying the separator as a comma, or there is a function specifically for .csv files, read.csv():

> analyticsData <- 
      read.table("C:\\Mydocuments\\Data\\Analytics.csv",
      sep = ",")

Or:

> analyticsData <-
    read.csv("C:\\Mydocuments\\Data\\Analytics.csv")

Note that unlike in other languages, R uses <- as well as = for assignment. Assignment can be made the other way using ->. The result of this is that y can be told to hold the value of 4 in this way y <- 4 or like this 4 -> y. There are some other, more advanced, things that can be done with assignment in R, but don't worry about them now. Just write code using the assignment operator as shown in the previous example and you'll be just like the natives that you come across on forums and blog posts.

Either of the previous code examples will assign the contents of the Analytics.csv file to a dataframe called analyticsData, with the first row of the spreadsheet providing the variable names. A dataframe is a special type of object in R which is designed to be useful for the storage and analysis of data.

Dataframes, lists, arrays, and matrices

Dataframes have several important features which make them useful for data analysis:

  • Rectangular data structures: In general, the pieces of data will read down the rows (for example, consecutive dates in June) and each variable (for example, unique visitors or time spent on the site) for these cases will read across the columns. A mix of datatypes is supported. A typical dataframe might include variables containing dates, numbers (integer or float), and text.

  • Subsetting and variable extraction can be easily done. R provides a lot of built-in functionality to select rows and variables within a dataframe.

  • Many functions include a data argument which makes it very simple to pass dataframes to functions, and process only those variables and cases that are relevant, which makes for cleaner and simpler code

We can inspect the first few rows of the dataframe using the head(analyticsData) command as shown in the following screenshot:

As you can see, there are four variables within the dataframe: one contains dates, two are integer variables, and the last is a numeric variable. There is more about variable types in R following.

Variables can be extracted from dataframes simply using the $ operator:

> analyticsData$pageViews
 [1] 836 676 940 689 647 899 934 718 776 570 651 816
[13] 731 604 627 946 634 990 994 599 657 642 894 983
[25] 646 540 756 989 965 821

Or using []:

> analyticsData[, "pageViews"]

Note the use of the comma with nothing before it to indicate that all the rows are required. If a subset of rows were required, it could be achieved through the following command line:

> analyticsData[1:10,"pageViews"]
[1] 836 676 940 689 647 899 934 718 776 570

In the same way, leaving a blank space after the comma returns all the variables:

> analyticsData[1:3,]

Dataframes are a special type of list. Lists can hold many different types of data, including lists. As with many datatypes in R, their elements can be named, which can be very useful for writing code that is easy to understand. Let's make a list of the options for dinner, with drink quantities expressed in milliliters.

In the following example, please note the use of the c() function which is used to produce vectors and lists by giving their elements separated by commas. R will pick an appropriate class for the return value: string for vectors that contain strings, numeric for those that only contain numbers, logical for boolean values, and so on:

> dinnerList <- list("Vegetables" =
    c("Potatoes", "Cabbage", "Carrots"),
    "Dessert" = c("Ice cream", "Apple pie"),
    "Drinks" = c(250, 330, 500)
    )

Indexing is similar to that of dataframes (which are, after all, special instances of a list). They can be indexed by number as shown in the following command lines:

> dinnerList[1:2]
$Vegetables
[1] "Potatoes" "Cabbage"  "Carrots"

$Dessert
[1] "Ice cream" "Apple pie"

This returns a list. Returning an object of the appropriate class is achieved using [[]]:

> dinnerList[[3]]
[1] 250 330 500

In this case, a numeric vector is returned. They can be indexed by name also:

> dinnerList["Drinks"]
$Drinks
[1] 250 330 500

Note that this also returns a list.

Matrices and arrays, unlike dataframes, only hold one type of data and make use of square brackets for indexing. Thus, the command analyticsMatrix[, 3:6] returns all the rows from the third to the sixth column; analyticsMatrix[1, 3] returns just the first row of the third column; and analyticsArray[1, 2, ] returns the first row of the second column across all the elements within the third dimension.

Variable types

R is a dynamically typed language and so you are not required to declare the type of your variables. It is worth knowing, of course, about the different types of variables that you might read or write using R. The different types of variables can be stored in a variety of structures, such as vectors, matrices, and dataframes, although some restrictions apply as detailed previously (for example, matrices must contain only one variable type). Declaring a variable with at least one string will produce a vector of strings (in R, the character datatype):

> c("First", "Third", 4, "Second")
[1] "First"  "Third"  "4"      "Second"

Declaring a variable with just numbers will produce a numeric vector:

> c(15, 10, 20, 11, 0.4, -4)
[1] 15.0 10.0 20.0 11.0  0.4 -4.0

R includes a logical datatype also:

> c(TRUE, FALSE, TRUE, TRUE, FALSE)
[1]  TRUE FALSE  TRUE  TRUE FALSE

A datatype exists for dates as well and is often a problem for beginners:

> as.Date(c("2013/10/24", "2012/12/05", "2011/09/02"))
[1] "2013-10-24" "2012-12-05" "2011-09-02"

The use of the factor datatype tells R of all the possible values of a categorical variable, such as gender or species:

> factor(c("Male", "Female", "Female", "Male", "Male"),
                   levels = c("Female", "Male")
[1] Male   Female Female Male   Male  
Levels: Female Male

Functions

As you grow in confidence with R, you will wish to begin writing your own functions. This is achieved very simply and in a manner quite reminiscent of many other languages. You will undoubtedly wish to read more about writing functions in R in a fuller treatment, but just to give you an idea, here is a function called sumMultiply which adds together x and y and multiplies that value by z:

sumMultiply <- function(x, y, z){
  final = (x+y) * z
  return(final)
}

Objects

There are many special object types within R designed to make it easier to analyze data. Functions in R can be polymorphic, that is, they can respond to different datatypes in different ways in order to produce the output that the user desires. For example, the plot() function in R responds to a wide variety of datatypes and objects, including single dimension vectors (each value of y plotted sequentially) and two dimensional matrices (producing a scatterplot), as well as specialized statistical objects such as regression models and time series data. In the latter case, plots specialized for these purposes are produced.

As with the rest of this introduction, don't worry if you haven't written functions before, or don't understand object concepts and aren't sure what all this means. You can produce great applications without understanding all these things, but as you work more and more with R, you will start wanting to learn in more detail about how R works and how experts produce R code. This introduction is designed to give you a jumping-off point to learn more about how to get the best out of R (and Shiny).

 

Base graphics and ggplot2


There are a lot of user-contributed graphics packages in R that can produce some wonderful graphics. You may wish to have a look for yourself at the CRAN task view that can be found at http://cran.r-project.org/web/views/Graphics.html. We will have a very quick look at two approaches: base graphics, so called because it is the default graphical environment within a vanilla install of R; and ggplot2, a highly popular user-contributed package produced by Hadley Wickham which is a little trickier to master than base graphics but can rapidly produce a wide range of graphical data summaries. We will cover two graphs familiar to all: the bar chart and the line chart.

Bar chart

Useful when comparing quantities across categories, bar charts are very simple to use in base graphics, particularly when combined with the table() command. We will use the mpg dataset which comes with the ggplot2 package; it summarizes different characteristics of a range of cars. First, let's install the ggplot2 package. You can do this straight from the console:

> install.packages("ggplot2")

You can also use the built-in package functions in IDEs, such as RStudio or RKWard. We will need to load the package every time we wish to use this dataset or the ggplot2 package itself. We need to give the following command at the console:

> library(ggplot2)

We will use the table() command to count the number of each type of car featured in the dataset:

> table(mpg$class)

This returns a table object (another special object type within R) that contains a frequency count for each type of car as seen in the following screenshot:

Producing a bar chart of this object is achieved through the following command line:

> barplot(table(mpg$class), main = "Base graphics")

The barplot( ) function takes a vector of frequencies. When they are named, as in the previous example (the table() command returns the named frequencies in the table form), the names are automatically included on the x-axis. The defaults for this graph are rather plain. Explore ?barplot and ?par to learn more about fine-tuning your graphics.

We have already loaded the ggplot2 package in order to use the mpg dataset, but if you have shut down R in between these two examples, you will need to reload it by the following command line:

> library(ggplot2)

The same graph is produced in ggplot2 in the following way:

> ggplot(data = mpg, aes(x = class)) + geom_bar() +
    ggtitle("ggplot2")

This ggplot call shows the three fundamental elements of ggplot calls: the use of a dataframe (data = mpg), the setting up of aesthetics (aes(x = class)) which determines how variables are mapped onto axes, colors, and other visual features; and the use of + geom_xxx(). A ggplot call sets up the data and aesthetics, but does not plot anything. Functions such as geom_bar() (there are many others, see ??geom) tell ggplot what type of a graph to plot, as well as taking optional arguments, for example, geom_bar() optionally takes a position argument which defines whether the bars should be stacked, offset, or stretched to a common height to show proportions instead of frequencies.

These elements are the key to the power and flexibility that ggplot2 offers. Once the data structure is defined, ways of visualizing it can be added and taken away easily, not only in terms of the type of graphic (bar, line, scatter) but also the scales and co-ordinate system (log10, polar co-ordinates), and statistical transformations (smoothing data, summarizing over spatial co-ordinates). The appearance of plots can be easily changed with pre-set and user-defined themes, and multiple plots can be added in layers (that is, adding to one plot) or facets (that is, drawing multiple plots with one function call).

Line chart

Line charts are most often used to indicate change, particularly over a period of time. This time we will use the longley dataset, featuring economic variables between 1947 and 1962:

> plot(x = 1947 : 1962, y = longley$GNP, type = "l",
         xlab = "Year", main = "Base graphics")

The x axis is given by 1947 : 1962, which enumerates all the numbers between 1947 and 1962, and the type = "l" argument specifies the plotting of lines. For other graphs, you may prefer to specify p for just drawing each individual datapoint, or b for drawing both datapoints and lines.

The ggplot call looks a lot like it did in the case of the bar chart except with an x and y dimension in the aesthetics this time:

> ggplot(longley, aes(x = 1947 : 1962, y = GNP)) + geom_line() +
           xlab("Year") + ggtitle("ggplot2")

Base graphics and ggplot versions of the bar chart are shown in the following screenshot for the purpose of comparison:

 

Installing Shiny and running the examples


RKWard, RStudio, and other GUIs include package management functions which can be used to install Shiny, or else it can be very easily installed by typing install.packages("shiny") at the console.

Let's run some of the examples:

> library(shiny)
> runExample("01_hello")

Your web browser should launch and display the following:

The previous graph shows the frequency of a set of random numbers drawn from a statistical distribution known as the normal distribution, and the slider allows users to select the size of the draw from 0 to 1000. You will notice that when you move the slider, the graph gets updated automatically. This is a fundamental feature of Shiny, which makes use of a reactive programming paradigm. Put simply, this is a type of programming which uses reactive expressions that keep track of the values on which they are based that can change (known as reactive values) and update themselves whenever any of their reactive values change. So, in this example, the function that generates the random data and draws the graph is a reactive expression, and the number of random draws which it makes is a reactive value on which the expression depends. Thus whenever the number of draws changes, the function re-executes.

You can find more information on this example, as well as a comprehensive tutorial for Shiny at http://rstudio.github.io/shiny/tutorial/.

Notice the layout and style of the web page. Shiny is based on the twitter bootstrap theme by default. However, you are not limited by the styling at all and can build the whole UI using a mix of HTML, CSS, and Shiny code.

Let's look at an interface made with bare-bones HTML and Shiny. Note that in this and all the subsequent examples, we're going to assume that you run library(shiny) at the beginning of each session. You don't have to run it before each example but just at the beginning of each R session. So, if you have closed R and come back, do run it at the console. If you can't remember whether you have already done so, run it again to be sure; it won't do any harm:

> runExample("08_html")

And here it is in all its customizable glory:

This time there are a few different statistical distributions to pick from, and a different method for selecting the number of observations. By now, you should be looking at the web page and imagining all the possibilities that exist to produce your own interactive data summaries and styling them just how you want, quickly and simply. By the end of the next chapter, you will have made your own application with the default UI, and by the end of the book, you will have gained complete control over the styling and be pondering about where else you can go.

There are a lot of other examples included within the Shiny library. Just type runExample() at the console to be provided with the list.

To see some really powerful and well-featured Shiny applications, have a look at the showcase available at http://www.rstudio.com/shiny/showcase/.

 

Summary


In this chapter, we learned how to install R and explored the different options for GUIs and IDEs, and looked at some examples of the graphical power of R. We also learned a little about the data structures of R and looked at some basic visualization code. Finally, we installed Shiny, ran the examples included in the package, and got introduced to a couple of basic concepts within Shiny.

In the next chapter, we will go on to build our own Shiny application using the default UI.

About the Author

  • Chris Beeley

    Chris Beeley has been using R and other open source software for ten years to better capture, analyze, and visualize data in the healthcare sector in the UK. He is the author of Web Application Development with R Using Shiny. He works full-time, developing software to store, collate, and present questionnaire data using open technologies (MySQL, PHP, R, and Shiny), with a particular emphasis on using the web and Shiny to produce simple and attractive data summaries. Chris is working hard to increase the use of R and Shiny, both within his own organization and throughout the rest of the healthcare sector, as well to enable his organization to better use a variety of other data science tools. Chris has also delivered talks about Shiny all over the country.

    Browse publications by this author

Latest Reviews

(1 reviews total)
Very good book, recommended