Home

Data

R Data Visualization Recipes

By Vitor Bianchi Lanzetta

Book

eBook $25.99 $17.99

Print $32.99

Subscription $15.99 $10 p/m for three months

BUY NOW

$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

What do you get with a Packt Subscription?

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

eBook $25.99 $17.99

Print $32.99

Subscription $15.99 $10 p/m for three months

What do you get with a Packt Subscription?

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

About this book

R is an open source language for data analysis and graphics that allows users to load various packages for effective and better data interpretation. Its popularity has soared in recent years because of its powerful capabilities when it comes to turning different kinds of data into intuitive visualization solutions. This book is an update to our earlier R data visualization cookbook with 100 percent fresh content and covering all the cutting edge R data visualization tools. This book is packed with practical recipes, designed to provide you with all the guidance needed to get to grips with data visualization using R. It starts off with the basics of ggplot2, ggvis, and plotly visualization packages, along with an introduction to creating maps and customizing them, before progressively taking you through various ggplot2 extensions, such as ggforce, ggrepel, and gganimate. Using real-world datasets, you will analyze and visualize your data as histograms, bar graphs, and scatterplots, and customize your plots with various themes and coloring options. The book also covers advanced visualization aspects such as creating interactive dashboards using Shiny By the end of the book, you will be equipped with key techniques to create impressive data visualizations with professional efficiency and precision.

Publication date:: November 2017
Publisher: Packt
Pages: 366
ISBN: 9781788398312
Download code from GitHub

Chapter 1. Installation and Introduction

Following recipes are covered in this chapter:

Installing and loading graphics packages
Using ggplot2, plotly, and ggvis
Making plots using primitives

Introduction

R is a free open language and environment for statistical computing and graphics. It particularly gained wide popularity among scientists from different fields, journalists, and private companies. There are various reasons for that, openness and gratuity may be couple of them. Also, R requires minimal programming background and has a vibrant online community.

From community, a bunch of useful graphical packages had come. This chapter covers basic aspects of three of them: ggplot2, plotly, and ggvis. The first one (ggplot2) has been there for a long time, is very mature, and is very useful to build non-interactive graphics.

Both plotly and ggvis are much younger packages, which can build interactive plots. Both are shiny compatible and can well address the matter of web applications. Beginning with installation and loading, this chapter goes all the way through explaining the basic framework of all those three packages, while demonstrating how to use ggplot2 primitives.

Installing and loading graphics packages

Before starting, there are some habits you may want to cultivate in order to keep improving your R skills. First of all, whenever you program there may be some challenges to face. Usually those are tackled either by out-thinking the problem or by doing some research. You might want to remember what the problem was about and the solution, be that for times you face it again later or even for studying hours, keep a record of problems and solutions.

Note

Speaking for me, making a library-like folder and gathering some commented examples on problems and resolutions was, and still is, of great help. Naming files properly and taking good use of comments (# are used to assign comments with R) makes the revision much easier.

R Markdowndocuments are pretty useful if want to keep a track of your own development and optionally publish for others to see. Publishing the learning process is a good way to self-promote. Also, keep in mind that R is a programming language and often those can correctly pull a problem out in more than one way, be open-minded to seek different solutions.

First things first, in order to make good use of a package, you need to install the package and know how to call a package's function.

Note

If your R Session is running for a long time, there is a good chance that a bunch of packages are already loaded. Before installing or updating a package it's a good practice to restart R so that the installation won't mess with related loaded packages.

How to do it...

Run the following code to install the graphics packages properly:

> install.packages(c('devtools','plotly','ggvis'))
> devtools::install_github('hadley/ggplot2')

How it works...

Most of the book covers three graphic packages—ggplot2, plotly, and ggvis. In order to install a new package, you can type the function install.packages() into the console. That function works for packages available at CRAN-like repositories and local files. In order to install packages from local files, you need to name more than just the first argument. Entering ?install.packages into RStudio console shall lead you to the function documentation at the Helptab.

Instants after running the recipe, all the packages (devtools included) covered in this chapter might already be properly installed. Check the Packagestab in your RStudio application (speed up the search by typing into the search engine); if everything went fine, these four may be shown under UserLibrary. Following image shows how it might look like:

Figure 1.1 - RStudio package window (bottom right corner).

If it fails, you may want to check the spelling and the internet connection. This function also gives some outputs that stand for warnings, progress reports, and results. Look for a message similar to package '<Package Name>' successfully unpacked and MD5 sums checked to make sure that all went fine. Checking the output is a good practice in order to know if the plan worked. It also give good clues about troubleshooting.

You may want to call a non-existing package (be creative here) and a package already installed and see what happens. Sometimes incompatibilities avoid proper download and installation.For example, missing Java or the proper architecture of Java may prevent you from installing the rJava package.

Realize that a package's name must be in the string format in order to work (remember to use ' '). It's also important to check the spelling. The function (calling and arguments) is case sensitive; if you miss even one letter or case, you will not find the desired package. Also note that the arguments where drew into a c() function. That is a vector (try ?c in the console).

Note

?sign is actually a function that comes along base package called utils. Typing ?<function name> will always lead you to documentation whenever there is one to display. All functions coming from CRAN packages, base R and maybe the majority of GitHub ones have related documentation files, yet, if it's not base R do not forget to have the respective package already loaded. Alternatively you can also make calls like this: ?<package name>::<function name>.

As first argument of the install.packages() function, a vector of strings was given. That said, multiple packages can be downloaded and installed simultaneously. The same function might not install only the packages asked, but all the packages each of them rely on.

Note

Once the packages are installed, you have a bunch of new functions at your disposal. In order to get to know these functions, you can seek the packages' documentation online. Usually, the documentations can be found at repositories (CRAN, GitHub, and so on).

Now with a bunch of new functions at hand, the next step is to call a function from a specific package. There are several ways of doing that. One possible way to do it is typing <package name>::<package function>, latest code block done that when called install_github(), a function from coming from devtools package, so it was called this way: devtools::install_github().

There are pros and cons about calling a function this way. As for pros, you mostly avoid any name conflict that could possible happen between packages. Other than that, you also avoid loading the whole package when you only need to call a single function. Thus, calling a function this way may be useful in two occasions:

Name conflict is expected
Only few functions from that package may be requested and only a few times

Otherwise, if a package is required many times, typing <package name>:: before every function is anti-productive. It's possible to load and attach the whole package at once. Via RStudio interface, right below the window that shows environment objects, there is a window with a package tab. Below the package tab it's possible to check the box in order to load a package and uncheck to detach them.

Try to detach ggplot2 by unchecking the box; keep an eye on that box. You can load packages using functions. The require() and library() functions can be assigned to this task. Both don't need ' ' in order to function well like install.packages() does, but if you call the package name as a string it stills works. Note that both functions can only load one package a time.

Although require() and library() work in a very similar way, they do not work exactly the same. If require() fails it throws a warning, library() on the other hand will trow an error. There is more, require() returns a logical value that stands for TRUE when the load succeeds and FALSE when it fails; library() returns no value.

For common loading procedures that is not a difference that should made into account, but if you want to create a function or loop that depends on loading a package and checking if it succeed, you may find easier to make it using require(). Using the logical operator & (and), it's possible to load all three packages at once and store the resultin a single variable. Calling this variable will state TRUE if there is success for all and FALSE if a single one fails. This is done as follows:

> lcheck <- require(ggplot2) & require(plotly) & require(ggvis)
> lcheck

Note

lcheck won't tell you which and how many packages failed. Try assigning c( require(ggplot2), require(plotly), reqruire(ggvis)) instead. Each element returning a FALSE is the package that is giving you trouble; this means better chances at troubleshooting.

For now you might be able to install R packages - from CRAN, Git repositories or local files - load and call a functions from an specific package. Now that you are familiar with R package's installation and loading procedures, the next section gives an introduction to the ggplot2 package framework.

There's more

Installation is also possible via RStudio features, which may seen more user friendly for newcomers. Open your RStudio, go to Tools > Install Packages..., type the packages' names (separate them with space or comma), and hit install. It fills the install.package() function and shows it in your console.

This is most indicated when you are not absolutely sure about the package name, but have a good clue. There is automatic suggestion thing that shall help you out to figure exactly what the package name is. You can also install packages from local files by using this feature. Look for an option called Install from and switch it to Package Archive File instead of Repository.

RStudios also gives you a Check For Packages Updates... option right below Install Packages... Hit it once in a while to make sure your packages are properly updated. Along with the packages to be updated it also shows what is new about them.

Using ggplot2, plotly, and ggvis

ggplot2, ggvis, and plotly have proven to be very useful graphical packages in the R universe. Each of them gained a respectful sum of popularity among R users, being recalled for the several graphical tasks each of them can handle in very elegant manners.

The purpose of this section is to give a brief introduction on the general framework of ggplot2 via some basic examples, and relate how to tackle similar quests using ggvis and plotly. Along the way, some pros and cons from each package will be highlighted.

Note

Whenever you need to choose between some packages (and base R), it's important to balance the tasks each one were designed to handle, the amount of work it will require for you to achieve your goal (learning time included), and the time you actually have. It's also good to consider scale gains in future uses. For example, mastering ggplot2 may not seem a smart choice for a single time task but might pay-off if you're expecting lots of graphical challenges in the future.

Keep in mind that all the three packages are eligible for a large convoy of tasks. There are some jobs that a specific package is more suitable for and even some tasks that can be considered almost impracticable for others. This point will become clearer as the book goes on.

Getting ready

The only requirement this section holds is to have the ggplot2, ggvis, and plotly packages properly installed. Go back to Installing and loading graphics packages recipe if that is not the case. Once the installation is checked, it's time to know ggplot2 framework.

How to do it...

Firstthings first, in order to plot using ggplot2, data must come from a data frame object. Data can come from more than one data frame but it's mandatory to have it arranged into objects from the data frame class.

We took the cars data set to fit this first graphic. It's good to actually get to know the data before plotting, so let's do it using the ?, class(), and head() functions:

> ?cars
> class(cars)
> head(cars)

Plots coming from ggplot2 can be stored by objects. They would fit two classes at same time, gg and ggplot:

> library(ggplot2)
> plot1 <- ggplot(cars, aes(x = speed,y = dist))

Note

Objects created by the ggplot() function get to be from classes gg and ggplot at the same time. That said, you can to refer to a plot crafted by ggplot2 as a ggplot.

The three packages work more or less in a layered way. To add what we call layers to a ggplot, we can use the + operator:

 > plot1 + geom_point()

Note

The + operator is in reality a function.

Result is shown by the following figure:

Figure 1.2 - Simple ggplot2 scatterplot.

Once you learn this framework, getting to know how ggvis works becomes much easier, and vice-versa. A similar graphic can be crafted with the following code:

> library(ggvis)
> ggvis(data = cars, x = ~speed, y = ~dist) %>% layer_points()

plotly would feel a little bit different, but it's not difficult at all to grasp how it works:

> library(plotly)
> plot_ly(data = cars, x = ~speed, y = ~dist, type = 'scatter', mode = 'markers')

Let's give these nuts and bolts some explanations.

How it works...

In order to have a brief data introduction, step 1 starts by calling ?cars. This is a very useful way to get to meet variables and background related to almost every data set coming from a package. Onceggplot2 requires data coming from data frames, class() function is checking if is that the case, answer is affirmative. At the end of this step head() function is checking upon the first six observations.

Moving on to step 2, after loading ggplot2, it demonstrates how to store the basic coordinate mapping and aesthetics into an object called plot1 (try it on the class() function). In order to set the basics, it uses a function (ggplot()) that initializes every single ggplot.

Note

Storing a plot coming from ggplot2, ggvis, or plotly package into an object is optional, though very useful way to proceed.

To properly set ggplot(), start by declaring data set using data argument. After that, some basic aesthetics and coordinates are assigned. Different figures can ask and work along with different aesthetics, for the majority of cases those are named inside the aes() function.

Note

As the books goes on you're going to get used to the ways how aesthetics can be declared-in or outside the aes() function. For now, let's acknowledged that inside aes() it's possible to call data frame variables by name and they may be displayed in legends.

Checking ?aes() shows "..." as argument, popularly known as three-dots but technically named ellipsis. It allows the user to pass an arbitrary number and variety of arguments. So as ggplot2 does lazy-evaluation (only evaluates arguments as they are requested, you could make up arguments and pass them into the aes() function with zero or only little trouble to the function. Perceive the following:

> plot1 <- ggplot(cars, aes(x = speed,y = dist, gorillaTroubleShooter = T, sight = 'Legolas'))

It would work as good as the earlier version. Just don't forget to name the arguments and you got yourself a good way to create some Easter eggs at your code (also a good way to confuse unaware developers). Both aes() and ggplot() play core roles in building graphics within this package.

Until step 2, only coordinate mapping was set at object named plot1, calling for it alone displays an empty graphic. Step 3 uses %+% to add a layer, the layer called (geom_point()) took care of fixing a geometry to the graphic. Besides the plus sign, ggplots are usually constructed by two families of functions (layers): geom_*and stat_*. While the first family comes with a fixed geometry and a default statistical transformation, the second one comes with fixed statistical transformations and a default geometry (this is grammar of graphics for real), defaults can be tweaked.

Note

plot1 + stat_identity(geom = 'point') works just the same as step 3. Argument geom is set for 'point' as default for stat_identity(), it's fine to skip it. The reason I declared it was to reinforce that if you call for a statistical transformation you can pick the geometry and it goes the other way round (if you call for a geometry you can change the statistical transformation).

Behind the scene, geom_point() called the layer() function, which set a couple of arguments that culminated in the creation of a scatterplot. One may want to modify the axis labels and add a regression line. It can be done by simply adding more layers to the plot using the plus sign. One can stack as many layers desired, as shown next:

> plot1 + geom_point() +
> labs(x = "Speed (mpg)", y = "Distance (ft)") +
> geom_smooth(method = "lm", se = F) +
> scale_y_continuous(breaks = seq(0, 125, 25))

Result is exhibited by figure 1.3:

Figure 1.3 - Adding up several layers to a ggplot.

Combining ggplot2's sum operator (that is actually a function) and functions allows the user to make plots in a layered, iterative way. It splits complex graphics construction into several simple steps. It's also very intuitive and does not get any harder as you practice.

Yet, there are limitations. The difficulty to make interactive graphics by itselft may be one. These tasks, in the majority of the cases, are very well handled by both ggvis and plotly as stand alone packages. This leads us to steps 4 and 5.

Note

Calling plotly::ggplotly() after bringing a ggplot up will coerce it into an interactive plot. It may fail sometimes. Do not forget to have plotly installed.

Step 4 loads ggvis package using library() and then gives birth to an interactive plot. It holds many similarities with ggplot2. Functionggvis() handles basic coordinating mapping while pipe operator (%>%) is used to add up a layer called by the layer_points() function. Remember, pipe operator and not plus sign.

Note

ggvis understands different arguments declared using = (ever scaled) and := (never scaled). Also, ~ must come before the variable names.

Function names may change and also does the operator used to add up layers from ggplot2 to ggvis, but essentially the underlying logic keeps still. Layers coming from ggvis has several correspondences with ggplot2's ones; refer to the See also section to track some. In comparison with ggplot2, ggvis is much younger and some utilities may be yet to come, also data don't need to come from a data frame object.

Step 5 draws an interactive plotly graph. A single function (plot_ly()) takes care of coordinate mapping and geometry. It can be designed a little more layered using the add_traces() function, but there is no real need for that when the plot is too simple. Instead of having many functions demanding statistical transformations and geometries those are declared by arguments inside the main function.

These three packages, ggplot2, ggvis, and plotly, are well coded and powerful graphic packages. Right before picking one of them to handle a task do ever consider some points like:

What the package is able to do
Time needed to master the skill set required
Time required to handle the task
Amount of time available
Time to be saved later by the thing that you learned

Base R is also a feasible possibility. Whenever you face new challenges, it is a good thing to think through these points.

There's more

To have data coming solely from data frames is a strong restriction, but it does obligate the user to be explicit about the data and also draw a very clear line on what is ggplot2's concern (data visualization) and what is not (model visualization). In order to avoid headaches that come from downloading spreadsheets, setting up working directories, and loading data from files, we're taking an alternative way: getting data from packages instead.

Note

data.frame() may be the most convenient function to coerce vectors into data frames in R.

By doing this, we ensure that the readers only need to reach the R's console to reproduce recipes; we want nothing to do with web browsers (we're too cool for school, school meaning web browsers). We shall follow this approach to the end of the book. This recipe look over datasets base packages to do so. ggplot2 has some data frames of its own.

Note

Enter library(help = 'datasets') to general information on the other data sets.

It's also important to outline that the gg in the ggplot2 and ggvis refer to the Grammar of Graphics. That's a very important and inspiring theory that in had influenced ggplot2, ggvis, and plotly. The layered/iterative way that these packages handle plots might come from the Grammar of Graphics and makes graphics building much easier and reasonable. Learning this theory may give you heads into the process of learning these packages while learning these packages may give you heads when it comes to learn the Grammar of Graphics.

Making plots using primitives

Previously, a brief introduction on the frameworks of ggplot2, ggvis and plotly package was conducted. Next we are getting started with ggplot2 graphical primitives, using them in a series of recipes with related examples made with ggvis and plotly.

There are a total of eight graphical primitives at ggplot2, one of them already covered in this chapter (geom_point()). It's important to know the primitives well-what they do and when to use them. As fundamental building blocks, they play an essential role in the drawing process. A series of tasks can be handled relying on primitives when there is no dedicated function to handle some task; sometimes even if there is, primitives can handle it much better.

A good example are the dot plots. They have this dedicated geom_dotplot() function, but sometimes it is much easier to draw dot plots using geom_point(). Now, let's see how ggplot2can brew figures using primitives and create related ones using ggvis and plotly.

How to do it...

After loading the package, primitives geom_point() and geom_path() can be stacked in order to plot lines with markers:

> library(ggplot2)
> plot1 <- ggplot( cars, aes(x = speed, y = dist))
> plot1 + geom_point() + geom_path()

The resulting output is shown by following figure:

Figure 1.4 - Lines with markers plot made by ggplot2's primitives.

Same mission can be nailed by the ggvis package, relying on the following code:

> library(ggvis)
> ggvis(cars, x = ~speed, y = ~dist) %>% layer_points() %>% layer_paths()

Following figure 1.5 displays a representation of the resulting graphic (only default theme will look different):

Figure 1.5 - Similar lines and markers plot done by ggvis.

Without using the translation function (ggplotly()) from plotly package, it's also possible to code a similar graphic from scratch relying only on plotly:

> library(plotly)
> plot_ly(cars, x = ~speed, y = ~dist, type = 'scatter', mode = 'lines+markers')

Following figure 1.6 exhibits a snapshot of the graphic brewed by the latest code:

Figure 1.6 - Similar lines and markers plot done by plotly.

Let's understand how these are unfolding.

How it works...

Complete list of ggplot2's primitives is given by geom_*: blank(), path(), ribbon(), polygon(), segment(), rect(), text(), and point(). Every primitive starts with geom_* but not every geom_* is a primitive. In fact, the better odds stands for quite the opposite.

More or less, geom_blank() seems to be the simplest of the primitives. Calling it right after setting ggplot() will display a blank plot with axis already adjusted. It's mostly used to check axes limits given by data itself. Maybe you can find it useful for another task; suit yourself.

Other primitives may work in a similar way. That is the case for geom_path(), geom_ribbon(), and geom_polygon() functions. The first one draws lines between coordinates, second one looks like the first but thicker, requiring additional aes() arguments (ymin and ymax). Last function draws filled polygons.

By setting only the starting and ending points, geom_segment() adds a segment line. geom_rect() adds a rectangle to the plot, requiring four corners to do so (xmin, xmax, ymin, and ymax). geom_text()add texts to the given coordinates. Some graphics displays only texts for each observations instead of points, also a good way to display additional information.

The remaining primitive is geom_point(). It's the only primitive direct called so far, it plot points at given coordinates. Two important points must be highlighted here. One, getting to know the primitives might give you an idea about which function you will require the most and which one the least, but that is not all that ggplot2 is capable of doing. Primitives are nothing but the building blocks used by other functions.

For the second point, as the previous recipe stated earlier, you can stack as many layers as you feel like. That is not less true for primitives functions, but it's good to know how they interact with one another. For example, calling geom_blank() after geom_point() may not override the points with a blank space.

After loading ggplot2 and setting base aes(), step 1 is creating a simple plot with lines and markers. While geom_point() displays the markers, geom_path() draws the lines between them. Note that the last function draws lines following the order given by data set rows, so we can call this function order-sensitive.

Note

For many situations, reordering data will improve viz. This may be the case for dot, box, violin, bar plots, and others. If you want paths to be ordered within the x variable, geom_line() does that by itself, though it is not a primitive.

To this particular plot, the lines attach no meaning; they actually mislead. Lines are better designated to indicate some sort of order within the data, like chronological order. The only reason they were used was to demonstrate how primitives could be stacked to originate different viz from the one done before.

Step 2 is drawing a plot similar to the one crafted by step 1 but using ggvis instead. libray() loads the package while the ggvis() function is used to map the basic aesthetics. Following function (layer_points()) sets up the points to work as our markers and layer_paths() draws the lines between them.

Earlier section argued that ggvis is very similar to ggplot2 in the ways of coding graphics. This section actually demonstrated that. First, the function gets the data set and the variables are inputted as arguments. Pipe operators (%>%) are used instead of plus sign to stack up the layers, and layer_* works in a very similar way as geom_* does.

By step 3, a similar plotly graphic is crafted. Same function responsible for setting basic aesthetic mapping (plot_ly()) is also dealing geometries. Arguments type and mode set the geometries, both inputted with strings. These two arguments are meant to work together.

Setting type = 'scatter' enables the lines and markers modes. Each type has a whole particular convoy of modes attached to it; consult the reference manual to catch them all. The way we wanted to is to use markers and lines at same time so we built a string containing those two elements separated by the plus sign ('lines+markers'), and assigned it to mode argument.

Note

mode = 'lines+markers' works as good as mode = 'markers+lines'. Modes can be stacked and order does not matter.

Figures 1.4 to 1.6 five resembles much a time series, but they aren't and it may give the wrong intuition.There are observations for two variables and neither one is time. Notice how for some speeds values there are up to 4 different distances to stop. Note that the cars data frame is ordered first by speed and then by distance, paths obey the row order showed by data while for point geometry order doesn't really matter.

Adding path geometry was misleading, geom_point() would be enough. Goal here was to demonstrate primitives interaction and not to give a meaningful figure. Next, let's build fictional data and draw a graphic that tells the story the right way. Picture a small classroom with only 7 students. The teacher builds a data frame with studying hours and grades for each student.

Data can be created like this:

> allnames <- c('Phill','Ross','Kate','Patrice','Peter','James','Monica')
> classr <- data.frame(names = allnames)
> classr$hours <- c(4, 16, 8, 11, 6, 14, 8)
> classr$grades <- c(4, 9.5, 6, 4, 6, 9, 7.5)

geom_text() primitive could be used to summon a meaningful graphic:

> library(ggplot2)
> plot2 <- ggplot( classr, aes(x = hours, y = grades))
> plot2 + geom_text( aes( labels = names))

The result would be like shown in the following figure 1.7:

Figure 1.7 - Plotting grades and hours as texts using ggplot2's primitive.

Related ggvis and plotly codes are shown next:

> library(ggvis)
> ggvis(classr, x = ~hours, y = ~grades, text := ~names) %>% layer_text()
> library(plotly)
> plot_ly(classr, x = ~hours, y = ~grades, type = 'scatter', mode = 'text', text = ~names)

This last brief example illustrates how to brew graphics using only primitives in a more meaningful way. It's very important to think about it. The better graphic is the one that tells the right story objectively and not the one with many layers.

There's more...

Did you know that both ggvis and plotly can guess which geometry you are looking for? Based on the basic aesthetics defined, they make a guess and adopt certain geometry. They look at how many variables of what kind (discrete or continuous) were inputted, and for some combinations they are able to make a guess.For the nearest example they would have guessed points geometry.

Figures breed by both packages will be displayed by the Viewer tab if you're using RStudio (They are interactive! Try hoovering the mouse over a plotly figure). Figures can be exported as web pages. Other than that, they can be exported as PNG, JPEG, and BMP, therefore losing the interactive property.

This recipe aimed to demonstrate how to construct plots using ggplot2 primitives, and build similar graphs using other packages. A question you should always ask yourself is if the geometry adopted goes along with the data used. In other words, if the graphic tells the story that you are willing to.

The recipes's goal was to introduce you to the graphical primitives of ggplot2 and draw simple graphics by using only primitives. Additional goal was to draw related graphics using the ggvis and plotly packages.

The next chapters dive deeper; each one shall tackle some families of graphics, highlighting nuts and bolts in the way to building high quality plots. As the book advances, so does the complexity involved. At some point, we are going to be plotting interactive globes, 3D surfaces and developing web applications. I find it pretty sicking cool, hope you enjoy it.

Chapter 2, Plotting Two Continuous Variables, takes care of scatterplots. It's a very popular kind of plot, and very useful too, but there is a big problem: over-plotting. Following chapter will not only teach how to craft scatterplots, but also teach how to deal with such problem and how to improve scatters by deploying marginal plots. Let it rip!

About the Author

Vitor Bianchi Lanzetta

Vitor Bianchi Lanzetta (@vitorlanzetta) has a master's degree in Applied Economics (University of So PauloUSP) and works as a data scientist in a tech start-up named RedFox Digital Solutions. He has also authored a book called R Data Visualization Recipes. The things he enjoys the most are statistics, economics, and sports of all kinds (electronics included). His blog, made in partnership with Ricardo Anjoleto Farias (@R_A_Farias), can be found at ArcadeData dot org, they kindly call it R-Cade Data.
Browse publications by this author

High quality and high availability

Diverse buone idee, con codici in R chiari e facilmente applicabili.

Učím nově Biopython a kniha s návodem k vizualizaci biologických dat se mi velmi hodí.

R Data Visualization Recipes

Chapter 1. Installation and Introduction

Introduction

Installing and loading graphics packages

Note

Note

How to do it...

How it works...

Note

Note

Note

There's more

See also...

Using ggplot2, plotly, and ggvis

Note

Getting ready

How to do it...

Note

Note

How it works...

Note

Note

Note

Note

Note

There's more

Note

Note

See also

Making plots using primitives

How to do it...

How it works...

Note

Note

There's more...