Reader small image

You're reading from  Data Wrangling with R

Product typeBook
Published inFeb 2023
PublisherPackt
ISBN-139781803235400
Edition1st Edition
Concepts
Right arrow
Author (1)
Gustavo R Santos
Gustavo R Santos
author image
Gustavo R Santos

Gustavo R Santos has worked in the Technology Industry for 13 years, improving processes, and analyzing datasets and creating dashboards. Since 2020, he has been working as a Data Scientist in the retail industry, wrangling, analyzing, visualizing and modeling data with the most modern tools like R, Python and Databricks. Gustavo also gives lectures from time to time at an online school about Data Science concepts. He has a background in Marketing, is certified as Data Scientist by the Data Science Academy Brazil and pursues his specialist MBA in Data Science at the University of São Paulo
Read more about Gustavo R Santos

Right arrow

Enhanced Visualizations with ggplot2

Data visualization is an art. Choosing the right graphic type, the right x and y variables, and the right colors, shapes, and titles can be challenging. We must be careful not to make our graphic too crowded with information or too lacking in information.

Sometimes, it is necessary to add other resources to the plot that will help us to deliver the right message, or at least to make it easier for the audience to understand. This is when additions such as facet grids, interactivity, and maps can be helpful.

In this chapter, we will go over some of these additional elements that can improve the readability or the interpretivity of a graphic. We will start with the facet grids, one of the grammatical elements that we still haven’t covered; then, we move forward to study 3D plots and when to use them. After that, we will learn about map plots, a valuable tool to have these days, as the world is more connected every day. And finally, there...

Technical requirements

We will use the diamonds dataset, which is including in the ggplot2 library.

All the code can be found in the book’s GitHub repository: https://github.com/PacktPublishing/Data-Wrangling-with-R/tree/main/Part3/Chapter11.

The libraries in this chapter are as follows:

library(tidyverse)
library(lubridate)
library(datasets)
library(patchwork)
library(plotly)
data("diamonds")

Off we go.

Facet grids

Facet grids create a figure in the form of a matrix of rows and columns to plot multiple graphics side by side. Those graphics are subplots of one or more variables, facilitating the visualization of the relationship of a variable with others separately. In summary, facet grids show small plots representing a subgroup of the data.

We can see what a facet grid looks like using the diamonds dataset, which is built into ggplot2 (type ?diamonds into R’s console for the documentation). This data has the cuts, dimensions, colors, prices, and other attributes of 54,000 diamonds. If we want to see a scatterplot of the prices by carat, the graphic will look busy, as we can see in Figure 11.1. Notice that it is difficult to see the trends and relationships for each cut type, such as Fair or Good. They will be hidden under other points. What we see is the general trend and relationship for the entire dataset.

Figure 11.1 – Scatterplot of...

Map plots

We live in the information era. Enormous amounts of data are created each day, from all parts of the world. Part of that data has location information attached to it (latitude and longitude), enabling the data scientists that have access to it to create visualizations using maps. Anaysis of store sales by city, state taxes collection, tourism destinations, and internet access by country are only a few examples of a large spectrum of possibilities. That is enough reason to learn how to use ggplot2 to create plots using maps.

A side note before we jump into the action is that map plots are a vast domain as well, being part of the spatial data analysis domain, which is out of the scope of this book. Here, the intention is to show the capabilities of ggplot2. To learn in more depth about map plotting, there is some material available in the Further reading section.

To plot a map, the geometry used is geom_map(). But before we can plot anything, ggplot2 requires us to load...

Time series plots

A time series is a sequence of data points ordered by time. In a time series, the data points are measurements of any given variable throughout time, such as days, hours, months, or any other time frame. We can visualize time series using ggplot2 as long as the dataset contains a datetime variable. The best way to visualize data organized by time is with line plots. Let’s set a seed so you can reproduce the same results as mine for the random numbers. Create a sample dataframe and then see how to visualize a time series:

# Set seed to reproduce the same random numbers
set.seed(10)
# Creating a Dataset
ts <- data.frame(
  date = seq(ymd('2022-01-01'),ymd('2022-06-30'),by='days'),
  measure = as.integer(runif(181, min=600,  max= 1000) + sort(rexp(181,0.001)))

The preceding code is a data.frame object where we are assigning a sequence of dates to the name date from January 1 to June 3, 2022,...

3D plots

3-dimensional plots are beautiful. Very often, they create a good impression with their audience, but the truth is that they are not the best type of graphic to use. To plot a 3D graphic on a 2D space, such as on a computer screen or on paper, the third dimension will have to simulate depth that does not exist. It is not recommended that you plot in 3D very often, as in general, a good old 2D plot will be the simplest and best option.

Sometimes, though, looking at 3D plots can be useful. Cases such as surface graphics, which can represent the surface of a given place, such as a mountain, can be interesting.

3D graphics can be created using the plotly library in R (loaded with library(plotly)). Let’s create a random surface and plot it. The surface graphics require the input data to be a matrix, thus we are creating one and then using the plot_ly() function, passing the z= ~surface argument to it to indicate that we want a 3D graphic. Remember that x and y are...

Adding interactivity to graphics

Images are interpreted by our brains faster than words or numbers (https://tinyurl.com/nhtbw9jk). That makes graphics an interesting way to show data, as we have learned throughout this book. But there is still more enhancement to be done when working with data visualization, and one of these enhancements is interactivity.

The ggplot2 library creates static graphics. Hence, the plots will not show values at the tops of bars or names of points on a scatterplot, for example. If that is a requirement for a visualization, it must be added using an annotation or text. However, when you combine the graphic’s code with plotly, some interaction is added to the visualization, such as making values appear just by hovering over a data point or zooming in and out the graphic.

To create an interactive scatterplot out of the same code that generated in Figure 11.1, we only have to add the ggplotly() function around the entire ggplot code. See the following...

Summary

After reading this chapter, you should be able to make enhanced plots, such as facet grids, maps, and 3D plots.

We started by learning about facet grids, which are one of the grammatical elements of the grammar of graphics. With facet grids, a graphic can be divided into subplots, making the interpretation easier for the reader. The next topics were how to plot maps and time series in R using ggplot2. These are vast subjects that lie within geospatial data analysis and time series analysis in data science, so we just covered the basics, but that should be enough for you to create great visualizations.

3-dimensional plots are beautiful and impactful, no doubt. However, they are not well suited for big data or for visualizations where precision is a requirement. They are good, though, for plotting surfaces or viewing the separation of data points that is only visible with the addition of a third dimension.

Finally, we closed the chapter with a function that combines...

Exercises

  1. What is a facet grid and what is the function used to create it using ggplot2?
  2. What step is required before plotting a map with ggplot2?
  3. What is the geometry used in order to create maps on ggplot2?
  4. What library is used to create 3D plots?
  5. What characterizes a dataset as a time series?
  6. List two use cases where 3D plots can be recommended.
  7. What is the function from plotly for adding interactivity to ggplot2 graphics?

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Wrangling with R
Published in: Feb 2023Publisher: PacktISBN-13: 9781803235400
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Gustavo R Santos

Gustavo R Santos has worked in the Technology Industry for 13 years, improving processes, and analyzing datasets and creating dashboards. Since 2020, he has been working as a Data Scientist in the retail industry, wrangling, analyzing, visualizing and modeling data with the most modern tools like R, Python and Databricks. Gustavo also gives lectures from time to time at an online school about Data Science concepts. He has a background in Marketing, is certified as Data Scientist by the Data Science Academy Brazil and pursues his specialist MBA in Data Science at the University of São Paulo
Read more about Gustavo R Santos