Reader small image

You're reading from  Applied Data Visualization with R and ggplot2

Product typeBook
Published inSep 2018
Reading LevelIntermediate
Publisher
ISBN-139781789612158
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Dr. Tania Moulik
Dr. Tania Moulik
author image
Dr. Tania Moulik

Tania Moulik has a PhD in particle physics. She has worked at CERN, the European Organization for Nuclear Research, and on the Tevatron at Fermi National Accelerator Laboratory in IL, USA. She has years of programming experience in C++, Python, and R. She has also worked in the feld of big data and has worked with technologies such as grid computing. She has a passion for data analysis and would like to share her passion with others who would like to delve into the world of data analytics. She especially likes R and ggplot2 as a powerful analytics package.
Read more about Dr. Tania Moulik

Right arrow

Chapter 2. Grammar of Graphics and Visual Components

In this chapter, we will explore the concept of the Grammar of Graphics in detail and use it to customize graphs to create better visualizations.

We need to customize graphs because default graphs may have fonts that are not visible in a presentation or document, or have scales that do not convey much information about the data. Sometimes, a company may require a uniform style for all their graphs to distinguish themselves, in which case, you would need to define and use the same style for all graphs. We may also need to split data into different subsets in order to understand it in greater detail. This chapter will explore these aspects in detail and explain how to change the default structure of a graph.

By the end of this chapter, you will be able to:

  • Apply the Grammar of Graphics techniques to layers, scales, and coordinates
  • Utilize faceting to make multiplots and divide data into subplots
  • Utilize colors in plots effectively
  • Modify the appearance...

More on the Grammar of Graphics


The Grammar of Graphics is the language used to describe the various components of a graphic that represent data in a visualization. In this topic, you will learn more about the Grammar of Graphics and will use it to make plots. You wille-encounter some of the Grammar of Graphics terms used in the previous chapter.

We will now break down the Grammar of Graphics language, in order to understand the terms in greater detail. 

Layers

In ggplot2, every plot is built up as a layer. Layers are made up of geometric objects (geoms), their statistical transformations (stats), and their thematic aspects. Hence, each plot can be thought of as a separate variable, in and of itself. Aesthetic mappings, defined withaes(), describe how variables are mapped to visual properties, or  aesthetics. The following diagram depicts the use of the df and aes functions:

Let's look at an example. We will use the gapminder dataset. You can see the available variables in the following snippet...

Facets


In data visualization, we sometimes have the need to compare different groups, looking at data alongside each other. One method for doing this is creating a subplot for each group. These kinds of plots are known as Trellis displays. In ggplot2, they're called facets. Facets divide the data by some discrete or categorical variable and display the same type of graph for each data subset.

Let's look at electricity consumption versus GDP for different countries, which we calculated in the previous activity.

We don't know which country has the highest GDP or electricity consumption. Let's split the data now.

Using Facets to Split Data

In this section, we'll plot subsets of data as separate subplots. Let's begin by implementing the following steps:

  1. Use the gapminder.csv dataset.
  2. Make a scatter plot of Electricity_consumption_per_capita versus gdp_per_capita:
p <- ggplot (df, aes (x=gdp_per_capita, y=Electricity_consumption_per_capita)) + geom_point ()
  1. Use facet_grid() to specify the variables...

Changing Styles and Colors


Aside from faceting, we can also produce a color differentiated plot. It can be advantageous to use a color differentiated plot when the shapes are very similar and there is some overlap. To see small differences, it is useful to use colors. For example, we can plot the Electricity consumption versus GDP by using different colors or shapes for the countries.

Using Different Colors to Group Points by a Variable

In this section, we'll produce a color differentiated scatter plot with respect to a third variable. Let's begin by implementing the following steps:

  1. Choose a subset of dataset 1 (gapminder) and select a few countries. Use the following subset command:
dfs <- subset(df,Country %in%c("Germany","India","China","United States"))
  1. Make a scatter plot of the two variables and change the x and y titles:
p1<- ggplot(df,aes_string(x=var1,y=var2))+)geom_point(color=2,shape=2)+xlim(0,10000)+xlab(name1)+ylab(name2)
  1. Then, change the colors and shapes of the points for...

Geoms and Statistical Summaries


Sometimes, you will need to calculate statistical summaries, such as the mean, median, or a quartile of a variable, and view changes with respect to another variable. This can be done by using grouping commands.

Let's plot Genre versus AudienceScore for the HollywoodMovies dataset. Change the angle of the axis labeling text, in order to make it less cluttered, using the following command:

ggplot(HollywoodMovies,aes(Genre,AudienceScore))+geom_point()+theme(axis.text.x=element_text(angle=40))

You'll get the following output:

Using Grouping to Create a Summarized Plot

In this section, we'll use grouping to summarize multiple y values for a given x value. Let's begin by implementing the following steps:

  1. Use grouping to group by genre and remove NULL values:
gp_scr <- group_by(HollywoodMovies,Genre)
gp_scr <- na.omit(gp_scr)
  1. Calculate the mean and standard deviation using the summarise function and make a new dataset, as follows:
dfnew <- dplyr::summarise(gp_scr...

Summary


In this chapter, you learned about the Grammar of Graphics in detail, changing the various theme and color aspects of graphs to create better visuals and reveal further details about data. You also learned how to provide more useful information in scatter plots, using grouping and summarizing to calculate quantities such as the mean, median, quartile, and so on.

In the next chapter, we will work on more advanced plotting techniques, which are not needed quite as often but may be required in some cases.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Applied Data Visualization with R and ggplot2
Published in: Sep 2018Publisher: ISBN-13: 9781789612158
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Dr. Tania Moulik

Tania Moulik has a PhD in particle physics. She has worked at CERN, the European Organization for Nuclear Research, and on the Tevatron at Fermi National Accelerator Laboratory in IL, USA. She has years of programming experience in C++, Python, and R. She has also worked in the feld of big data and has worked with technologies such as grid computing. She has a passion for data analysis and would like to share her passion with others who would like to delve into the world of data analytics. She especially likes R and ggplot2 as a powerful analytics package.
Read more about Dr. Tania Moulik