Reader small image

You're reading from  Applied Data Visualization with R and ggplot2

Product typeBook
Published inSep 2018
Reading LevelIntermediate
Publisher
ISBN-139781789612158
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Dr. Tania Moulik
Dr. Tania Moulik
author image
Dr. Tania Moulik

Tania Moulik has a PhD in particle physics. She has worked at CERN, the European Organization for Nuclear Research, and on the Tevatron at Fermi National Accelerator Laboratory in IL, USA. She has years of programming experience in C++, Python, and R. She has also worked in the feld of big data and has worked with technologies such as grid computing. She has a passion for data analysis and would like to share her passion with others who would like to delve into the world of data analytics. She especially likes R and ggplot2 as a powerful analytics package.
Read more about Dr. Tania Moulik

Right arrow

Chapter 2:  Grammar of Graphics and Visual Components


The following are the activity solutions for this chapter.

Activity: Applying Grammar of Graphics to Create a Complex Visualization

Steps for Completion:

  1. Use the commands that we just explored to create the scatterplot.
  2. For this activity, you will use the gapminder dataset.
  3. You can use the help command to explore the options.
  4. To change scales, you will have to use one of the preceding label formats.
  5. Use labels=scales::unit_format ("K", 1e-3)) for labeling.

Outcome:

The output code is as follows:

ggplot(df, aes(x=gdp_per_capita,y=Electricity_consumption_per_capita))+
    geom_point()+
    scale_x_continuous(name="GDP",breaks = seq(0,50000,5000),
                       labels=scales::unit_format("K", 1e-3)) +
    scale_y_continuous(name="Electricity Consumption",
                       breaks = seq(0,20000,2000),
                       labels=scales::unit_format("K", 1e-3))

Activity: Using Faceting to Understand Data

Steps for Completion:

  1. Use the loan data and plot a histogram (use fill color=cadetblue4 and bins=10).
  2. Use facet_wrap() to plot the loan data for the different credit grades.
  3. Now, you will need to change the default options for facet_wrap, in order to produce the following plots. Use ?facet_wrap on the command line to view the options that can be changed.

Outcome:

Refer to the complete code at the following path: https://goo.gl/RheL2G. The answers to the questions are given here:

  1. scale=free_y.
  2. A, B, and C have maximum loan amounts below 10,000. (A, B, C, and D is also an acceptable answer.)
  3. F and G show uniform distributions.
  4. No, none of the distributions are normally distributed.

Activity: Using Color Differentiation in Plots

Steps for Completion:

  1. Use the LoanStats dataset and make a subset using the following variables:
dfn <- df3[,c("home_ownership","loan_amnt","grade")]
  1. Clean the dataset (removing the NONE and NA cases), using the following code:
dfn <- na.omit(dfn)
dfn <- subset (dfn, !dfn$home_ownership %in% c("NONE"))
  1. Create a boxplot showing the loan amount versus home ownership.
  2. Color differentiate by credit grade.

Outcome:

Refer to the following URL for the output: https://goo.gl/RheL2G.

The answers to question 5 are as follows:

  1. Credit grades F and G are the highest. Credit grades A and B are the lowest.
  2. They are higher for a person who has a mortgage.
  3. The median value for A is 2,000, and the median value for G is 20,000, so the difference is 180,000.

Activity: Using Themes and Color Differentiation in a Plot

Steps for Completion:

  1. Make a scatterplot of female versus male BMIs.
  2. Build your plot in layers, to avoid creating three separate plots.
    1. Create the default plot. Store this plot as p1.
    2. Points should be differentiated by color. Differentiate the two BMIs by country using color. The size of the points should be 2.
    3. Change the color scheme by using scale_color_brewer. The palette used is Dark2. Store this plot as p2.
    4. Add a plot title: BMI female vs BMI Male.
    5. Change more of the theme's aspects to produce plot p3. The theme aspects to be changed, and their values, are as follows:
      • Panel Background: azure; Color: black
      • No grid lines
      • Axis Title Size: 15; Axis Title Color: cadetblue4
      • Change x and y titles: BMI female and BMI Male
      • Legend: Position bottom, Lef justifid, No Legend Title, legend key (fil – gray97, color of the line=3)
      • Plot Title Color: cadetblue4; Size: 18; Face: bold.italic

Outcome:

The output code is as follows:

pd1 <- ggplot(df,aes(x=BMI_male,y=BMI_female))
pd2 <- pd1+geom_point()
pd3 <- pd1+geom_point(aes(color=Country),size=2)+
    scale_colour_brewer(palette="Dark2")
pd4 <- pd3+theme(axis.title=element_text(size=15,color="cadetblue4",
                 face="bold"),
                 plot.title=element_text(color="cadetblue4", size=18,
                 face="bold.italic"),
                 panel.background = element_rect(fill="azure",color="black"),
                 panel.grid=element_blank(),
                 legend.position="bottom",
                 legend.justification="left",
                 legend.title = element_blank(),
                 legend.key = element_rect(color=3,fill="gray97")
)+
    xlab("BMI Male")+
    ylab("BMI female")+
    ggtitle("BMI female vs BMI Male")
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Applied Data Visualization with R and ggplot2
Published in: Sep 2018Publisher: ISBN-13: 9781789612158

Author (1)

author image
Dr. Tania Moulik

Tania Moulik has a PhD in particle physics. She has worked at CERN, the European Organization for Nuclear Research, and on the Tevatron at Fermi National Accelerator Laboratory in IL, USA. She has years of programming experience in C++, Python, and R. She has also worked in the feld of big data and has worked with technologies such as grid computing. She has a passion for data analysis and would like to share her passion with others who would like to delve into the world of data analytics. She especially likes R and ggplot2 as a powerful analytics package.
Read more about Dr. Tania Moulik