Reader small image

You're reading from  Applied Data Visualization with R and ggplot2

Product typeBook
Published inSep 2018
Reading LevelIntermediate
Publisher
ISBN-139781789612158
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Dr. Tania Moulik
Dr. Tania Moulik
author image
Dr. Tania Moulik

Tania Moulik has a PhD in particle physics. She has worked at CERN, the European Organization for Nuclear Research, and on the Tevatron at Fermi National Accelerator Laboratory in IL, USA. She has years of programming experience in C++, Python, and R. She has also worked in the feld of big data and has worked with technologies such as grid computing. She has a passion for data analysis and would like to share her passion with others who would like to delve into the world of data analytics. She especially likes R and ggplot2 as a powerful analytics package.
Read more about Dr. Tania Moulik

Right arrow

Chapter 1:  Basic Plotting in ggplot2


The following are the activity solutions for this chapter.

Activity: Creating a Histogram and Explaining its Features

Steps for Completion:

  1. Use the template code Lesson1_student.R.

Note

This is an empty code, wherein the libraries are already loaded. You will be writing your code here.

  1. Load the dataset temperature.csv from the directory data.
  2. Create the histogram for two cities (Vancouver and Miami) by using the command discussed previously.
  3. Once the histogram is ready, run the code.
  4. Analyze the two histograms by giving three points for each histogram, and two points of difference between the two.

Outcome:

Two histograms should be created and compared. The complete code is as follows:

df_t <- read.csv("data/historical-hourly-weather-data/temperature.
csv")
ggplot(df_t,aes(x=Vancouver))+geom_histogram()
ggplot(df_t,aes(x=Miami))+geom_histogram()

Activity: Creating One- and Two-Dimensional Visualizations with a Given Dataset

Steps for Completion:

  1. Load the given datasets and investigate them by using the appropriate commands in dataset: xAPI-Edu-Data.csv.
  2. Decide which visualizations to use for the given variables: Topic, gender, and VisitedResources.
  3. Create one-dimensional visualizations and explain why you chose that type of visual (one per variable). Provide one point of observation for each visualization.
  4. Create two-dimensional boxplots or scatterplots for VisitedResources versus Topic, VisitedResources versus AnnouncementsView, and Discussion versus Gender. What are your observations? Write at least five points.

Outcome:

Three one-dimensional plots and three two-dimensional plots should be created, with the following axes (count versus topic) and observations. (Note that the students may provide different observations, so the instructor should verify the answers.)

The complete code is as follows:

df_edu <- read.csv("data/xAPI-Edu-Data.csv")
str(df_edu)

#Functions for Plotting a barchart/Histogram
plotbar <- function(df,mytxt) {
  ggplot(df,aes_string(x=mytxt)) + geom_bar()
}
plothist <- function(df,mytxt) {
  ggplot(df,aes_string(x=mytxt)) + geom_histogram()
}

#Alternatively one can use a function to plot but students can just
#do it directly at this point.
#1-D Plots
plotbar(df_edu,"Topic")
plotbar(df_edu,"gender")
plotbar(df_edu,"ParentschoolSatisfaction")
plothist(df_edu,"VisitedResources")

#2-D Plots
ggplot(df_edu,aes(x=Topic,y=VisitedResources)) + geom_boxplot()
ggplot(df_edu,aes(x=AnnouncementsView,y=VisitedResources)) + geom_point()
ggplot(df_edu,aes(x=gender,y=Discussion)) + geom_boxplot()

Activity: Improving the Default Visualization

Steps for Completion:

  1. Use the basic ggplot commands to create two of the plots from Activity B(Topic and VisitedResources).
  2. Use the Grammar of Graphics to improve your graphics by layering upon the base graphic. The graph should follow these guidelines:
    1. Histograms should be rebinned.
    2. Change the fill colors of one- and two-dimensional objects. The line colors should be black.
    3. Add a title to the graph.
    4. Apply the appropriate font sizes and colors to the x- and y-axes.

Outcome:

The complete code is as follows:

p1 <- ggplot(df_edu,aes(x=Topic))
p2 <- ggplot(df_edu,aes(x=VisitedResources))

p1 +
    geom_bar(color=1,fill=3) +
    ylab("Count")+
    theme(axis.text.y=element_text(size=10),
          axis.text.x=element_text(size = 10),
          axis.title.x=element_text(size=15,color=4),
          axis.title.y=element_text(size=15,color=4))+
    ggtitle("Topics in Education data")

p2 +
    geom_histogram(bins=20,fill="white",color=1)+
    ggtitle("Visited Resources for Education data")+
    xlab("Visited Resources")+
    theme(axis.text.x=element_text(size = 12),
          axis.text.y=element_text(size=12),
          axis.title.x=element_text(size=15,color=4),
          axis.title.y=element_text(size=15,color=4)) 
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Applied Data Visualization with R and ggplot2
Published in: Sep 2018Publisher: ISBN-13: 9781789612158

Author (1)

author image
Dr. Tania Moulik

Tania Moulik has a PhD in particle physics. She has worked at CERN, the European Organization for Nuclear Research, and on the Tevatron at Fermi National Accelerator Laboratory in IL, USA. She has years of programming experience in C++, Python, and R. She has also worked in the feld of big data and has worked with technologies such as grid computing. She has a passion for data analysis and would like to share her passion with others who would like to delve into the world of data analytics. She especially likes R and ggplot2 as a powerful analytics package.
Read more about Dr. Tania Moulik