Packt+ | Advance your knowledge in tech

You're reading from Applied Data Visualization with R and ggplot2

Product typeBook

Published inSep 2018

Reading LevelIntermediate

Publisher

ISBN-139781789612158

Edition1st Edition

Languages

Tools

ggplot

Concepts

Data Visualization

Author (1)

Dr. Tania Moulik

Appendix 1. Solutions

This section contains the worked-out answers for the activities present in each lesson. Note that in case of descriptive questions, your answers might not match the ones provided in this section completely. As long as the essence of the answers remain the same, you can consider them correct.

Chapter 1: Basic Plotting in ggplot2

The following are the activity solutions for this chapter.

Activity: Creating a Histogram and Explaining its Features

Steps for Completion:

Use the template code Lesson1_student.R.

Note

This is an empty code, wherein the libraries are already loaded. You will be writing your code here.

Load the dataset temperature.csv from the directory data.
Create the histogram for two cities (Vancouver and Miami) by using the command discussed previously.
Once the histogram is ready, run the code.
Analyze the two histograms by giving three points for each histogram, and two points of difference between the two.

Outcome:

Two histograms should be created and compared. The complete code is as follows:

df_t <- read.csv("data/historical-hourly-weather-data/temperature.
csv")
ggplot(df_t,aes(x=Vancouver))+geom_histogram()
ggplot(df_t,aes(x=Miami))+geom_histogram()

Activity: Creating One- and Two-Dimensional Visualizations with a Given Dataset

Steps for Completion:

Load the given datasets and investigate them by using the appropriate commands in dataset: xAPI-Edu-Data.csv.
Decide which visualizations to use for the given variables: Topic, gender, and VisitedResources.
Create one-dimensional visualizations and explain why you chose that type of visual (one per variable). Provide one point of observation for each visualization.
Create two-dimensional boxplots or scatterplots for VisitedResources versus Topic, VisitedResources versus AnnouncementsView, and Discussion versus Gender. What are your observations? Write at least five points.

Outcome:

Three one-dimensional plots and three two-dimensional plots should be created, with the following axes (count versus topic) and observations. (Note that the students may provide different observations, so the instructor should verify the answers.)

The complete code is as follows:

df_edu <- read.csv("data/xAPI-Edu-Data.csv")
str(df_edu)

#Functions for Plotting a barchart/Histogram
plotbar <- function(df,mytxt) {
  ggplot(df,aes_string(x=mytxt)) + geom_bar()
}
plothist <- function(df,mytxt) {
  ggplot(df,aes_string(x=mytxt)) + geom_histogram()
}

#Alternatively one can use a function to plot but students can just
#do it directly at this point.
#1-D Plots
plotbar(df_edu,"Topic")
plotbar(df_edu,"gender")
plotbar(df_edu,"ParentschoolSatisfaction")
plothist(df_edu,"VisitedResources")

#2-D Plots
ggplot(df_edu,aes(x=Topic,y=VisitedResources)) + geom_boxplot()
ggplot(df_edu,aes(x=AnnouncementsView,y=VisitedResources)) + geom_point()
ggplot(df_edu,aes(x=gender,y=Discussion)) + geom_boxplot()

Activity: Improving the Default Visualization

Steps for Completion:

Use the basic ggplot commands to create two of the plots from Activity B(Topic and VisitedResources).
Use the Grammar of Graphics to improve your graphics by layering upon the base graphic. The graph should follow these guidelines:
1. Histograms should be rebinned.
2. Change the fill colors of one- and two-dimensional objects. The line colors should be black.
3. Add a title to the graph.
4. Apply the appropriate font sizes and colors to the x- and y-axes.

Outcome:

The complete code is as follows:

p1 <- ggplot(df_edu,aes(x=Topic))
p2 <- ggplot(df_edu,aes(x=VisitedResources))

p1 +
    geom_bar(color=1,fill=3) +
    ylab("Count")+
    theme(axis.text.y=element_text(size=10),
          axis.text.x=element_text(size = 10),
          axis.title.x=element_text(size=15,color=4),
          axis.title.y=element_text(size=15,color=4))+
    ggtitle("Topics in Education data")

p2 +
    geom_histogram(bins=20,fill="white",color=1)+
    ggtitle("Visited Resources for Education data")+
    xlab("Visited Resources")+
    theme(axis.text.x=element_text(size = 12),
          axis.text.y=element_text(size=12),
          axis.title.x=element_text(size=15,color=4),
          axis.title.y=element_text(size=15,color=4))

Chapter 2: Grammar of Graphics and Visual Components

The following are the activity solutions for this chapter.

Activity: Applying Grammar of Graphics to Create a Complex Visualization

Steps for Completion:

Use the commands that we just explored to create the scatterplot.
For this activity, you will use the gapminder dataset.
You can use the help command to explore the options.
To change scales, you will have to use one of the preceding label formats.
Use labels=scales::unit_format ("K", 1e-3)) for labeling.

Outcome:

The output code is as follows:

ggplot(df, aes(x=gdp_per_capita,y=Electricity_consumption_per_capita))+
    geom_point()+
    scale_x_continuous(name="GDP",breaks = seq(0,50000,5000),
                       labels=scales::unit_format("K", 1e-3)) +
    scale_y_continuous(name="Electricity Consumption",
                       breaks = seq(0,20000,2000),
                       labels=scales::unit_format("K", 1e-3))

Activity: Using Faceting to Understand Data

Steps for Completion:

Use the loan data and plot a histogram (use fill color=cadetblue4 and bins=10).
Use facet_wrap() to plot the loan data for the different credit grades.
Now, you will need to change the default options for facet_wrap, in order to produce the following plots. Use ?facet_wrap on the command line to view the options that can be changed.

Outcome:

Refer to the complete code at the following path: https://goo.gl/RheL2G. The answers to the questions are given here:

scale=free_y.
A, B, and C have maximum loan amounts below 10,000. (A, B, C, and D is also an acceptable answer.)
F and G show uniform distributions.
No, none of the distributions are normally distributed.

Activity: Using Color Differentiation in Plots

Steps for Completion:

Use the LoanStats dataset and make a subset using the following variables:

dfn <- df3[,c("home_ownership","loan_amnt","grade")]

Clean the dataset (removing the NONE and NA cases), using the following code:

dfn <- na.omit(dfn)
dfn <- subset (dfn, !dfn$home_ownership %in% c("NONE"))

Create a boxplot showing the loan amount versus home ownership.
Color differentiate by credit grade.

Outcome:

Refer to the following URL for the output: https://goo.gl/RheL2G.

The answers to question 5 are as follows:

Credit grades F and G are the highest. Credit grades A and B are the lowest.
They are higher for a person who has a mortgage.
The median value for A is 2,000, and the median value for G is 20,000, so the difference is 180,000.

Activity: Using Themes and Color Differentiation in a Plot

Steps for Completion:

Make a scatterplot of female versus male BMIs.
Build your plot in layers, to avoid creating three separate plots.
1. Create the default plot. Store this plot as p1.
2. Points should be differentiated by color. Differentiate the two BMIs by country using color. The size of the points should be 2.
3. Change the color scheme by using scale_color_brewer. The palette used is Dark2. Store this plot as p2.
4. Add a plot title: BMI female vs BMI Male.
5. Change more of the theme's aspects to produce plot p3. The theme aspects to be changed, and their values, are as follows:
  - Panel Background: azure; Color: black
  - No grid lines
  - Axis Title Size: 15; Axis Title Color: cadetblue4
  - Change x and y titles: BMI female and BMI Male
  - Legend: Position bottom, Lef justifid, No Legend Title, legend key (fil – gray97, color of the line=3)
  - Plot Title Color: cadetblue4; Size: 18; Face: bold.italic

Outcome:

The output code is as follows:

pd1 <- ggplot(df,aes(x=BMI_male,y=BMI_female))
pd2 <- pd1+geom_point()
pd3 <- pd1+geom_point(aes(color=Country),size=2)+
    scale_colour_brewer(palette="Dark2")
pd4 <- pd3+theme(axis.title=element_text(size=15,color="cadetblue4",
                 face="bold"),
                 plot.title=element_text(color="cadetblue4", size=18,
                 face="bold.italic"),
                 panel.background = element_rect(fill="azure",color="black"),
                 panel.grid=element_blank(),
                 legend.position="bottom",
                 legend.justification="left",
                 legend.title = element_blank(),
                 legend.key = element_rect(color=3,fill="gray97")
)+
    xlab("BMI Male")+
    ylab("BMI female")+
    ggtitle("BMI female vs BMI Male")

Chapter 3: Advanced Geoms and Statistics

The following are the activity solutions for this chapter.

Activity: Using Density Plots to Compare Distributions

Steps for Completion:

Use the RestaurantTips dataset in Lock5data.
Compare the TIP amount for various days. Use aes=color for geom_density command.
Superimpose all of the plots.
Use the scale_x_continuous command for the x-axis tick marks.

Activity: Plot the Monthly Closing Stock Prices and the Mean Values

Steps for Completion:

Use the strftime command to get the month from each date and make another variable (Month), as follows:

df_fb$Month <- strftime(df_fb$Date,"%m")

Change the month to a numerical value by using as.numeric:

df_fb$Month <- as.numeric(df_fb$Month)

Now, use ggplot to make a plot of closing prices versus months.
Plot the data using geom_point (color=red).
Change the x scale to show each month, and label the x-axis, such that each month is shown.
Title your plot Monthly closing stock prices: Facebook.
Use geom_line(stat='summary',fun.y=mean) to plot the mean.

Outcome:

The complete code is shown as follows:

ggplot(df_fb, aes(Month,Close)) + geom_point(color="red",alpha=1/2,position = position_jitter(h=0.0,w=0.0
))+
    geom_line(stat='summary',fun.y=mean, color="blue",size=1)+
    scale_x_continuous(breaks=seq(0,13,1))+
    ggtitle("Monthly Closing Stock Prices: Facebook")+theme_classic()

Activity: Creating a Variable-Encoded Regional Map

Steps for Completion:

Merge the USStates data with states_map.
Before merging, change the states variable in USStates to the same format used in states_map.

Use the ggplot options geom_polygon and coord_map to create the map.
For aesthetics, run the following code and specify x=long, y=lat, group=group, and fill=ObamaVote.

Outcome:

The complete code is shown as follows:

USStates$Statelower <- as.character(tolower(USStates$State))
glimpse(USStates)
us_data <- merge(USStates,states_map,by.x="Statelower",by.y="region")
head(us_data)

Activity: Studying Correlated Variables

Steps for Completion:

Make a subset of the loan dataset by using some of the following variables:

df3_1 <- df3[,c("funded_amnt","annual_inc","dti","inq_last_6mths",
                "total_acc","total_pymnt_inv")]

Use cor for the preceding loan data subset, and then choose two highly correlated variables in the loan dataset. Use pairs, as follows:

total_rec_prncp and total_pymnt_int
funded_amnt,total_pymnt_inv

Make a scatterplot for the preceding pairs for grade A, then fit a linear regression model.
Determine what are the correlations of the preceding pairs.

Outcome:

Answer to step 4: The correlations are as follows:

The rest of the chapter is locked

You have been reading a chapter from

Applied Data Visualization with R and ggplot2

Published in: Sep 2018Publisher: ISBN-13: 9781789612158

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Dr. Tania Moulik

Tania Moulik has a PhD in particle physics. She has worked at CERN, the European Organization for Nuclear Research, and on the Tevatron at Fermi National Accelerator Laboratory in IL, USA. She has years of programming experience in C++, Python, and R. She has also worked in the feld of big data and has worked with technologies such as grid computing. She has a passion for data analysis and would like to share her passion with others who would like to delve into the world of data analytics. She especially likes R and ggplot2 as a powerful analytics package.
Read more about Dr. Tania Moulik

Other recommended products

Related to this chapter

Hands-On Data Visualization with Bokeh

Adding a layer of interactivity to your plots and converting these plots into applications hold immense value in the field of data science. The standard approach to adding interactivity would be to use paid software such as Tableau, but the Bokeh package in Python offers users a way to create both interactive and visually aesthetic plots for free.

BookJun 2018174 pages

R Programming Fundamentals

Data analysis is crucial to accurately predict the performance of an application. The book begins by getting you started with R, including basic programming and data import, data visualization, pivoting, merging, aggregating, and joins. Once you are comfortable with the basics, you can read ahead and learn all about data visualization and graphics. You can learn data management, statistics and applications, forecasting, and reporting. With this various case studies and examples, this book gives you the knowledge to confidently start your career in the field of data science.

BookSep 2018206 pages

R Data Visualization Recipes

R is an open source language for data analysis and graphics. Its popularity has soared in recent years because of its powerful capabilities when it comes to turning different kinds of data into intuitive visualization solutions. An updated, revamped version of our R Data Visualization Cookbook, this book contains practical, easy to follow recipes you need to master data visualization using R. It covers various graphics packages such as ggplot2, plotly, and ggvis. Using real-world datasets, you will analyze and visualize your data as histograms, bar graphs, scatterplots, and more, and customize your plots with various themes and coloring options. The book also covers advanced visualization aspects such as creating interactive dashboards using Shiny.

BookNov 2017366 pages

Interactive Dashboards and Data Apps with Plotly and Dash

Learn how to design and build Dash apps from scratch with this practical book that covers the different functionalities of Plotly and Dash for building dashboards and data apps. You’ll start by exploring the Dash ecosystem and go on to build a fully functional app as you discover options for fine-tuning and extending your app using new techniques.

BookMay 2021364 pages

Applied Supervised Learning with Python

Applied Supervised Learning with Python provides you a rich understanding of machine learning, one of the most pursued topics in information science, and Python, one of the most popular scripting languages. Through this book, you'll learn Jupyter Notebooks, the technology used in academic and commercial circles with in-line code running support.

BookApr 2019404 pages

Interactive Data Visualization with Python

Interactive Data Visualization with Python sharpens your data exploration skills, tells you everything there is to know about interactive data visualization in Python, and most importantly, helps you make your storytelling more intuitive and persuasive.

BookOct 2019362 pages

Interactive Data Visualization with Python

Interactive Data Visualization with Python sharpens your data exploration skills, tells you everything there is to know about interactive data visualization in Python, and most importantly, helps you make your storytelling more intuitive and persuasive.

BookApr 2020362 pages

R Data Analysis Cookbook

Data analytics with R has emerged as a very important focus for organizations of all kinds. R enables even those with only an intuitive grasp of the underlying concepts, without a deep mathematical background, to unleash powerful and detailed examinations of their data. This book empowers you by showing you ways to use R to generate professional analysis reports. The book also teaches you to quickly adapt the example code for your own needs and save yourself the time needed to construct code from scratch.

BookSep 2017560 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages