Reader small image

You're reading from  Applied Data Visualization with R and ggplot2

Product typeBook
Published inSep 2018
Reading LevelIntermediate
Publisher
ISBN-139781789612158
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Dr. Tania Moulik
Dr. Tania Moulik
author image
Dr. Tania Moulik

Tania Moulik has a PhD in particle physics. She has worked at CERN, the European Organization for Nuclear Research, and on the Tevatron at Fermi National Accelerator Laboratory in IL, USA. She has years of programming experience in C++, Python, and R. She has also worked in the feld of big data and has worked with technologies such as grid computing. She has a passion for data analysis and would like to share her passion with others who would like to delve into the world of data analytics. She especially likes R and ggplot2 as a powerful analytics package.
Read more about Dr. Tania Moulik

Right arrow

Chapter 3:  Advanced Geoms and Statistics


The following are the activity solutions for this chapter.

Activity: Using Density Plots to Compare Distributions

Steps for Completion:

  1. Use the RestaurantTips dataset in Lock5data.
  2. Compare the TIP amount for various days. Use aes=color for geom_density command.
  3. Superimpose all of the plots.
  4. Use the scale_x_continuous command for the x-axis tick marks.

Activity: Plot the Monthly Closing Stock Prices and the Mean Values

Steps for Completion:

  1. Use the strftime command to get the month from each date and make another variable (Month), as follows:
df_fb$Month <- strftime(df_fb$Date,"%m")
  1. Change the month to a numerical value by using as.numeric:
df_fb$Month <- as.numeric(df_fb$Month)
  1. Now, use ggplot to make a plot of closing prices versus months.
  2. Plot the data using geom_point (color=red).
  3. Change the x scale to show each month, and label the x-axis, such that each month is shown.
  4. Title your plot Monthly closing stock prices: Facebook.
  5. Use geom_line(stat='summary',fun.y=mean) to plot the mean.

Outcome:

The complete code is shown as follows:

ggplot(df_fb, aes(Month,Close)) + geom_point(color="red",alpha=1/2,position = position_jitter(h=0.0,w=0.0
))+
    geom_line(stat='summary',fun.y=mean, color="blue",size=1)+
    scale_x_continuous(breaks=seq(0,13,1))+
    ggtitle("Monthly Closing Stock Prices: Facebook")+theme_classic()

Activity: Creating a Variable-Encoded Regional Map

Steps for Completion:

  1. Merge the USStates data with states_map.
  2. Before merging, change the states variable in USStates to the same format used in states_map.
  1. Use the ggplot options geom_polygon and coord_map to create the map.
  2. For aesthetics, run the following code and specify x=long, y=latgroup=group, and fill=ObamaVote.

Outcome:

The complete code is shown as follows:

USStates$Statelower <- as.character(tolower(USStates$State))
glimpse(USStates)
us_data <- merge(USStates,states_map,by.x="Statelower",by.y="region")
head(us_data)

Activity: Studying Correlated Variables

Steps for Completion:

  1. Make a subset of the loan dataset by using some of the following variables:
df3_1 <- df3[,c("funded_amnt","annual_inc","dti","inq_last_6mths",
                "total_acc","total_pymnt_inv")]
  1. Use cor for the preceding loan data subset, and then choose two highly correlated variables in the loan dataset. Use pairs, as follows:
total_rec_prncp and total_pymnt_int
funded_amnt,total_pymnt_inv
  1. Make a scatterplot for the preceding pairs for grade A, then fit a linear regression model.
  2. Determine what are the correlations of the preceding pairs.

Outcome:

Answer to step 4: The correlations are as follows:

  1. 93%
  2. 85%
lock icon
The rest of the page is locked
Previous PageNext Chapter
You have been reading a chapter from
Applied Data Visualization with R and ggplot2
Published in: Sep 2018Publisher: ISBN-13: 9781789612158

Author (1)

author image
Dr. Tania Moulik

Tania Moulik has a PhD in particle physics. She has worked at CERN, the European Organization for Nuclear Research, and on the Tevatron at Fermi National Accelerator Laboratory in IL, USA. She has years of programming experience in C++, Python, and R. She has also worked in the feld of big data and has worked with technologies such as grid computing. She has a passion for data analysis and would like to share her passion with others who would like to delve into the world of data analytics. She especially likes R and ggplot2 as a powerful analytics package.
Read more about Dr. Tania Moulik