You're reading from The Statistics and Machine Learning with R Workshop

Product typeBook

Published inOct 2023

Reading LevelIntermediate

PublisherPackt

ISBN-139781803240305

Edition1st Edition

Languages

Concepts

Machine Learning

Author (1)

Liu Peng

Data Visualization with ggplot2

The previous chapter covered intermediate data processing techniques, focusing on dealing with string data. When the raw data has been transformed and processed into a clean and structured shape, we can take the analysis to the next level by visualizing the clean data in a graph, which we aim to accomplish in this chapter.

By the end of this chapter, you will be able to plot standard graphs using the ggplot2 package and add customizations to present excellent visuals.

In this chapter, we will cover the following topics:

Introducing ggplot2
Understanding the grammar of graphics
Geometries in graphics
Controlling themes in graphics

Technical requirements

To complete the exercises in this chapter, you will need to have the latest versions of the following packages:

The ggplot2 package, version 3.3.6. Alternatively, install the tidyverse package and load ggplot2 directly.
The ggthemes package, version 4.2.4.

The versions mentioned along with the packages in the preceding list are the latest ones while I am writing this book.

All the code and data for this chapter is available at https://github.com/PacktPublishing/The-Statistics-and-Machine-Learning-with-R-Workshop/tree/main/Chapter_4.

Introducing ggplot2

Conveying information via graphs tends to be more effective and visually appealing than tables alone. After all, humans are much quicker at processing visual information, such as recognizing a car in an image. In building machine learning (ML) models, we are often interested in the training and test loss profile in the form of a line chart that indicates the reduction in the training and test set loss as the model gets trained for a more extended period. Observing performance metrics helps us better diagnose whether a model is underfitting or overfitting—in other words, whether the current model is too simple or overly complex. Note that the test set is used to approximate a future dataset, and minimizing the test set error helps the model generalize to new datasets, an approach known as empirical risk minimization. Underfitting refers to the case when the model does poorly in both training and test sets due to insufficient fitting power, while overfitting...

Understanding the grammar of graphics

The previous example contained the three essential layers that need to be specified when plotting a graph: data, aesthetics, and geometries. The primary purpose of each layer is listed as follows:

The data layer specifies the dataset to be plotted. This corresponds to the mtcars dataset we specified earlier.
The aesthetics layer specifies the scale-related items that map the variables to the visual properties of the plot. Examples include the variables to be shown for the x axis and y axis, the size and color, and other plot aesthetics. This corresponds to the cyl and mpg variables we specified earlier.
The geometry layer specifies the visual elements used for the data, such as presenting the data via points, lines, or other forms. The geom_point() command we set in the previous example tells the plot to be shown as a scatter plot.

Other layers, such as the theme layer, also help beautify the plot, which we will cover later...

Geometries in graphics

The previous section mostly covered scatter plots. In this section, we will go over two additional common types of plots: bar charts and line plots. We will discuss different ways to construct these plots, focusing on the geometries that can be used to control layer-specific visual properties of the graph.

Understanding geometry in scatter plots

Let us revisit the scatter plot and zoom in on the geometry layer. The geometry layer determines how the plot actually looks, which is an essential layer in our visual communication. At the time of writing, there are over 50 geometries we can choose from, all of which start with the geom_ keyword.

Some overall guidelines apply when deciding which type of geometry to use. For example, the following list contains the possible kinds of applicable geometries for a typical scatter plot:

Point, which visualizes the data as points
Jitter, which adds positional jittering to a scatter plot
Abline, which...

Controlling themes in graphics

The theme layer specifies all non-data-related properties on the plot, such as the background, legend, axis labels, and so on. Proper control of the themes in the plot could aid visual communication by highlighting critical information and directing users’ attention to the intended message we would like to convey.

There are three types of visual elements controlled by the theme layer, as follows:

Text, used to specify the textual display (for example, color) of the axis label
Line, used to specify the visual properties of the axes such as color and line type
Rectangle, used to control the borders and backgrounds of the plot

All three types are specified using functions that start with element_, including examples such as element_text() and element_line(). We will go over these functions in the following section.

Adjusting themes

The theme layer can be easily applied as an additional layer on the existing graph. Let...

Summary

In this chapter, we introduced essential graphics techniques based on the ggplot2 package. We started by going over the basic scatter plot and learned the grammar of developing layers in a plot. To build, edit, and improve a plot, we need to specify three essential layers: data, aesthetics, and geometries. For example, the geom_point() function used to build a scatter plot allows us to control the size, shape, and color of the points on a graph. We can also display them as text in addition to presenting points using the geom_text() function.

We also covered the layer-specific control provided by the geometry layer and showed examples using bar charts and line plots. A bar chart can help represent the frequency distribution of categorical variables and the histogram of continuous variables. A line chart supports time series data and can help identify trends and patterns if appropriately plotted.

Finally, we also covered the theme layer, which allows us to control all non...

The rest of the chapter is locked

You have been reading a chapter from

The Statistics and Machine Learning with R Workshop

Published in: Oct 2023Publisher: PacktISBN-13: 9781803240305

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Liu Peng

Peng Liu is an Assistant Professor of Quantitative Finance (Practice) at Singapore Management University and an adjunct researcher at the National University of Singapore. He holds a Ph.D. in statistics from the National University of Singapore and has ten years of working experience as a data scientist across the banking, technology, and hospitality industries.
Read more about Liu Peng

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages