Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Data Wrangling with R

You're reading from  Data Wrangling with R

Product type Book
Published in Feb 2023
Publisher Packt
ISBN-13 9781803235400
Pages 384 pages
Edition 1st Edition
Languages
Concepts
Author (1):
Gustavo R Santos Gustavo R Santos
Profile icon Gustavo R Santos

Table of Contents (21) Chapters

Preface Part 1: Load and Explore Data
Chapter 1: Fundamentals of Data Wrangling Chapter 2: Loading and Exploring Datasets Chapter 3: Basic Data Visualization Part 2: Data Wrangling
Chapter 4: Working with Strings Chapter 5: Working with Numbers Chapter 6: Working with Date and Time Objects Chapter 7: Transformations with Base R Chapter 8: Transformations with Tidyverse Libraries Chapter 9: Exploratory Data Analysis Part 3: Data Visualization
Chapter 10: Introduction to ggplot2 Chapter 11: Enhanced Visualizations with ggplot2 Chapter 12: Other Data Visualization Options Part 4: Modeling
Chapter 13: Building a Model with R Chapter 14: Build an Application with Shiny in R Conclusion Other Books You May Enjoy

What is data wrangling?

Data wrangling is the process of modifying, cleaning, organizing, and transforming data from one given state to another, with the objective of making it more appropriate for use in analytics and data science.

This concept is also referred to as data munging, and both words are related to the act of changing, manipulating, transforming, and incrementing your dataset.

I bet you’ve already performed data wrangling. It is a common task for all of us. Since our primary school years, we have been taught how to create a table and make counts to organize people’s opinions in a dataset. If you are familiar with MS Excel or similar tools, remember all the times you have sorted, filtered, or added columns to a table, not to mention all of those lookups that you may have performed. All of that is part of the data-wrangling process. Every task performed to somehow improve the data and make it more suitable for analysis can be considered data wrangling.

As a data scientist, you will constantly be provided with different kinds of data, with the mission of transforming the dataset into insights that will, consequentially, form the basis for business decisions. Unlike a few years ago, when the majority of data was presented in a structured form such as text or tables, nowadays, data can come in many other forms, including unstructured formats such as video, audio, or even a combination of those. Thus, it becomes clear that most of the time, data will not be presented ready to work and will require some effort to get it in a ready state, sometimes more than others.

Figure 1.1 – Data before and after wrangling

Figure 1.1 – Data before and after wrangling

Figure 1.1 is a visual representation of data wrangling. We see on the left-hand side three kinds of data points combined, and after sorting and tabulating, the data is clearer to be analyzed.

A wrangled dataset is easier to understand and to work with, creating the path to better analysis and modeling, as we shall see in the next section when we will learn why data wrangling is important to a data science project.

You have been reading a chapter from
Data Wrangling with R
Published in: Feb 2023 Publisher: Packt ISBN-13: 9781803235400
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}