Reader small image

You're reading from  Data Wrangling with R

Product typeBook
Published inFeb 2023
PublisherPackt
ISBN-139781803235400
Edition1st Edition
Concepts
Right arrow
Author (1)
Gustavo R Santos
Gustavo R Santos
author image
Gustavo R Santos

Gustavo R Santos has worked in the Technology Industry for 13 years, improving processes, and analyzing datasets and creating dashboards. Since 2020, he has been working as a Data Scientist in the retail industry, wrangling, analyzing, visualizing and modeling data with the most modern tools like R, Python and Databricks. Gustavo also gives lectures from time to time at an online school about Data Science concepts. He has a background in Marketing, is certified as Data Scientist by the Data Science Academy Brazil and pursues his specialist MBA in Data Science at the University of São Paulo
Read more about Gustavo R Santos

Right arrow

Preparing data for modeling in R

We must wrangle the data to prepare it for modeling. Since we know where we want to go at the end of this project, the next step is a matter of finding a way to get there.

The first thing we must do is load the libraries to be used for wrangling and modeling the data. We will use tidyverse to perform data wrangling and visualization, skimr to create a descriptive statistics summary, patchwork, a great library to put graphics side by side, randomForest to create the model, caret to create the confusion matrix, and ROCR to plot the ROC curve of model performance.

To load the dataset, the best option is to pull it directly from the internet, without the need to save it locally on our machine. Just use the read_csv() function and point to the web address where the raw dataset is located, as we’ve done previously in this book. Here, we are using the trim_ws=TRUE argument to trim any unwanted white spaces and the col_names=headers argument, where...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Data Wrangling with R
Published in: Feb 2023Publisher: PacktISBN-13: 9781803235400

Author (1)

author image
Gustavo R Santos

Gustavo R Santos has worked in the Technology Industry for 13 years, improving processes, and analyzing datasets and creating dashboards. Since 2020, he has been working as a Data Scientist in the retail industry, wrangling, analyzing, visualizing and modeling data with the most modern tools like R, Python and Databricks. Gustavo also gives lectures from time to time at an online school about Data Science concepts. He has a background in Marketing, is certified as Data Scientist by the Data Science Academy Brazil and pursues his specialist MBA in Data Science at the University of São Paulo
Read more about Gustavo R Santos