Reader small image

You're reading from  Data Wrangling with R

Product typeBook
Published inFeb 2023
PublisherPackt
ISBN-139781803235400
Edition1st Edition
Concepts
Right arrow
Author (1)
Gustavo R Santos
Gustavo R Santos
author image
Gustavo R Santos

Gustavo R Santos has worked in the Technology Industry for 13 years, improving processes, and analyzing datasets and creating dashboards. Since 2020, he has been working as a Data Scientist in the retail industry, wrangling, analyzing, visualizing and modeling data with the most modern tools like R, Python and Databricks. Gustavo also gives lectures from time to time at an online school about Data Science concepts. He has a background in Marketing, is certified as Data Scientist by the Data Science Academy Brazil and pursues his specialist MBA in Data Science at the University of São Paulo
Read more about Gustavo R Santos

Right arrow

Summary

In this chapter, we went over an EDA project, beginning with the load of the data to RStudio up to an analysis report.

After loading the data, we started to understand the shape of the dataset and the data types, and we did a transformation of some variables to factor. Moving on, we cleaned the data of missing values and started the exploration and visualization part. This began with a checkup of the descriptive statistics, then we looked at the distributions of the data and outlier detection. The sequence was to look at a bivariate chart and a pair plot that shows the correlations and scatterplots, allowing one to understand the relationship between the variables and start to get a feel of the best ones for modeling.

Next, we started to ask questions to lead our exploration, always answering them with data and statistical tests. Finally, closing the chapter, we presented an analysis report example, highlighting the findings in text form.

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Data Wrangling with R
Published in: Feb 2023Publisher: PacktISBN-13: 9781803235400

Author (1)

author image
Gustavo R Santos

Gustavo R Santos has worked in the Technology Industry for 13 years, improving processes, and analyzing datasets and creating dashboards. Since 2020, he has been working as a Data Scientist in the retail industry, wrangling, analyzing, visualizing and modeling data with the most modern tools like R, Python and Databricks. Gustavo also gives lectures from time to time at an online school about Data Science concepts. He has a background in Marketing, is certified as Data Scientist by the Data Science Academy Brazil and pursues his specialist MBA in Data Science at the University of São Paulo
Read more about Gustavo R Santos