Reader small image

You're reading from  Data Wrangling with R

Product typeBook
Published inFeb 2023
PublisherPackt
ISBN-139781803235400
Edition1st Edition
Concepts
Right arrow
Author (1)
Gustavo R Santos
Gustavo R Santos
author image
Gustavo R Santos

Gustavo R Santos has worked in the Technology Industry for 13 years, improving processes, and analyzing datasets and creating dashboards. Since 2020, he has been working as a Data Scientist in the retail industry, wrangling, analyzing, visualizing and modeling data with the most modern tools like R, Python and Databricks. Gustavo also gives lectures from time to time at an online school about Data Science concepts. He has a background in Marketing, is certified as Data Scientist by the Data Science Academy Brazil and pursues his specialist MBA in Data Science at the University of São Paulo
Read more about Gustavo R Santos

Right arrow

Exploring and visualizing the data

The exploration phase of an EDA is the main portion of it, naturally. In this section, the idea is to take a thorough look at the variables, understand their distributions, start creating some questions that will lead the exploration, and use the data to answer them.

Univariate analysis

The first step to take concerns univariate analysis—looking at one variable at a time. The best approach is to create some histograms to look at the distribution of the variables. According to Hair Jr. et al. (2019), plotting the variables’ distributions and looking at their shape is a good point to start understanding the nature of those variables.

In the next code snippet, we will loop through all the numeric variables and plot one histogram for each. It starts with a for loop to iterate over each variable in the column names that presents the numeric type (there is the importance of knowing the data types, from previous sections). If it is...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Data Wrangling with R
Published in: Feb 2023Publisher: PacktISBN-13: 9781803235400

Author (1)

author image
Gustavo R Santos

Gustavo R Santos has worked in the Technology Industry for 13 years, improving processes, and analyzing datasets and creating dashboards. Since 2020, he has been working as a Data Scientist in the retail industry, wrangling, analyzing, visualizing and modeling data with the most modern tools like R, Python and Databricks. Gustavo also gives lectures from time to time at an online school about Data Science concepts. He has a background in Marketing, is certified as Data Scientist by the Data Science Academy Brazil and pursues his specialist MBA in Data Science at the University of São Paulo
Read more about Gustavo R Santos