Reader small image

You're reading from  Data Wrangling with R

Product typeBook
Published inFeb 2023
PublisherPackt
ISBN-139781803235400
Edition1st Edition
Concepts
Right arrow
Author (1)
Gustavo R Santos
Gustavo R Santos
author image
Gustavo R Santos

Gustavo R Santos has worked in the Technology Industry for 13 years, improving processes, and analyzing datasets and creating dashboards. Since 2020, he has been working as a Data Scientist in the retail industry, wrangling, analyzing, visualizing and modeling data with the most modern tools like R, Python and Databricks. Gustavo also gives lectures from time to time at an online school about Data Science concepts. He has a background in Marketing, is certified as Data Scientist by the Data Science Academy Brazil and pursues his specialist MBA in Data Science at the University of São Paulo
Read more about Gustavo R Santos

Right arrow

Machine learning concepts

Before we move on to the project itself, let’s just build a background about machine learning concepts. This content is not the main scope of this book; therefore, we will quickly go over a couple of definitions to put us on the same page for the remainder of this book.

A model is a representation of a theory (HAIR Jr. et al, 2019) but is also defined as a simplification or approximation of reality (Burnham & Anderson, 2002). In other words, modeling data involves finding patterns that can help us explain a response, which is the most probable outcome from that observation.

With that said, the model will just reflect the data that it received. For that reason, it is crucial that the input data is clean and representative of the reality we are trying to model. To exemplify this, think about when we see a dataset with too many missing values that are going to be either removed or inputted. Both approaches will certainly have an impact on the...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Data Wrangling with R
Published in: Feb 2023Publisher: PacktISBN-13: 9781803235400

Author (1)

author image
Gustavo R Santos

Gustavo R Santos has worked in the Technology Industry for 13 years, improving processes, and analyzing datasets and creating dashboards. Since 2020, he has been working as a Data Scientist in the retail industry, wrangling, analyzing, visualizing and modeling data with the most modern tools like R, Python and Databricks. Gustavo also gives lectures from time to time at an online school about Data Science concepts. He has a background in Marketing, is certified as Data Scientist by the Data Science Academy Brazil and pursues his specialist MBA in Data Science at the University of São Paulo
Read more about Gustavo R Santos