Reader small image

You're reading from  Data Wrangling with R

Product typeBook
Published inFeb 2023
PublisherPackt
ISBN-139781803235400
Edition1st Edition
Concepts
Right arrow
Author (1)
Gustavo R Santos
Gustavo R Santos
author image
Gustavo R Santos

Gustavo R Santos has worked in the Technology Industry for 13 years, improving processes, and analyzing datasets and creating dashboards. Since 2020, he has been working as a Data Scientist in the retail industry, wrangling, analyzing, visualizing and modeling data with the most modern tools like R, Python and Databricks. Gustavo also gives lectures from time to time at an online school about Data Science concepts. He has a background in Marketing, is certified as Data Scientist by the Data Science Academy Brazil and pursues his specialist MBA in Data Science at the University of São Paulo
Read more about Gustavo R Santos

Right arrow

Summary

In this chapter, we created an end-to-end machine learning project. We started by studying some basic machine learning concepts to put us in sync. Then, we understood what was needed for the main goal of the project. First, we must understand the problem and know where we want to go so that the solution becomes clearer. In this case, our client was a digital marketing company that wanted to reduce the risk of their messages ending up in their spam filter, so we had to create a classification model to predict the probability of a message being marked as spam or not spam.

We loaded a dataset from UCI, which brought up some words and characters associated with spam messages and their percentage in the email. Then, we studied the data and created some visualizations to learn which elements were more likely to be classified as spam. Out of those, we created a new dataset with just six explanatory variables, reducing it from the original 57 columns.

Next, we trained and tested...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Data Wrangling with R
Published in: Feb 2023Publisher: PacktISBN-13: 9781803235400

Author (1)

author image
Gustavo R Santos

Gustavo R Santos has worked in the Technology Industry for 13 years, improving processes, and analyzing datasets and creating dashboards. Since 2020, he has been working as a Data Scientist in the retail industry, wrangling, analyzing, visualizing and modeling data with the most modern tools like R, Python and Databricks. Gustavo also gives lectures from time to time at an online school about Data Science concepts. He has a background in Marketing, is certified as Data Scientist by the Data Science Academy Brazil and pursues his specialist MBA in Data Science at the University of São Paulo
Read more about Gustavo R Santos