Reader small image

You're reading from  The Statistics and Machine Learning with R Workshop

Product typeBook
Published inOct 2023
Reading LevelIntermediate
PublisherPackt
ISBN-139781803240305
Edition1st Edition
Languages
Right arrow
Author (1)
Liu Peng
Liu Peng
author image
Liu Peng

Peng Liu is an Assistant Professor of Quantitative Finance (Practice) at Singapore Management University and an adjunct researcher at the National University of Singapore. He holds a Ph.D. in statistics from the National University of Singapore and has ten years of working experience as a data scientist across the banking, technology, and hospitality industries.
Read more about Liu Peng

Right arrow

EDA fundamentals

When facing a new dataset in the form of a table (a DataFrame) in Excel or a dataset, EDA helps us gain insight into the underlying pattern and irregularities of variables in the dataset. This is an important first-step exercise before building any predictive model. As the saying goes, garbage in, garbage out. When the input variables used for model development suffer from problems, such as missing values or different scales, the resulting model will either perform poorly, converge slowly, or even hit an error in the training stage. Therefore, understanding your data and ensuring the raw materials are in check are critical steps in warrantying a good-performing model later on.

This is where EAD comes in. Instead of being a rigid statistical procedure, EAD is a set of exploratory analyses that enables you to develop a better understanding of the features and potential relationships in the data. It serves as a transitional analysis to guide modeling later on, involving...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
The Statistics and Machine Learning with R Workshop
Published in: Oct 2023Publisher: PacktISBN-13: 9781803240305

Author (1)

author image
Liu Peng

Peng Liu is an Assistant Professor of Quantitative Finance (Practice) at Singapore Management University and an adjunct researcher at the National University of Singapore. He holds a Ph.D. in statistics from the National University of Singapore and has ten years of working experience as a data scientist across the banking, technology, and hospitality industries.
Read more about Liu Peng