Reader small image

You're reading from  The Applied Data Science Workshop - Second Edition

Product typeBook
Published inJul 2020
Reading LevelIntermediate
PublisherPackt
ISBN-139781800202504
Edition2nd Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Alex Galea
Alex Galea
author image
Alex Galea

Alex Galea has been professionally practicing data analytics since graduating with a masters degree in physics from the University of Guelph, Canada. He developed a keen interest in Python while researching quantum gases as part of his graduate studies. Alex is currently doing web data analytics, where Python continues to play a key role in his work. He is a frequent blogger about data-centric projects that involve Python and Jupyter Notebooks.
Read more about Alex Galea

Right arrow

Assessing Models with k-Fold Cross Validation

Thus far, we have trained models on a subset of the data and then assessed performance on the unseen portion, called the test set. This is good practice because the model's performance on data that's used for training is not a good indicator of its effectiveness as a predictor. It's very easy to increase accuracy on a training dataset by overfitting a model, which results in a poorer performance on unseen data.

That being said, simply training models on data that's been split in this way is not good enough. There is a natural variance in data that causes accuracies to be different (if even slightly), depending on the training and test splits. Furthermore, using only one training/test split to compare models can introduce bias toward certain models and lead to overfitting.

k-Fold cross validation offers a solution to this problem and allows the variance to be accounted for by way of an error estimate on each accuracy...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
The Applied Data Science Workshop - Second Edition
Published in: Jul 2020Publisher: PacktISBN-13: 9781800202504

Author (1)

author image
Alex Galea

Alex Galea has been professionally practicing data analytics since graduating with a masters degree in physics from the University of Guelph, Canada. He developed a keen interest in Python while researching quantum gases as part of his graduate studies. Alex is currently doing web data analytics, where Python continues to play a key role in his work. He is a frequent blogger about data-centric projects that involve Python and Jupyter Notebooks.
Read more about Alex Galea