Reader small image

You're reading from  The Applied Data Science Workshop - Second Edition

Product typeBook
Published inJul 2020
Reading LevelIntermediate
PublisherPackt
ISBN-139781800202504
Edition2nd Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Alex Galea
Alex Galea
author image
Alex Galea

Alex Galea has been professionally practicing data analytics since graduating with a masters degree in physics from the University of Guelph, Canada. He developed a keen interest in Python while researching quantum gases as part of his graduate studies. Alex is currently doing web data analytics, where Python continues to play a key role in his work. He is a frequent blogger about data-centric projects that involve Python and Jupyter Notebooks.
Read more about Alex Galea

Right arrow

Introduction

As we've seen in the previous chapters, it's easy to train models with scikit-learn using just a few lines of Python code. This is possible by abstracting away the computational complexity of the algorithm, including details such as constructing cost functions and optimizing model parameters. In other words, we deal with a black box where the internal operations are hidden from us.

While the simplicity offered by this approach is quite nice on the surface, it does nothing to prevent the misuse of algorithms—for example, by selecting the wrong model for a dataset, overfitting on the training set, or failing to test properly on unseen data.

In this chapter, we'll show you how to avoid some of these pitfalls while training classification models and equip you with the tools to produce trustworthy results. We'll introduce k-fold cross validation and validation curves, and then look at ways to use them in Jupyter.

We'll also introduce...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
The Applied Data Science Workshop - Second Edition
Published in: Jul 2020Publisher: PacktISBN-13: 9781800202504

Author (1)

author image
Alex Galea

Alex Galea has been professionally practicing data analytics since graduating with a masters degree in physics from the University of Guelph, Canada. He developed a keen interest in Python while researching quantum gases as part of his graduate studies. Alex is currently doing web data analytics, where Python continues to play a key role in his work. He is a frequent blogger about data-centric projects that involve Python and Jupyter Notebooks.
Read more about Alex Galea