Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Learn Python by Building Data Science Applications

You're reading from  Learn Python by Building Data Science Applications

Product type Book
Published in Aug 2019
Publisher Packt
ISBN-13 9781789535365
Pages 482 pages
Edition 1st Edition
Languages
Authors (2):
Philipp Kats Philipp Kats
Profile icon Philipp Kats
David Katz David Katz
Profile icon David Katz
View More author details

Table of Contents (26) Chapters

Preface Section 1: Getting Started with Python
Preparing the Workspace First Steps in Coding - Variables and Data Types Functions Data Structures Loops and Other Compound Statements First Script – Geocoding with Web APIs Scraping Data from the Web with Beautiful Soup 4 Simulation with Classes and Inheritance Shell, Git, Conda, and More – at Your Command Section 2: Hands-On with Data
Python for Data Applications Data Cleaning and Manipulation Data Exploration and Visualization Training a Machine Learning Model Improving Your Model – Pipelines and Experiments Section 3: Moving to Production
Packaging and Testing with Poetry and PyTest Data Pipelines with Luigi Let's Build a Dashboard Serving Models with a RESTful API Serverless API Using Chalice Best Practices and Python Performance Assessments Other Books You May Enjoy

Chapter 14

What is overfitting?

Many ML models (for example, decision trees) actively fit to perform well on the training set at hand, but at some point, this process goes beyond generalizable knowledge that's valuable for the task, with some parts being irrelevant to the test set. This is not only meaningless but will also affect the model's performance on other data. This phenomenon is known as overfitting, and there are ways to overcome it.

Why should we use cross-validation?

Cross-validation is a technique that's aimed at overcoming the issue of overfitting. In its basic form, it splits a training set into multiple folds, trains multiple models with the same settings on different combinations of those folds, and measures their performance on other folds—and then averages the performance across all models. As a result, this sampling and prediction on the...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}