All Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Newsletters

Free Learning

You're reading from Learn Python by Building Data Science Applications

Product type Book

Published in Aug 2019

Publisher Packt

ISBN-13 9781789535365

Pages 482 pages

Edition 1st Edition

Languages

Python

Concepts

Application Development

Authors (2):

Philipp Kats

David Katz

View More author details

Table of Contents (26) Chapters

Preface

Section 1: Getting Started with Python

Preparing the Workspace

First Steps in Coding - Variables and Data Types

Functions

Data Structures

Loops and Other Compound Statements

First Script – Geocoding with Web APIs

Scraping Data from the Web with Beautiful Soup 4

Simulation with Classes and Inheritance

Shell, Git, Conda, and More – at Your Command

Section 2: Hands-On with Data

Python for Data Applications

Data Cleaning and Manipulation

Data Exploration and Visualization

Training a Machine Learning Model

Improving Your Model – Pipelines and Experiments

Section 3: Moving to Production

Packaging and Testing with Poetry and PyTest

Data Pipelines with Luigi

Let's Build a Dashboard

Serving Models with a RESTful API

Serverless API Using Chalice

Best Practices and Python Performance

Assessments

Other Books You May Enjoy

Leave a review - let other readers know what you think

Chapter 14

What is overfitting?

Many ML models (for example, decision trees) actively fit to perform well on the training set at hand, but at some point, this process goes beyond generalizable knowledge that's valuable for the task, with some parts being irrelevant to the test set. This is not only meaningless but will also affect the model's performance on other data. This phenomenon is known as overfitting, and there are ways to overcome it.

Why should we use cross-validation?

Cross-validation is a technique that's aimed at overcoming the issue of overfitting. In its basic form, it splits a training set into multiple folds, trains multiple models with the same settings on different combinations of those folds, and measures their performance on other folds—and then averages the performance across all models. As a result, this sampling and prediction on the...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime}

Authors (2)

Philipp Kats

Philipp Kats is a researcher at the Urban Complexity Lab, NYU CUSP, a research fellow at Kazan Federal University, and a data scientist at StreetEasy, with many years of experience in software development. His interests include data analysis, urban studies, data journalism, and visualization. Having a bachelor's degree in architectural design and a having followed the rocky path (at first) of being a self-taught developer, Philipp knows the pain points of learning programming and is eager to share his experience.

See other products by Philipp Kats

David Katz

David Katz is a researcher and holds a Ph.D. in mathematics. As a mathematician at heart, he sees code as a tool to express his questions. David believes that code literacy is essential as it applies to most disciplines and professions. David is passionate about sharing his knowledge and has 6 years of experience teaching college and high school students.

See other products by David Katz