Reader small image

You're reading from  The Applied Data Science Workshop - Second Edition

Product typeBook
Published inJul 2020
Reading LevelIntermediate
PublisherPackt
ISBN-139781800202504
Edition2nd Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Alex Galea
Alex Galea
author image
Alex Galea

Alex Galea has been professionally practicing data analytics since graduating with a masters degree in physics from the University of Guelph, Canada. He developed a keen interest in Python while researching quantum gases as part of his graduate studies. Alex is currently doing web data analytics, where Python continues to play a key role in his work. He is a frequent blogger about data-centric projects that involve Python and Jupyter Notebooks.
Read more about Alex Galea

Right arrow

Introduction

In the previous chapters, we walked through the steps that we need to take in a data science project before we can train a machine learning model. This included the planning phase, that is, identifying business problems, assessing data sources for suitability, and deciding on modeling approaches.

Having decided on a general modeling approach, we should be careful to avoid the common pitfalls of training ML models as we proceed with modeling. Firstly, remember that training data is very important. In fact, increasing the amount of training data can have a larger impact than model selection on scoring performance. One issue is that there may not be enough data available, which could make patterns difficult to find and cause models to perform poorly on testing data. Data quality also has a huge effect on model performance. Some possible issues include the following:

  • Non-representative training data (sampling bias)
  • Errors in the record sets (such as recorded...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
The Applied Data Science Workshop - Second Edition
Published in: Jul 2020Publisher: PacktISBN-13: 9781800202504

Author (1)

author image
Alex Galea

Alex Galea has been professionally practicing data analytics since graduating with a masters degree in physics from the University of Guelph, Canada. He developed a keen interest in Python while researching quantum gases as part of his graduate studies. Alex is currently doing web data analytics, where Python continues to play a key role in his work. He is a frequent blogger about data-centric projects that involve Python and Jupyter Notebooks.
Read more about Alex Galea