Reader small image

You're reading from  Hands-On Predictive Analytics with Python

Product typeBook
Published inDec 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781789138719
Edition1st Edition
Languages
Right arrow
Author (1)
Alvaro Fuentes
Alvaro Fuentes
author image
Alvaro Fuentes

Alvaro Fuentes is a senior data scientist with a background in applied mathematics and economics. He has more than 14 years of experience in various analytical roles and is an analytics consultant at one of the ‘Big Three' global management consulting firms, leading advanced analytics projects in different industries like banking, technology, and consumer goods. Alvaro is also an author and trainer in analytics and data science and has published courses and books, such as 'Become a Python Data Analyst' and 'Hands-On Predictive Analytics with Python'. He has also taught data science and related topics to thousands of students both on-site and online through different platforms such as Springboard, Simplilearn, Udemy, and BSG Institute, among others.
Read more about Alvaro Fuentes

Right arrow

Training versus testing error

The point of splitting the dataset into training and testing sets was to simulate the situation of using the model to make predictions on data the model has not seen. As we said before, the whole point is to generalize what we have learned from the observed data. The training MSE (or any metric calculated on the training dataset) may give us a biased view of the performance of our model, especially because of the possibility of overfitting. The metrics of performance we get from the training dataset will tend to be too optimistic. Let's take a look again at our illustration of overfitting:

If we calculate the training MSE for these three cases, we will definitely get the lowest one (hence the best) for the third model, the polynomial with 16 degrees; as we see, the model touches many points, making the error for those points exactly 0. However...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Hands-On Predictive Analytics with Python
Published in: Dec 2018Publisher: PacktISBN-13: 9781789138719

Author (1)

author image
Alvaro Fuentes

Alvaro Fuentes is a senior data scientist with a background in applied mathematics and economics. He has more than 14 years of experience in various analytical roles and is an analytics consultant at one of the ‘Big Three' global management consulting firms, leading advanced analytics projects in different industries like banking, technology, and consumer goods. Alvaro is also an author and trainer in analytics and data science and has published courses and books, such as 'Become a Python Data Analyst' and 'Hands-On Predictive Analytics with Python'. He has also taught data science and related topics to thousands of students both on-site and online through different platforms such as Springboard, Simplilearn, Udemy, and BSG Institute, among others.
Read more about Alvaro Fuentes