Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
scikit-learn Cookbook - Second Edition

You're reading from  scikit-learn Cookbook - Second Edition

Product type Book
Published in Nov 2017
Publisher Packt
ISBN-13 9781787286382
Pages 374 pages
Edition 2nd Edition
Languages
Author (1):
Trent Hauck Trent Hauck
Profile icon Trent Hauck

Table of Contents (13) Chapters

Preface 1. High-Performance Machine Learning – NumPy 2. Pre-Model Workflow and Pre-Processing 3. Dimensionality Reduction 4. Linear Models with scikit-learn 5. Linear Models – Logistic Regression 6. Building Models with Distance Metrics 7. Cross-Validation and Post-Model Workflow 8. Support Vector Machines 9. Tree Algorithms and Ensembles 10. Text and Multiclass Classification with scikit-learn 11. Neural Networks 12. Create a Simple Estimator

Balanced cross-validation

While splitting the different folds in various datasets, you might wonder: couldn't the different sets in each fold of k-fold cross-validation be very different? The distributions could be very different in each fold, and these differences could lead to volatility in the scores.

There is a solution for this, using stratified cross-validation. The subsets of the dataset will look like smaller versions of the whole dataset (at least in the target variable).

Getting ready

Create a toy dataset as follows:

import numpy as np
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8],[1, 2], [3, 4], [5, 6], [7, 8]])
y = np.array([1, 1, 1, 1, 2, 2, 2, 2])
...
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}