scikit-learn Cookbook - Second Edition

Learn to use scikit-learn operations and functions for Machine Learning and deep learning applications.
Preview in Mapt

scikit-learn Cookbook - Second Edition

Julian Avila, Trent Hauck

1 customer reviews
Learn to use scikit-learn operations and functions for Machine Learning and deep learning applications.

Quick links: > What will you learn?> Table of content> Product reviews

Mapt Subscription
FREE
$29.99/m after trial
eBook
$5.00
RRP $31.99
Save 84%
Print + eBook
$39.99
RRP $39.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$5.00
$39.99
$29.99 p/m after trial
RRP $31.99
RRP $39.99
Subscription
eBook
Print + eBook
Start 14 Day Trial

Frequently bought together


scikit-learn Cookbook - Second Edition Book Cover
scikit-learn Cookbook - Second Edition
$ 31.99
$ 5.00
Mastering Machine Learning with scikit-learn - Second Edition Book Cover
Mastering Machine Learning with scikit-learn - Second Edition
$ 35.99
$ 5.00
Buy 2 for $10.00
Save $57.98
Add to Cart

Book Details

ISBN 139781787286382
Paperback374 pages

Book Description

Python is quickly becoming the go-to language for analysts and data scientists due to its simplicity and flexibility, and within the Python data space, scikit-learn is the unequivocal choice for machine learning. This book includes walk throughs and solutions to the common as well as the not-so-common problems in machine learning, and how scikit-learn can be leveraged to perform various machine learning tasks effectively.

The second edition begins with taking you through recipes on evaluating the statistical properties of data and generates synthetic data for machine learning modelling. As you progress through the chapters, you will comes across recipes that will teach you to implement techniques like data pre-processing, linear regression, logistic regression, K-NN, Naïve Bayes, classification, decision trees, Ensembles and much more. Furthermore, you’ll learn to optimize your models with multi-class classification, cross validation, model evaluation and dive deeper in to implementing deep learning with scikit-learn. Along with covering the enhanced features on model section, API and new features like classifiers, regressors and estimators the book also contains recipes on evaluating and fine-tuning the performance of your model.

By the end of this book, you will have explored plethora of features offered by scikit-learn for Python to solve any machine learning problem you come across.

Table of Contents

Chapter 1: High-Performance Machine Learning – NumPy
Introduction
NumPy basics
Loading the iris dataset
Viewing the iris dataset
Viewing the iris dataset with Pandas
Plotting with NumPy and matplotlib
A minimal machine learning recipe – SVM classification
Introducing cross-validation
Putting it all together
Machine learning overview – classification versus regression
Chapter 2: Pre-Model Workflow and Pre-Processing
Introduction
Creating sample data for toy analysis
Scaling data to the standard normal distribution
Creating binary features through thresholding
Working with categorical variables
Imputing missing values through various strategies
A linear model in the presence of outliers
Putting it all together with pipelines
Using Gaussian processes for regression
Using SGD for regression
Chapter 3: Dimensionality Reduction
Introduction
Reducing dimensionality with PCA
Using factor analysis for decomposition
Using kernel PCA for nonlinear dimensionality reduction
Using truncated SVD to reduce dimensionality
Using decomposition to classify with DictionaryLearning
Doing dimensionality reduction with manifolds – t-SNE
Testing methods to reduce dimensionality with pipelines
Chapter 4: Linear Models with scikit-learn
Introduction
Fitting a line through data
Fitting a line through data with machine learning
Evaluating the linear regression model
Using ridge regression to overcome linear regression's shortfalls
Optimizing the ridge regression parameter
Using sparsity to regularize models
Taking a more fundamental approach to regularization with LARS
References
Chapter 5: Linear Models – Logistic Regression
Introduction
Loading data from the UCI repository
Viewing the Pima Indians diabetes dataset with pandas
Looking at the UCI Pima Indians dataset web page
Machine learning with logistic regression
Examining logistic regression errors with a confusion matrix
Varying the classification threshold in logistic regression
Receiver operating characteristic – ROC analysis
Plotting an ROC curve without context
Putting it all together – UCI breast cancer dataset
Chapter 6: Building Models with Distance Metrics
Introduction
Using k-means to cluster data
Optimizing the number of centroids
Assessing cluster correctness
Using MiniBatch k-means to handle more data
Quantizing an image with k-means clustering
Finding the closest object in the feature space
Probabilistic clustering with Gaussian mixture models
Using k-means for outlier detection
Using KNN for regression
Chapter 7: Cross-Validation and Post-Model Workflow
Introduction
Selecting a model with cross-validation
K-fold cross validation
Balanced cross-validation
Cross-validation with ShuffleSplit
Time series cross-validation
Grid search with scikit-learn
Randomized search with scikit-learn
Classification metrics
Regression metrics
Clustering metrics
Using dummy estimators to compare results
Feature selection
Feature selection on L1 norms
Persisting models with joblib or pickle
Chapter 8: Support Vector Machines
Introduction
Classifying data with a linear SVM
Optimizing an SVM
Multiclass classification with SVM
Support vector regression
Chapter 9: Tree Algorithms and Ensembles
Introduction
Doing basic classifications with decision trees
Visualizing a decision tree with pydot
Tuning a decision tree
Using decision trees for regression
Reducing overfitting with cross-validation
Implementing random forest regression
 Bagging regression with nearest neighbors
Tuning gradient boosting trees
Tuning an AdaBoost regressor
Writing a stacking aggregator with scikit-learn
Chapter 10: Text and Multiclass Classification with scikit-learn
Using LDA for classification
Working with QDA – a nonlinear LDA
Using SGD for classification
Classifying documents with Naive Bayes
Label propagation with semi-supervised learning
Chapter 11: Neural Networks
Introduction
Perceptron classifier
Neural network – multilayer perceptron
Stacking with a neural network
Chapter 12: Create a Simple Estimator
Introduction
Create a simple estimator

What You Will Learn

  • Build predictive models in minutes by using scikit-learn
  • Understand the differences and relationships between Classification and Regression, two types of Supervised Learning.
  • Use distance metrics to predict in Clustering, a type of Unsupervised Learning
  • Find points with similar characteristics with Nearest Neighbors.
  • Use automation and cross-validation to find a best model and focus on it for a data product
  • Choose among the best algorithm of many or use them together in an ensemble.
  • Create your own estimator with the simple syntax of sklearn
  • Explore the feed-forward neural networks available in scikit-learn

Authors

Table of Contents

Chapter 1: High-Performance Machine Learning – NumPy
Introduction
NumPy basics
Loading the iris dataset
Viewing the iris dataset
Viewing the iris dataset with Pandas
Plotting with NumPy and matplotlib
A minimal machine learning recipe – SVM classification
Introducing cross-validation
Putting it all together
Machine learning overview – classification versus regression
Chapter 2: Pre-Model Workflow and Pre-Processing
Introduction
Creating sample data for toy analysis
Scaling data to the standard normal distribution
Creating binary features through thresholding
Working with categorical variables
Imputing missing values through various strategies
A linear model in the presence of outliers
Putting it all together with pipelines
Using Gaussian processes for regression
Using SGD for regression
Chapter 3: Dimensionality Reduction
Introduction
Reducing dimensionality with PCA
Using factor analysis for decomposition
Using kernel PCA for nonlinear dimensionality reduction
Using truncated SVD to reduce dimensionality
Using decomposition to classify with DictionaryLearning
Doing dimensionality reduction with manifolds – t-SNE
Testing methods to reduce dimensionality with pipelines
Chapter 4: Linear Models with scikit-learn
Introduction
Fitting a line through data
Fitting a line through data with machine learning
Evaluating the linear regression model
Using ridge regression to overcome linear regression's shortfalls
Optimizing the ridge regression parameter
Using sparsity to regularize models
Taking a more fundamental approach to regularization with LARS
References
Chapter 5: Linear Models – Logistic Regression
Introduction
Loading data from the UCI repository
Viewing the Pima Indians diabetes dataset with pandas
Looking at the UCI Pima Indians dataset web page
Machine learning with logistic regression
Examining logistic regression errors with a confusion matrix
Varying the classification threshold in logistic regression
Receiver operating characteristic – ROC analysis
Plotting an ROC curve without context
Putting it all together – UCI breast cancer dataset
Chapter 6: Building Models with Distance Metrics
Introduction
Using k-means to cluster data
Optimizing the number of centroids
Assessing cluster correctness
Using MiniBatch k-means to handle more data
Quantizing an image with k-means clustering
Finding the closest object in the feature space
Probabilistic clustering with Gaussian mixture models
Using k-means for outlier detection
Using KNN for regression
Chapter 7: Cross-Validation and Post-Model Workflow
Introduction
Selecting a model with cross-validation
K-fold cross validation
Balanced cross-validation
Cross-validation with ShuffleSplit
Time series cross-validation
Grid search with scikit-learn
Randomized search with scikit-learn
Classification metrics
Regression metrics
Clustering metrics
Using dummy estimators to compare results
Feature selection
Feature selection on L1 norms
Persisting models with joblib or pickle
Chapter 8: Support Vector Machines
Introduction
Classifying data with a linear SVM
Optimizing an SVM
Multiclass classification with SVM
Support vector regression
Chapter 9: Tree Algorithms and Ensembles
Introduction
Doing basic classifications with decision trees
Visualizing a decision tree with pydot
Tuning a decision tree
Using decision trees for regression
Reducing overfitting with cross-validation
Implementing random forest regression
 Bagging regression with nearest neighbors
Tuning gradient boosting trees
Tuning an AdaBoost regressor
Writing a stacking aggregator with scikit-learn
Chapter 10: Text and Multiclass Classification with scikit-learn
Using LDA for classification
Working with QDA – a nonlinear LDA
Using SGD for classification
Classifying documents with Naive Bayes
Label propagation with semi-supervised learning
Chapter 11: Neural Networks
Introduction
Perceptron classifier
Neural network – multilayer perceptron
Stacking with a neural network
Chapter 12: Create a Simple Estimator
Introduction
Create a simple estimator

Book Details

ISBN 139781787286382
Paperback374 pages
Read More
From 1 reviews

Read More Reviews

Recommended for You

Mastering Machine Learning with scikit-learn - Second Edition Book Cover
Mastering Machine Learning with scikit-learn - Second Edition
$ 35.99
$ 5.00
Advanced Predictive Techniques with Scikit-Learn and TensorFlow [Video] Book Cover
Advanced Predictive Techniques with Scikit-Learn and TensorFlow [Video]
$ 124.99
$ 5.00
Ceph Cookbook - Second Edition Book Cover
Ceph Cookbook - Second Edition
$ 35.99
$ 5.00
scikit-learn : Machine Learning Simplified Book Cover
scikit-learn : Machine Learning Simplified
$ 79.99
$ 5.00
Learning Angular - Second Edition Book Cover
Learning Angular - Second Edition
$ 35.99
$ 5.00
Mastering Microservices with Java 9 - Second Edition Book Cover
Mastering Microservices with Java 9 - Second Edition
$ 35.99
$ 5.00