scikit-learn : Machine Learning Simplified

Implement scikit-learn into every step of the data science pipeline
Preview in Mapt

scikit-learn : Machine Learning Simplified

Raúl Garreta et al.

Implement scikit-learn into every step of the data science pipeline
Mapt Subscription
FREE
$29.99/m after trial
eBook
$10.00
RRP $79.99
Save 87%
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$10.00
$29.99 p/m after trial
RRP $79.99
Subscription
eBook
Start 30 Day Trial

Frequently bought together


scikit-learn : Machine Learning Simplified Book Cover
scikit-learn : Machine Learning Simplified
$ 79.99
$ 10.00
Machine Learning with scikit-learn and Tensorflow [Video] Book Cover
Machine Learning with scikit-learn and Tensorflow [Video]
$ 124.99
$ 10.00
Buy 2 for $20.00
Save $184.98
Add to Cart

Book Details

ISBN 139781788833479
Paperback1035 pages

Book Description

Machine learning, the art of creating applications that learn from experience and data, has been around for many years. Python is quickly becoming the go-to language for analysts and data scientists due to its simplicity and flexibility; moreover, within the Python data space, scikit-learn is the unequivocal choice for machine learning. The course combines an introduction to some of the main concepts and methods in machine learning with practical, hands-on examples of real-world problems. The course starts by walking through different methods to prepare your data—be it a dataset with missing values or text columns that require the categories to be turned into indicator variables. After the data is ready, you'll learn different techniques aligned with different objectives—be it a dataset with known outcomes such as sales by state, or more complicated problems such as clustering similar customers. Finally, you'll learn how to polish your algorithm to ensure that it's both accurate and resilient to new datasets. You will learn to incorporate machine learning in your applications. Ranging from handwritten digit recognition to document classification, examples are solved step-by-step using scikit-learn and Python. By the end of this course you will have learned how to build applications that learn from experience, by applying the main concepts and techniques of machine learning.

Table of Contents

Chapter 1: Machine Learning – A Gentle Introduction
Installing scikit-learn
Our first machine learning method –linear classification
Evaluating our results
Machine learning categories
Important concepts related to machine learning
Summary
Chapter 2: Supervised Learning
Image recognition with Support Vector Machines
Text classification with Naïve Bayes
Explaining Titanic hypothesis with decision trees
Predicting house prices with regression
Summary
Chapter 3: Unsupervised Learning
Principal Component Analysis
Clustering handwritten digits with k-means
Alternative clustering methods
Summary
Chapter 4: Advanced Features
Feature extraction
Feature selection
Model selection
Grid search
Parallel grid search
Summary
Chapter 5: Premodel Workflow
Introduction
Getting sample data from external sources
Creating sample data for toy analysis
Scaling data to the standard normal
Creating binary features through thresholding
Working with categorical variables
Binarizing label features
Imputing missing values through various strategies
Using Pipelines for multiple preprocessing steps
Reducing dimensionality with PCA
Using factor analysis for decomposition
Kernel PCA for nonlinear dimensionality reduction
Using truncated SVD to reduce dimensionality
Decomposition to classify with DictionaryLearning
Putting it all together with Pipelines
Using Gaussian processes for regression
Defining the Gaussian process object directly
Using stochastic gradient descent for regression
Chapter 6: Working with Linear Models
Introduction
Fitting a line through data
Evaluating the linear regression model
Using ridge regression to overcome linear regression's shortfalls
Optimizing the ridge regression parameter
Using sparsity to regularize models
Taking a more fundamental approach to regularization with LARS
Using linear methods for classification – logistic regression
Directly applying Bayesian ridge regression
Using boosting to learn from errors
Chapter 7: Building Models with Distance Metrics
Introduction
Using KMeans to cluster data
Optimizing the number of centroids
Assessing cluster correctness
Using MiniBatch KMeans to handle more data
Quantizing an image with KMeans clustering
Finding the closest objects in the feature space
Probabilistic clustering with Gaussian Mixture Models
Using KMeans for outlier detection
Using k-NN for regression
Chapter 8: Classifying Data with scikit-learn
Introduction
Doing basic classifications with Decision Trees
Tuning a Decision Tree model
Using many Decision Trees – random forests
Tuning a random forest model
Classifying data with support vector machines
Generalizing with multiclass classification
Using LDA for classification
Working with QDA – a nonlinear LDA
Using Stochastic Gradient Descent for classification
Classifying documents with Naïve Bayes
Label propagation with semi-supervised learning
Chapter 9: Postmodel Workflow
Introduction
K-fold cross validation
Automatic cross validation
Cross validation with ShuffleSplit
Stratified k-fold
Poor man's grid search
Brute force grid search
Using dummy estimators to compare results
Regression model evaluation
Feature selection
Feature selection on L1 norms
Persisting models with joblib
Chapter 10: The Fundamentals of Machine Learning
Learning from experience
Machine learning tasks
Training data and test data
Performance measures, bias, and variance
An introduction to scikit-learn
Installing scikit-learn
Installing pandas and matplotlib
Summary
Chapter 11: Linear Regression
Simple linear regression
Evaluating the model
Multiple linear regression
Polynomial regression
Regularization
Applying linear regression
Fitting models with gradient descent
Summary
Chapter 12: Feature Extraction and Preprocessing
Extracting features from categorical variables
Extracting features from text
Extracting features from images
Data standardization
Summary
Chapter 13: From Linear Regression to Logistic Regression
Binary classification with logistic regression
Spam filtering
Binary classification performance metrics
Calculating the F1 measure
ROC AUC
Tuning models with grid search
Multi-class classification
Multi-label classification and problem transformation
Summary
Chapter 14: Nonlinear Classification and Regression with Decision Trees
Decision trees
Training decision trees
Decision trees with scikit-learn
Summary
Chapter 15: Clustering with K-Means
Clustering with the K-Means algorithm
Evaluating clusters
Image quantization
Clustering to learn features
Summary
Chapter 16: Dimensionality Reduction with PCA
An overview of PCA
Performing Principal Component Analysis
Using PCA to visualize high-dimensional data
Face recognition with PCA
Summary
Chapter 17: The Perceptron
Activation functions
Binary classification with the perceptron
Limitations of the perceptron
Summary
Chapter 18: From the Perceptron to Support Vector Machines
Kernels and the kernel trick
Maximum margin classification and support vectors
Classifying characters in scikit-learn
Summary
Chapter 19: From the Perceptron to Artificial Neural Networks
Nonlinear decision boundaries
Feedforward and feedback artificial neural networks
Approximating XOR with Multilayer perceptrons
Classifying handwritten digits
Summary

What You Will Learn

  • Review fundamental concepts including supervised and unsupervised experiences, common tasks, and performance metrics
  • Classify objects (from documents to human faces and flower species) based on some of their features, using a variety of methods from Support Vector Machines to Naïve Bayes
  • Use Decision Trees to explain the main causes of certain phenomena such as passenger survival on the Titanic
  • Evaluate the performance of machine learning systems in common tasks
  • Master algorithms of various levels of complexity and learn how to analyze data at the same time
  • Learn just enough math to think about the connections between various algorithms
  • Customize machine learning algorithms to fit your problem, and learn how to modify them when the situation calls for it
  • Incorporate other packages from the Python ecosystem to munge and visualize your dataset
  • Improve the way you build your models using parallelization techniques

Authors

Table of Contents

Chapter 1: Machine Learning – A Gentle Introduction
Installing scikit-learn
Our first machine learning method –linear classification
Evaluating our results
Machine learning categories
Important concepts related to machine learning
Summary
Chapter 2: Supervised Learning
Image recognition with Support Vector Machines
Text classification with Naïve Bayes
Explaining Titanic hypothesis with decision trees
Predicting house prices with regression
Summary
Chapter 3: Unsupervised Learning
Principal Component Analysis
Clustering handwritten digits with k-means
Alternative clustering methods
Summary
Chapter 4: Advanced Features
Feature extraction
Feature selection
Model selection
Grid search
Parallel grid search
Summary
Chapter 5: Premodel Workflow
Introduction
Getting sample data from external sources
Creating sample data for toy analysis
Scaling data to the standard normal
Creating binary features through thresholding
Working with categorical variables
Binarizing label features
Imputing missing values through various strategies
Using Pipelines for multiple preprocessing steps
Reducing dimensionality with PCA
Using factor analysis for decomposition
Kernel PCA for nonlinear dimensionality reduction
Using truncated SVD to reduce dimensionality
Decomposition to classify with DictionaryLearning
Putting it all together with Pipelines
Using Gaussian processes for regression
Defining the Gaussian process object directly
Using stochastic gradient descent for regression
Chapter 6: Working with Linear Models
Introduction
Fitting a line through data
Evaluating the linear regression model
Using ridge regression to overcome linear regression's shortfalls
Optimizing the ridge regression parameter
Using sparsity to regularize models
Taking a more fundamental approach to regularization with LARS
Using linear methods for classification – logistic regression
Directly applying Bayesian ridge regression
Using boosting to learn from errors
Chapter 7: Building Models with Distance Metrics
Introduction
Using KMeans to cluster data
Optimizing the number of centroids
Assessing cluster correctness
Using MiniBatch KMeans to handle more data
Quantizing an image with KMeans clustering
Finding the closest objects in the feature space
Probabilistic clustering with Gaussian Mixture Models
Using KMeans for outlier detection
Using k-NN for regression
Chapter 8: Classifying Data with scikit-learn
Introduction
Doing basic classifications with Decision Trees
Tuning a Decision Tree model
Using many Decision Trees – random forests
Tuning a random forest model
Classifying data with support vector machines
Generalizing with multiclass classification
Using LDA for classification
Working with QDA – a nonlinear LDA
Using Stochastic Gradient Descent for classification
Classifying documents with Naïve Bayes
Label propagation with semi-supervised learning
Chapter 9: Postmodel Workflow
Introduction
K-fold cross validation
Automatic cross validation
Cross validation with ShuffleSplit
Stratified k-fold
Poor man's grid search
Brute force grid search
Using dummy estimators to compare results
Regression model evaluation
Feature selection
Feature selection on L1 norms
Persisting models with joblib
Chapter 10: The Fundamentals of Machine Learning
Learning from experience
Machine learning tasks
Training data and test data
Performance measures, bias, and variance
An introduction to scikit-learn
Installing scikit-learn
Installing pandas and matplotlib
Summary
Chapter 11: Linear Regression
Simple linear regression
Evaluating the model
Multiple linear regression
Polynomial regression
Regularization
Applying linear regression
Fitting models with gradient descent
Summary
Chapter 12: Feature Extraction and Preprocessing
Extracting features from categorical variables
Extracting features from text
Extracting features from images
Data standardization
Summary
Chapter 13: From Linear Regression to Logistic Regression
Binary classification with logistic regression
Spam filtering
Binary classification performance metrics
Calculating the F1 measure
ROC AUC
Tuning models with grid search
Multi-class classification
Multi-label classification and problem transformation
Summary
Chapter 14: Nonlinear Classification and Regression with Decision Trees
Decision trees
Training decision trees
Decision trees with scikit-learn
Summary
Chapter 15: Clustering with K-Means
Clustering with the K-Means algorithm
Evaluating clusters
Image quantization
Clustering to learn features
Summary
Chapter 16: Dimensionality Reduction with PCA
An overview of PCA
Performing Principal Component Analysis
Using PCA to visualize high-dimensional data
Face recognition with PCA
Summary
Chapter 17: The Perceptron
Activation functions
Binary classification with the perceptron
Limitations of the perceptron
Summary
Chapter 18: From the Perceptron to Support Vector Machines
Kernels and the kernel trick
Maximum margin classification and support vectors
Classifying characters in scikit-learn
Summary
Chapter 19: From the Perceptron to Artificial Neural Networks
Nonlinear decision boundaries
Feedforward and feedback artificial neural networks
Approximating XOR with Multilayer perceptrons
Classifying handwritten digits
Summary

Book Details

ISBN 139781788833479
Paperback1035 pages
Read More

Read More Reviews

Recommended for You

Machine Learning with scikit-learn and Tensorflow [Video] Book Cover
Machine Learning with scikit-learn and Tensorflow [Video]
$ 124.99
$ 10.00
Mastering Machine Learning with scikit-learn - Second Edition Book Cover
Mastering Machine Learning with scikit-learn - Second Edition
$ 35.99
$ 10.00
scikit-learn Cookbook - Second Edition Book Cover
scikit-learn Cookbook - Second Edition
$ 31.99
$ 10.00
Mastering Machine Learning Algorithms Book Cover
Mastering Machine Learning Algorithms
$ 35.99
$ 10.00
Pandas for Predictive Analysis using scikit-learn [Video] Book Cover
Pandas for Predictive Analysis using scikit-learn [Video]
$ 124.99
$ 10.00
Machine Learning with the Elastic Stack Book Cover
Machine Learning with the Elastic Stack
$ 31.99
$ 10.00