Scikit-learn Cookbook: Over 80 recipes for machine learning in Python with scikit-learn , Third Edition

John Sukup

$19.99 per month

Paperback Sep 2025 414 pages 3rd Edition

Subscription

Free Trial

Renews at $19.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!

Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!

50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

Thousands of reference materials covering every tech concept you need to stay up to date.

Subscribe now

View plans & pricing

Scikit-learn Cookbook

Join our book community on Discord

A qr code with a square in the middle Description automatically generated

https://packt.link/EarlyAccessCommunity

It’s hard to believe that the scikit-learn project started back in 2007 and officially launched in 2009. Even after so many years, it is hard to deny the impact the Python library has had on the world of data science and machine learning (ML). For many of us, scikit-learn is one of the first libraries we hear about when beginning our journey in ML programming and engineering – and that hasn’t changed as the library is one of the most widely used in research, academia, and production applications at scale in the business world.

This chapter will cover the standard conventions and core API elements of scikit-learn, including the design principles behind estimators, transformers, and pipelines, as well as common methods like fit(), predict(), and transform(). The exercises found throughout the rest of the book will involve using these conventions to build and evaluate models, focusing on understanding...

Understanding Estimators

So, what exactly is an estimator anyway? The concept of estimators lies at the heart of scikit-learn. Estimators are objects (in the Python Object-Oriented Programming (OOP) sense) that implement algorithms for learning from data and are consistent across the entire library. Every estimator in scikit-learn, whether a model or a transformer, follows a simple and intuitive interface. The two most essential methods of any estimator are fit() and predict()previously mentioned. The fit() method trains the model by learning from data, while predict() is used to make predictions on new data based on the trained model. This is the raison d’etre of ML.

For example, in one of the simplest ML models (yet still often powerful), LinearRegression(), calling fit() with training data allows the model to learn the optimal coefficients for predicting outcomes. Afterward, predict() can be used on new data to generate predictions.

from sklearn.linear_model import LinearRegression...

Transformers and the transform() Method

Transformers in scikit-learn are tools that modify data by applying transformations such as scaling, normalization, or encoding, to prepare it for modeling. Each transformer follows a consistent interface, using the fit() method to learn any necessary parameters from the data and the transform() method to apply those transformations. For instance, StandardScaler() calculates the mean and standard deviation during fit() and uses those values to transform the data by scaling it (if you remember back to high school statistics, this transformed value is called a z-score).

from sklearn.preprocessing import StandardScaler
import numpy as np
# Example data
X = np.array([[1, 2], [3, 4], [5, 6]])
# Create a StandardScaler instance
scaler = StandardScaler()
# Fit the scaler on the data
scaler.fit(X)
# Transform the data
X_scaled = scaler.transform(X)
print(X_scaled)

Another common shortcut like we saw before, fit_transform(), allows users to perform both...

Common Attributes and Methods

As model complexity grows, it becomes harder and harder to look inside and understand a model’s inner workings (especially with artificial neural networks). Thankfully, scikit-learn models share several key attributes and methods that provide valuable insight into how a model has learned from data. For instance, attributes like coef_ and intercept_ in linear models store the learned coefficients and intercepts, helping interpret model behavior.

Similarly, methods such as score() allow users to evaluate model performance, typically returning a default metric like accuracy for classifiers or R² for regressors. These common features ensure consistency across different models and simplify model analysis and interpretation.

from sklearn.linear_model import LinearRegression
import numpy as np
# Example data
X = np.array([[1], [2], [3], [4], [5]])  # Feature matrix
y = np.array([1, 2, 3, 3.5, 5])  # Target values
# Create and fit the model
model = LinearRegression...

Hyperparameter Tuning with Search Methods

Hyperparameter tuning is crucial for optimizing candidate machine learning models and scikit-learn makes this process easier with a variety of built-in search methods. The library provides the two most used methods, GridSearchCV() and RandomizedSearchCV(), in easy to implement APIs along with their counterpart methods that implement a successive halving approach to hyperparameter search.

Scikit-learn also allows a manual approach to setting hyperparameters if you desire to adjust default values for your own training purposes: the set_params() and get_params() methods. set_params() allows users to adjust model hyperparameters programmatically, while get_params() retrieves the current hyperparameter settings. This functionality ensures flexibility when experimenting with different model configurations and can be paired with the techniques mentioned earlier for efficient tuning.

from sklearn.ensemble import RandomForestClassifier
# Create a RandomForestClassifier...

Key benefits

Solve complex business problems with data-driven approaches

Master tools associated with developing predictive/prescriptive models

Build robust ML pipelines for real-world applications

Avoid common pitfalls in ML pipeline development

Learn comprehensive, hands-on recipes tailored to Scikit-Learn version 1.5

Master ML with real-world examples and Scikit-Learn 1.5 features

Description

Scikit-Learn is a powerful, open-source ML library for Python that provides simple and efficient tools for model development and deployment. Data scientists, ML engineers, and software developers learn Scikit-Learn because it offers a versatile, user-friendly framework for implementing a wide range of ML algorithms, enabling efficient development and deployment of predictive models in real-world applications. Scikit-learn Cookbook (3rd Edition) takes the reader on a journey from understanding the fundamentals of ML and data preprocessing, through implementing advanced algorithms and techniques, to deploying and optimizing ML models in production. Along the way, readers will explore practical, step-by-step recipes that cover everything from feature engineering and model selection to hyperparameter tuning and model evaluation, all using Scikit-Learn. By the end of this book, readers will have the knowledge and skills to confidently build, evaluate, and deploy sophisticated ML models using Scikit-Learn, enabling them to tackle a wide range of data-driven challenges.

Who is this book for?

Are you a data scientist, machine learning, or software development professional looking to deepen their understanding of advanced ML techniques? Then this book is for you! To get the most out of this book, you should have a proficiency in Python programming and familiarity with commonly used ML libraries (e.g., pandas, NumPy, matplotlib, sciPy, etc.) Additionally, an understanding of basic ML concepts, like linear regression, decision trees, and model evaluation metrics is helpful. Familiarity with mathematical concepts such as linear algebra, calculus, and probability is also invaluable.

What you will learn

Implement a variety of ML algorithms, from basic classifiers to complex ensemble methods, using Scikit-Learn

Perform data preprocessing, feature engineering, and model selection to prepare datasets for optimal model performance

Optimize ML models through hyperparameter tuning and cross-validation techniques to improve accuracy and reliability

Deploy ML models for scalable, maintainable real-world applications

Evaluate and interpret models with advanced metrics and visualizations in Scikit-Learn

What do you get with a Packt Subscription?