You're reading from Scikit-learn Cookbook Over 80 recipes for machine learning in Python with scikit-learn

Product type Paperback

Published in Dec 2025

Last Updated in Sep 2025

Publisher Packt

ISBN-13 9781836644453

Length 414 pages

Edition 3rd Edition

Languages

Python

Tools

Scikit-learn

Concepts

Machine Learning

Author (1):

John Sukup

View More author details

Table of Contents (14) Chapters

1. Scikit-learn Cookbook, Third Edition: Over 80 recipes for machine learning in Python with scikit-learn

2. Chapter 1: Common Conventions and API Elements of scikit-learn FREE CHAPTER

3. Chapter 2: Pre-Model Workflow and Data Preprocessing

4. Chapter 3: Dimensionality Reduction Techniques

5. Chapter 4: Building Models with Distance Metrics and Nearest Neighbors

6. Chapter 5: Linear Models and Regularization

7. Chapter 6: Advanced Logistic Regression and Extensions

8. Chapter 7: Support Vector Machines and Kernel Methods

9. Chapter 8: Tree-Based Algorithms and Ensemble Methods

10. Chapter 9: Text Processing and Multiclass Classification

11. Chapter 10: Clustering Techniques

12. Chapter 11: Novelty and Outlier Detection

13. Chapter 12: Cross-Validation and Model Evaluation Techniques

14. Chapter 13: Deploying scikit-learn Models in Production

Encoding Categorical Variables

Categorical variables are a common feature in many datasets, representing discrete values such as categories, labels, or groups. However, most ML algorithms (well, computers in general, it should be said) require numerical input, making it essential to convert categorical data into a suitable format.

Categorical variables can be divided into two main types:

Nominal Variables: These represent categories without any intrinsic ordering (e.g., color, brand).
Ordinal Variables: These have a clear ordering among categories (e.g., ratings from 1 to 5).

Choosing the right encoding method depends on the type of categorical variable and the specific requirements of the ML algorithm being used.

Getting ready

To begin, like we did earlier, we will create a toy dataset only this time our features will be composed of qualitative data.

Load libraries

import numpy as np

Create sample categorical data with 20 records

np.random.seed(2024)  # for reproducibility...

The rest of the chapter is locked

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

You're reading from Scikit-learn Cookbook Over 80 recipes for machine learning in Python with scikit-learn

Table of Contents (14) Chapters

Encoding Categorical Variables

Getting ready

Authors (1)

Personalised recommendations for you

You're reading from Scikit-learn Cookbook Over 80 recipes for machine learning in Python with scikit-learn

Table of Contents (14) Chapters

Encoding Categorical Variables

Getting ready

Authors (1)

Personalised recommendations for you

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access