You're reading from Scikit-learn Cookbook Over 80 recipes for machine learning in Python with scikit-learn

Product type Paperback

Published in Dec 2025

Last Updated in Sep 2025

Publisher Packt

ISBN-13 9781836644453

Length 414 pages

Edition 3rd Edition

Languages

Python

Tools

Scikit-learn

Concepts

Machine Learning

Author (1):

John Sukup

View More author details

Table of Contents (14) Chapters

1. Scikit-learn Cookbook, Third Edition: Over 80 recipes for machine learning in Python with scikit-learn

2. Chapter 1: Common Conventions and API Elements of scikit-learn FREE CHAPTER

3. Chapter 2: Pre-Model Workflow and Data Preprocessing

4. Chapter 3: Dimensionality Reduction Techniques

5. Chapter 4: Building Models with Distance Metrics and Nearest Neighbors

6. Chapter 5: Linear Models and Regularization

7. Chapter 6: Advanced Logistic Regression and Extensions

8. Chapter 7: Support Vector Machines and Kernel Methods

9. Chapter 8: Tree-Based Algorithms and Ensemble Methods

10. Chapter 9: Text Processing and Multiclass Classification

11. Chapter 10: Clustering Techniques

12. Chapter 11: Novelty and Outlier Detection

13. Chapter 12: Cross-Validation and Model Evaluation Techniques

14. Chapter 13: Deploying scikit-learn Models in Production

Density-Based Clustering with DBSCAN

DBSCAN is a somewhat unique clustering algorithm capable of identifying clusters of varying shapes and sizes. It differs from K-means and hierarchical clustering by not requiring the number of clusters to be specified in advance and by handling outliers (noise) effectively. This means, unlike K-means, it does not generate centroids a priori. This recipe applies DBSCAN on a rather unconventional dataset.

Getting ready

Here, we are going to use another data generator function in scikit-learn called make_moons(), which, again, like make_blobs() is aptly named!

Load the libraries:

from sklearn.cluster import DBSCAN
from sklearn.datasets import make_moons

Create a new one with noise:

X, _ = make_moons(n_samples=300, noise=0.1, random_state=2024)

Another scikit-learn dataset generator (make_moons) creates crescent shaped data sets for exploring clustering algorithms that work better with data not arranged in spherical groupings