You're reading from Bioinformatics with Python Cookbook Solve advanced computational biology problems and build production pipelines with Python and AI tools

Product type Paperback

Published in Dec 2025

Publisher Packt

ISBN-13 9781836642756

Length 618 pages

Edition 4th Edition

Languages

Python

Tools

Jupyter

Concepts

Data Science

Author (1):

Shane Brubaker

View More author details

Table of Contents (22) Chapters

Preface

1. Chapter 1: Computer Specifications and Python Setup

2. Chapter 2: Basics of Data Manipulation FREE CHAPTER

3. Chapter 3: Modern Coding Practices and AI-Generated Coding

4. Chapter 4: Data Science and Graphing

5. Chapter 5: Alignment and Variant Calling

6. Chapter 6: Annotation and Biological Interpretation

7. Chapter 7: Genomes and Genome Assembly

8. Chapter 8: Accessing Public Databases

9. Chapter 9: Protein Structure and Proteomics

10. Chapter 10: Phylogenetics

11. Chapter 11: Population Genetics

12. Chapter 12: Metabolic Modeling and Other Applications

13. Chapter 13: Genome Editing

14. Chapter 14: Cloud Basics

15. Chapter 15: Workflow Systems

16. Chapter 16: More Workflow Systems

17. Chapter 17: Deep Learning and LLMs for Nucleic Acid and Protein Design

18. Chapter 18: Single-Cell Technology and Imaging

19. Chapter 19: Unlock Your Exclusive Benefits

Unlock this Book’s Free Benefits in 3 Easy Steps

20. Index

Why subscribe?

21. Other Books You May Enjoy

K-means clustering

In this recipe, we’ll learn about another data science technique called clustering and revisit our breast cancer dataset.

K-means clustering is an example of an unsupervised algorithm. In these types of algorithms, we need a training dataset so that the algorithm is able to learn. After training the algorithm, it will be able to predict a certain outcome for new samples. In our case, we are hoping that we can predict the main classes in the population.

K-means comes from the idea of creating K centers. Points are assigned to centroids based on their Euclidean distance from the center. We then adjust the centers until we have more and more data points falling nearby with minimal distance. In this way, we can attempt to classify our data into approximately K groups.