You're reading from Bioinformatics with Python Cookbook Solve advanced computational biology problems and build production pipelines with Python & AI tools

Product type Paperback

Published in Jan 2026

Last Updated in Oct 2025

Publisher Packt

ISBN-13 9781836642756

Length 617 pages

Edition 4th Edition

Languages

Python

Tools

Jupyter

Concepts

Data Science

Author (1):

Shane Brubaker

View More author details

Table of Contents (14) Chapters

1. Bioinformatics with Python Cookbook, Fourth Edition: Solve advanced computational biology problems and build production pipelines with Python & AI tools FREE CHAPTER

2. Chapter 1: Computer Specifications and Python Setup

3. Chapter 2: Basics of Data Manipulation

4. Chapter 3: Modern Coding Practices and AI-generated coding

5. Chapter 4: Data Science and Graphing

6. Chapter 5: Alignment and Variant Calling

7. Chapter 6: Annotation and Biological Interpretation

8. Chapter 7: Genomes and Genome Assembly

9. Chapter 8: Accessing Public Databases

10. Chapter 9: Protein Structure and Proteomics

11. Chapter 10: Phylogenetics

12. Chapter 11: Population Genetics

13. Chapter 12: Mectabolic Modeling and Other Applications

14. Chapter 13: Genome Editing

Dealing with the pitfalls of joining pandas DataFrames

The previous recipe was a whirlwind tour that introduced pandas and exposed most of the features that we will use in this book. While an exhaustive discussion about pandas would require a complete book, in this recipe – and in the next one – we are going to discuss topics that impact data analysis and are seldom discussed in the literature but are very important.In this recipe, we are going to discuss some pitfalls that deal with relating DataFrames through joins: it turns out that many data analysis errors are introduced by carelessly joining data. We will introduce techniques to reduce such problems here.

Getting ready

We will be using the same data as in the previous recipe, but we will jumble it a bit so that we can discuss typical data analysis pitfalls. Once again, we will be joining the main adverse events table with the vaccination table, but we will randomly sample 90% of the data from each. This mimics, for...