Bioinformatics with Python Cookbook

Learn how to use modern Python bioinformatics libraries and applications to do cutting-edge research in computational biology

Bioinformatics with Python Cookbook

This ebook is included in a Mapt subscription
Tiago Antao

Learn how to use modern Python bioinformatics libraries and applications to do cutting-edge research in computational biology
$0.00
$22.00
$54.99
$29.99p/m after trial
RRP $43.99
RRP $54.99
Subscription
eBook
Print + eBook
Start 30 Day Trial
Subscribe and access every Packt eBook & Video.
 
  • 4,000+ eBooks & Videos
  • 40+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 
Preview in Mapt

Book Details

ISBN 139781782175117
Paperback306 pages

Book Description

If you are either a computational biologist or a Python programmer, you will probably relate to the expression "explosive growth, exciting times". Python is arguably the main programming language for big data, and the deluge of data in biology, mostly from genomics and proteomics, makes bioinformatics one of the most exciting fields in data science.

Using the hands-on recipes in this book, you'll be able to do practical research and analysis in computational biology with Python. We cover modern, next-generation sequencing libraries and explore real-world examples on how to handle real data. The main focus of the book is the practical application of bioinformatics, but we also cover modern programming techniques and frameworks to deal with the ever increasing deluge of bioinformatics data.

Table of Contents

Chapter 1: Python and the Surrounding Software Ecology
Introduction
Installing the required software with Anaconda
Installing the required software with Docker
Interfacing with R via rpy2
Performing R magic with IPython
Chapter 2: Next-generation Sequencing
Introduction
Accessing GenBank and moving around NCBI databases
Performing basic sequence analysis
Working with modern sequence formats
Working with alignment data
Analyzing data in the variant call format
Studying genome accessibility and filtering SNP data
Chapter 3: Working with Genomes
Introduction
Working with high-quality reference genomes
Dealing with low-quality genome references
Traversing genome annotations
Extracting genes from a reference using annotations
Finding orthologues with the Ensembl REST API
Retrieving gene ontology information from Ensembl
Chapter 4: Population Genetics
Introduction
Managing datasets with PLINK
Introducing the Genepop format
Exploring a dataset with Bio.PopGen
Computing F-statistics
Performing Principal Components Analysis
Investigating population structure with Admixture
Chapter 5: Population Genetics Simulation
Introduction
Introducing forward-time simulations
Simulating selection
Simulating population structure using island and stepping-stone models
Modeling complex demographic scenarios
Simulating the coalescent with Biopython and fastsimcoal
Chapter 6: Phylogenetics
Introduction
Preparing the Ebola dataset
Aligning genetic and genomic data
Comparing sequences
Reconstructing phylogenetic trees
Playing recursively with trees
Visualizing phylogenetic data
Chapter 7: Using the Protein Data Bank
Introduction
Finding a protein in multiple databases
Introducing Bio.PDB
Extracting more information from a PDB file
Computing molecular distances on a PDB file
Performing geometric operations
Implementing a basic PDB parser
Animating with PyMol
Parsing mmCIF files using Biopython
Chapter 8: Other Topics in Bioinformatics
Introduction
Accessing the Global Biodiversity Information Facility
Geo-referencing GBIF datasets
Accessing molecular-interaction databases with PSIQUIC
Plotting protein interactions with Cytoscape the hard way
Chapter 9: Python for Big Genomics Datasets
Introduction
Setting the stage for high-performance computing
Designing a poor human concurrent executor
Performing parallel computing with IPython
Computing the median in a large dataset
Optimizing code with Cython and Numba
Programming with laziness
Thinking with generators

What You Will Learn

  • Gain a deep understanding of Python's fundamental bioinformatics libraries and be exposed to the most important data science tools in Python
  • Process genome-wide data with Biopython
  • Analyze and perform quality control on next-generation sequencing datasets using libraries such as PyVCF or PySAM
  • Use DendroPy and Biopython for phylogenetic analysis
  • Perform population genetics analysis on large datasets
  • Simulate complex demographies and genomic features with simuPOP

Authors

Table of Contents

Chapter 1: Python and the Surrounding Software Ecology
Introduction
Installing the required software with Anaconda
Installing the required software with Docker
Interfacing with R via rpy2
Performing R magic with IPython
Chapter 2: Next-generation Sequencing
Introduction
Accessing GenBank and moving around NCBI databases
Performing basic sequence analysis
Working with modern sequence formats
Working with alignment data
Analyzing data in the variant call format
Studying genome accessibility and filtering SNP data
Chapter 3: Working with Genomes
Introduction
Working with high-quality reference genomes
Dealing with low-quality genome references
Traversing genome annotations
Extracting genes from a reference using annotations
Finding orthologues with the Ensembl REST API
Retrieving gene ontology information from Ensembl
Chapter 4: Population Genetics
Introduction
Managing datasets with PLINK
Introducing the Genepop format
Exploring a dataset with Bio.PopGen
Computing F-statistics
Performing Principal Components Analysis
Investigating population structure with Admixture
Chapter 5: Population Genetics Simulation
Introduction
Introducing forward-time simulations
Simulating selection
Simulating population structure using island and stepping-stone models
Modeling complex demographic scenarios
Simulating the coalescent with Biopython and fastsimcoal
Chapter 6: Phylogenetics
Introduction
Preparing the Ebola dataset
Aligning genetic and genomic data
Comparing sequences
Reconstructing phylogenetic trees
Playing recursively with trees
Visualizing phylogenetic data
Chapter 7: Using the Protein Data Bank
Introduction
Finding a protein in multiple databases
Introducing Bio.PDB
Extracting more information from a PDB file
Computing molecular distances on a PDB file
Performing geometric operations
Implementing a basic PDB parser
Animating with PyMol
Parsing mmCIF files using Biopython
Chapter 8: Other Topics in Bioinformatics
Introduction
Accessing the Global Biodiversity Information Facility
Geo-referencing GBIF datasets
Accessing molecular-interaction databases with PSIQUIC
Plotting protein interactions with Cytoscape the hard way
Chapter 9: Python for Big Genomics Datasets
Introduction
Setting the stage for high-performance computing
Designing a poor human concurrent executor
Performing parallel computing with IPython
Computing the median in a large dataset
Optimizing code with Cython and Numba
Programming with laziness
Thinking with generators

Book Details

ISBN 139781782175117
Paperback306 pages
Read More

Read More Reviews