Bioinformatics with Python Cookbook

Learn how to use modern Python bioinformatics libraries and applications to do cutting-edge research in computational biology
Preview in Mapt

Bioinformatics with Python Cookbook

Tiago Antao

1 customer reviews
Learn how to use modern Python bioinformatics libraries and applications to do cutting-edge research in computational biology
Mapt Subscription
FREE
$29.99/m after trial
eBook
$30.80
RRP $43.99
Save 29%
Print + eBook
$54.99
RRP $54.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$30.80
$54.99
$29.99p/m after trial
RRP $43.99
RRP $54.99
Subscription
eBook
Print + eBook
Start 30 Day Trial

Frequently bought together


Bioinformatics with Python Cookbook Book Cover
Bioinformatics with Python Cookbook
$ 43.99
$ 30.80
Python Text Processing with NLTK 2.0 Cookbook: LITE Book Cover
Python Text Processing with NLTK 2.0 Cookbook: LITE
$ 9.99
$ 7.00
Buy 2 for $24.50
Save $29.48
Add to Cart
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 

Book Details

ISBN 139781782175117
Paperback306 pages

Book Description

If you are either a computational biologist or a Python programmer, you will probably relate to the expression "explosive growth, exciting times". Python is arguably the main programming language for big data, and the deluge of data in biology, mostly from genomics and proteomics, makes bioinformatics one of the most exciting fields in data science.

Using the hands-on recipes in this book, you'll be able to do practical research and analysis in computational biology with Python. We cover modern, next-generation sequencing libraries and explore real-world examples on how to handle real data. The main focus of the book is the practical application of bioinformatics, but we also cover modern programming techniques and frameworks to deal with the ever increasing deluge of bioinformatics data.

Table of Contents

Chapter 1: Python and the Surrounding Software Ecology
Introduction
Installing the required software with Anaconda
Installing the required software with Docker
Interfacing with R via rpy2
Performing R magic with IPython
Chapter 2: Next-generation Sequencing
Introduction
Accessing GenBank and moving around NCBI databases
Performing basic sequence analysis
Working with modern sequence formats
Working with alignment data
Analyzing data in the variant call format
Studying genome accessibility and filtering SNP data
Chapter 3: Working with Genomes
Introduction
Working with high-quality reference genomes
Dealing with low-quality genome references
Traversing genome annotations
Extracting genes from a reference using annotations
Finding orthologues with the Ensembl REST API
Retrieving gene ontology information from Ensembl
Chapter 4: Population Genetics
Introduction
Managing datasets with PLINK
Introducing the Genepop format
Exploring a dataset with Bio.PopGen
Computing F-statistics
Performing Principal Components Analysis
Investigating population structure with Admixture
Chapter 5: Population Genetics Simulation
Introduction
Introducing forward-time simulations
Simulating selection
Simulating population structure using island and stepping-stone models
Modeling complex demographic scenarios
Simulating the coalescent with Biopython and fastsimcoal
Chapter 6: Phylogenetics
Introduction
Preparing the Ebola dataset
Aligning genetic and genomic data
Comparing sequences
Reconstructing phylogenetic trees
Playing recursively with trees
Visualizing phylogenetic data
Chapter 7: Using the Protein Data Bank
Introduction
Finding a protein in multiple databases
Introducing Bio.PDB
Extracting more information from a PDB file
Computing molecular distances on a PDB file
Performing geometric operations
Implementing a basic PDB parser
Animating with PyMol
Parsing mmCIF files using Biopython
Chapter 8: Other Topics in Bioinformatics
Introduction
Accessing the Global Biodiversity Information Facility
Geo-referencing GBIF datasets
Accessing molecular-interaction databases with PSIQUIC
Plotting protein interactions with Cytoscape the hard way
Chapter 9: Python for Big Genomics Datasets
Introduction
Setting the stage for high-performance computing
Designing a poor human concurrent executor
Performing parallel computing with IPython
Computing the median in a large dataset
Optimizing code with Cython and Numba
Programming with laziness
Thinking with generators

What You Will Learn

  • Gain a deep understanding of Python's fundamental bioinformatics libraries and be exposed to the most important data science tools in Python
  • Process genome-wide data with Biopython
  • Analyze and perform quality control on next-generation sequencing datasets using libraries such as PyVCF or PySAM
  • Use DendroPy and Biopython for phylogenetic analysis
  • Perform population genetics analysis on large datasets
  • Simulate complex demographies and genomic features with simuPOP

Authors

Table of Contents

Chapter 1: Python and the Surrounding Software Ecology
Introduction
Installing the required software with Anaconda
Installing the required software with Docker
Interfacing with R via rpy2
Performing R magic with IPython
Chapter 2: Next-generation Sequencing
Introduction
Accessing GenBank and moving around NCBI databases
Performing basic sequence analysis
Working with modern sequence formats
Working with alignment data
Analyzing data in the variant call format
Studying genome accessibility and filtering SNP data
Chapter 3: Working with Genomes
Introduction
Working with high-quality reference genomes
Dealing with low-quality genome references
Traversing genome annotations
Extracting genes from a reference using annotations
Finding orthologues with the Ensembl REST API
Retrieving gene ontology information from Ensembl
Chapter 4: Population Genetics
Introduction
Managing datasets with PLINK
Introducing the Genepop format
Exploring a dataset with Bio.PopGen
Computing F-statistics
Performing Principal Components Analysis
Investigating population structure with Admixture
Chapter 5: Population Genetics Simulation
Introduction
Introducing forward-time simulations
Simulating selection
Simulating population structure using island and stepping-stone models
Modeling complex demographic scenarios
Simulating the coalescent with Biopython and fastsimcoal
Chapter 6: Phylogenetics
Introduction
Preparing the Ebola dataset
Aligning genetic and genomic data
Comparing sequences
Reconstructing phylogenetic trees
Playing recursively with trees
Visualizing phylogenetic data
Chapter 7: Using the Protein Data Bank
Introduction
Finding a protein in multiple databases
Introducing Bio.PDB
Extracting more information from a PDB file
Computing molecular distances on a PDB file
Performing geometric operations
Implementing a basic PDB parser
Animating with PyMol
Parsing mmCIF files using Biopython
Chapter 8: Other Topics in Bioinformatics
Introduction
Accessing the Global Biodiversity Information Facility
Geo-referencing GBIF datasets
Accessing molecular-interaction databases with PSIQUIC
Plotting protein interactions with Cytoscape the hard way
Chapter 9: Python for Big Genomics Datasets
Introduction
Setting the stage for high-performance computing
Designing a poor human concurrent executor
Performing parallel computing with IPython
Computing the median in a large dataset
Optimizing code with Cython and Numba
Programming with laziness
Thinking with generators

Book Details

ISBN 139781782175117
Paperback306 pages
Read More
From 1 reviews

Read More Reviews

Recommended for You

IPython Interactive Computing and Visualization Cookbook Book Cover
IPython Interactive Computing and Visualization Cookbook
$ 29.99
$ 21.00
Python Machine Learning Book Cover
Python Machine Learning
$ 35.99
$ 25.20
Machine Learning with R Book Cover
Machine Learning with R
$ 32.99
$ 23.10
Practical Data Science Cookbook Book Cover
Practical Data Science Cookbook
$ 29.99
$ 21.00
Building Machine Learning Systems with Python Book Cover
Building Machine Learning Systems with Python
$ 29.99
$ 6.00
Python Data Analysis Book Cover
Python Data Analysis
$ 29.99
$ 21.00