Free Sample
+ Collection

Bioinformatics with R Cookbook

Paurush Praveen Sinha

Over 90 practical recipes for computational biologists to model and handle real-life data using R
RRP $32.99
RRP $54.99
Print + eBook

Want this title & more?

$16.99 p/month

Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.

Book Details

ISBN 139781783283132
Paperback340 pages

About This Book

  • Use the existing R-packages to handle biological data
  • Represent biological data with attractive visualizations
  • An easy-to-follow guide to handle real-life problems in Bioinformatics like Next Generation Sequencing and Microarray Analysis

Who This Book Is For

This book is ideal for computational biologists and bioinformaticians with basic knowledge of R programming, bioinformatics and statistics. If you want to understand various critical concepts needed to develop your computational models in Bioinformatics, then this book is for you. Basic knowledge of R is expected.

Table of Contents

Chapter 1: Starting Bioinformatics with R
Getting started and installing libraries
Reading and writing data
Filtering and subsetting data
Basic statistical operations on data
Generating probability distributions
Performing statistical tests on data
Visualizing data
Working with PubMed in R
Retrieving data from BioMart
Chapter 2: Introduction to Bioconductor
Installing packages from Bioconductor
Handling annotation databases in R
Performing ID conversions
The KEGG annotation of genes
The GO annotation of genes
The GO enrichment of genes
The KEGG enrichment of genes
Bioconductor in the cloud
Chapter 3: Sequence Analysis with R
Retrieving a sequence
Reading and writing the FASTA file
Getting the detail of a sequence composition
Pairwise sequence alignment
Multiple sequence alignment
Phylogenetic analysis and tree plotting
Handling BLAST results
Pattern finding in a sequence
Chapter 4: Protein Structure Analysis with R
Retrieving a sequence from UniProt
Protein sequence analysis
Computing the features of a protein sequence
Handling the PDB file
Working with the InterPro domain annotation
Understanding the Ramachandran plot
Searching for similar proteins
Working with the secondary structure features of proteins
Visualizing the protein structures
Chapter 5: Analyzing Microarray Data with R
Reading CEL files
Building the ExpressionSet object
Handling the AffyBatch object
Checking the quality of data
Generating artificial expression data
Data normalization
Overcoming batch effects in expression data
An exploratory analysis of data with PCA
Finding the differentially expressed genes
Working with the data of multiple classes
Handling time series data
Fold changes in microarray data
The functional enrichment of data
Clustering microarray data
Getting a co-expression network from microarray data
More visualizations for gene expression data
Chapter 6: Analyzing GWAS Data
The SNP association analysis
Running association scans for SNPs
The whole genome SNP association analysis
Importing PLINK GWAS data
Data handling with the GWASTools package
Manipulating other GWAS data formats
The SNP annotation and enrichment
Testing data for the Hardy-Weinberg equilibrium
Association tests with CNV data
Visualizations in GWAS studies
Chapter 7: Analyzing Mass Spectrometry Data
Reading the MS data of the mzXML/mzML format
Reading the MS data of the Bruker format
Converting the MS data in the mzXML format to MALDIquant
Extracting data elements from the MS data object
Preprocessing MS data
Peak detection in MS data
Peak alignment with MS data
Peptide identification in MS data
Performing protein quantification analysis
Performing multiple groups' analysis in MS data
Useful visualizations for MS data analysis
Chapter 8: Analyzing NGS Data
Querying the SRA database
Downloading data from the SRA database
Reading FASTQ files in R
Reading alignment data
Preprocessing the raw NGS data
Analyzing RNAseq data with the edgeR package
The differential analysis of NGS data using limma
Enriching RNAseq data with GO terms
The KEGG enrichment of sequence data
Analyzing methylation data
Analyzing ChipSeq data
Visualizations for NGS data
Chapter 9: Machine Learning in Bioinformatics
Data clustering in R using k-means and hierarchical clustering
Visualizing clusters
Supervised learning for classification
Probabilistic learning in R with Naïve Bayes
Bootstrapping in machine learning
Cross-validation for classifiers
Measuring the performance of classifiers
Visualizing an ROC curve in R
Biomarker identification using array data

What You Will Learn

  • Retrieve biological data from within an R environment without hassling web pages
  • Annotate and enrich your data and convert the identifiers
  • Find relevant text from PubMed on which to perform text mining
  • Find phylogenetic relations between species
  • Infer relations between genomic content and diseases via GWAS
  • Classify patients based on biological or clinical features
  • Represent biological data with attractive visualizations, useful for publications and presentations

In Detail

Bioinformatics is an interdisciplinary field that develops and improves upon the methods for storing, retrieving, organizing, and analyzing biological data. R is the primary language used for handling most of the data analysis work done in the domain of bioinformatics.

Bioinformatics with R Cookbook is a hands-on guide that provides you with a number of recipes offering you solutions to all the computational tasks related to bioinformatics in terms of packages and tested codes.

With the help of this book, you will learn how to analyze biological data using R, allowing you to infer new knowledge from your data coming from different types of experiments stretching from microarray to NGS and mass spectrometry.


Read More