You're reading from R Bioinformatics Cookbook - Second Edition

Product typeBook

Published inOct 2023

PublisherPackt

ISBN-139781837634279

Edition2nd Edition

Concepts

Bioinformatics

Author (1)

Dan MacLean

Phylogenetic Analysis and Visualization

Phylogenetics is the study of the evolutionary relationships among species or other groups of organisms. It involves the use of molecular and computational techniques to construct phylogenetic trees, which depict the evolutionary history of the organisms under study.

In bioinformatics, phylogenetics is studied using various computational tools and methods, including sequence alignment, distance-based methods, maximum likelihood, and Bayesian inference. These methods allow researchers to compare DNA or protein sequences from different organisms and infer their evolutionary relationships based on similarities and differences in their genetic makeup. Phylogenetics has many applications in biology, and is used to help understand the evolutionary history of species, to study the origins and spread of diseases (phylogenetic analysis can be used to trace the origins and spread of infectious diseases), and to inform conservation efforts by identifying...

Technical requirements

We will use renv to manage packages in a project-specific way. To use renv to install packages, you will first need to install the renv package. You can do this by running the following commands in your R console:

Install renv:
```
install.packages("renv")
```
Create a new renv environment:
```
renv::init()
```
This will create a new directory called .renv in your current project directory.
You can then install packages with the following:
```
renv::install_packages()
```
You can also use the renv package manager to install Bioconductor packages by running the following command:
```
renv::install("bioc::package name")
```
For example, to install the Biobase package, you would run this:
```
renv::install("bioc::Biobase")
```
You can use renv to install development packages from GitHub like this:
```
renv::install("user name/repo name")
```
For example, to install the danmaclean user rbioinfcookbook package, you would run this:
```
renv::install("danmaclean...
```

Reading and writing varied tree formats with ape and treeio

Phylogenetic analysis is a cornerstone of biology and bioinformatics. The programs are diverse and complex, the computations are long-running, and the datasets are often large. Many programs are standalone and many have proprietary input and output formats. This has created a very complex ecosystem that we must navigate when dealing with phylogenetic data, meaning that often the simplest strategy is to use combinations of tools to load, convert, and save the results of analyses in order to be able to use them in different packages. In this recipe, we’ll look at dealing with phylogenetic tree data in R. To date, R support for the wide range of tree formats is restricted, but a few key packages have sufficient standardized objects such that workflows can focus on a few types and conversion to those types is streamlined. We’ll look at using the ape and treeio packages to get tree data into and out of R.

Getting...

Visualizing trees of many genes quickly with ggtree

Once you have computed a tree, the first thing you will want to do with it is take a look. That’s possible in many programs, but R has an extremely powerful, flexible, and fast system in the form of the ggtree package. In this recipe, we’ll learn how to get into ggtree and re-layout, highlight, and annotate tree images in just a few commands.

Getting ready

You’ll need the ggplot2, ggtree, and ape packages. You’ll also require the itol.nwk file from the rbioinfcookbook package. The file is a Newick tree of 191 species from the Interactive Tree of Life online tool’s public dataset. At the time of writing, there is an issue with an upstream dependency that causes this code to fail, though it is correct. We hope this will have gone away by the time you read this. If it hasn’t, a workaround is to install the source version of ggtree from Biocmanager, like this:

BiocManager::install("...

Quantifying and estimating the differences between trees with treespace

Comparing trees to differentiate or group them can help researchers to see patterns of evolution. Multiple trees of a single gene tracked across species or strains can reveal differences in how that gene is changing across species. At the core of these approaches are metrics of distances between trees. In this recipe, we’ll calculate one such metric to find pairwise differences between 20 different genes in 15 different species, hence 15 different tips with identical names in each tree. Such similarity in trees is needed to compare and get distances, and we can’t do an analysis like this unless these conditions are met.

Getting ready

For this recipe, we’ll use the treespace package to compute distances and clusters. We’ll use ape and adegraphics for accessory loading and visualization functions. The input data will be 20 files of Newick format trees, each of which represents a...

Extracting and working with subtrees using ape

A common but often frustrating task is cropping trees to look at a section in a new, clearer context or combining them with another tree in order to present two distant clades more clearly. In this short recipe, we’ll look at how easy it can be to manipulate trees- specifically, how to pull out a subtree as a new object and how to combine trees into other trees. We’ll use the ape package, the phylogenetic workhorse in R that will give us functionality for completing those tasks easily.

Getting ready

We’ll need a single example tree – the mammal_tree.nwk file in the rbioinfcookbook package will be fine. All the functions we require can be found in the ape package.

How to do it…

Extracting and working with subtrees in ape can be executed using the following steps:

Load the library and tree:

library(ape)tree_file <- fs::path_package(  "extdata",  "...

Creating dot plots for alignment visualizations

Dot plots of pairs of aligned sequences are possibly the oldest alignment visualization. In these plots, the positions of two sequences are plotted on the x axis and y axis, and for every coordinate in that space, a point is drawn if the letters (nucleotides or amino acids) correspond at that (x,y) coordinate. Since the plot can show regions that match that aren’t generally in the same region of the two sequences (as lines away from the diagonal), the plot is a good way to visually spot insertions and deletions and structural rearrangements in the two sequences. In this recipe, we’ll look at a speedy method for constructing a dot plot using the dotplot package and a bit of code for getting a grid plot of all pairwise dot plots for sequences in a file.

Getting ready

We’ll need the bhlh.fa file, which contains three basic helix-loop-helix (bHLH) transcription factor sequences from pea, soy, and lotus. The file...

Reconstructing trees from alignments using phangorn

So far in this chapter, we’ve assumed that trees are already available and ready to use. Of course, there are many ways to make a phylogenetic tree and, in this recipe, we’ll take a look at some of the different methods available.

Getting ready

For this chapter, we’ll use the abc.fa file of yeast ABC transporter sequences, the Bioconductor Biostrings package, and the CRAN msa and phangorn packages.

How to do it…

Constructing trees using phangorn can be done like this:

Load in the libraries and sequences and make an alignment:

library(Biostrings)library(msa)library(phangorn)seqfile <- fs::path_package(  "extdata",  "abc.fa",  package="rbioinfcookbook")seqs <- readAAStringSet(seqfile)aln <- msa::msa(seqs, method=c("ClustalOmega"))

Convert the alignment:
```
aln <- as.phyDat(aln, type = "AA")
```
Make...

Finding orthologue candidates using reciprocal BLASTs

In genomics, orthology refers to the relationship between genes from different species that evolved from a common ancestral gene through speciation. Orthologous genes typically have the same function and structure and play similar roles in different organisms, even if they have diverged over time.

Orthology has many important uses in bioinformatics. Orthology can be used to infer the function of a gene in a newly sequenced genome based on its similarity to known genes in other species. This can be especially useful for identifying genes that are involved in specific biological processes or pathways. Orthologous genes can be used to compare the genomes of different organisms and study the evolution of gene families. By identifying which genes are conserved across different species, researchers can gain insights into the evolutionary history of those genes and the organisms that carry them.

Orthology can be inferred using various...

The rest of the chapter is locked

You have been reading a chapter from

R Bioinformatics Cookbook - Second Edition

Published in: Oct 2023Publisher: PacktISBN-13: 9781837634279

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Dan MacLean

Professor Dan MacLean has a PhD in molecular biology from the University of Cambridge and gained postdoctoral experience in genomics and bioinformatics at Stanford University in California. Dan is now an honorary professor at the School of Computing Sciences at the University of East Anglia. He has worked in bioinformatics and plant pathogenomics, specializing in R and Bioconductor, and has developed analytical workflows in bioinformatics, genomics, genetics, image analysis, and proteomics at the Sainsbury Laboratory since 2006. Dan has developed and published software packages in R, Ruby, and Python, with over 100,000 downloads combined.
Read more about Dan MacLean

Personalised recommendations for you

Based on your interests and search pattern

Engineering Manager's Handbook

Engineering Manager's Handbook is a comprehensive guide for managers to excel in their role, foster customer-centric digital products, learn leadership, team building, and balancing technical work with management. You’ll also explore how to develop trust, authority, and collaboration to drive success and make a lasting impact.

BookSep 2023278 pages

C++ Game Animation Programming

Video game characters have a fascinating history, evolving from simple 2D sprites to high-polygon 3D models. Take a look behind the curtain and learn how to build a 3D renderer, load character models, play animations and blend between them, and create large crowds of animated people with this comprehensive C++ game animation programming guide.

BookDec 2023480 pages

Gamification for Product Excellence

This book helps you to take your product management strategy to the next level by standing out in crowded markets. Along with boosting user adoption rates by creating engaging products that incorporate playful elements, learn gamification theory and how to integrate it into your design, product development, and product management processes.

BookSep 2023350 pages

Supercharging Productivity with Trello

Supercharging Productivity with Trello is the ultimate guide for anyone looking to boost their productivity with digital tools. Whether you're new to Trello or a seasoned professional, this book covers everything from core features to advanced automation, and Power-Ups.

BookAug 2023342 pages

Automate It with Zapier and Generative AI

This comprehensive guide takes you through the concepts of business process automation, showing you how Zapier can facilitate it without having to write code and helping you to boost productivity. You’ll learn how to save time, reduce costs, and make your business recession-proof by using Zapier to automate tasks in your cloud-based business apps.

BookAug 2023706 pages

Scoring to Picture in Logic Pro

In this book, you’ll explore a variety of techniques to synchronize music to picture using Logic Pro. Though this is not a technical manual, it will teach you how to make the best use of Logic Pro and how to wield this technology to maximize your potential when scoring to picture.

BookSep 2023412 pages

Mastering Information Security Compliance Management

This concise book equips you with the knowledge and practices needed to establish and maintain an effective information security management system. The chapters provide insights into ISO/IEC 27001/27002:2022, risk management, ISMS development, incident management, audit processes, and strategies for continuous improvement.

BookAug 2023236 pages1

Implementing Atlassian Confluence

Implementing Atlassian Confluence provides both a high-level overview and an insightful path for remote collaboration with Atlassian Confluence. With this multi-layered yet practical guide, you’ll be able to set up Confluence-based collaboration with minimum external consultancy services to ensure smooth and close coordination between teams.

BookSep 2023406 pages

R Bioinformatics Cookbook

This book takes a unique problem–solution approach to handling complex tasks in the bioinformatics domain using different datasets present in the book. With the help of real-world examples, you’ll learn to put each independent recipe to use to tackle problems in the field of bioinformatics.

BookOct 2023396 pages

Build Your Own Metaverse with Unity

Build Your own Metaverse with Unity is a practical guide for developers to create their own metaverse - a virtual world with infinite possibilities. It empowers you to identify gaps in existing metaverses and improve upon them, enabling you to shape your virtual world.

BookSep 2023586 pages5