You're reading from R Bioinformatics Cookbook - Second Edition

Product typeBook

Published inOct 2023

PublisherPackt

ISBN-139781837634279

Edition2nd Edition

Concepts

Bioinformatics

Author (1)

Dan MacLean

ggplot2 and Extensions for Publication Quality Plots

Clear and informative data visualizations are the most important tool that bioinformaticians have to effectively communicate complex data and findings to other scientists in the field. They allow for easy and efficient exploration and understanding of large and complex datasets. The process of creating a good visualization is very iterative, and many drafts of a visualization are discarded before a final one is settled on, so it is important that we have plotting tools that allow for quick and easy plot creation and customization.

ggplot2 is a popular data visualization library in R that provides an elegant solution for bioinformaticians. It is based on the Grammar of Graphics, a principle that allows users to easily create complex and customizable visualizations by breaking them down into small, modular components, defined by a consistent interface. These make ggplot2 highly flexible and allow for the creation of a wide variety...

Technical requirements

We will use renv to manage packages in a project-specific way. To use renv to install packages, you will first need to install the renv package itself. Here’s how to install renv and then use it to install packages:

Run the following command in your R console:
```
install.packages("renv")
```
Next, you will need to create a new renv environment for your project by running the following command:
```
renv::init()
```
This will create a new directory called .renv in your current project directory.
You can then install packages with the following command:
```
install.packages("<package name>")
```
You can also use the renv package manager to install Bioconductor packages by running the following command:
```
renv::install("bioc::<package name>")
```
For example, to install the Biobase package, you would run the following command:
```
renv::install("bioc::Biobase")
```
You can use renv to install development packages from GitHub...

Combining many plot types in ggplot2

The layer model of ggplot2 is a key feature of the library that allows users to create complex visualizations by building up layers of data, aesthetics, and geoms. Each layer represents a different aspect of the plot, and they are added on top of each other to create the final visualization. In this recipe, we’ll use the layer model to create a complex plot of data in the palmerpenguins package. It may be helpful to inspect the data in R directly by printing it to the screen. Also, the package is well documented at https://allisonhorst.github.io/palmerpenguins/, should you wish to look more into how it was generated.

Getting ready

Install the ggplot2 and palmerpenguins packages.

How to do it…

We can use the layer system to combine multiple plot types as follows:

Create the base for the plot:

library(ggplot2)library(palmerpenguins)p <- ggplot(data = penguins) +  aes(x = bill_length_mm, y = bill_depth_mm...

Comparing changes in distributions with ggridges

Ridge plots, also known as joyplots, are a visualization tool that allows for the clear comparison of multiple distributions in a single plot. The ggridges R package provides an easy-to-use implementation of ridge plots, allowing for the clear comparison of multiple distributions of a single variable by superimposing them on top of each other in a single plot. The package also allows for easy customization of plot features such as color, fill, and theme. The ggridges package is particularly useful for comparing the distribution of a single variable across multiple groups or categories. In this recipe, we will look at implementing some useful ridge plots.

Getting ready

We will need the ggplot2, ggridges, and palmerpenguins packages.

How to do it…

We can look at the changes in distributions using the following steps:

Plot overlapping distributions:

library(ggplot2)library(ggridges)library(palmerpenguins)ggplot...

Customizing plots with ggeasy

One of the key aspects of customizing plots in ggplot2 is the theme() function, which allows users to customize elements of the plot’s overall appearance. Customizing plots in ggplot2 can be a little unintuitive. Although the theme() function is powerful, it does require the user to manually specify each element of the plot, such as axis labels, titles, colors, and shapes. The ggeasy package, built on top of ggplot2, aims to make plot customization more accessible by providing a simpler, more intuitive syntax for many common customization tasks. ggeasy provides a set of simple wrapper functions around theme() that make the important things a lot easier to remember. With this recipe, we’ll look at customizing labels, legends, and axes in a plot created initially in ggplot2.

Getting ready

We’ll need the ggplot2, ggeasy, and palmerpenguins packages.

How to do it…

We can customize a plot as follows.

Make a base plot...

Highlighting selected values in busy plots with gghighlight

Bioinformatics datasets often comprise measurements of many items. The genomes we analyze have thousands of genes, but usually, we’re only interested in the few that respond to particular changes in the experiment we have designed. So, it’s of great use to be able to highlight those few in our plots. In this recipe, we’ll look at the gghighlight package, which can make that very easy.

Getting ready

We’ll need the gghighlight, ggplot2, and rbioinfcookbook packages for the main functions. We’ll also use dplyr briefly. The datasets for these are fission yeast wt versus mutant gene expression data and an Arabidopsis treatment timecourse. The columns in the data are for the log 2 ratio of gene expression in mutant versus wt and the p-value from a statistical test.

How to do it…

We can highlight selected values in a plot such as a gene expression plot using the following steps...

Plotting variability and confidence intervals better with ggdist

Confidence intervals are used to make inferences about a population based on a sample of data. They capture the variability of the data by providing a range of possible values for some parameter, rather than a single point estimate. The interval is a measure of how sure we are that the interval contains the true population parameter. It is common to show distributions and annotate them with range markers or confidence intervals. With this recipe, we will look at how to use ggplot’s ggdist extension to make informative and great-looking plots of distributions.

Getting ready

For this recipe, we need the ggdist, ggplot2, and palmerpenguins packages.

How to do it…

We can create plots with confidence intervals as follows:

Create a raincloud plot:

library(ggplot2)library(ggdist)library(palmerpenguins)ggplot(penguins) +  aes(x = flipper_length_mm, y = island) +  geom_dots...

Making interactive plots with plotly

Interactive plots are great tools for data exploration, allowing users to explore interactively large datasets to gain insights and identify patterns in data. They are useful for programmers wishing to create dashboards for visualizing real-time data and help with interactive presentations that can communicate complex data relationships in an engaging manner. plotly is a data visualization library for creating interactive plots in Python, R, and JavaScript. It provides a high-level interface for drawing attractive and informative statistical graphics, and the ggplotly package in R allows you to convert static ggplot2 visualizations to interactive plots through a high-level interface. In this recipe, we’ll create a fairly involved ggplot2 visualization of mutation sites on a genome and then convert it to plotly to get a great first-level interaction layer.

Getting ready

We’ll need the ggplot2, plotly, and rbioinfcookbook packages...

Clarifying label placement with ggrepel

Bioinformatics datasets often have many thousands of data points. These can be genomic positions or genes within a genome, and as part of our data analysis, we will frequently want to label positions or genes so that the reader can identify them. A problem arises in that the labels can easily overlap or clash in the plots. The ggrepel package provides geoms for ggplot2 that allow for labels to be positioned much more clearly, incorporating label layout algorithms that make labels and connecting lines repel intelligently. In this recipe, we’ll look at the most important options for applying that to a genomics dataset.

Getting ready

We’ll need the ggplot2 and ggrepel packages and the fission yeast gene expression dataset in the rbioinfcookbook data package. This data frame contains yeast gene IDs in one column, the log 2-fold change of gene expression for that gene, and the p-value from a statistical test.

How to do it…...

Zooming and making callouts from selected plot sections with facetzoom

We’ve already seen in these recipes how bioinformatics datasets can encompass very large scales. Genomes can be thousands of millions of bases long and contain tens of thousands of genes, taxa can have thousands of members, and biomes can have billions of individuals living in areas of a wide range of sizes. Contextual information is therefore often important in analysis and visualization; we may want to see a detail of some subset of data in its original broader context. We can do that by using plots with callout-style subplots—zoomed-in areas drawn alongside the wider data. In this recipe, we will look at using the facet zoom functionality in the ggforce package to look at an area of interest in a ggplot.

Getting ready

We’ll use the ggplot2, ggforce, palmerpenguins, and rbioinfcookbook packages for the main part of this recipe. The allele_freq and penguins datasets will be the basis...

The rest of the chapter is locked

You have been reading a chapter from

R Bioinformatics Cookbook - Second Edition

Published in: Oct 2023Publisher: PacktISBN-13: 9781837634279

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Dan MacLean

Professor Dan MacLean has a PhD in molecular biology from the University of Cambridge and gained postdoctoral experience in genomics and bioinformatics at Stanford University in California. Dan is now an honorary professor at the School of Computing Sciences at the University of East Anglia. He has worked in bioinformatics and plant pathogenomics, specializing in R and Bioconductor, and has developed analytical workflows in bioinformatics, genomics, genetics, image analysis, and proteomics at the Sainsbury Laboratory since 2006. Dan has developed and published software packages in R, Ruby, and Python, with over 100,000 downloads combined.
Read more about Dan MacLean

Personalised recommendations for you

Based on your interests and search pattern

Engineering Manager's Handbook

Engineering Manager's Handbook is a comprehensive guide for managers to excel in their role, foster customer-centric digital products, learn leadership, team building, and balancing technical work with management. You’ll also explore how to develop trust, authority, and collaboration to drive success and make a lasting impact.

BookSep 2023278 pages

C++ Game Animation Programming

Video game characters have a fascinating history, evolving from simple 2D sprites to high-polygon 3D models. Take a look behind the curtain and learn how to build a 3D renderer, load character models, play animations and blend between them, and create large crowds of animated people with this comprehensive C++ game animation programming guide.

BookDec 2023480 pages

Gamification for Product Excellence

This book helps you to take your product management strategy to the next level by standing out in crowded markets. Along with boosting user adoption rates by creating engaging products that incorporate playful elements, learn gamification theory and how to integrate it into your design, product development, and product management processes.

BookSep 2023350 pages

Supercharging Productivity with Trello

Supercharging Productivity with Trello is the ultimate guide for anyone looking to boost their productivity with digital tools. Whether you're new to Trello or a seasoned professional, this book covers everything from core features to advanced automation, and Power-Ups.

BookAug 2023342 pages

Automate It with Zapier and Generative AI

This comprehensive guide takes you through the concepts of business process automation, showing you how Zapier can facilitate it without having to write code and helping you to boost productivity. You’ll learn how to save time, reduce costs, and make your business recession-proof by using Zapier to automate tasks in your cloud-based business apps.

BookAug 2023706 pages

Scoring to Picture in Logic Pro

In this book, you’ll explore a variety of techniques to synchronize music to picture using Logic Pro. Though this is not a technical manual, it will teach you how to make the best use of Logic Pro and how to wield this technology to maximize your potential when scoring to picture.

BookSep 2023412 pages

Mastering Information Security Compliance Management

This concise book equips you with the knowledge and practices needed to establish and maintain an effective information security management system. The chapters provide insights into ISO/IEC 27001/27002:2022, risk management, ISMS development, incident management, audit processes, and strategies for continuous improvement.

BookAug 2023236 pages1

Implementing Atlassian Confluence

Implementing Atlassian Confluence provides both a high-level overview and an insightful path for remote collaboration with Atlassian Confluence. With this multi-layered yet practical guide, you’ll be able to set up Confluence-based collaboration with minimum external consultancy services to ensure smooth and close coordination between teams.

BookSep 2023406 pages

R Bioinformatics Cookbook

This book takes a unique problem–solution approach to handling complex tasks in the bioinformatics domain using different datasets present in the book. With the help of real-world examples, you’ll learn to put each independent recipe to use to tackle problems in the field of bioinformatics.

BookOct 2023396 pages

Build Your Own Metaverse with Unity

Build Your own Metaverse with Unity is a practical guide for developers to create their own metaverse - a virtual world with infinite possibilities. It empowers you to identify gaps in existing metaverses and improve upon them, enabling you to shape your virtual world.

BookSep 2023586 pages5