You're reading from R Bioinformatics Cookbook - Second Edition

Product typeBook

Published inOct 2023

PublisherPackt

ISBN-139781837634279

Edition2nd Edition

Concepts

Bioinformatics

Author (1)

Dan MacLean

Computing new data columns from existing ones and applying arbitrary functions using mutate()

Data frames are the core data structure in R for storing and manipulating tabular data. They are similar to a table in a relational database or a spreadsheet, with rows representing observations and columns representing variables. The mutate() function in dplyr is used to add new columns to a data frame by applying a function or calculation to existing columns; it can be used on both data frames and nested datasets. A nested dataset is a data structure that contains multiple levels of information, such as lists or data frames within data frames.

Getting ready

In this recipe, we’ll use a very small data frame that will be created in the recipe and the dplyr, tidyr, and purrr packages.

How to do it…

The functionality offered by the mutate() pattern is exemplified in the following steps:

Add some new columns:

library(dplyr)df <- data.frame(gene_id = c("gene1", "gene2", "gene3"),                 tissue1 = c(5.1, 7.3, 8.2),                 tissue2 = c(4.8, 6.1, 9.5))df <- df |> mutate(log2_tissue1 = log2(tissue1),                    log2_tissue2 = log2(tissue2))

Conditionally operate on columns:

df |> mutate_if(is.numeric, log2)df |> mutate_at(vars(starts_with("tissue")), log2)

Operate across columns:

library(dplyr)df <- data.frame(gene_id = c("gene1", "gene2", "gene3"),                 tissue1 = c(5.1, 7.3, 8.2),                 tissue2 = c(4.8, 6.1, 9.5),                 tissue3 = c(8.5, 12.5, 6.5))df <- df |>   rowwise() |>    mutate(mean = mean(c(tissue1, tissue2, tissue3)),         stddev = sd(c(tissue1, tissue2, tissue3))         )

Operate on nested data:

df <- data.frame(gene_id = c("gene1", "gene2", "gene3"),                 tissue1_value = c(5.1, 7.3, 8.2),                 tissue1_pvalue = c(0.01, 0.05, 0.001),                 tissue2_value = c(4.8, 6.1, 9.5),                 tissue2_pvalue = c(0.03, 0.04, 0.001)                 ) |>       tidyr::nest(-gene_id, .key = "tissue")df <- df |> mutate(tissue = purrr::map(tissue, ~ mutate(.x, value_log2 = log2(.[1]),pvalue_log2 = log2(.[2])) ))

These are the various ways we can work with mutate() on different data frames to different ends.

How it works…

In step 1, we start by creating a data frame containing three genes, with columns representing the values of each gene. Then, we use the [mutate(){custom-style='P - Code'} function in its most basic form to apply the log2 transformation of the values into new columns.

In step 2, we apply the transformation conditionally with mutate_if(), which applies the specified function to all columns that match the specified condition (in this case, is.numeric), and mutate_at() to apply the log2 function to all columns that start with the name tissue.

In step 3, we create a bigger data frame of expression data and then use the rowwise() function in conjunction with mutate() to add two new columns, mean and stddev, which contain the mean and standard deviation of the expression values across all three tissues for each gene. If we just use mutate() on the vector, the function will be applied to the entire column instead of the rows. As we need to calculate the mean and standard deviation of the expression values across all tissues, this would be difficult to achieve just by using mutate() on vectors in the standard column-wise fashion.

Finally, we look at using mutate() in nested data frames. We begin by creating a data frame of genes and expression values and use the nest() function to create a nested data frame. We can then use the combination of mutate() and map() functions from the purrr package, to extract the tissue1_value and tissue2_value columns from the nested part of the data frame.

You have been reading a chapter from

R Bioinformatics Cookbook - Second Edition

Published in: Oct 2023Publisher: PacktISBN-13: 9781837634279

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Dan MacLean

Professor Dan MacLean has a PhD in molecular biology from the University of Cambridge and gained postdoctoral experience in genomics and bioinformatics at Stanford University in California. Dan is now an honorary professor at the School of Computing Sciences at the University of East Anglia. He has worked in bioinformatics and plant pathogenomics, specializing in R and Bioconductor, and has developed analytical workflows in bioinformatics, genomics, genetics, image analysis, and proteomics at the Sainsbury Laboratory since 2006. Dan has developed and published software packages in R, Ruby, and Python, with over 100,000 downloads combined.
Read more about Dan MacLean

Personalised recommendations for you

Based on your interests and search pattern

Engineering Manager's Handbook

Engineering Manager's Handbook is a comprehensive guide for managers to excel in their role, foster customer-centric digital products, learn leadership, team building, and balancing technical work with management. You’ll also explore how to develop trust, authority, and collaboration to drive success and make a lasting impact.

BookSep 2023278 pages

C++ Game Animation Programming

Video game characters have a fascinating history, evolving from simple 2D sprites to high-polygon 3D models. Take a look behind the curtain and learn how to build a 3D renderer, load character models, play animations and blend between them, and create large crowds of animated people with this comprehensive C++ game animation programming guide.

BookDec 2023480 pages

Gamification for Product Excellence

This book helps you to take your product management strategy to the next level by standing out in crowded markets. Along with boosting user adoption rates by creating engaging products that incorporate playful elements, learn gamification theory and how to integrate it into your design, product development, and product management processes.

BookSep 2023350 pages

Supercharging Productivity with Trello

Supercharging Productivity with Trello is the ultimate guide for anyone looking to boost their productivity with digital tools. Whether you're new to Trello or a seasoned professional, this book covers everything from core features to advanced automation, and Power-Ups.

BookAug 2023342 pages

Automate It with Zapier and Generative AI

This comprehensive guide takes you through the concepts of business process automation, showing you how Zapier can facilitate it without having to write code and helping you to boost productivity. You’ll learn how to save time, reduce costs, and make your business recession-proof by using Zapier to automate tasks in your cloud-based business apps.

BookAug 2023706 pages

Scoring to Picture in Logic Pro

In this book, you’ll explore a variety of techniques to synchronize music to picture using Logic Pro. Though this is not a technical manual, it will teach you how to make the best use of Logic Pro and how to wield this technology to maximize your potential when scoring to picture.

BookSep 2023412 pages

Mastering Information Security Compliance Management

This concise book equips you with the knowledge and practices needed to establish and maintain an effective information security management system. The chapters provide insights into ISO/IEC 27001/27002:2022, risk management, ISMS development, incident management, audit processes, and strategies for continuous improvement.

BookAug 2023236 pages1

Implementing Atlassian Confluence

Implementing Atlassian Confluence provides both a high-level overview and an insightful path for remote collaboration with Atlassian Confluence. With this multi-layered yet practical guide, you’ll be able to set up Confluence-based collaboration with minimum external consultancy services to ensure smooth and close coordination between teams.

BookSep 2023406 pages

R Bioinformatics Cookbook

This book takes a unique problem–solution approach to handling complex tasks in the bioinformatics domain using different datasets present in the book. With the help of real-world examples, you’ll learn to put each independent recipe to use to tackle problems in the field of bioinformatics.

BookOct 2023396 pages

Build Your Own Metaverse with Unity

Build Your own Metaverse with Unity is a practical guide for developers to create their own metaverse - a virtual world with infinite possibilities. It empowers you to identify gaps in existing metaverses and improve upon them, enabling you to shape your virtual world.

BookSep 2023586 pages5