You're reading from R Bioinformatics Cookbook - Second Edition

Product typeBook

Published inOct 2023

PublisherPackt

ISBN-139781837634279

Edition2nd Edition

Concepts

Bioinformatics

Author (1)

Dan MacLean

Easily Performing Statistical Tests Using Linear Models

Linear models are a statistical tool used to model the relationship between a dependent variable and one or more independent variables. They are based on the assumption that the relationship between the variables is linear, meaning that the change in the dependent variable is proportional to the change in the independent variables.

Linear models are widely used in many fields, including bioinformatics. In bioinformatics, linear models can be used to analyze large datasets, such as gene expression data. For example, linear models can be used to identify differentially expressed genes between different experimental conditions or to predict the expression of genes based on other variables, such as clinical data.

Linear models are closely related to statistical tests, such as t-tests and analysis of variance (ANOVA). In fact, t-tests and ANOVA can be seen as special cases of linear models. For example, a two-sample t-test is...

Technical requirements

We will use renv to manage packages in a project-specific way. To use renv to install packages, you will first need to install the renv package. You can do this by running the following commands in your R console:

Install renv:
```
install.packages("renv")
```
Create a new renv environment:
```
renv::init()
```
This will create a new directory called .renv in your current project directory.
You can then install packages with the following:
```
renv::install_packages()
```
You can also use the renv package manager to install Bioconductor packages by running the following command:
```
renv::install("bioc::package name")
```
For example, to install the Biobase package, you would run the following:
```
renv::install("bioc::Biobase")
```
You can use renv to install development packages from GitHub with this command:
```
renv::install("user name/repo name")
```
For example, to install the user danmaclean package rbioinfcookbook, you would run the following...

Modeling data with a linear model

Linear models are a type of statistical model used to analyze the relationship between a dependent variable and one or more independent variables. In essence, they seek to fit a line that best describes the relationship between these variables, allowing us to make predictions about the dependent variable based on the values of the independent variables. The equation for a simple linear model can be written as follows:

y = β 0 + β 1 x + ε

where y is the dependent variable, x is the independent variable, β 0 and β 1 are coefficients that represent the intercept and slope of the line, respectively, and ε is the error term.

The output of a linear model typically includes the coefficients of the model, which describe the strength and direction of the relationship between the variables, as well as measures of the model’s goodness of fit, such as the R-squared value.

Linear models...

Using a linear model to compare the mean of two groups

The t-test is a statistical method used to help us decide whether there is likely to be a difference between the means of two groups. t-tests are probably the most widely used tests in bioinformatics and biology, usually applied without consideration as to whether the assumptions of the test hold and can be intepreted without criticism. By learning how to do the t-test through building a linear model, you will be able to test whether the assumptions hold since a well fit model implies a good fit to the assumptions. The t-test is a special case of the linear model because it can be framed as a linear regression problem with a binary predictor variable.

In the linear model, we try to fit a linear equation that describes the relationship between a response output variable (dependent variable) and one or more predictor input variables (independent variables). In the case of a t-test, we have one binary predictor variable, which...

Using a linear model and ANOVA to compare multiple groups in a single variable

ANOVA is a statistical method used to test whether there is a significant difference between two or more groups. ANOVA compares the variance within groups to the variance between groups to determine if there is a statistically significant difference in the means of the groups. ANOVA is commonly used in experiments where a response variable is measured across several groups under different experimental conditions.

ANOVA can be used to compare gene expression levels across multiple samples under different experimental conditions, the response variable is the gene expression level, and the categorical variable is the experimental condition. ANOVA can also be used in clinical trials to compare the effectiveness of different treatments or interventions for a disease or medical condition.

Linear models can be used to perform ANOVA by fitting a linear model to the data with a categorical variable that represents...

Using linear models and ANOVA to compare multiple groups in multiple variables

Two-way ANOVA is a statistical method used to analyze the effects of two categorical independent variables, also known as factors, on a continuous dependent variable. The two independent variables can be either fixed or random.

The main purpose of two-way ANOVA is to examine whether there is a significant interaction between the two independent variables, as well as to determine the main effects of each independent variable on the dependent variable.

The analysis involves calculating the sum of squares for each of the effects and the interaction and comparing these values to their respective degrees of freedom to obtain F ratios. The F ratios are then compared to critical values from an F-distribution to determine whether the effects are statistically significant.

Like the one-way ANOVA seen in the Using a linear model and ANOVA to compare multiple groups in a single variable recipe, the basis is...

Testing and accounting for interactions between variables in linear models

An interaction between variables occurs when the effect of one predictor variable on the response variable depends on the level of another predictor variable. In other words, the effect of one variable is not constant across different levels of the other variable. The interaction can occur between different drug regimes in medical trials or generally multiple experimental conditions being changed.

Linear models can model interactions by including interaction terms in the model formula. An interaction term is the product of two or more predictor variables, where each predictor variable is centered to have a mean of zero.

Suppose we have a linear regression model with two predictor variables, x1 and x2, and we want to examine their interaction. The interaction term can be included in the model as follows:

y = β 0 + β 1 x 1 + β 2 x 2 + β 3(x 1...

Doing tests for differences in data in two categorical variables

Categorical output variables, also known as response variables or dependent variables, are variables that take on discrete values from a finite set of possible outcomes. We can consider that there are two types of categorical variables: nominal and ordinal.

Ordinal variables have a natural ordering among the categories. Examples of ordinal variables include education level, income bracket, and satisfaction ratings. In linear models, ordinal variables can be represented using their numerical values or by assigning each category a numerical rank. For example, in a linear model predicting job satisfaction based on salary, the ordinal variable income bracket could be assigned a numerical rank from one to five based on the size of the income range. Ranking helps us to use the linear model framework fairly easily.

Nominal variables are variables that have no inherent order or ranking among the categories. Examples of...

Making predictions using linear models

Linear models are commonly used in bioinformatics for prediction tasks due to their simplicity, interpretability, and ability to handle high-dimensional datasets. In bioinformatics, researchers often work with large datasets that have a large number of features (such as gene expression data or sequence data), making it challenging to analyze them with more complex models. Linear models offer a straightforward and computationally efficient way to analyze these datasets. Linear models can help researchers identify genes or genetic variants that are associated with a particular trait or disease. They can also be used in feature selection, which is an important step in bioinformatics data analysis. Feature selection aims to identify the most relevant features (genes, proteins, etc.) that are associated with the outcome of interest (disease, drug response, etc.). Linear models can be used to rank features based on their importance and select the most...

The rest of the chapter is locked

You have been reading a chapter from

R Bioinformatics Cookbook - Second Edition

Published in: Oct 2023Publisher: PacktISBN-13: 9781837634279

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Dan MacLean

Professor Dan MacLean has a PhD in molecular biology from the University of Cambridge and gained postdoctoral experience in genomics and bioinformatics at Stanford University in California. Dan is now an honorary professor at the School of Computing Sciences at the University of East Anglia. He has worked in bioinformatics and plant pathogenomics, specializing in R and Bioconductor, and has developed analytical workflows in bioinformatics, genomics, genetics, image analysis, and proteomics at the Sainsbury Laboratory since 2006. Dan has developed and published software packages in R, Ruby, and Python, with over 100,000 downloads combined.
Read more about Dan MacLean

Personalised recommendations for you

Based on your interests and search pattern

Engineering Manager's Handbook

Engineering Manager's Handbook is a comprehensive guide for managers to excel in their role, foster customer-centric digital products, learn leadership, team building, and balancing technical work with management. You’ll also explore how to develop trust, authority, and collaboration to drive success and make a lasting impact.

BookSep 2023278 pages

C++ Game Animation Programming

Video game characters have a fascinating history, evolving from simple 2D sprites to high-polygon 3D models. Take a look behind the curtain and learn how to build a 3D renderer, load character models, play animations and blend between them, and create large crowds of animated people with this comprehensive C++ game animation programming guide.

BookDec 2023480 pages

Gamification for Product Excellence

This book helps you to take your product management strategy to the next level by standing out in crowded markets. Along with boosting user adoption rates by creating engaging products that incorporate playful elements, learn gamification theory and how to integrate it into your design, product development, and product management processes.

BookSep 2023350 pages

Supercharging Productivity with Trello

Supercharging Productivity with Trello is the ultimate guide for anyone looking to boost their productivity with digital tools. Whether you're new to Trello or a seasoned professional, this book covers everything from core features to advanced automation, and Power-Ups.

BookAug 2023342 pages

Automate It with Zapier and Generative AI

This comprehensive guide takes you through the concepts of business process automation, showing you how Zapier can facilitate it without having to write code and helping you to boost productivity. You’ll learn how to save time, reduce costs, and make your business recession-proof by using Zapier to automate tasks in your cloud-based business apps.

BookAug 2023706 pages

Scoring to Picture in Logic Pro

In this book, you’ll explore a variety of techniques to synchronize music to picture using Logic Pro. Though this is not a technical manual, it will teach you how to make the best use of Logic Pro and how to wield this technology to maximize your potential when scoring to picture.

BookSep 2023412 pages

Mastering Information Security Compliance Management

This concise book equips you with the knowledge and practices needed to establish and maintain an effective information security management system. The chapters provide insights into ISO/IEC 27001/27002:2022, risk management, ISMS development, incident management, audit processes, and strategies for continuous improvement.

BookAug 2023236 pages1

Implementing Atlassian Confluence

Implementing Atlassian Confluence provides both a high-level overview and an insightful path for remote collaboration with Atlassian Confluence. With this multi-layered yet practical guide, you’ll be able to set up Confluence-based collaboration with minimum external consultancy services to ensure smooth and close coordination between teams.

BookSep 2023406 pages

R Bioinformatics Cookbook

This book takes a unique problem–solution approach to handling complex tasks in the bioinformatics domain using different datasets present in the book. With the help of real-world examples, you’ll learn to put each independent recipe to use to tackle problems in the field of bioinformatics.

BookOct 2023396 pages

Build Your Own Metaverse with Unity

Build Your own Metaverse with Unity is a practical guide for developers to create their own metaverse - a virtual world with infinite possibilities. It empowers you to identify gaps in existing metaverses and improve upon them, enabling you to shape your virtual world.

BookSep 2023586 pages5