Reader small image

You're reading from  R Statistics Cookbook

Product typeBook
Published inMar 2019
Reading LevelExpert
PublisherPackt
ISBN-139781789802566
Edition1st Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Francisco Juretig
Francisco Juretig
author image
Francisco Juretig

Francisco Juretig has worked for over a decade in a variety of industries such as retail, gambling and finance deploying data-science solutions. He has written several R packages, and is a frequent contributor to the open source community.
Read more about Francisco Juretig

Right arrow

Mixed Effects Models

We will cover the following recipes in this chapter:

  • The standard model and ANOVA
  • Some useful plots for mixed effects models
  • Nonlinear mixed effects models
  • Crossed and nested designs
  • Robust mixed effects models with robustlmm
  • Choosing the best linear mixed model
  • Mixed generalized linear models

Introduction

In Chapter 2, Univariate and Multivariate Tests for Equality of Means, we discussed mixed effects models in the context of the analysis of variance (ANOVA). These models arise when we have a mixture of fixed and random effects. Fixed effects are associated to standard coefficients that appear in every regression problem, and random effects are variance components that govern shocks that are shared by members of the same groups. For example, the grades of any student can be thought of as the sum of how many hours the student spent studying (this would be the fixed effect) and a random shock that is shared across all students from the same school. The idea is to capture that students belonging to the same school to have correlated grades.

The standard model and ANOVA

In this recipe, we will be more interested in the regression part of it, instead of the ANOVA part. In the previous ANOVA chapter, we only used random effects for the intercepts, and this is usually not the price only way that random effects are introduced. Imagine that we model the sales in terms of price for certain customers, where we have several observations for each one of them. The ordinary least squares (OLS) standard approach would be to ignore this heterogeneity and pool all the observations together.

Naturally, this would introduce a problem, because the residuals would then be correlated (observations belonging to the same individual will produce similar residuals). The correct approach would be to introduce a random effect per individual, but there is a subtle point here: we are not expecting the response to differ in terms of an intercept...

Some useful plots for mixed effects models

In this recipe, we will explore some interesting plots that are for presenting and analyzing the results from mixed effects models. In the simplest formulation of mixed effects models, we have a random intercept by group. Every observation belonging to the same group will share that very same shock, rendering all of them correlated. But this can be extended to other coefficients (not just the intercept). We could have yet another coefficient, that is, beta would be the sum of beta1 (which would be fixed) and beta_random (this would be a random effect). What this would imply is that the slope relating to the regressor and the response, would have two parts: a part that is the same for all the observations, and another part that depends on each group.

...

Nonlinear mixed effects models

Linear mixed effects models assume that a linear relationship exists between the predictors and the target variable. In many cases, this is a problematic assumption; whenever the target is expected to show any kind of saturation effect or have an exponential response with respect to any of the regressors, the linearity assumption needs to be removed.

In medicine and biology, this is usually the case, as dose response studies almost always exhibit a certain kind of saturation effect. The same happens for marketing studies, because spending increasing amounts of resources in order to drive sales up might be effective, but it won’t be effective if that spend is too large.

Fitting nonlinear mixed effects models is much harder than their linear counterpart. Here, we can’t rely on any matrix techniques and we need to attack the problem...

Crossed and nested designs

Whenever we collect data of a model with the intention of testing something, we are implicitly working with an experimental design. Experimental design refers to the setup that defines which experimental units are used, and how they are allocated to each treatment. For example, if we want to measure whether clients are more likely to buy a product after receiving a discount, we need to define which clients will be in the control or test group. Furthermore, we need to define how many of them will fall in each group. All these decisions will have implications regarding the effects and contrasts that we can estimate, and what the precision will be for each one. This is why experimental design has transcendental consequences for our ANOVA and regression models.

Understanding the underlying design for an experiment is of prime importance. The design type...

Robust mixed effects models with robustlmm

The lme4 package is the de facto package for linear mixed models. Its syntax has become a standard in the industry and most researchers working with applied linear models use it. As we have seen with many techniques so far, the problem with it is that it can be impacted greatly by outliers. Even a minor contamination causes major estimation problems.

Getting ready

The lme4 and robustlmm packages are needed for this recipe. They can be installed using install.packages().

How to do it...

In this recipe, we will use the robustlmm...

Choosing the best linear mixed model

When using OLS models, choosing the best one is not a complex task: we have a set of variables that we use, and we just pick whichever model has the lowest Akaike information criterion (AIC) (or any other appropriate metric that we choose).

Mixed models entail an extra level of complexity, as we can define the random effects in many ways. Resuming our previous example of deal_size versus time_spent and salespeople, we could choose a model with random effects only for the deal_size or both the deal_size and salespeople. We can also decide to add a random intercept or not, and we can force the model to assume that the shocks impacting each one of these are either, uncorrelated or correlated.

Choosing models by comparing the AIC is quite hard for mixed models, since we have a random and a fixed part. There are two types of analysis that we might...

Mixed generalized linear models

Generalized linear models are a set of techniques that generalizes the linear regression model (which assumes that the dependent variable is Gaussian) into a wide variety of distributions for the response variable. This response can no longer be Gaussian, but can belong to any distribution that is part of the so-called exponential family. In fact, there are many distributions that fall into this category, such as the binomial, gamma, Poisson, or negative binomial distributions. This fact allows us to work with a wide array of situations, such as with count data, or binary responses, and so on.

Generalized linear models (referred to as GLMs in the literature) are defined by three things: first, a linear predictor that relates the covariates with the response variable; second, a probability distribution for the dependent variable from the exponential...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
R Statistics Cookbook
Published in: Mar 2019Publisher: PacktISBN-13: 9781789802566
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Francisco Juretig

Francisco Juretig has worked for over a decade in a variety of industries such as retail, gambling and finance deploying data-science solutions. He has written several R packages, and is a frequent contributor to the open source community.
Read more about Francisco Juretig