Reader small image

You're reading from  Regression Analysis with R

Product typeBook
Published inJan 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788627306
Edition1st Edition
Languages
Right arrow
Author (1)
Giuseppe Ciaburro
Giuseppe Ciaburro
author image
Giuseppe Ciaburro

Giuseppe Ciaburro holds a PhD and two master's degrees. He works at the Built Environment Control Laboratory - Università degli Studi della Campania "Luigi Vanvitelli". He has over 25 years of work experience in programming, first in the field of combustion and then in acoustics and noise control. His core programming knowledge is in MATLAB, Python and R. As an expert in AI applications to acoustics and noise control problems, Giuseppe has wide experience in researching and teaching. He has several publications to his credit: monographs, scientific journals, and thematic conferences. He was recently included in the world's top 2% scientists list by Stanford University (2022).
Read more about Giuseppe Ciaburro

Right arrow

Chapter 3. More Than Just One Predictor – MLR

In Chapter 2, Basic Concepts – Simple Linear Regression, we understood the concept of simple linear regression that covers the relationship between only one independent variable (explanatory variable) and the dependent variable (response variable). It's not very often that we find a variable that depends solely on another. Usually, we find that the response variable depends on at least two predictors.

Let's take a look at an example. Getting to the workplace can often be a path full of variables. Scheduling your departure to arrive on time can be a difficult task. That is why you need to take different variables into account: the distance from your home, the type of route to follow (street type), traffic along the route, the number of stops (if you need to drop your children to school), weather conditions, and so on. Have you ever thought about it? Every morning, we have to plan all these things in order to get to work on time.

In this chapter...

Multiple linear regression concepts


So far, we have resolved simple linear regression problems; they study the relation between a dependent variable, y, and an independent variable, x, based on the regression equation:

In this equation, the explanatory variable is represented by x and the response variable is represented by y. To solve this problem, the least squares method was used. In this method, we can find the best fit by minimizing the sum of squares of the vertical distances from each data point on the line. As mentioned before, we don't find that a variable depends solely on another very often. Usually, we find that the response variable depends on at least two predictors. In practice, we will have to create models with a response variable that depends on more than one predictor. These models are named multiple linear regression, a straightforward generalization of single predictor models. According to multiple linear regression models, the dependent variable is related to two or...

Building a multiple linear regression model


In Chapter 2Basic Concepts – Simple Linear Regression, we learned to use the lm() function to create a simple linear regression model. We can also use it to solve this kind of problem.

To practice this method, we can draw on the many datasets available on the internet. In this case, we will load a .csv file named EscapingHydrocarbons.csv into the R environment; it contains the quantity of hydrocarbons escaping, depending on different variables.

Note

Source: Linear Regression Datasets offered by the Department of Scientific Computing, Florida State University (http://people.sc.fsu.edu/~jburkardt/datasets/regression/regression.html).

When petrol is pumped into tanks, hydrocarbons escape. To evaluate the effectiveness of pollution controls, experiments were performed. The quantity of hydrocarbons escaping was measured as a function of the tank temperature, the temperature of the petrol pumped in, the initial pressure in the tank, and the pressure of...

Multiple linear regression with categorical predictor


After dealing with several examples of linear regression, we can certainly claim to have understood the mechanisms underlying this statistical technique. So far, we've used only continuous variables, such as predictors. What happens when the predictors are categorical variables? Don't worry, because the underlying principles of regression techniques remain the same.

Categorical variables

Categorical variables are variables that are not numerical. They do not derive from measurement operations (and do not have units of measurement), but from classification and comparison operations; for instance, they describe data that fits into specific categories. Categorical variables can be further grouped as nominal, dichotomous, or ordinal:

  • Nominal variables are variables that have two or more categories but do not have an intrinsic order. For example, the blood group variable, limited to the ABO system, can assume the values A, B, AB, and O. If we...

Gradient Descent and linear regression


The Gradient Descent (GD) is an iterative approach for minimizing the given function, or, in other words, a way to find a local minimum of a function. The algorithm starts with an initial estimate of the solution that we can give in several ways: one approach is to randomly sample values for the parameters. We evaluate the slope of the function at that point, determine the solution in the negative direction of the gradient, and repeat this process. The algorithm will eventually converge where the gradient is zero, corresponding to a local minimum.

The steepest descent step size is replaced by a similar size from the previous step. The gradient is basically defined as the slope of the curve, as shown in the following figure:

In Chapter 2Basic Concepts – Simple Linear Regression, we saw that the goal of OLS regression is to find the line that best fits the predictor in terms of minimizing the overall squared distance between itself and the response. In...

Polynomial regression


Polynomial models can be used in situations where the relationship between response and explanatory variables is curvilinear. Sometimes, a nonlinear relationship in a small range of explanatory variables can also be modeled by polynomials.

A polynomial quadratic (squared) or cubic (cubed) term turns a linear regression model into a polynomial curve. However, since it is the explanatory variable that is squared or cubed and not the Beta coefficient, it still qualifies as a linear model. This makes it a nice, straightforward way to model curves, without having to model complicated nonlinear models.

In polynomial regression, some predictors appear in degrees equal to or greater than two. The model continues to be linear in its parameters. For example, a second-degree parabolic regression model looks like this:

This model can easily be estimated by introducing a second-degree term in the regression model. The difference is that in polynomial regression, the equation produces...

Summary


In this chapter, we learned the basic concepts of multiple linear regression, where linear regression is extended to extract predictive information from more than one feature. We saw how to tune the multiple linear regression model for higher performance and deeply understood every parameter of it. We understood the information contained in linear regression models that we can build with the lm function. Furthermore, we have learned to carry out a proper residuals analysis to understand, in depth, whether the model we built has been effective in predicting our system. We dealt with the case of a linear regression model with categorical variables.

We then explored the SGD technique for optimization of algorithms used on regression to find a good set of model parameters given a training dataset. After analyzing the GD algorithms in detail, we solved a multiple linear regression problem with the use of the sgd package.

Finally, polynomial regression was introduced where linear regression...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Regression Analysis with R
Published in: Jan 2018Publisher: PacktISBN-13: 9781788627306
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Giuseppe Ciaburro

Giuseppe Ciaburro holds a PhD and two master's degrees. He works at the Built Environment Control Laboratory - Università degli Studi della Campania "Luigi Vanvitelli". He has over 25 years of work experience in programming, first in the field of combustion and then in acoustics and noise control. His core programming knowledge is in MATLAB, Python and R. As an expert in AI applications to acoustics and noise control problems, Giuseppe has wide experience in researching and teaching. He has several publications to his credit: monographs, scientific journals, and thematic conferences. He was recently included in the world's top 2% scientists list by Stanford University (2022).
Read more about Giuseppe Ciaburro