Reader small image

You're reading from  Regression Analysis with R

Product typeBook
Published inJan 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788627306
Edition1st Edition
Languages
Right arrow
Author (1)
Giuseppe Ciaburro
Giuseppe Ciaburro
author image
Giuseppe Ciaburro

Giuseppe Ciaburro holds a PhD and two master's degrees. He works at the Built Environment Control Laboratory - Università degli Studi della Campania "Luigi Vanvitelli". He has over 25 years of work experience in programming, first in the field of combustion and then in acoustics and noise control. His core programming knowledge is in MATLAB, Python and R. As an expert in AI applications to acoustics and noise control problems, Giuseppe has wide experience in researching and teaching. He has several publications to his credit: monographs, scientific journals, and thematic conferences. He was recently included in the world's top 2% scientists list by Stanford University (2022).
Read more about Giuseppe Ciaburro

Right arrow

Chapter 8. Beyond Linearity – When Curving Is Much Better

Some problems cannot be solved with linear models. Often, we must go beyond the simple linearity of models by introducing features that take into account the complexity of the phenomenon. Nonlinear models are more complex (and more prone to overfitting), but sometimes they are the only solution.

In this chapter, we will see an introduction to the most used ones, how to train them, and how to apply them. First, a nonlinear least squares method will be treated, where the parameters of the regression function to be estimated are nonlinear. In this technique, given the nonlinearity of the coefficients, the solution of the problem occurs by means of iterative numerical calculation methods. Then Multivariate Adaptive Regression Splines (MARS) will be performed. This is a nonparametric regression procedure that makes no assumption about the underlying functional relationship between the response and predictor variables. This relationship...

Nonlinear least squares


In Chapter 3More Than Just One Predictor – MLR, we have already handled a case in which a linear regression was unable to model the relationship between the response and predictors. In that case, we solved the problem by applying polynomial regression. When the relationships between variables are not linear, three solutions are possible:

  • Linearize the relationship by transforming the data
  • Fit polynomial or complex spline models
  • Fit a nonlinear model

The first two solutions you have already faced in somemanner in the previous chapters. Now we will focus on the third solution. If the parameters of the regression function to be estimated are nonlinear, that is, they appear at a different degree from the first, the Ordinary Least Squares (OLS) can no longer be applied and other methods need to be applied.

In the multiple nonlinear regression models, the dependent variable is related to two or more independent variables as follows:

Here, the model is not linear with respect...

Multivariate Adaptive Regression Splines


MARS is a form of regression analysis introduced by Jerome H. Friedman (1991), with the main purpose being to predict the values of a response variable from a set of predictor variables.

MARS is a nonparametric regression procedure that makes no assumption about the underlying functional relationship between the response and predictor variables.

This relationship is constructed from a set of coefficients and basis functions that are processed starting from the regression data. The method divides the input space into regions, each with its own regression equation. This makes MARS particularly suitable for problems with a large number of predictors. The following figure shows a distribution with two regression regions:

The MARS algorithm operates as a multiple piecewise linear regression, where each breakpoint (estimated from the data) defines the region of application for a very simple linear regression equation.

The general MARS model equation is as follows...

Generalized Additive Model


A GAM is a GLM in which the linear predictor is given by a user-specified sum of smooth functions of the covariates plus a conventional parametric component of the linear predictor. Assume that a sample of n objects has a response variable y and r explanatory variables x1,. . . , xr. In these assumptions, the regression equation becomes:

Here, the functions f1, f2,…., fr are different nonlinear functions on variables x. Into the GAM, the linear relationship between the response and predictors are replaced by several nonlinear smooth functions to model and capture the nonlinearities in the data.

We can see the GAM as a generalization of a multiple regression model without interactions between predictors. Among the advantages of this approach, in addition to greater flexibility than the linear model, the good algorithmic convergence rate should also be mentioned for problems with many explanatory variables. The biggest drawback lies in the complexity of the parameter...

Regression trees


Decision trees are used to predict a response or class y from several input variables x1, x2,…,xn. If y is a continuous response, it's called a regression tree, if y is categorical, it's called a classification tree. That's why these methods are often called Classification and Regression Tree (CART). The algorithm is based on the following procedure: at each node of the tree, we check the value of one the input xi and depending of the (binary) answer we continue to the left or to the right branch. When we reach a leaf we will find the prediction.

This algorithm starts from grouped data into a single node (root node) and executes a comprehensive recursion of all possible subdivisions at every step. At each step, the best subdivision is chosen, that is, the one that produces as many homogeneous branches as possible.

In the regression trees, we try to partition the data space into small-enough parts where we can apply a simple different model on each part. The non leaf part of...

Support Vector Regression


SVR is based on the same principles as the Support Vector Machine (SVM). In fact, SVR is the adapted form of SVM when the dependent variable is numeric rather than categorical. One of the main advantages of using SVR is that it is a nonparametric technique.

To build the model, the SVR technique uses the kernel functions. The commonly used kernel functions are:

  • Linear
  • Polynomial
  • Sigmoid
  • Radial base

This technique allows the fitting of a nonlinear model without changing the explanatory variables, helping to interpret the resulting pattern better.

In the SVR, we do not have to worry about the prediction as long as the error (ε) remains above a certain value. This method is called the maximal margin principle. The maximal margin allows SVR to be seen as a convex optimization problem.

Regression can also be penalized using a cost parameter, which becomes useful in avoiding excess adaptation. SVR is a useful technique that provides the user with a great flexibility in distributing...

Summary


In this chapter, several advanced techniques to solve regression problems that cannot be solved with linear models were treated. First, a nonlinear least squares method was explored, where the parameters of the regression function to be estimated were nonlinear. In this technique, given the nonlinearity of the coefficients, the solution of the problem occurs by means of iterative numerical calculation methods. Then a MARS was performed. This is a nonparametric regression procedure that makes no assumption about the underlying functional relationship between the response and predictor variables. This relationship is constructed from a set of coefficients and basis functions that are processed, starting from the regression data.  

Later, we focused attention on a GAM. This is a GLM in which the linear predictor is given by a user-specified sum of smooth functions of the covariates plus a conventional parametric component of the linear predictor. Then, we introduced the tree regression...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Regression Analysis with R
Published in: Jan 2018Publisher: PacktISBN-13: 9781788627306
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Giuseppe Ciaburro

Giuseppe Ciaburro holds a PhD and two master's degrees. He works at the Built Environment Control Laboratory - Università degli Studi della Campania "Luigi Vanvitelli". He has over 25 years of work experience in programming, first in the field of combustion and then in acoustics and noise control. His core programming knowledge is in MATLAB, Python and R. As an expert in AI applications to acoustics and noise control problems, Giuseppe has wide experience in researching and teaching. He has several publications to his credit: monographs, scientific journals, and thematic conferences. He was recently included in the world's top 2% scientists list by Stanford University (2022).
Read more about Giuseppe Ciaburro