Reader small image

You're reading from  Regression Analysis with R

Product typeBook
Published inJan 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788627306
Edition1st Edition
Languages
Right arrow
Author (1)
Giuseppe Ciaburro
Giuseppe Ciaburro
author image
Giuseppe Ciaburro

Giuseppe Ciaburro holds a PhD and two master's degrees. He works at the Built Environment Control Laboratory - Università degli Studi della Campania "Luigi Vanvitelli". He has over 25 years of work experience in programming, first in the field of combustion and then in acoustics and noise control. His core programming knowledge is in MATLAB, Python and R. As an expert in AI applications to acoustics and noise control problems, Giuseppe has wide experience in researching and teaching. He has several publications to his credit: monographs, scientific journals, and thematic conferences. He was recently included in the world's top 2% scientists list by Stanford University (2022).
Read more about Giuseppe Ciaburro

Right arrow

Chapter 4. When the Response Falls into Two Categories – Logistic Regression

In previous chapters, we studied linear regression models in detail. In particular, we found that in all the models described, the response variable takes quantitative values. Often in everyday life, response variables are qualitative instead. For example, we want to determine whether a device is on or off, depending on the noise detected in the environment. Or we want to know whether to issue a credit based on financial information and other personal information. Or we want to diagnose a patient's disease first to select the immediate treatment pending final results.

In each of these cases, we want to explain the probability of having an attribute, or an event occurring, in relation to the number of possible variations of multiple explanatory variables. In other words, we are trying to classify a phenomenon. The term classification can cover any context in which certain decisions or forecasts are made on the basis...

Understanding logistic regression


In linear regression, the dependent variable y (response variable) is continuous and its estimated value can be thought of as a conditional mean estimation for each value of x. In this case, it is assumed that the variable y is distributed according to normal distribution. When the dependent variable is dichotomous, and can be coded as having two values, zero or one (such as on = one, off = zero), the theoretical distribution of reference should not be normal but binomial distribution.

In fact, as we have seen in Chapter 2Basic Concepts – Simple Linear Regression, the linear model is based on the following regression equation:

Here, the values of the dependent variable can go from -∞ to +∞. All this does not agree with the expected values for a dichotomous variable, which as we have said, assumes only two values (0;1).

Let's try to understand this concept better by analyzing a simple example. Let's suppose we've put the data of a certain observation on a...

Generalized Linear Model


In the previous chapters, we have worked with regression models where the response variable is quantitative and normally distributed. Now, we turn our attention to models where the response variable is discrete and the error terms do not follow a normal distribution. Such models are called GLMs.

GLMs are extensions of traditional regression models that allow the mean to depend on the explanatory variables through a link function, and the response variable to be any member of a set of distributions called the exponential family (such as Binomial, Gaussian, Poisson, and others).

In R, to fit GLMs we can use the glm() function. The model is specified by giving a symbolic description of the linear predictor and a description of the error distribution. Its usage is similar to that of the function lm() which we previously used for multiple linear regression. The main difference is that we need to include an additional argument family to describe the error distribution and...

Multiple logistic regression


In the previous section, we introduced the simple logistic regression model, where the dichotomous response depends on only one explanatory variable. As in the case of linear regression, which we analyzed in Chapter 2Basic Concepts – Simple Linear Regression, and Chapter 3More Than Just One Predictor – MLR, the popularity of a modeling technique lies in its ability to model many variables, which can be on different measurement scales. Now, we will generalize the logistic model to the case of more than one independent variable.

Central arguments in dealing with multiple logistic models will be the estimate of the coefficients in the model and the tests for their significance. This will follow the same lines as the univariate model already seen in the previous section. In multiple regression, the coefficients are called partial because they express the specific relationship that an independent variable has with the dependent variable net of the other independent...

Multinomial logistic regression


A generalization of logistic regression techniques makes it possible to deal with the case where the dependent variable is categorical on more than two levels. This is a case of multinomial or polynomial logistic regression.

A first distinction to operate is between nominal and ordinal logistic regression. We refer to nominal logistic regression when there is no natural order among the categories of the dependent variable, as can be the choice between four pizza types or between some singers. When, on the other hand, you can classify the dependent variable levels in an orderly scale, you are talking about ordinal logistic regression.

To perform multinomial logistic regression analysis, we can use the mlogit package. mlogit is a package for R which enables the estimation of the multinomial logit models with individual and/or alternative specific variables. The main extensions of the basic multinomial model (heteroscedastic, nested and random parameter models...

Summary


In this chapter, we introduced classifications through logistic regression techniques. We have first described a logistic model and then we have provided some intuitions underneath the math formulation.  We learned how to define a classification problem and how to apply logistic regression to solve this type of problem. We introduced the simple logistic regression model, where the dichotomous response depends on only one explanatory variable.

Then, we generalized the logistic model to the case of more than one independent variable in the multiple logistic regression. Central arguments in dealing with multiple logistic models have been the estimate of the coefficients in the model and the tests for their significance. This has followed the same lines as the univariate model. In multiple regression, the coefficients are called partial because they express the specific relationship that an independent variable has with the dependent variable net of the other independent variables considered...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Regression Analysis with R
Published in: Jan 2018Publisher: PacktISBN-13: 9781788627306
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Giuseppe Ciaburro

Giuseppe Ciaburro holds a PhD and two master's degrees. He works at the Built Environment Control Laboratory - Università degli Studi della Campania "Luigi Vanvitelli". He has over 25 years of work experience in programming, first in the field of combustion and then in acoustics and noise control. His core programming knowledge is in MATLAB, Python and R. As an expert in AI applications to acoustics and noise control problems, Giuseppe has wide experience in researching and teaching. He has several publications to his credit: monographs, scientific journals, and thematic conferences. He was recently included in the world's top 2% scientists list by Stanford University (2022).
Read more about Giuseppe Ciaburro