You're reading from The Statistics and Machine Learning with R Workshop

Product typeBook

Published inOct 2023

Reading LevelIntermediate

PublisherPackt

ISBN-139781803240305

Edition1st Edition

Languages

Concepts

Machine Learning

Author (1)

Liu Peng

Linear Regression in R

In this chapter, we will introduce linear regression, a fundamental statistical approach that’s used to model the relationship between a target variable and multiple explanatory (also called independent) variables. We will cover the basics of linear regression, starting with simple linear regression and then extending the concepts to multiple linear regression. We will learn how to estimate the model coefficients, evaluate the goodness of fit, and test the significance of the coefficients using hypothesis testing. Additionally, we will discuss the assumptions underlying linear regression and explore techniques to address potential issues, such as nonlinearity, interaction effect, multicollinearity, and heteroskedasticity. We will also introduce two widely used regularization techniques: the ridge and Least Absolute Shrinkage and Selection Operator (lasso) penalties.

By the end of this chapter, you will learn the core principles of linear regression...

Introducing linear regression

At the core of linear regression is the concept of fitting a straight line – or more generally, a hyperplane – to the data points. Such fitting aims to minimize the deviation between the observed and predicted values. When it comes to simple linear regression, one target variable is regressed by one predictor, and the goal is to fit a straight line that best mimics the relationship between the two variables. For multiple linear regression, there is more than one predictor, and the goal is to fit a hyperplane that best describes the relationship among the variables. Both tasks can be achieved by minimizing a measure of deviation between the predictions and the corresponding targets.

In linear regression, obtaining an optimal model means identifying the best coefficients that define the relationship between the target variable and the input predictors. These coefficients represent the change in the target associated with a single unit change...

Introducing penalized linear regression

Penalized regression models, such as ridge and lasso, are techniques that are used to handle problems such as multicollinearity, reduce overfitting, and even perform variable selection, especially when dealing with high-dimensional data with multiple input features.

Ridge regression (also called L2 regularization) is a method that adds a penalty equivalent to the square of the magnitude of coefficients. We would add this term to the loss function after weighting it by an additional hyperparameter, often denoted as λ, to control the strength of the penalty term.

Lasso regression (L1 regularization), on the other hand, is a method that, similar to ridge regression, adds a penalty for non-zero coefficients, but unlike ridge regression, it can force some coefficients to be exactly equal to zero when the penalty tuning parameter is large enough. The larger the value of the hyperparameter, λ, the greater the amount of shrinkage. The...

Working with ridge regression

Ridge regression, also referred to as L2 regularization, is a commonly used technique to alleviate overfitting in linear regression models by penalizing the magnitude of the estimated coefficients in the resulting model.

Recall that in an SLR model, we seek to minimize the sum of the squared differences between our predicted and actual values, which we refer to as the least squares method. The loss function we wish to minimize is the RSS:

RSS = ∑ i=1 n (y i − (β 0 + ∑ j=1 p β j x ij)) 2

Here, y i is the actual target value, β 0 is the intercept term, {β j} are the coefficient estimates for each predictor, x ij, and the summations are overall observations and predictors.

Purely minimizing the RSS would give us an overfitting model, as represented by the high magnitude of the resulting coefficients. As a remedy, we could apply...

Working with lasso regression

Lasso regression is another type of regularized linear regression. It is similar to ridge regression but differs in terms of the specific process of calculating the magnitude of the coefficients. Specifically, it uses the L1 norm of the coefficients, which consists of the total sum of absolute values of the coefficients, as the penalty that’s added to the OLS loss function.

The lasso regression cost function can be written as follows:

L lasso = RSS + λ∑ j=1 p | β j|

The key characteristic of lasso regression is that it can reduce some coefficients exactly to 0, effectively performing variable selection. This is a consequence of the L1 penalty term and is not the case for ridge regression, which can only shrink coefficients close to 0. Therefore, lasso regression is particularly useful when we believe that only a subset of the predictors matters when it comes to predicting the outcome.

In addition...

Summary

In this chapter, we covered the nuts and bolts of the linear regression model. We started by introducing the SLR model, which consists of only one input variable and one target variable, and then extended to the MLR model with two or more predictors. Both models can be assessed using R 2, or more preferably, the adjusted R 2 metric. Next, we discussed specific scenarios, such as working with categorical variables and interaction terms, handling nonlinear terms via transformations, working with the closed-form solution, and dealing with multicollinearity and heteroskedasticity. Lastly, we introduced widely used regularization techniques such as ridge and lasso penalties, which can be incorporated into the loss function as a penalty term and generate a regularized model, and, additionally, a sparse solution in the case of lasso regression.

In the next chapter, we will cover another type of widely used linear model: the logistic regression model.

The rest of the chapter is locked

You have been reading a chapter from

The Statistics and Machine Learning with R Workshop

Published in: Oct 2023Publisher: PacktISBN-13: 9781803240305

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Liu Peng

Peng Liu is an Assistant Professor of Quantitative Finance (Practice) at Singapore Management University and an adjunct researcher at the National University of Singapore. He holds a Ph.D. in statistics from the National University of Singapore and has ten years of working experience as a data scientist across the banking, technology, and hospitality industries.
Read more about Liu Peng

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages