You're reading from Extending Excel with Python and R

Product typeBook

Published inApr 2024

PublisherPackt

ISBN-139781804610695

Edition1st Edition

Concepts

Data Analysis

Authors (2):

Steven Sanderson

David Kun

View More author details

Statistical Analysis: Linear and Logistic Regression

Welcome to our comprehensive guide on linear and logistic regression using R and Python, where we will explore these essential statistical techniques using two popular frameworks: tidymodels and base R and Python. Whether you’re a data science enthusiast or a professional looking to sharpen your skills, this tutorial will help you gain a deep understanding of linear and logistic regression and how to implement them in R and Python. Now, it is possible to perform linear and logistic regression. The issue here is that linear regression can only be performed on a single series of ungrouped data, and performing logistic regression is cumbersome and may require the use of external solver add-ins. Also, the process can only be performed against ungrouped or non-nested data. In R and Python, we do not have such limitations.

In this chapter, we will cover the following topics in both base R and Python and using the tidymodels framework...

Technical requirements

All code for this chapter can be found on GitHub at this URL: https://github.com/PacktPublishing/Extending-Excel-with-Python-and-R/tree/main/Chapter9. You will need the following R packages installed to follow along:

readxl 1.4.3
performance 0.10.8
tidymodels 1.1.1
purrr 1.0.2

We will begin by learning about what linear and logistic regression are and then move into the details of everything.

Linear regression

Linear regression is a fundamental statistical method used for modeling the relationship between a dependent variable (usually denoted as “Y”) and one or more independent variables (often denoted as “X”). It aims to find the best-fitting linear equation that describes how changes in the independent variables affect the dependent variable. Many of you may know this as the ordinary least squares (OLS) method.

In simpler terms, linear regression helps us predict a continuous numeric outcome based on one or more input features. For this to work, if you are unaware, many assumptions must be held true. If you would like to understand these more, then a simple search will bring you a lot of good information on them. In this tutorial, we will delve into both simple linear regression (one independent variable) and multiple linear regression (multiple independent variables).

Logistic regression

Logistic regression is another crucial statistical technique, which is primarily used for binary classification problems. Instead of predicting continuous outcomes, logistic regression predicts the probability of an event occurring, typically expressed as a “yes” or “no” outcome. This method is particularly useful for scenarios where we need to model the likelihood of an event, such as whether a customer will churn or not or whether an email is spam or not. Logistic regression models the relationship between the independent variables and the log odds of the binary outcome.

Frameworks

We will explore two approaches to implementing linear and logistic regression in R. First, we will use the base R framework, which is an excellent starting point to understand the underlying concepts and functions. Then, we will dive into tidymodels, a modern and tidy approach to modeling and machine learning in R. tidymodels provides a consistent and...

Performing linear regression in R

For this section, we are going to perform linear regression in R, both in base R and by way of the tidymodels framework. In this section, you will learn how to do this on a dataset that has different groups in it. We will do this because if you can learn to do it this way, then doing it in a single group becomes simpler as there is no need to group data and perform actions by group. The thought process here is that by doing it on grouped data, we hope you can learn an extra skill.

Linear regression in base R

The first example we are going to show is using the lm() function to perform a linear regression in base R. Let’s dive right into it with the iris dataset.

We will break the code down into chunks and discuss what is happening at each step. The first step for us is to use the library command to bring in the necessary packages into our development environment:

library(readxl)

In this section, we’re loading a library called...

Performing logistic regression in R

As we did in the section on linear regression, in this section, we will also perform logistic regression in base R and with the tidymodels framework. We are going to only perform a simple binary classification regression problem using the Titanic dataset, where we will be deciding if someone is going to survive or not. Let’s dive right into it.

Logistic regression with base R

In order to get going, we are going to start with a base R implementation of logistic regression on the Titanic dataset where we will be modeling the response of Survived. So, let’s get straight into it.

The following is the code that will perform the data modeling along with explanations of what is happening:

library(tidyverse)
df <- Titanic |>
       as.data.frame() |>
       uncount(Freq)

This block of code starts by loading a library called tidyverse, which contains...

Performing linear regression in Python using Excel data

Linear regression in Python can be carried out with the help of libraries such as pandas, scikit-learn, statsmodels, and matplotlib. The following is a step-by-step code example:

First, import the necessary libraries:

# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import statsmodels.api as sm
from statsmodels.graphics.regressionplots import plot_regress_exog
from statsmodels.graphics.gofplots import qqplot

Then, we create an Excel file with test data. Of course, in a real-life scenario, you would not need the mock data – you would skip this step and load the data from Excel (see the next step) after loading the necessary libraries:
```
# Step 0: Generate sample data and save as Excel file
np.random.seed(0)
n_samples = 100
X = np.random.rand(n_samples, 2)  # Two features
y = 2 * X[:, 0] + 3 * X[:, 1] ...
```

Logistic regression in Python using Excel data

In the following code, we generate random sample data with two features (Feature1 and Feature2) and a binary target variable (Target) based on a simple condition. We perform logistic regression, evaluate the model using accuracy, the confusion matrix, and a classification report, visualize the results for binary classification, and interpret the coefficients.

The following is a step-by-step code example:

Again, we start with importing the necessary libraries:

# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

For this example, we will use a different sample dataset:

# Step 0: Generate sample data
np.random.seed(0)
n_samples = 100
X = np.random.rand(n_samples, 2)  # Two features
y = (X...

Summary

In this chapter, we explored the powerful world of linear and logistic regression using Excel data. Linear regression, a fundamental statistical technique, allows us to model relationships between dependent and independent variables. We discussed its assumptions and applications, and walked through the entire process of loading data from Excel, preparing it for analysis, and fitting linear regression models using both R (using base R and tidymodels) and Python (with the scikit-learn and statsmodels libraries).

Through comprehensive code examples, you learned how to perform regression analysis, assess model accuracy, and generate valuable statistics and metrics to interpret model results. We gained insights into creating diagnostic plots, such as residual plots and Q-Q plots, which aid in identifying issues such as heteroscedasticity and outliers.

Additionally, we delved into logistic regression, a powerful tool for class probability prediction and binary classification...

The rest of the chapter is locked

You have been reading a chapter from

Extending Excel with Python and R

Published in: Apr 2024Publisher: PacktISBN-13: 9781804610695

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Steven Sanderson

Steven Sanderson, MPH, is an applications manager for the patient accounts department at Stony Brook Medicine. He received his bachelor's degree in economics and his master's in public health from Stony Brook University. He has worked in healthcare in some capacity for just shy of 20 years. He is the author and maintainer of the healthyverse set of R packages. He likes to read material related to social and labor economics and has recently turned his efforts back to his guitar with the hope that his kids will follow suit as a hobby they can enjoy together.
Read more about Steven Sanderson

David Kun

David Kun is a mathematician and actuary who has always worked in the gray zone between quantitative teams and ICT, aiming to build a bridge. He is a co-founder and director of Functional Analytics and the creator of the ownR Infinity platform. As a data scientist, he also uses ownR for his daily work. His projects include time series analysis for demand forecasting, computer vision for design automation, and visualization.
Read more about David Kun

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages