Packt+ | Advance your knowledge in tech

You're reading from Regression Analysis with Python

Product typeBook

Published inFeb 2016

Reading LevelIntermediate

Publisher

ISBN-139781785286315

Edition1st Edition

Languages

Python

Tools

SciPy Scikit-learn

Concepts

Statistics

Authors (2):

Luca Massaron

Alberto Boschetti

View More author details

Chapter 2. Approaching Simple Linear Regression

Having set up all your working tools (directly installing Python and IPython or using a scientific distribution), you are now ready to start using linear models to incorporate new abilities into the software you plan to build, especially predictive capabilities. Up to now, you have developed software solutions based on certain specifications you defined (or specifications that others have handed to you). Your approach has always been to tailor the response of the program to particular inputs, by writing code carefully mapping every single situation to a specific, predetermined response. Reflecting on it, by doing so you were just incorporating practices that you (or others) have learned from experience.

However, the world is complex, and sometimes your experience is not enough to make your software smart enough to make a difference in a fairly competitive business or in challenging problems with many different and mutable facets.

In this chapter...

Defining a regression problem

Thanks to machine learning algorithms, deriving knowledge from data is possible. Machine learning has solid roots in years of research: it has really been a long journey since the end of the fifties, when Arthur Samuel clarified machine learning as being a "field of study that gives computers the ability to learn without being explicitly programmed."

The data explosion (the availability of previously unrecorded amounts of data) has enabled the widespread usage of both recent and classic machine learning techniques and made them high-performance techniques. If nowadays you can talk by voice to your mobile phone and expect it to answer properly to you, acting as your secretary (such as Siri or Google Now), it is uniquely because of machine learning. The same holds true for every application based on machine learning such as face recognition, search engines, spam filters, recommender systems for books/music/movies, handwriting recognition, and automatic language...

Starting from the basics

We will start exploring the first dataset, the Boston dataset, but before delving into numbers, we will upload a series of helpful packages that will be used during the rest of the chapter:

In: import numpy as np
  import pandas as pd
  import matplotlib.pyplot as plt
  import matplotlib as mpl

If you are working from an IPython Notebook, running the following command in a cell will instruct the Notebook to represent any graphic output in the Notebook itself (otherwise, if you are not working on IPython, just ignore the command because it won't work in IDEs such as Python's IDLE or Spyder):

In: %matplotlib inline
  # If you are using IPython, this will make the images available in the Notebook

To immediately select the variables that we need, we just frame all the data available into a Pandas data structure, DataFrame.

Inspired by a similar data structure present in the R statistical language, a DataFrame renders data vectors of different types easy to handle under...

Extending to linear regression

Linear regression tries to fit a line through a given set of points, choosing the best fit. The best fit is the line that minimizes the summed squared difference between the value dictated by the line for a certain value of x and its corresponding y values. (It is optimizing the same squared error that we met before when checking how good a mean was as a predictor.)

Since linear regression is a line; in bi-dimensional space (x, y), it takes the form of the classical formula of a line in a Cartesian plane: y = mx + q, where m is the angular coefficient (expressing the angle between the line and the x axis) and q is the intercept between the line and the x axis.

Formally, machine learning indicates the correct expression for a linear regression as follows:

Here, again, X is a matrix of the predictors, β is a matrix of coefficients, and β₀ is a constant value called the bias (it is the same as the Cartesian formulation, only the notation is different).

We can better...

Minimizing the cost function

At the core of linear regression, there is the search for a line's equation that it is able to minimize the sum of the squared errors of the difference between the line's y values and the original ones. As a reminder, let's say our regression function is called h, and its predictions h(X), as in this formulation:

Consequently, our cost function to be minimized is as follows:

There are quite a few methods to minimize it, some performing better than others in the presence of large quantities of data. Among the better performers, the most important ones are Pseudoinverse (you can find this in books on statistics), QR factorization, and gradient descent.

Explaining the reason for using squared errors

Looking under the hood of a linear regression analysis, at first it could be puzzling to realize that we are striving to minimize the squared differences between our estimates and the data from which we are building the model. Squared differences are not as intuitively explainable...

Summary

In this chapter, we introduced linear regression as a supervised machine learning algorithm. We explained its functional form, its relationship with the statistical measures of mean and correlation, and we tried to build a simple linear regression model on the Boston house prices data. After doing that we finally glanced at how regression works under the hood by proposing its key mathematical formulations and their translation into Python code.

In the next chapter, we will continue our discourse about linear regression, extending our predictors to multiple variables and carrying on our explanation where we left it suspended during our initial illustration with a single variable. We will also point out the most useful transformations you can apply to data to make it suitable for processing by a linear regression algorithm.

The rest of the chapter is locked

You have been reading a chapter from

Regression Analysis with Python

Published in: Feb 2016Publisher: ISBN-13: 9781785286315

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Luca Massaron

Having joined Kaggle over 10 years ago, Luca Massaron is a Kaggle Grandmaster in discussions and a Kaggle Master in competitions and notebooks. In Kaggle competitions he reached no. 7 in the worldwide rankings. On the professional side, Luca is a data scientist with more than a decade of experience in transforming data into smarter artifacts, solving real-world problems, and generating value for businesses and stakeholders. He is a Google Developer Expert(GDE) in machine learning and the author of best-selling books on AI, machine learning, and algorithms.
Read more about Luca Massaron

Alberto Boschetti

Alberto Boschetti is a data scientist with expertise in signal processing and statistics. He holds a Ph.D. in telecommunication engineering and currently lives and works in London. In his work projects, he faces challenges ranging from natural language processing (NLP) and behavioral analysis to machine learning and distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending meet-ups, conferences, and other events.
Read more about Alberto Boschetti

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages