2. An Introduction to Regression
Overview
In this chapter, you will be introduced to regression. Regression comes in handy when you are trying to predict future variables using historical data. You will learn various regression techniques such as linear regression with single and multiple variables, along with polynomial and Support Vector Regression (SVR). You will use these techniques to predict future stock prices from a stock price data. By the end of this chapter, you will be comfortable using regression techniques to solve practical problems in a variety of fields.
Introduction
In the previous chapter, you were introduced to the fundamentals of Artificial Intelligence (AI), which helped you create the game Tic-Tac-Toe. In this chapter, we will be looking at regression, which is a machine learning algorithm that can be used to measure how closely related independent variable(s), called features, relate to a dependent variable called a label.
Linear regression is a concept with many applications a variety of fields, ranging from finance (predicting the price of an asset) to business (predicting the sales of a product) and even the economy (predicting economy growth).
Most of this chapter will deal with different forms of linear regression, including linear regression with one variable, linear regression with multiple variables, polynomial regression with one variable, and polynomial regression with multiple variables. Python provides lots of forms of support for performing regression operations and we will also be looking at these later on...
Linear Regression with One Variable
A general regression problem can be defined with the following example. Suppose we have a set of data points and we need to figure out the best fit curve to approximately fit the given data points. This curve will describe the relationship between our input variable, x
, which is the data point, and the output variable, y
, which is the curve.
Remember, in real life, we often have more than one input variable determining the output variable. However, linear regression with one variable will help us to understand how the input variable impacts the output variable.
Types of Regression
In this chapter, we will work with regression on the two-dimensional plane. This means that our data points are two-dimensional, and we are looking for a curve to approximate how to calculate one variable from another.
We will come across the following types of regression in this chapter:
- Linear regression with one variable using a polynomial of degree...
Linear Regression with Multiple Variables
In the previous section, we dealt with linear regression with one variable. Now we will learn an extended version of linear regression, where we will use multiple input variables to predict the output.
Multiple Linear Regression
If you recall the formula for the line of best fit in linear regression, it was defined as , where is the slope of the line, is the y intercept of the line, x is the feature value, and y is the calculated label value.
In multiple regression, we have multiple features and one label. If we have three features, x1, x2, and x3, our model changes to .
In NumPy array format, we can write this equation as follows:
y = np.dot(np.array([a1, a2, a3]), np.array([x1, x2, x3])) + b
For convenience, it makes sense to define the whole equation in a vector multiplication format. The coefficient of is going to be 1
:
y = np.dot(np.array([b, a1, a2, a3]) * np.array([1, x1, x2, x3]))
Multiple linear regression...
Polynomial and Support Vector Regression
When performing a polynomial regression, the relationship between x and y, or using their other names, features, and labels, is not a linear equation, but a polynomial equation. This means that instead of the equation, we can have multiple coefficients and multiple powers of x in the equation.
To make matters even more complicated, we can perform polynomial regression using multiple variables, where each feature may have coefficients multiplying different powers of the feature.
Our task is to find a curve that best fits our dataset. Once polynomial regression is extended to multiple variables, we will learn the SVM model to perform polynomial regression.
Polynomial Regression with One Variable
As a recap, we have performed two types of regression so far:
- Simple linear regression:
- Multiple linear regression:
We will now learn how to do polynomial linear regression with one variable. The equation for polynomial...
Support Vector Regression
SVMs are binary classifiers and are usually used in classification problems (you will learn more about this in Chapter 3, An Introduction to Classification). An SVM classifier takes data and tries to predict which class it belongs to. Once the classification of a data point is determined, it gets labeled. But SVMs can also be used for regression; that is, instead of labeling data, it can predict future values in a series.
The SVR model uses the space between our data as a margin of error. Based on the margin of error, it makes predictions regarding future values.
If the margin of error is too small, we risk overfitting the existing dataset. If the margin of error is too big, we risk underfitting the existing dataset.
In the case of a classifier, the kernel describes the surface dividing the state space, whereas, in a regression, the kernel measures the margin of error. This kernel can use a linear model, a polynomial model, or many other possible...
Summary
In this chapter, we have learned the fundamentals of linear regression. After going through some basic mathematics, we looked at the mathematics of linear regression using one variable and multiple variables.
Then, we learned how to load external data from sources such as a CSV file, Yahoo Finance, and Quandl. After loading the data, we learned how to identify features and labels, how to scale data, and how to format data to perform regression.
We learned how to train and test a linear regression model, and how to predict the future. Our results were visualized by an easy-to-use Python graph plotting library called pyplot
.
We also learned about a more complex form of linear regression: linear polynomial regression using arbitrary degrees. We learned how to define these regression problems on multiple variables and compare their performance on the Boston House Price dataset. As an alternative to polynomial regression, we also introduced SVMs as a regression model and...