Exploring the Housing dataset
Before we implement our first linear regression model, we will introduce a new dataset, the Housing dataset, which contains information about houses in the suburbs of Boston collected by D. Harrison and D.L. Rubinfeld in 1978. The Housing dataset has been made freely available and is included in the code bundle of this book. The dataset has been recently removed from the UCI Machine Learning Repository but is available online at https://raw.githubusercontent.com/rasbt/python-machine-learning-book-2nd-edition/master/code/ch10/housing.data.txt. As with each new dataset, it is always helpful to explore the data through a simple visualization, to get a better feeling of what we are working with.
Loading the Housing dataset into a data frame
In this section, we will load the Housing dataset using the pandas read_csv
function, which is fast and versatile—a recommended tool for working with tabular data stored in a plaintext format.
The features of the 506 samples in...