Problem 4 – using Python to create models of housing data
Let’s take a look at a problem where we want to display trends and information about the housing market in Brooklyn, New York. The dataset includes information from the NYC Housing Sales Data for 2003 to 2017. The dataset we’ll be using has its information merged in a usable format and can be found on Kaggle (https://www.kaggle.com/tianhwu/brooklynhomes2003to2017). In addition, a copy of the .csv file can be found in this book’s GitHub repository under the name brooklyn_sales_map.csv.
Defining the problem
We have a large data file for this particular problem. We can look at information by neighborhood, sale prices by year, and compare the year built to the neighborhood to find trends, history, and so on. We could spend hours, days, and weeks just on this one dataset. So, let’s try to focus our energy on what we are going to accomplish with this example. For this, we’re going to...