Chapter 5: Ensemble Modeling
Activity 14: Stacking with Standalone and Ensemble Algorithms
Solution
Import the relevant libraries:
import pandas as pd import numpy as np import seaborn as sns %matplotlib inline import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.metrics import mean_absolute_error from sklearn.model_selection import KFold from sklearn.linear_model import LinearRegression from sklearn.tree import DecisionTreeRegressor from sklearn.neighbors import KNeighborsRegressor from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
Read the data and print the first five rows:
data = pd.read_csv('house_prices.csv') data.head()The output will be as follows:

Figure 5.19: The first 5 rows
Preprocess the dataset to remove null values and one-hot encode categorical variables to prepare the data for modeling.
First, we remove all columns where more than 10% of the values are null. To do this, calculate the fraction of missing values...