Clustering model
Since you've already learned how to perform prediction and classification tasks in data analytics, in this chapter, you will learn about clustering analysis. In clustering, we strive to meaningfully group the data objects in a dataset. We will learn about clustering analysis through an example.
Clustering example using a two-dimensional dataset
In this example, we will use WH Report_preprocessed.csv to cluster the countries based on two scores called Life_Ladder and Perceptions_of_corruption in 2019.
The following code reads the data into report_df and uses Boolean masking to preprocess the dataset into report2019_df, which only includes the data of 2019:
report_df = pd.read_csv('WH Report_preprocessed.csv')
BM = report_df.year == 2019
report2019_df = report_df[BM]
The result of the preceding code is that we have a DataFrame, reprot1019_df, that only includes the data of 2019, as requested by the prompt.
Since we only have two dimensions...