Chapter 2: Exploratory Data Analysis and Visualization
Activity 2: Summary Statistics and Missing Values
Solution
The steps to complete this activity are as follows:
Read the data. Use pandas' .read_csv method to read the CSV file into a pandas DataFrame:
data = pd.read_csv('house_prices.csv')Use pandas' .info() and .describe() methods to view the summary statistics of the dataset:
data.info() data.describe().T
The output of info() will be:

Figure 2.39: The output of the info() method
The output of describe() will be:

Figure 2.40: The output of the describe() method
Find the total count and total percentage of missing values in each column of the DataFrame and display them for columns having at least one null value, in descending order of missing percentages.
As we did in Exercise 12: Visualizing Missing Values, we will use the .isnull() function on the DataFrame to get a mask, find the count of null values in each column by using the .sum() function over the mask DataFrame and the fraction of null...