EXPLORATORY DATA ANALYSIS
In this chapter, we introduce managing data as a Pandas dataframe and common exploratory data analysis (EDA) techniques.
As a key part of data inspection, EDA involves summarizing the salient characteristics of your dataset in preparation for further processing and analysis. This includes understanding the shape and distribution of the data, scanning for missing values, learning which features are most relevant based on correlation, and familiarizing yourself with the overall contents of the dataset. Gathering this intel helps to inform algorithm selection and highlight parts of the data that require cleaning in preparation for further processing.
Using Pandas, there’s a range of simple techniques we can use to summarize data and additional options to visualize the data using Seaborn and Matplotlib.
Let’s begin by importing Pandas, Seaborn, and Matplotlib inline using the following code in Jupyter Notebook.
import...