Visualizing and understanding your data in Python
In this recipe, we will load the sample dataset and generate a scatter plot to explore the relationship between the variables in the dataset. As you can see in the following screenshot, we have started with a DataFrame containing the management_experience_months and monthly_salary values and generated a visualization that allows us to observe the linear relationship between these two variables:
Figure 1.34 – Using matplotlib to generate a scatter plot chart from a DataFrame
The objective of this recipe is for us to understand the data first using plotting libraries (for example, matplotlib) before diving directly into the other steps of the ML process. We will start by loading a sample dataset from a CSV file to a pandas DataFrame and then use matplotlib to generate a scatter plot.
Getting ready
This recipe continues on from the Preparing the Amazon S3 bucket and the training dataset for the linear...