Reader small image

You're reading from  Building Data Science Solutions with Anaconda

Product typeBook
Published inMay 2022
PublisherPackt
ISBN-139781800568785
Edition1st Edition
Concepts
Right arrow
Author (1)
Dan Meador
Dan Meador
author image
Dan Meador

Dan Meador is an Engineering Manager at Anaconda and is the creator of Conda as well as a champion of open source at Anaconda. With a history of engineering and client facing roles, he has the ability to jump into any position. He has a track record of delivering as a leader and a follower in companies from the Fortune 10 to startups.
Read more about Dan Meador

Right arrow

Exploring and cleaning the data

Now we move on to what might be the most important and time-consuming part of the data science workflow: exploring and cleaning the data. We'll begin by grabbing some basic statistics for the data that we have.

Type the following into another Jupyter notebook cell and run it:

df_raw.describe()

You will see the basic info across all our columns. Note in the following example I grabbed a subset just for practical purposes of displaying it here:

Figure 9.8 – Combined wine basic statistics

There are a few things we can pick out: one is that the mean quality is 5.8, so that is the number that we would want to beat, but if we are looking to be at the higher end of wine quality, we would want to shoot for something above 6, which is the 75th percentile, and nothing gets above a 9, so perhaps that could be our lofty goal.

Note that the quality is a discrete integer in the range 3-9. Should we one-hot encode it...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Building Data Science Solutions with Anaconda
Published in: May 2022Publisher: PacktISBN-13: 9781800568785

Author (1)

author image
Dan Meador

Dan Meador is an Engineering Manager at Anaconda and is the creator of Conda as well as a champion of open source at Anaconda. With a history of engineering and client facing roles, he has the ability to jump into any position. He has a track record of delivering as a leader and a follower in companies from the Fortune 10 to startups.
Read more about Dan Meador