Reader small image

You're reading from  Data Science Projects with Python - Second Edition

Product typeBook
Published inJul 2021
Reading LevelIntermediate
PublisherPackt
ISBN-139781800564480
Edition2nd Edition
Languages
Concepts
Right arrow
Author (1)
Stephen Klosterman
Stephen Klosterman
author image
Stephen Klosterman

Stephen Klosterman is a Machine Learning Data Scientist with a background in math, environmental science, and ecology. His education includes a Ph.D. in Biology from Harvard University, where he was an assistant teacher of the Data Science course. His professional experience includes work in the environmental, health care, and financial sectors. At work, he likes to research and develop machine learning solutions that create value, and that stakeholders understand. In his spare time, he enjoys running, biking, paddleboarding, and music.
Read more about Stephen Klosterman

Right arrow

Data Quality Assurance and Exploration

So far, we remedied two data quality issues just by asking basic questions or by looking at the .info() summary. Let's now take a look at the first few columns of data. Before we get to the historical bill payments, we have the credit limits of the LIMIT_BAL accounts, and the SEX, EDUCATION, MARRIAGE, and AGE demographic features. Our business partner has reached out to us, to let us know that gender should not be used to predict credit-worthiness, as this is unethical by their standards. So we keep this in mind for future reference. Now we'll explore the rest of these columns, making any corrections that are necessary.

In order to further explore the data, we will use histograms. Histograms are a good way to visualize data that is on a continuous scale, such as currency amounts and ages. A histogram groups similar values into bins and shows the number of data points in these bins as a bar graph.

To plot histograms, we will start...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Data Science Projects with Python - Second Edition
Published in: Jul 2021Publisher: PacktISBN-13: 9781800564480

Author (1)

author image
Stephen Klosterman

Stephen Klosterman is a Machine Learning Data Scientist with a background in math, environmental science, and ecology. His education includes a Ph.D. in Biology from Harvard University, where he was an assistant teacher of the Data Science course. His professional experience includes work in the environmental, health care, and financial sectors. At work, he likes to research and develop machine learning solutions that create value, and that stakeholders understand. In his spare time, he enjoys running, biking, paddleboarding, and music.
Read more about Stephen Klosterman