Reader small image

You're reading from  Learn Python by Building Data Science Applications

Product typeBook
Published inAug 2019
Reading LevelIntermediate
PublisherPackt
ISBN-139781789535365
Edition1st Edition
Languages
Tools
Right arrow
Authors (2):
Philipp Kats
Philipp Kats
author image
Philipp Kats

Philipp Kats is a researcher at the Urban Complexity Lab, NYU CUSP, a research fellow at Kazan Federal University, and a data scientist at StreetEasy, with many years of experience in software development. His interests include data analysis, urban studies, data journalism, and visualization. Having a bachelor's degree in architectural design and a having followed the rocky path (at first) of being a self-taught developer, Philipp knows the pain points of learning programming and is eager to share his experience.
Read more about Philipp Kats

David Katz
David Katz
author image
David Katz

David Katz is a researcher and holds a Ph.D. in mathematics. As a mathematician at heart, he sees code as a tool to express his questions. David believes that code literacy is essential as it applies to most disciplines and professions. David is passionate about sharing his knowledge and has 6 years of experience teaching college and high school students.
Read more about David Katz

View More author details
Right arrow

Quality assurance

I know we have spent a lot of time cleaning the data, but there is still one last task we need to perform – quality assurance. Proper quality assurance is a very important practice. In a nutshell, you need to define certain assumptions about the dataset (for example, minimum and maximum values, the acceptable number of missing values, standard deviation, medians, the number of unique values, and many more). The key is to start with something that is somewhat reasonable, and then run tests to check whether the data fits your assumptions. If not, investigate specific data points to check whether your assumptions were incorrect (and update them), or whether there are still some issues with the data. It just gets a little more tricky for the multilevel columns. Consider the following code:

assumptions = {
'killed': [0, 1_500_000],
'wounded&apos...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Learn Python by Building Data Science Applications
Published in: Aug 2019Publisher: PacktISBN-13: 9781789535365

Authors (2)

author image
Philipp Kats

Philipp Kats is a researcher at the Urban Complexity Lab, NYU CUSP, a research fellow at Kazan Federal University, and a data scientist at StreetEasy, with many years of experience in software development. His interests include data analysis, urban studies, data journalism, and visualization. Having a bachelor's degree in architectural design and a having followed the rocky path (at first) of being a self-taught developer, Philipp knows the pain points of learning programming and is eager to share his experience.
Read more about Philipp Kats

author image
David Katz

David Katz is a researcher and holds a Ph.D. in mathematics. As a mathematician at heart, he sees code as a tool to express his questions. David believes that code literacy is essential as it applies to most disciplines and professions. David is passionate about sharing his knowledge and has 6 years of experience teaching college and high school students.
Read more about David Katz