Reader small image

You're reading from  Designing Machine Learning Systems with Python

Product typeBook
Published inApr 2016
Reading LevelBeginner
Publisher
ISBN-139781785882951
Edition1st Edition
Languages
Right arrow
Author (1)
David Julian
David Julian
author image
David Julian

David Julian is a freelance technology consultant and educator. He has worked as a consultant for government, private, and community organizations on a variety of projects, including using machine learning to detect insect outbreaks in controlled agricultural environments (Urban Ecological Systems Ltd., Bluesmart Farms), designing and implementing event management data systems (Sustainable Industry Expo, Lismore City Council), and designing multimedia interactive installations (Adelaide University). He has also written Designing Machine Learning Systems With Python for Packt Publishing and was a technical reviewer for Python Machine Learning and Hands-On Data Structures and Algorithms with Python - Second Edition, published by Packt.
Read more about David Julian

Right arrow

Cleaning data


To gain an understanding of which cleaning operations may be required for a particular dataset, we need to consider how the data was collected. One of the major cleaning operations involves dealing with missing data. We have already encountered an example of this in the last chapter, when we examined the temperature data. In this instance, the data had a quality parameter, so we could simply exclude the incomplete data. However, this may not be the best solution for many applications. It may be necessary to fill in the missing data. How do we decide what data to use? In the case of our temperature data, we could fill the missing values in with the average values for that time of year. Notice that we presuppose some domain knowledge, for example, the data is more or less periodic; it is in line with the seasonal cycle. So, it is a fair assumption that we could take the average for that particular date for every year we have a reliable record. However, consider that we are attempting...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Designing Machine Learning Systems with Python
Published in: Apr 2016Publisher: ISBN-13: 9781785882951

Author (1)

author image
David Julian

David Julian is a freelance technology consultant and educator. He has worked as a consultant for government, private, and community organizations on a variety of projects, including using machine learning to detect insect outbreaks in controlled agricultural environments (Urban Ecological Systems Ltd., Bluesmart Farms), designing and implementing event management data systems (Sustainable Industry Expo, Lismore City Council), and designing multimedia interactive installations (Adelaide University). He has also written Designing Machine Learning Systems With Python for Packt Publishing and was a technical reviewer for Python Machine Learning and Hands-On Data Structures and Algorithms with Python - Second Edition, published by Packt.
Read more about David Julian