Reader small image

You're reading from  Machine Learning for Time-Series with Python

Product typeBook
Published inOct 2021
PublisherPackt
ISBN-139781801819626
Edition1st Edition
Right arrow
Author (1)
Ben Auffarth
Ben Auffarth
author image
Ben Auffarth

Ben Auffarth is a full-stack data scientist with more than 15 years of work experience. With a background and Ph.D. in computational and cognitive neuroscience, he has designed and conducted wet lab experiments on cell cultures, analyzed experiments with terabytes of data, run brain models on IBM supercomputers with up to 64k cores, built production systems processing hundreds and thousands of transactions per day, and trained language models on a large corpus of text documents. He co-founded and is the former president of Data Science Speakers, London.
Read more about Ben Auffarth

Right arrow

What Is Preprocessing?

Anyone who's ever worked in a company on a machine learning project knows that real-world data is messy. It's often aggregated from multiple sources or using multiple platforms or recording devices, and it's incomplete and inconsistent. In preprocessing, we want to improve the data quality to successfully apply a machine learning model.

Data preprocessing includes the following set of techniques:

  • Feature transforms
    • Scaling
    • Power/log transforms
    • Imputation
  • Feature engineering

These techniques fall largely into two classes: either they tailor to the assumptions of the machine learning algorithm (feature transforms) or they are concerned with constructing more complex features from multiple underlying features (feature engineering). We'll only deal with univariate feature transforms, transforms that apply to one feature at a time. We won't discuss multivariate feature...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Machine Learning for Time-Series with Python
Published in: Oct 2021Publisher: PacktISBN-13: 9781801819626

Author (1)

author image
Ben Auffarth

Ben Auffarth is a full-stack data scientist with more than 15 years of work experience. With a background and Ph.D. in computational and cognitive neuroscience, he has designed and conducted wet lab experiments on cell cultures, analyzed experiments with terabytes of data, run brain models on IBM supercomputers with up to 64k cores, built production systems processing hundreds and thousands of transactions per day, and trained language models on a large corpus of text documents. He co-founded and is the former president of Data Science Speakers, London.
Read more about Ben Auffarth