Reader small image

You're reading from  Machine Learning Engineering with Python - Second Edition

Product typeBook
Published inAug 2023
Reading LevelIntermediate
PublisherPackt
ISBN-139781837631964
Edition2nd Edition
Languages
Right arrow
Author (1)
Andrew P. McMahon
Andrew P. McMahon
author image
Andrew P. McMahon

Andrew P. McMahon has spent years building high-impact ML products across a variety of industries. He is currently Head of MLOps for NatWest Group in the UK and has a PhD in theoretical condensed matter physics from Imperial College London. He is an active blogger, speaker, podcast guest, and leading voice in the MLOps community. He is co-host of the AI Right podcast and was named ‘Rising Star of the Year' at the 2022 British Data Awards and ‘Data Scientist of the Year' by the Data Science Foundation in 2019.
Read more about Andrew P. McMahon

Right arrow

Engineering features for machine learning

Before we feed any data into an ML model, it has to be transformed into a state that can be understood by our models. We also need to make sure we only do this on the data we deem useful for improving the performance of the model, as it is far too easy to explode the number of features and fall victim to the curse of dimensionality. This refers to a series of related observations where, in high-dimensional problems, data becomes increasingly sparse in the feature space, so achieving statistical significance can require exponentially more data. In this section, we will not cover the theoretical basis of feature engineering. Instead, we will focus on how we, as ML engineers, can help automate some of the steps in production. To this end, we will quickly recap the main types of feature preparation and feature engineering steps so that we have the necessary pieces to add to our pipelines later in this chapter.

Engineering categorical features...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Machine Learning Engineering with Python - Second Edition
Published in: Aug 2023Publisher: PacktISBN-13: 9781837631964

Author (1)

author image
Andrew P. McMahon

Andrew P. McMahon has spent years building high-impact ML products across a variety of industries. He is currently Head of MLOps for NatWest Group in the UK and has a PhD in theoretical condensed matter physics from Imperial College London. He is an active blogger, speaker, podcast guest, and leading voice in the MLOps community. He is co-host of the AI Right podcast and was named ‘Rising Star of the Year' at the 2022 British Data Awards and ‘Data Scientist of the Year' by the Data Science Foundation in 2019.
Read more about Andrew P. McMahon