You're reading from Mastering Predictive Analytics with scikit-learn and TensorFlow

Product typeBook

Published inSep 2018

Reading LevelIntermediate

PublisherPackt

ISBN-139781789617740

Edition1st Edition

Languages

Python

Tools

Scikit-learn TensorFlow

Concepts

Predictive Analytics

Author (1)

Alvaro Fuentes

Working with Features

In this chapter, we are going to take a close look at how features play an important role in the feature engineering technique. We'll learn some techniques that will allow us to improve our predictive analytics models in two ways: in terms of the performance metrics of our models and to understand the relationship between the features and the target variables that we are trying to predict.

In this chapter, we are going to cover the following topics:

Feature selection methods
Dimensionality reduction and PCA
Creating new features
Improving models with feature engineering

Feature selection methods

Feature selection methods are used for selecting features that are likely to help with predictions. The following are the three methods for feature selection:

Removing dummy features with low variance
Identifying important features statistically
Recursive feature elimination

When building predictive analytics models, some features won't be related to the target and this will prove to be less helpful in prediction. Now, the problem is that including irrelevant features in the model can introduce noise and add bias to the model. So, feature selection techniques are a set of techniques used to select the most relevant and useful features that will help either with prediction or with understanding our model.

Removing dummy features with low variance

...

Dimensionality reduction and PCA

The dimensionality reduction method is the process of reducing the number of features under consideration by obtaining a set of principal variables. The Principal Component Analysis (PCA) technique is the most important technique used for dimensionality reduction. Here, we will talk about why we need dimensionality reduction, and we will also see how to perform the PCA technique in scikit-learn.

These are the reasons for having a high number of features while working on predictive analytics:

It enables the simplification of models, in order to make them easier to understand and to interpret. There might be some computational considerations if you are dealing with thousands of features. It might be a good idea to reduce the number of features in order to save computational resources.
Another reason is to avoid the "curse of dimensionality...

Feature engineering

Feature engineering plays a vital role in making machine learning algorithms work and, if carried out properly, it enhances the predictive ability of machine learning algorithms. In other words, feature engineering is the process of extracting existing features or creating new features from the raw data using domain knowledge, the context of the problem, or specialized techniques that result in more accurate predictive models. This is an activity where domain knowledge and creativity play a very important role. This is an important process, which can significantly improve the performance of our predictive models. The more context you have about a problem, the better your ability to create new and useful features. Basically, the feature engineering process converts the features into input values that algorithms can understand.
There are various ways of implementing...

Improving models with feature engineering

Now that we have seen how feature engineering techniques help in building predictive models, let's try and improve the performance of these models and evaluate whether the newly built model works better than the previous built model. Then, we will talk about two very important concepts that you must always keep in mind when doing predictive analytics, and these are the reducible and irreducible errors in your predictive models.

Let's first import the necessary modules, as shown in the following screenshot:

So, let's go to the Jupyter Notebook and take a look at the imported credit card default dataset that we saw earlier in this chapter, but as you can see, some modifications have been made to this dataset:

For this model, instead of transforming the sex and marriage features into two dummy features, the ones that we have...

Reducible and irreducible error

Before moving on, there are two really important concepts to be covered for predictive analytics. Errors can be divided into the following two types:

Reducible errors: These errors can be reduced by making certain improvements to the model
Irreducible errors: These errors cannot be reduced at all

Let's assume that, in machine learning, there is a relationship between features and target that is represented with a function, as shown in the following screenshot:

Let’s assume that the target (y) is the underlying supposition of machine learning, and the relationship between the features and the target is given by a function. Since, in most cases we consider that there is some randomness in the relationship between features and target, we add a noise term here, which will always be present in reality. This is the underlying supposition...

Summary

In this chapter, we talked about feature selection methods, how to distinguish between useful features, and features that are not likely to be helpful in prediction. We talked about dimensionality reduction and we learned how to perform PCA in scikit-learn. We also talked about feature engineering, and we tried to come up with new features in the datasets that we have been using so far. Finally, we tried to improve our credit card model by coming up with new features, and by working with all of the techniques that we learned in this chapter. I hope you have enjoyed this chapter.

In the next chapter, we will learn about artificial neural networks and how the tensorflow library is used when working with neural networks and artificial intelligence.

The rest of the chapter is locked

You have been reading a chapter from

Mastering Predictive Analytics with scikit-learn and TensorFlow

Published in: Sep 2018Publisher: PacktISBN-13: 9781789617740

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Alvaro Fuentes

Alvaro Fuentes is a senior data scientist with a background in applied mathematics and economics. He has more than 14 years of experience in various analytical roles and is an analytics consultant at one of the ‘Big Three' global management consulting firms, leading advanced analytics projects in different industries like banking, technology, and consumer goods. Alvaro is also an author and trainer in analytics and data science and has published courses and books, such as 'Become a Python Data Analyst' and 'Hands-On Predictive Analytics with Python'. He has also taught data science and related topics to thousands of students both on-site and online through different platforms such as Springboard, Simplilearn, Udemy, and BSG Institute, among others.
Read more about Alvaro Fuentes

Other recommended products

Related to this chapter

Hands-On Predictive Analytics with Python

This book will teach you all the processes you need to build a predictive analytics solution: understanding the problem, preparing datasets, exploring relationships, model building, tuning, evaluation, and deployment. You'll earn to use Python and its data analytics ecosystem to implement the main techniques used in real-world projects.

BookDec 2018330 pages

Applied Deep Learning with Keras

Applied Deep Learning with Keras takes you from a basic knowledge of machine learning and Python to an expert understanding of applying Keras to develop efficient deep learning solutions. This book teaches you new techniques to handle neural networks, and in turn, broadens your options as a data scientist.

BookApr 2019412 pages

Machine Learning for Data Mining

Most data mining opportunities involve machine learning and often come with greater financial rewards. This book will help you bring the power of machine learning techniques into your data mining work. By the end of the book, you will be able to create accurate predictive models for data mining.

BookApr 2019252 pages

The Deep Learning with Keras Workshop

The Deep Learning with Keras Workshop outlines a simple and straightforward way for you to understand deep learning with Keras. Starting with basic concepts such as data preprocessing, this book equips you with all the tools and techniques required for training your neural networks to solve various modeling problems.

BookJul 2020496 pages1

Predictive Analytics with TensorFlow

Predictive decisions are becoming a huge trend worldwide, catering to wide industry sectors by predicting which decisions are more likely to give maximum results. Data mining, statistics, and machine learning allow users to discover predictive intelligence by uncovering patterns and showing the relationship between structured and unstructured data. This book will help you build solutions that will make automated decisions. In the end, tune and build your own predictive analytics model with the help of TensorFlow.

BookNov 2017522 pages

Practical Time Series Analysis

Practical Time Series Analysis will introduce you to the basic concepts of time series analysis and describe powerful yet simple techniques in Python which data scientists and data engineers would find useful in dealing with real life datasets in industrial settings. This book focuses on explaining important concepts and practical techniques to process, summarize and model time series data. Real life case studies with code snippets in Python are used to demonstrate the concepts and techniques.

BookSep 2017244 pages

Hands-On Recommendation Systems with Python

Recommendation systems are at the heart of almost every internet business today; from Facebook to Netflix to Amazon. Providing good recommendations, whether it's friends, movies or groceries, goes a long way in defining user experience and enticing your customers to use and buy from your platform. This book teaches you to do just that.

BookJul 2018146 pages

The Deep Learning with Keras Workshop

Cut through the noise and get real results with a step-by-step approach to understanding deep learning with Keras programming

BookFeb 2020446 pages

Hands-On Genetic Algorithms with Python

Using this book, you will gain expertise in genetic algorithms, understand how they work and know when and how to use them to create intelligent Python-based applications. By the end of this book, you will have hands-on experience applying genetic algorithms to artificial intelligence as well as numerous other domains.

BookJan 2020346 pages

TensorFlow: Powerful Predictive Analytics with TensorFlow

Predictive analytics discovers hidden patterns from structured and unstructured data for automated decision making in business intelligence. Predictive decisions are becoming a huge trend worldwide, catering to wide industry sectors by predicting which decisions are more likely to give maximum results. TensorFlow, Google’s brainchild, is immensely popular and extensively used for predictive analysis.

BookMar 2018164 pages

Data Science Projects with Python

Data Science Projects with Python will help you build a toolkit for solving data science problems with Python. You will learn how to implement machine learning techniques for deriving insights from data. These skills will help you develop the kind of state-of-the-art predictive models that are used to deliver value to businesses across industries.

BookApr 2019374 pages

The Data Science Workshop

Cut through the noise and get real results with a step-by-step approach to data science

BookJan 2020818 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages