Reader small image

You're reading from  Hands-On Genetic Algorithms with Python

Product typeBook
Published inJan 2020
Reading LevelIntermediate
PublisherPackt
ISBN-139781838557744
Edition1st Edition
Languages
Right arrow
Author (1)
Eyal Wirsansky
Eyal Wirsansky
author image
Eyal Wirsansky

Eyal Wirsansky is a senior data scientist, an experienced software engineer, a technology community leader, and an artificial intelligence researcher. Eyal began his software engineering career over twenty-five years ago as a pioneer in the field of Voice over IP. He currently works as a member of the data platform team at Gradle, Inc. During his graduate studies, he focused his research on genetic algorithms and neural networks. A notable result of this research is a novel supervised machine learning algorithm that integrates both approaches. In addition to his professional roles, Eyal serves as an adjunct professor at Jacksonville University, where he teaches a class on artificial intelligence. He also leads both the Jacksonville, Florida Java User Group and the Artificial Intelligence for Enterprise virtual user group, and authors the developer-focused artificial intelligence blog, ai4java.
Read more about Eyal Wirsansky

Right arrow

Enhancing Machine Learning Models Using Feature Selection

This chapter describes how genetic algorithms can be used to improve the performance of supervised machine learning models by selecting the best subset of features from the provided input data. This chapter will start with a brief introduction to machine learning and then describe the two main types of supervised machine learning tasks regression and classification. We will then discuss the potential benefits of feature selection when it comes to the performance of these models. Next, we will demonstrate how genetic algorithms can be utilized to pinpoint the genuine features that are generated by the Friedman-1 Test regression problem. Then, we will use the real-life Zoo dataset to create a classification model and improve its accuracy again by applying genetic algorithms to isolate the best features for...

Technical requirements

In this chapter, we will be using Python 3 with the following supporting libraries:

  • deap
  • numpy
  • pandas
  • matplotlib
  • seaborn
  • sklearn introduced in this chapter

In addition, we will be using the UCI Zoo Dataset (https://archive.ics.uci.edu/ml/datasets/zoo).

The programs that will be used in this chapter can be found in this book's GitHub repository at https://github.com/PacktPublishing/Hands-On-Genetic-Algorithms-with-Python/tree/master/Chapter07.

Check out the following video to see the Code in Action:
http://bit.ly/37HCKyr

Supervised machine learning

The term machine learning typically refers to a computer program that receives inputs and produces outputs. Our goal is to train this program, also known as the model, to produce the correct outputs for the given inputs, without explicitly programming them.

During this training process, the model learns the mapping between the inputs and the outputs by adjusting its internal parameters. One common way to train the model is by providing it with a set of inputs, for which the correct output is known. For each of these inputs, we tell the model what the correct output is so that it can adjust, or tune itself, aiming to eventually produce the desired output for each of the given inputs. This tuning is at the heart of the learning process.

Over the years, many types of machine learning models have been developed. Each model has its own particular internal...

Feature selection in supervised learning

As we saw in the previous section, a supervised learning model receives a set of inputs, called features, and maps them to a set of outputs. The assumption is that the information described by the features is useful for determining the value of the corresponding outputs. At first glance, it may seem that the more information we can use as input, the better our chances of predicting the output(s) correctly. However, in many cases, the opposite holds true; if some of the features we use are irrelevant or redundant, the consequence could be a (sometimes significant) decrease in the accuracy of the models.

Feature selection is the process of selecting the most beneficial and essential set of features out of the entire given set of features. Besides increasing the accuracy of the model, a successful feature selection can provide the following...

Selecting the features for the Friedman-1 regression problem

The Friedman-1 regression problem, which was created by Friedman and Breiman, describes a single output value, y, which is a function of five input values, x0..x4, and randomly generated noise, according to the following formula:

The input variables, x0..x4, are independent, and uniformly distributed over the interval [0, 1]. The last component in the formula is the randomly generated noise. The noise is normally distributed and multiplied by the constant noise, which determines its level.

In Python, the scikit-learn (sklearn) library provides us with the make_friedman1() function, which can be used to generate a dataset containing the desired number of samples. Each of the samples consists of randomly generated x0..x4 values and their corresponding calculated y value. The interesting part, however, is that we can tell...

Selecting the features for the classification Zoo dataset

The UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/index.php) maintains over 350 datasets as a service to the machine learning community. These datasets can be used for experimentation with various models and algorithms. A typical dataset contains a number of features (inputs) and the desired output, in a form of columns, with a description of their meaning.

In this section, we will use the UCI Zoo dataset (https://archive.ics.uci.edu/ml/datasets/zoo). This dataset describes 101 different animals using the following 18 features:

...

No.

Feature Name

Data Type

1

animal name

Unique for each instance

2

hair

Boolean

3

feathers

Boolean

4

eggs

Boolean

5

milk

Boolean

6

airborne

Boolean

7

aquatic

Boolean

8

predator

Boolean

Summary

In this chapter, you were introduced to machine learning and the two main types of supervised machine learning tasks – regression and classification. Then, you were presented with the potential benefits of feature selection on the performance of the models carrying out these tasks. At the heart of this chapter were two demonstrations of how genetic algorithms can be utilized to enhance the performance of such models via feature selection. In the first case, we pinpointed the genuine features that were generated by the Friedman-1 Test regression problem, while, in the other case, we selected the most beneficial features of the Zoo classification dataset.

In the next chapter, we will look at another possible way of enhancing the performance of supervised machine learning models, namely hyperparameter tuning.

Further reading

For more information about the topics that were covered in this chapter, please refer to the following resources:

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Genetic Algorithms with Python
Published in: Jan 2020Publisher: PacktISBN-13: 9781838557744
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Eyal Wirsansky

Eyal Wirsansky is a senior data scientist, an experienced software engineer, a technology community leader, and an artificial intelligence researcher. Eyal began his software engineering career over twenty-five years ago as a pioneer in the field of Voice over IP. He currently works as a member of the data platform team at Gradle, Inc. During his graduate studies, he focused his research on genetic algorithms and neural networks. A notable result of this research is a novel supervised machine learning algorithm that integrates both approaches. In addition to his professional roles, Eyal serves as an adjunct professor at Jacksonville University, where he teaches a class on artificial intelligence. He also leads both the Jacksonville, Florida Java User Group and the Artificial Intelligence for Enterprise virtual user group, and authors the developer-focused artificial intelligence blog, ai4java.
Read more about Eyal Wirsansky