Reader small image

You're reading from  Hands-On Artificial Intelligence with Java for Beginners

Product typeBook
Published inAug 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781789537550
Edition1st Edition
Languages
Right arrow
Author (1)
Nisheeth Joshi
Nisheeth Joshi
author image
Nisheeth Joshi

Nisheeth Joshi is an associate professor and a researcher at Banasthali University. He has also done a PhD in Natural Language Processing. He is an expert with the TDIL Program, Department of IT, Government of India, the premier organization overseeing language technology funding and research in India. He has several publications to his name in various journals and conferences, and also serves on the program committees and editorial boards of several conferences and journals.
Read more about Nisheeth Joshi

Right arrow

Chapter 5. Handling Attributes

In this chapter, you will learn how to filter attributes, how to discretize attributes, and how to perform attribute selection. When we filter attributes, we will want to remove certain attributes from our datasets. To do so, we will use a Remove class from an unsupervised filters package, along with an attribute called -R. In this chapter, we will also use discretization and binning.

We will cover the following topics in this chapter:

  • Filtering attributes
  • Discretizing attributes
  • Attribute selection

Let's begin!

Filtering attributes


We will learn how to filter attributes in this section. Let's start with the code.

We will first import the following packages and classes:

import weka.core.Instances;
import weka.core.converters.ArffSaver;
import java.io.File;
import weka.core.converters.ConverterUtils.DataSource;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Remove;

We imported the Instances, ArffSaver, File, and DataSourceclasses from their respective packages, as seen in the preceding code. We used them in the previous chapter, as well. The Instance class will take the database into the memory, and we will work with the dataset in the memory. The ArffSaver class will help us to save our dataset onto the disk. The File class will give the name to the disk, and the DataSource class will open the dataset from the disk.

As you can see in the preceding code snippet, we imported a new class, Filter, from the weka.filters package. We can apply filters using the Filter class. The filter...

Discretizing attributes


We will now look at how to discretize attributes using Weka. First, let's explain what discretization is. Discretizing attributes means discretizing a range of numeric attributes in the dataset into nominal attributes. Hence, discretization is actually creating numeric data into categories. We will use binning for this; it skips the class attribute, if set.

 

Suppose that we have values from 1 to 60, and we want to categorize them into three different categories. Instead of creating numeric data, we want to create categorical data. We will create three bins. Let's create a bin for all of the values from 0 to 20, another bin for the values from 20 to 40, and a third bin for the values from 40 to 60. Every numeric data will become categorical data, using discretization.

We will now use the following options:

  • -B<num>: This specifies the number of bins in which to divide the numeric attributes. The default value is 10.
  • -R(col1,col2-col4,..): We have to assign the columns...

Attribute selection


We will now look at how to perform attribute selection. Attribute selection is a technique for deciding which attributes are the most favorable attributes for performing classification or clustering.

So, let's take a look at the code and see what happens, as follows:

import weka.core.Instances;
import weka.core.converters.ArffSaver;
import java.io.File;
import weka.core.converters.ConverterUtils.DataSource;
import weka.filters.Filter;
import weka.filters.supervised.attribute.AttributeSelection;
import weka.attributeSelection.CfsSubsetEval;
import weka.attributeSelection.GreedyStepwise;

The first five classes will be the same as those we used earlier. We will also be using a new type of attribute, which will be a supervised attribute from the filters.supervised package, and the AttributeSelection class. Then, we have an attribute.Selection package, and from that, we'll be using the CfsSubsetEval class and the GreedyStepwise class.

In the following code, we'll first read the...

Summary


In this chapter, you learned how to filter attributes, how to discretize attributes using binning, and how to apply attribute selection. The processes of filtering and discretizing attributes use unsupervised filters, whereas attribute selection is performed by using supervised filters.

In the next chapter, you'll see how to apply supervised learning.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Artificial Intelligence with Java for Beginners
Published in: Aug 2018Publisher: PacktISBN-13: 9781789537550
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Nisheeth Joshi

Nisheeth Joshi is an associate professor and a researcher at Banasthali University. He has also done a PhD in Natural Language Processing. He is an expert with the TDIL Program, Department of IT, Government of India, the premier organization overseeing language technology funding and research in India. He has several publications to his name in various journals and conferences, and also serves on the program committees and editorial boards of several conferences and journals.
Read more about Nisheeth Joshi