Regression models in Weka

Exclusive offer: get 80% off this eBook here
Instant Weka How-to [Instant]

Instant Weka How-to [Instant] — Save 80%

Implement cutting-edge data mining aspects in Weka to your applications with this book and ebook

₨462.00    ₨92.40
by Boštjan Kaluža | September 2013 | Java Open Source

Regression is a technique used to predict a value of a numerical class, in contrast to classification, which predicts the value of a nominal class. Given a set of attributes, the regression builds a model, usually an equation that is used to compute the predicted class value. In this article by Boštjan Kaluža, the author of Instant Weka How-to, you will learn about the regression models in Weka.

(For more resources related to this topic, see here.)

Getting ready

Let's look at an example of a house price-based regression model, and create some real data to examine. These are actual numbers from houses for sale, and we will be trying to find the value of a house we are supposed to sell:

Size (m2)

Land (m2)

Rooms

Granite

Extra bathroom

Price

1076

2801

6

0

0

€324.500,00

990

3067

5

1

1

€466.000,00

1229

3094

5

0

1

€425.900,00

731

4315

4

1

0

€387.120,00

671

2926

4

0

1

€312.100,00

1078

6094

6

1

1

€603.000,00

909

2854

5

0

1

€383.400,00

975

2947

5

1

1

??

To load files in Weka, we have to put the table in the ARFF file format and save it as house.arff. Make sure the attributes are numeric, as shown here:

@RELATION house
@ATTRIBUTE size NUMERIC
@ATTRIBUTE land NUMERIC
@ATTRIBUTE rooms NUMERIC
@ATTRIBUTE granite NUMERIC
@ATTRIBUTE extra_bathroom NUMERIC
@ATTRIBUTE price NUMERIC
@DATA
1076,2801,6,0,0,324500
990,3067,5,1,1,466000
1229,3094,5,0,1,425900
731,4315,4,1,0,387120
671,2926,4,0,1,312100
1078,6094,6,1,1,603000
909,2854,5,0,1,383400
975,2947,5,1,1,?

How to do it…

Use the following snippet:

import java.io.BufferedReader;
import java.io.FileReader;
import weka.core.Instance;
import weka.core.Instances;
import weka.classifiers.functions.LinearRegression;
public class Regression{
public static void main(String args[]) throws Exception{
//load data
Instances data = new Instances(new BufferedReader(new
FileReader("dataset/house.arff")));
data.setClassIndex(data.numAttributes() - 1);
//build model
LinearRegression model = new LinearRegression();
model.buildClassifier(data); //the last instance with missing
class is not used
System.out.println(model);
//classify the last instance
Instance myHouse = data.lastInstance();
double price = model.classifyInstance(myHouse);
System.out.println("My house ("+myHouse+"): "+price);
}
}

Here is the output:

Linear Regression Model
price =
195.2035 * size +
38.9694 * land +
76218.4642 * granite +
73947.2118 * extra_bathroom +
2681.136
My house (975,2947,5,1,1,?): 458013.16703945777

How it works…

Import a basic regression model named weka.classifiers.functions.LinearRegression:

import java.io.BufferedReader;
import java.io.FileReader;
import weka.core.Instance;
import weka.core.Instances;
import weka.classifiers.functions.LinearRegression;

Load the house dataset:

Instances data = new Instances(new BufferedReader(new
FileReader("dataset/house.arff")));
data.setClassIndex(data.numAttributes() - 1);

Initialize and build a regression model. Note, that the last instance is not used for building the model since the class value is missing:

LinearRegression model = new LinearRegression();
model.buildClassifier(data);

Output the model:

System.out.println(model);

Use the model to predict the price of the last instance in the dataset:

Instance myHouse = data.lastInstance();
double price = model.classifyInstance(myHouse);
System.out.println("My house ("+myHouse+"): "+price);

There’s more

This section lists some additional algorithms.

Other regression algorithms

There is a wide variety of implemented regression algorithms one can use in Weka:

  • weka.classifiers.rules.ZeroR: The class for building and using an 0-R classifier. Predicts the mean (for a numeric class) or the mode (for a nominal class) and it is considered as a baseline; that is, if your classifier's performance is worse than average value predictor, it is not worth considering it.
  • weka.classifiers.trees.REPTree: The fast decision tree learner. Builds a decision/regression tree using information gain/variance and prunes it using reduced-error pruning (with backfitting). It only sorts values for numeric attributes once. Missing values are dealt with by splitting the corresponding instances into pieces (that is, as in C4.5).
  • weka.classifiers.functions.SMOreg: SMOreg implements the support vector machine for regression. The parameters can be learned using various algorithms. The algorithm is selected by setting the RegOptimizer. The most popular algorithm (RegSMOImproved) is due to Shevade, Keerthi, and others, and this is the default RegOptimizer.
  • weka.classifiers.functions.MultilayerPerceptron: A classifier that uses backpropagation to classify instances. This network can be built by hand, or created by an algorithm, or both. The network can also be monitored and modified during training time. The nodes in this network are all sigmoid (except for when the class is numeric in which case the output nodes become unthresholded linear units).
  • weka.classifiers.functions.GaussianProcesses: Implements Gaussian Processes for regression without hyperparameter-tuning.

Summary

We learned how to use models that predict a value of numerical class, in contrast to classification, which predicts the value of a nominal class. Given a set of attributes, the regression builds a model, usually an equation that is used to compute the predicted class value.

Resources for Article:


Further resources on this subject:


Instant Weka How-to [Instant] Implement cutting-edge data mining aspects in Weka to your applications with this book and ebook
Published: June 2013
eBook Price: ₨462.00
See more
Select your format and quantity:

About the Author :


Boštjan Kaluža

Boštjan Kaluža, PhD is a researcher in artificial intelligence and ubiquitous computing. Since October 2008, he has been working at Jozef Stefan Institute, Slovenia. His research focuses on the development of novel algorithms and approaches, with an emphasis on human behavior analysis from sensor data using machine learning and data mining techniques. Boštjan has extensive experience in Java and Python, and lectures Weka in the classroom. He spent a year as a visiting researcher at the University of Southern California, where he studied suspicious and anomalous agent behavior in the context of security applications. He has published over 40 journal articles and conference papers.

Books From Packt


Java EE Development with Eclipse
Java EE Development with Eclipse

 Java Persistence with MyBatis 3
Java Persistence with MyBatis 3

Java 7 Concurrency Cookbook
Java 7 Concurrency Cookbook

Java EE 6 Development with NetBeans 7
Java EE 6 Development with NetBeans 7

Java EE 7 Developer Handbook
Java EE 7 Developer Handbook

Java EE 6 with GlassFish 3 Application Server
Java EE 6 with GlassFish 3 Application Server

Java 7 New Features Cookbook
Java 7 New Features Cookbook

Java 7 JAX-WS Web Services
Java 7 JAX-WS Web Services


Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software