Regression models in Weka

Instant Weka How-to


June 2013

$14.99

Implement cutting-edge data mining aspects in Weka to your applications

(For more resources related to this topic, see here.)

Getting ready

Let's look at an example of a house price-based regression model, and create some real data to examine. These are actual numbers from houses for sale, and we will be trying to find the value of a house we are supposed to sell:

Size (m2)

Land (m2)

Rooms

Granite

Extra bathroom

Price

1076

2801

6

0

0

€324.500,00

990

3067

5

1

1

€466.000,00

1229

3094

5

0

1

€425.900,00

731

4315

4

1

0

€387.120,00

671

2926

4

0

1

€312.100,00

1078

6094

6

1

1

€603.000,00

909

2854

5

0

1

€383.400,00

975

2947

5

1

1

??

To load files in Weka, we have to put the table in the ARFF file format and save it as house.arff. Make sure the attributes are numeric, as shown here:

@RELATION house
@ATTRIBUTE size NUMERIC
@ATTRIBUTE land NUMERIC
@ATTRIBUTE rooms NUMERIC
@ATTRIBUTE granite NUMERIC
@ATTRIBUTE extra_bathroom NUMERIC
@ATTRIBUTE price NUMERIC
@DATA
1076,2801,6,0,0,324500
990,3067,5,1,1,466000
1229,3094,5,0,1,425900
731,4315,4,1,0,387120
671,2926,4,0,1,312100
1078,6094,6,1,1,603000
909,2854,5,0,1,383400
975,2947,5,1,1,?

How to do it…

Use the following snippet:

import java.io.BufferedReader;
import java.io.FileReader;
import weka.core.Instance;
import weka.core.Instances;
import weka.classifiers.functions.LinearRegression;
public class Regression{
public static void main(String args[]) throws Exception{
//load data
Instances data = new Instances(new BufferedReader(new
FileReader("dataset/house.arff")));
data.setClassIndex(data.numAttributes() - 1);
//build model
LinearRegression model = new LinearRegression();
model.buildClassifier(data); //the last instance with missing
class is not used
System.out.println(model);
//classify the last instance
Instance myHouse = data.lastInstance();
double price = model.classifyInstance(myHouse);
System.out.println("My house ("+myHouse+"): "+price);
}
}

Here is the output:

Linear Regression Model
price =
195.2035 * size +
38.9694 * land +
76218.4642 * granite +
73947.2118 * extra_bathroom +
2681.136
My house (975,2947,5,1,1,?): 458013.16703945777

How it works…

Import a basic regression model named weka.classifiers.functions.LinearRegression:

import java.io.BufferedReader;
import java.io.FileReader;
import weka.core.Instance;
import weka.core.Instances;
import weka.classifiers.functions.LinearRegression;

Load the house dataset:

Instances data = new Instances(new BufferedReader(new
FileReader("dataset/house.arff")));
data.setClassIndex(data.numAttributes() - 1);

Initialize and build a regression model. Note, that the last instance is not used for building the model since the class value is missing:

LinearRegression model = new LinearRegression();
model.buildClassifier(data);

Output the model:

System.out.println(model);

Use the model to predict the price of the last instance in the dataset:

Instance myHouse = data.lastInstance();
double price = model.classifyInstance(myHouse);
System.out.println("My house ("+myHouse+"): "+price);

There’s more

This section lists some additional algorithms.

Other regression algorithms

There is a wide variety of implemented regression algorithms one can use in Weka:

  • weka.classifiers.rules.ZeroR: The class for building and using an 0-R classifier. Predicts the mean (for a numeric class) or the mode (for a nominal class) and it is considered as a baseline; that is, if your classifier's performance is worse than average value predictor, it is not worth considering it.
  • weka.classifiers.trees.REPTree: The fast decision tree learner. Builds a decision/regression tree using information gain/variance and prunes it using reduced-error pruning (with backfitting). It only sorts values for numeric attributes once. Missing values are dealt with by splitting the corresponding instances into pieces (that is, as in C4.5).
  • weka.classifiers.functions.SMOreg: SMOreg implements the support vector machine for regression. The parameters can be learned using various algorithms. The algorithm is selected by setting the RegOptimizer. The most popular algorithm (RegSMOImproved) is due to Shevade, Keerthi, and others, and this is the default RegOptimizer.
  • weka.classifiers.functions.MultilayerPerceptron: A classifier that uses backpropagation to classify instances. This network can be built by hand, or created by an algorithm, or both. The network can also be monitored and modified during training time. The nodes in this network are all sigmoid (except for when the class is numeric in which case the output nodes become unthresholded linear units).
  • weka.classifiers.functions.GaussianProcesses: Implements Gaussian Processes for regression without hyperparameter-tuning.

Summary

We learned how to use models that predict a value of numerical class, in contrast to classification, which predicts the value of a nominal class. Given a set of attributes, the regression builds a model, usually an equation that is used to compute the predicted class value.

Resources for Article:


Further resources on this subject:


Books to Consider

comments powered by Disqus