# Regression models in Weka

Exclusive offer: get 50% off this eBook here

### Instant Weka How-to [Instant] — Save 50%

Implement cutting-edge data mining aspects in Weka to your applications with this book and ebook

\$14.99    \$7.50
by Boštjan Kaluža | September 2013 | Java Open Source

Regression is a technique used to predict a value of a numerical class, in contrast to classification, which predicts the value of a nominal class. Given a set of attributes, the regression builds a model, usually an equation that is used to compute the predicted class value. In this article by Boštjan Kaluža, the author of Instant Weka How-to, you will learn about the regression models in Weka.

(For more resources related to this topic, see here.)

Let's look at an example of a house price-based regression model, and create some real data to examine. These are actual numbers from houses for sale, and we will be trying to find the value of a house we are supposed to sell:

 Size (m2) Land (m2) Rooms Granite Extra bathroom Price 1076 2801 6 0 0 €324.500,00 990 3067 5 1 1 €466.000,00 1229 3094 5 0 1 €425.900,00 731 4315 4 1 0 €387.120,00 671 2926 4 0 1 €312.100,00 1078 6094 6 1 1 €603.000,00 909 2854 5 0 1 €383.400,00 975 2947 5 1 1 ??

To load files in Weka, we have to put the table in the ARFF file format and save it as house.arff. Make sure the attributes are numeric, as shown here:

`@RELATION house@ATTRIBUTE size NUMERIC@ATTRIBUTE land NUMERIC@ATTRIBUTE rooms NUMERIC@ATTRIBUTE granite NUMERIC@ATTRIBUTE extra_bathroom NUMERIC@ATTRIBUTE price NUMERIC@DATA1076,2801,6,0,0,324500990,3067,5,1,1,4660001229,3094,5,0,1,425900731,4315,4,1,0,387120671,2926,4,0,1,3121001078,6094,6,1,1,603000909,2854,5,0,1,383400975,2947,5,1,1,?`

## How to do it…

Use the following snippet:

`import java.io.BufferedReader;import java.io.FileReader;import weka.core.Instance;import weka.core.Instances;import weka.classifiers.functions.LinearRegression;public class Regression{public static void main(String args[]) throws Exception{//load dataInstances data = new Instances(new BufferedReader(newFileReader("dataset/house.arff")));data.setClassIndex(data.numAttributes() - 1);//build modelLinearRegression model = new LinearRegression();model.buildClassifier(data); //the last instance with missingclass is not usedSystem.out.println(model);//classify the last instanceInstance myHouse = data.lastInstance();double price = model.classifyInstance(myHouse);System.out.println("My house ("+myHouse+"): "+price);}}`

Here is the output:

`Linear Regression Modelprice =195.2035 * size +38.9694 * land +76218.4642 * granite +73947.2118 * extra_bathroom +2681.136My house (975,2947,5,1,1,?): 458013.16703945777`

## How it works…

Import a basic regression model named weka.classifiers.functions.LinearRegression:

`import java.io.BufferedReader;import java.io.FileReader;import weka.core.Instance;import weka.core.Instances;import weka.classifiers.functions.LinearRegression;`

`Instances data = new Instances(new BufferedReader(newFileReader("dataset/house.arff")));data.setClassIndex(data.numAttributes() - 1);`

Initialize and build a regression model. Note, that the last instance is not used for building the model since the class value is missing:

`LinearRegression model = new LinearRegression();model.buildClassifier(data);`

Output the model:

`System.out.println(model);`

Use the model to predict the price of the last instance in the dataset:

`Instance myHouse = data.lastInstance();double price = model.classifyInstance(myHouse);System.out.println("My house ("+myHouse+"): "+price);`

## There’s more

This section lists some additional algorithms.

### Other regression algorithms

There is a wide variety of implemented regression algorithms one can use in Weka:

• weka.classifiers.rules.ZeroR: The class for building and using an 0-R classifier. Predicts the mean (for a numeric class) or the mode (for a nominal class) and it is considered as a baseline; that is, if your classifier's performance is worse than average value predictor, it is not worth considering it.
• weka.classifiers.trees.REPTree: The fast decision tree learner. Builds a decision/regression tree using information gain/variance and prunes it using reduced-error pruning (with backfitting). It only sorts values for numeric attributes once. Missing values are dealt with by splitting the corresponding instances into pieces (that is, as in C4.5).
• weka.classifiers.functions.SMOreg: SMOreg implements the support vector machine for regression. The parameters can be learned using various algorithms. The algorithm is selected by setting the RegOptimizer. The most popular algorithm (RegSMOImproved) is due to Shevade, Keerthi, and others, and this is the default RegOptimizer.
• weka.classifiers.functions.MultilayerPerceptron: A classifier that uses backpropagation to classify instances. This network can be built by hand, or created by an algorithm, or both. The network can also be monitored and modified during training time. The nodes in this network are all sigmoid (except for when the class is numeric in which case the output nodes become unthresholded linear units).
• weka.classifiers.functions.GaussianProcesses: Implements Gaussian Processes for regression without hyperparameter-tuning.

# Summary

We learned how to use models that predict a value of numerical class, in contrast to classification, which predicts the value of a nominal class. Given a set of attributes, the regression builds a model, usually an equation that is used to compute the predicted class value.

## Resources for Article:

Further resources on this subject:

## Instant Weka How-to [Instant]

 Implement cutting-edge data mining aspects in Weka to your applications with this book and ebook
Published: June 2013
eBook Price: \$14.99
See more

## Boštjan Kaluža

Boštjan Kaluža, PhD is a researcher in artificial intelligence and ubiquitous computing. Since October 2008, he has been working at Jozef Stefan Institute, Slovenia. His research focuses on the development of novel algorithms and approaches, with an emphasis on human behavior analysis from sensor data using machine learning and data mining techniques. Boštjan has extensive experience in Java and Python, and lectures Weka in the classroom. He spent a year as a visiting researcher at the University of Southern California, where he studied suspicious and anomalous agent behavior in the context of security applications. He has published over 40 journal articles and conference papers.