This is similar to the previous recipes, but we use Forest Trees to solve a regression problem (continuous). The following parameter is used to direct the algorithm to regression rather than classification. We again limit the number of classes to two:
val impurity = "variance" // USE variance for regression
- Start a new project in IntelliJ or in an IDE of your choice. Make sure the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter10
- Import the necessary packages from Spark:
import org.apache.spark.mllib.evaluation.RegressionMetrics import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.tree.model.RandomForestModel import org.apache.spark.rdd.RDD import org.apache.spark.mllib.tree.RandomForest import org.apache.spark.sql.SparkSession import org.apache.log4j.{Level...