Linear regression API with Lasso and L-BFGS in Spark 2.0
In this recipe, we will demonstrate the use of Spark 2.0's LinearRegression() API to showcase a unified/parameterized API to tackle the linear in a comprehensive capable of extension without backward-compatibility issues of an RDD-based named API. We show how to use the setSolver() to set the optimization method to first-order memory-efficient L-BFGS, which can deal with numerous amount of parameters (that is, especially in sparse configuration) with ease.
Note
In this recipe, the .setSolver() is set to lbgfs, which makes the L-BFGS (see RDD-based regression for more detail) the selected optimization method. The .setElasticNetParam() is not set, so the default of 0 remains in effect, which makes this a Lasso regression.
How to do it...
- We use a housing dataset from the UCI machine library depository.
- Download the entire data set from the following URLs: