Before we can program in Spark or use and matrix artifacts, we need to first import the right packages and then set up SparkSession
so we can gain access to the cluster handle.
In this short recipe, we highlight a comprehensive number of packages that can cover most of the linear algebra operations in Spark. The individual recipes that follow will include the subset required for the specific program.
- Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter2
- Import the necessary packages for vector and matrix manipulation:
import org.apache.spark.mllib.linalg.distributed.RowMatrix import org.apache.spark.mllib.linalg.distributed.{IndexedRow, IndexedRowMatrix} import org.apache.spark.mllib.linalg.distributed.{CoordinateMatrix, MatrixEntry} import org.apache.spark.sql.{SparkSession...