Apache SystemML in action
So let's take a look at a very simple example. Let's create a script in Apache SystemML DSL--an R-like syntax--in order to multiply two matrices:
import org.apache.sysml.api.MLOutput
import org.apache.spark.sql.SQLContext
import org.apache.spark.mllib.util.LinearDataGenerator
import org.apache.sysml.api.MLContext
import org.apache.sysml.runtime.instructions.spark.utils.{RDDConverterUtilsExt => RDDConverterUtils}
import org.apache.sysml.runtime.matrix.MatrixCharacteristics;
val sqlContext = new SQLContext(sc)
val simpleScript =
"""
fileX = "";
fileY = "";
fileZ = "";
X = read (fileX);
Y = read (fileY);
Z = X %*% Y
write (Z,fileZ);
"""Then, we generate some test data:
// Generate data val rawDataX = sqlContext.createDataFrame(LinearDataGenerator.generateLinearRDD(sc, 100, 10, 1)) val rawDataY = sqlContext.createDataFrame(LinearDataGenerator.generateLinearRDD(sc, 10, 100, 1)) // Repartition into a more parallelism-friendly number of partitions val dataX = rawDataX...