Reader small image

You're reading from  Apache Spark 2.x Machine Learning Cookbook

Product typeBook
Published inSep 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781783551606
Edition1st Edition
Languages
Right arrow
Authors (5):
Mohammed Guller
Mohammed Guller
author image
Mohammed Guller

Author of Big Data Analytics with Spark - http://www.apress.com/9781484209653
Read more about Mohammed Guller

Siamak Amirghodsi
Siamak Amirghodsi
author image
Siamak Amirghodsi

Siamak Amirghodsi (Sammy) is interested in building advanced technical teams, executive management, Spark, Hadoop, big data analytics, AI, deep learning nets, TensorFlow, cognitive models, swarm algorithms, real-time streaming systems, quantum computing, financial risk management, trading signal discovery, econometrics, long-term financial cycles, IoT, blockchain, probabilistic graphical models, cryptography, and NLP.
Read more about Siamak Amirghodsi

Shuen Mei
Shuen Mei
author image
Shuen Mei

Shuen Mei is a big data analytic platforms expert with 15+ years of experience in designing, building, and executing large-scale, enterprise-distributed financial systems with mission-critical low-latency requirements. He is certified in the Apache Spark, Cloudera Big Data platform, including Developer, Admin, and HBase. He is also a certified AWS solutions architect with emphasis on peta-byte range real-time data platform systems.
Read more about Shuen Mei

Meenakshi Rajendran
Meenakshi Rajendran
author image
Meenakshi Rajendran

Meenakshi Rajendran is experienced in the end-to-end delivery of data analytics and data science products for leading financial institutions. Meenakshi holds a master's degree in business administration and is a certified PMP with over 13 years of experience in global software delivery environments. Her areas of research and interest are Apache Spark, cloud, regulatory data governance, machine learning, Cassandra, and managing global data teams at scale.
Read more about Meenakshi Rajendran

Broderick Hall
Broderick Hall
author image
Broderick Hall

Broderick Hall is a hands-on big data analytics expert and holds a masters degree in computer science with 20 years of experience in designing and developing complex enterprise-wide software applications with real-time and regulatory requirements at a global scale. He is a deep learning early adopter and is currently working on a large-scale cloud-based data platform with deep learning net augmentation.
Read more about Broderick Hall

View More author details
Right arrow

Package imports and initial setup for vectors and matrices


Before we can program in Spark or use and matrix artifacts, we need to first import the right packages and then set up SparkSession so we can gain access to the cluster handle.

In this short recipe, we highlight a comprehensive number of packages that can cover most of the linear algebra operations in Spark. The individual recipes that follow will include the subset required for the specific program.

How to do it...

  1. Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
  2. Set up the package location where the program will reside:
package spark.ml.cookbook.chapter2
  1.  Import the necessary packages for vector and matrix manipulation:
import org.apache.spark.mllib.linalg.distributed.RowMatrix
import org.apache.spark.mllib.linalg.distributed.{IndexedRow, IndexedRowMatrix}
import org.apache.spark.mllib.linalg.distributed.{CoordinateMatrix, MatrixEntry}
import org.apache.spark.sql.{SparkSession...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Apache Spark 2.x Machine Learning Cookbook
Published in: Sep 2017Publisher: PacktISBN-13: 9781783551606

Authors (5)

author image
Mohammed Guller

Author of Big Data Analytics with Spark - http://www.apress.com/9781484209653
Read more about Mohammed Guller

author image
Siamak Amirghodsi

Siamak Amirghodsi (Sammy) is interested in building advanced technical teams, executive management, Spark, Hadoop, big data analytics, AI, deep learning nets, TensorFlow, cognitive models, swarm algorithms, real-time streaming systems, quantum computing, financial risk management, trading signal discovery, econometrics, long-term financial cycles, IoT, blockchain, probabilistic graphical models, cryptography, and NLP.
Read more about Siamak Amirghodsi

author image
Shuen Mei

Shuen Mei is a big data analytic platforms expert with 15+ years of experience in designing, building, and executing large-scale, enterprise-distributed financial systems with mission-critical low-latency requirements. He is certified in the Apache Spark, Cloudera Big Data platform, including Developer, Admin, and HBase. He is also a certified AWS solutions architect with emphasis on peta-byte range real-time data platform systems.
Read more about Shuen Mei

author image
Meenakshi Rajendran

Meenakshi Rajendran is experienced in the end-to-end delivery of data analytics and data science products for leading financial institutions. Meenakshi holds a master's degree in business administration and is a certified PMP with over 13 years of experience in global software delivery environments. Her areas of research and interest are Apache Spark, cloud, regulatory data governance, machine learning, Cassandra, and managing global data teams at scale.
Read more about Meenakshi Rajendran

author image
Broderick Hall

Broderick Hall is a hands-on big data analytics expert and holds a masters degree in computer science with 20 years of experience in designing and developing complex enterprise-wide software applications with real-time and regulatory requirements at a global scale. He is a deep learning early adopter and is currently working on a large-scale cloud-based data platform with deep learning net augmentation.
Read more about Broderick Hall