Advanced Analytics and Real-Time Data Processing in Apache Spark [Video]

More Information
  • Real-time data streaming processes and operations with Spark Streaming
  • Implement high-velocity streaming and data processing use cases while working with streaming API
  • Dive into MLlib– the machine learning functional library in Spark with highly scalable algorithms.
  • Createmachine learning pipelines to combine multiple algorithms in a single workflow.
  • Understand graphs and the Apache Spark API for graphs—GraphX
  • Apply interesting graph algorithms and graph processing with GraphX in a distributed environment
  • Use R, the popular statistical language, to work with Spark—SparkR
  • See how SparkR allows users to create and transform RDDs in R
  • See analytical use case implementations using MLLib, GraphX, and Spark Streaming

This comprehensive tutorial will acquaint you with all the aspects of real-time analytics with Apache Spark, one of the trending Big Data processing frameworks on the market today. It will show you how to leverage the features of various components of the Spark framework to efficiently process, analyze, and visualize your data.

You will learn how to implement the high velocity streaming operation for data processing in order to perform efficient analytics on your real-time data. You’ll analyze data using machine learning techniques and graphs. You’ll learn about Spark Streaming and create real-world streaming processing that address all the problems that need to be solved. You’ll solve problems using Machine Learning techniques and find out about all the tools available in the MLlibtoolkit. You’ll find out how to leverage Graphs to solve real-world problems.

At the end of this video, you’ll also see some useful Machine Learning algorithms with the help of Spark MLlib and will integrate Spark with R. We’ll also make sure you’re confident and prepared for graph processing, as you’ll learn more about the GraphX API. By the end, you’ll be well-versed in the aspects of real-time analytics and implement them with Apache Spark.

Style and Approach

Filled with hands-on examples, this course will help you perform data analysis and take you from an intermediate level to an advanced approach to data analytics. You will perform graph analysis, handling high velocity streaming with some analytical use cases.

  • Leverage the power of Apache Spark to perform efficient data processing and analytics on your data in real-time
  • Process and analyze streams of data with ease and perform machine learning efficiently
  • A comprehensive tutorial to help you get the most out of the trending Big Data framework for all your data processing needs
Course Length 3 hours 24 minutes
ISBN 9781787282032
Date Of Publication 25 Jan 2018


Tomasz Lelek

Tomasz Lelek is a software engineer, programming mostly in Java and Scala. He has been working with the Spark and ML APIs for the past 6 years, with production experience in processing petabytes of data. He is passionate about nearly everything associated with software development and believes that we should always try to consider different solutions and approaches before attempting to solve a problem. Recently, he was also a speaker at conferences in Poland—Confitura, and JDD (Java Developers Day) and at Krakow Scala User Group. He has also conducted a live coding session at the Geecon Conference.