Advanced Analytics and Real-Time Data Processing in Apache Spark [Video]

Preview in Mapt

Advanced Analytics and Real-Time Data Processing in Apache Spark [Video]

Tomasz Lelek

Implement high velocity streaming for real-time data processing along with machine learning, graph analysis operations using Spark MLlib, GraphX, SparkR on Apache Spark and explore some Analytical use-cases on Spark.
Mapt Subscription
FREE
$29.99/m after trial
Video
$106.25
RRP $124.99
Save 14%
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$106.25
$29.99 p/m after trial
RRP $124.99
Subscription
Video
Start 14 Day Trial

Frequently bought together


Advanced Analytics and Real-Time Data Processing in Apache Spark [Video] Book Cover
Advanced Analytics and Real-Time Data Processing in Apache Spark [Video]
$ 124.99
$ 106.25
Apache Spark with Python - Big Data with PySpark and Spark [Video] Book Cover
Apache Spark with Python - Big Data with PySpark and Spark [Video]
$ 149.99
$ 127.50
Buy 2 for $35.01
Save $239.97
Add to Cart

Video Details

ISBN 139781787282032
Course Length3 hours and 24 minutes

Video Description

This comprehensive tutorial will acquaint you with all the aspects of real-time analytics with Apache Spark, one of the trending Big Data processing frameworks on the market today. It will show you how to leverage the features of various components of the Spark framework to efficiently process, analyze, and visualize your data.

You will learn how to implement the high velocity streaming operation for data processing in order to perform efficient analytics on your real-time data. You’ll analyze data using machine learning techniques and graphs. You’ll learn about Spark Streaming and create real-world streaming processing that address all the problems that need to be solved. You’ll solve problems using Machine Learning techniques and find out about all the tools available in the MLlibtoolkit. You’ll find out how to leverage Graphs to solve real-world problems.

At the end of this video, you’ll also see some useful Machine Learning algorithms with the help of Spark MLlib and will integrate Spark with R. We’ll also make sure you’re confident and prepared for graph processing, as you’ll learn more about the GraphX API. By the end, you’ll be well-versed in the aspects of real-time analytics and implement them with Apache Spark.

Style and Approach

Filled with hands-on examples, this course will help you perform data analysis and take you from an intermediate level to an advanced approach to data analytics. You will perform graph analysis, handling high velocity streaming with some analytical use cases.

Table of Contents

Spark Streaming
The Course Overview
Introducing Spark Streaming
Streaming Context
Processing Streaming Data
Use Cases
Spark Streaming Word Count Hands-On
Spark Streaming - Understanding Master URL
Integrating Spark Streaming with Apache Kafka
mapWithState Operation
Transform and Window Operation
Join and Output Operations
Output Operations -Saving Results to Kafka Sink
Advance Streaming and Use Cases
Handling Time in High Velocity Streams
Connecting External Systems That Works in At Least Once Guarantee - Deduplicaion
Building Streaming Application -Handling Events That Are Not in Order
Filtering Bots from Stream of Page View Events
Spark MLlib and ML Pipelines
Introducing Machine Learning with Spark
Feature Extraction and Transformation
Transforming Text into Vector of Numbers - ML Bag-of-Words Technique
Logistic Regression
Model Evaluation
Clustering
Implementing GMM in Apache Spark
Principal Component Analysis and Distributing the Singular Value Decomposition (SVD)
Collaborative Filtering - Building Recommendation Engine
Spark GraphX
Introducing Spark GraphX - How to Represent a Graph?
Limitations of Graph-Parallel System - Why Spark GraphX?
Importing GraphX
Create a Graph Using GraphX and Property Graph
List of Operators
Perform Graph Operations Using GraphX
Triplet View
Performing Spark GraphX Operations
Perform Subgraph Operations
Neighbourhood Aggregations - Collecting Neighbours
Counting Degree of Vertex
Caching and Uncaching
GraphBuilder
Vertex and Edge RDD
Structural Operators - Connected Components
SparkR
Introduction to SparkR and How It's Used?
Setting Up from RStudio
Creating Spark DataFrames from Data Sources
SparkDataFrames Operations - Grouping, Aggregation
Run a Given Function on a Large Dataset Using dapply or dapplyCollect
Running Large Dataset by Input Column(s) and Using gapply or gapplyCollect
Run Local R Functions Distributed Using spark.lapply
Running SQL Queries from SparkR
Analytical Use Cases
PageRank Using Spark GraphX
Sending Real-Time NotificationWhen User Want to Buy a Product on the E-Commerce Site

What You Will Learn

  • Real-time data streaming processes and operations with Spark Streaming
  • Implement high-velocity streaming and data processing use cases while working with streaming API
  • Dive into MLlib– the machine learning functional library in Spark with highly scalable algorithms.
  • Createmachine learning pipelines to combine multiple algorithms in a single workflow.
  • Understand graphs and the Apache Spark API for graphs—GraphX
  • Apply interesting graph algorithms and graph processing with GraphX in a distributed environment
  • Use R, the popular statistical language, to work with Spark—SparkR
  • See how SparkR allows users to create and transform RDDs in R
  • See analytical use case implementations using MLLib, GraphX, and Spark Streaming

Authors

Table of Contents

Spark Streaming
The Course Overview
Introducing Spark Streaming
Streaming Context
Processing Streaming Data
Use Cases
Spark Streaming Word Count Hands-On
Spark Streaming - Understanding Master URL
Integrating Spark Streaming with Apache Kafka
mapWithState Operation
Transform and Window Operation
Join and Output Operations
Output Operations -Saving Results to Kafka Sink
Advance Streaming and Use Cases
Handling Time in High Velocity Streams
Connecting External Systems That Works in At Least Once Guarantee - Deduplicaion
Building Streaming Application -Handling Events That Are Not in Order
Filtering Bots from Stream of Page View Events
Spark MLlib and ML Pipelines
Introducing Machine Learning with Spark
Feature Extraction and Transformation
Transforming Text into Vector of Numbers - ML Bag-of-Words Technique
Logistic Regression
Model Evaluation
Clustering
Implementing GMM in Apache Spark
Principal Component Analysis and Distributing the Singular Value Decomposition (SVD)
Collaborative Filtering - Building Recommendation Engine
Spark GraphX
Introducing Spark GraphX - How to Represent a Graph?
Limitations of Graph-Parallel System - Why Spark GraphX?
Importing GraphX
Create a Graph Using GraphX and Property Graph
List of Operators
Perform Graph Operations Using GraphX
Triplet View
Performing Spark GraphX Operations
Perform Subgraph Operations
Neighbourhood Aggregations - Collecting Neighbours
Counting Degree of Vertex
Caching and Uncaching
GraphBuilder
Vertex and Edge RDD
Structural Operators - Connected Components
SparkR
Introduction to SparkR and How It's Used?
Setting Up from RStudio
Creating Spark DataFrames from Data Sources
SparkDataFrames Operations - Grouping, Aggregation
Run a Given Function on a Large Dataset Using dapply or dapplyCollect
Running Large Dataset by Input Column(s) and Using gapply or gapplyCollect
Run Local R Functions Distributed Using spark.lapply
Running SQL Queries from SparkR
Analytical Use Cases
PageRank Using Spark GraphX
Sending Real-Time NotificationWhen User Want to Buy a Product on the E-Commerce Site

Video Details

ISBN 139781787282032
Course Length3 hours and 24 minutes
Read More

Read More Reviews

Recommended for You

Apache Spark with Python - Big Data with PySpark and Spark [Video] Book Cover
Apache Spark with Python - Big Data with PySpark and Spark [Video]
$ 149.99
$ 127.50
Real-time Data Processing with Azure Stream Analytics [Video] Book Cover
Real-time Data Processing with Azure Stream Analytics [Video]
$ 124.99
$ 106.25
Practical Real-time Data Processing and Analytics Book Cover
Practical Real-time Data Processing and Analytics
$ 39.99
$ 28.00
Apache Kafka Series - Kafka Streams for Data Processing [Video] Book Cover
Apache Kafka Series - Kafka Streams for Data Processing [Video]
$ 114.99
$ 97.75
Apache Spark with Scala - Learn Spark from a Big Data Guru [Video] Book Cover
Apache Spark with Scala - Learn Spark from a Big Data Guru [Video]
$ 149.99
$ 127.50
Apache Spark with Java - Learn Spark from a Big Data Guru [Video] Book Cover
Apache Spark with Java - Learn Spark from a Big Data Guru [Video]
$ 197.99
$ 168.30