Switch to the store?

Spark Analytics for Real-Time Data Processing [Video]

More Information
Learn
  • Loading data from a variety of structured sources (for example, JSON, Hive, and Parquet) using Spark SQL and schema RDDs.
  • Querying data using spark SQL from external tools using JDBC/ODBC for example, Tableau, Qlik, and from the Spark program.
  • Integration between SQL and Java/Scala/Python code.
  • How Spark Streaming works on top of the Spark core and inherits all its features
  • Architecture of Spark Streaming.
  • Spark Streaming programming and DStreams.
  • Best Practice for managing high-velocity streaming data.
  • Best Practice for External data sources.
About

This tutorial is focused on analytics and real-time data processing using Apache Spark. You will begin with Spark SQL, using the Spark SQL API and built-in functions; within Apache Spark, you will go through some interactive analysis and look at some integrations between Spark and Java/Scala/Python.

You will explore Spark Streaming, streaming context, and DStreams. You will learn how Spark streaming works on top of the Spark core, thus inheriting its features. You will stream data and also learn best practices for managing high-velocity streaming and external data sources.

By the end of this course, you will be able to load data from a variety of structured sources (for example, JSON, Hive, and Parquet) using Spark SQL and schema RDDs and will perform real-time data processing.

Style and Approach

Filled with examples, this course will help viewers perform real-time data analysis and help them get started with analytics. Viewers will learn to build streaming applications and handle high-velocity streaming.

Features
  • Query data using Spark SQL APIs
  • Handle steaming data using Spark Streaming 
  • Best Practice for streaming data
Course Length 1 hour 38 minutes
ISBN 9781787287402
Date Of Publication 27 Jun 2018
Spark SQL Introduction
Spark SQL – Core Abstractions
Creating DataFrames from RDD
Creating DataFrames from Files
Creating DataFrames from Data Sources
DataFrame API – Common Operations
DataFrame API – Query Operations
DataFrame API – Actions
DataFrame API – Built-In Functions
Spark Streaming – Quick Example
Spark Streaming – Architecture
Spark Streaming – Transformations
Spark Streaming – Input Sources
Spark Streaming – Performance Considerations

Authors

Nishant Garg

Nishant Garg has over 17 years' software architecture and development experience in various technologies, such as Java Enterprise Edition, SOA, Spring, Hadoop, Hive, Flume, Sqoop, Oozie, Spark, Shark, YARN, Impala, Kafka, Storm, Solr/Lucene, NoSQL databases (such as HBase, Cassandra, and MongoDB), and MPP databases (such as GreenPlum). He received his MS in software systems from the Birla Institute of Technology and Science, Pilani, India, and is currently working as a technical architect for the Big Data RandD Group with Impetus Infotech Pvt. Ltd. Previously, Nishant has enjoyed working with some of the most recognizable names in IT services and financial industries, employing full software life cycle methodologies such as Agile and SCRUM. Nishant has also undertaken many speaking engagements on big data technologies and is also the author of Apache Kafka and HBase Essentials, Packt Publishing.