Apache Spark 2.x for Java Developers

Unleash the data processing and analytics capability of Apache Spark with the language of choice: Java
Preview in Mapt

Apache Spark 2.x for Java Developers

Sourav Gulati, Sumit Kumar

3 customer reviews
Unleash the data processing and analytics capability of Apache Spark with the language of choice: Java

Quick links: > What will you learn?> Table of content> Product reviews

eBook
$28.00
RRP $39.99
Save 29%
Print + eBook
$49.99
RRP $49.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$28.00
$49.99
RRP $39.99
RRP $49.99
eBook
Print + eBook

Frequently bought together


Apache Spark 2.x for Java Developers Book Cover
Apache Spark 2.x for Java Developers
$ 39.99
$ 28.00
Building Data Streaming Applications with Apache Kafka Book Cover
Building Data Streaming Applications with Apache Kafka
$ 35.99
$ 25.20
Buy 2 for $35.00
Save $40.98
Add to Cart

Book Details

ISBN 139781787126497
Paperback350 pages

Book Description

Apache Spark is the buzzword in the big data industry right now, especially with the increasing need for real-time streaming and data processing. While Spark is built on Scala, the Spark Java API exposes all the Spark features available in the Scala version for Java developers. This book will show you how you can implement various functionalities of the Apache Spark framework in Java, without stepping out of your comfort zone.

The book starts with an introduction to the Apache Spark 2.x ecosystem, followed by explaining how to install and configure Spark, and refreshes the Java concepts that will be useful to you when consuming Apache Spark's APIs. You will explore RDD and its associated common Action and Transformation Java APIs, set up a production-like clustered environment, and work with Spark SQL. Moving on, you will perform near-real-time processing with Spark streaming, Machine Learning analytics with Spark MLlib, and graph processing with GraphX, all using various Java packages.

By the end of the book, you will have a solid foundation in implementing components in the Spark framework in Java to build fast, real-time applications.

Table of Contents

Chapter 1: Introduction to Spark
Dimensions of big data
What makes Hadoop so revolutionary?
Why Apache Spark?
RDD - the first citizen of Spark
Exploring the Spark ecosystem
What's new in Spark 2.X?
References
Summary
Chapter 2: Revisiting Java
Why use Java for Spark?
Generics
Interfaces
Lambda expressions
Lexical scoping
Streams
Intermediate operations
Terminal operations
Summary
Chapter 3: Let Us Spark
Getting started with Spark
Spark REPL also known as CLI
Some basic exercises using Spark shell
Spark components
Spark Driver Web UI
Spark job configuration and submission
Spark REST APIs
Summary
Chapter 4: Understanding the Spark Programming Model
Hello Spark
Common RDD transformations
Common RDD actions
RDD persistence and cache
Summary
Chapter 5: Working with Data and Storage
Interaction with external storage systems
Working with different data formats
References
Summary
Chapter 6: Spark on Cluster
Spark application in distributed-mode
Cluster managers
Yet Another Resource Negotiator (YARN)
Summary
Chapter 7: Spark Programming Model - Advanced
RDD partitioning
Advanced transformations
Advanced actions
Shared variable
Broadcast variable
Summary
Chapter 8: Working with Spark SQL
SQLContext and HiveContext
Dataframe and dataset
Spark SQL operations
Hive integration
Summary
Chapter 9: Near Real-Time Processing with Spark Streaming
Introducing Spark Streaming
Understanding micro batching
Streaming sources
Kafka
Streaming transformations
Fault tolerance and reliability
Structured Streaming
Summary
Chapter 10: Machine Learning Analytics with Spark MLlib
Introduction to machine learning
Concepts of machine learning
Machine learning work flow
Operations on feature vectors
Summary
Chapter 11: Learning Spark GraphX
Introduction to GraphX
Introduction to Property Graph
Getting started with the GraphX API
Graph operations
Graph algorithms
Summary

What You Will Learn

  • Process data using different file formats such as XML, JSON, CSV, and plain and delimited text, using the Spark core Library.
  • Perform analytics on data from various data sources such as Kafka, and Flume using Spark Streaming Library
  • Learn SQL schema creation and the analysis of structured data using various SQL functions including Windowing functions in the Spark SQL Library
  • Explore Spark Mlib APIs while implementing Machine Learning techniques to solve real-world problems
  • Get to know Spark GraphX so you understand various graph-based analytics that can be performed with Spark

Authors

Table of Contents

Chapter 1: Introduction to Spark
Dimensions of big data
What makes Hadoop so revolutionary?
Why Apache Spark?
RDD - the first citizen of Spark
Exploring the Spark ecosystem
What's new in Spark 2.X?
References
Summary
Chapter 2: Revisiting Java
Why use Java for Spark?
Generics
Interfaces
Lambda expressions
Lexical scoping
Streams
Intermediate operations
Terminal operations
Summary
Chapter 3: Let Us Spark
Getting started with Spark
Spark REPL also known as CLI
Some basic exercises using Spark shell
Spark components
Spark Driver Web UI
Spark job configuration and submission
Spark REST APIs
Summary
Chapter 4: Understanding the Spark Programming Model
Hello Spark
Common RDD transformations
Common RDD actions
RDD persistence and cache
Summary
Chapter 5: Working with Data and Storage
Interaction with external storage systems
Working with different data formats
References
Summary
Chapter 6: Spark on Cluster
Spark application in distributed-mode
Cluster managers
Yet Another Resource Negotiator (YARN)
Summary
Chapter 7: Spark Programming Model - Advanced
RDD partitioning
Advanced transformations
Advanced actions
Shared variable
Broadcast variable
Summary
Chapter 8: Working with Spark SQL
SQLContext and HiveContext
Dataframe and dataset
Spark SQL operations
Hive integration
Summary
Chapter 9: Near Real-Time Processing with Spark Streaming
Introducing Spark Streaming
Understanding micro batching
Streaming sources
Kafka
Streaming transformations
Fault tolerance and reliability
Structured Streaming
Summary
Chapter 10: Machine Learning Analytics with Spark MLlib
Introduction to machine learning
Concepts of machine learning
Machine learning work flow
Operations on feature vectors
Summary
Chapter 11: Learning Spark GraphX
Introduction to GraphX
Introduction to Property Graph
Getting started with the GraphX API
Graph operations
Graph algorithms
Summary

Book Details

ISBN 139781787126497
Paperback350 pages
Read More
From 3 reviews

Read More Reviews

Recommended for You

Building Data Streaming Applications with Apache Kafka Book Cover
Building Data Streaming Applications with Apache Kafka
$ 35.99
$ 25.20
Spring 5.0 Microservices - Second Edition Book Cover
Spring 5.0 Microservices - Second Edition
$ 39.99
$ 28.00
Apache Spark 2.x Cookbook Book Cover
Apache Spark 2.x Cookbook
$ 39.99
$ 28.00
Statistics for Machine Learning Book Cover
Statistics for Machine Learning
$ 39.99
$ 28.00
Learning Apache Cassandra - Second Edition Book Cover
Learning Apache Cassandra - Second Edition
$ 35.99
$ 25.20
Mastering Docker - Second Edition Book Cover
Mastering Docker - Second Edition
$ 39.99
$ 28.00