Reader small image

You're reading from  Apache Spark Quick Start Guide

Product typeBook
Published inJan 2019
Reading LevelIntermediate
PublisherPackt
ISBN-139781789349108
Edition1st Edition
Languages
Right arrow
Authors (2):
Shrey Mehrotra
Shrey Mehrotra
author image
Shrey Mehrotra

Shrey Mehrotra has over 8 years of IT experience and, for the past 6 years, has been designing the architecture of cloud and big-data solutions for the finance, media, and governance sectors. Having worked on research and development with big-data labs and been part of Risk Technologies, he has gained insights into Hadoop, with a focus on Spark, HBase, and Hive. His technical strengths also include Elasticsearch, Kafka, Java, YARN, Sqoop, and Flume. He likes spending time performing research and development on different big-data technologies. He is the coauthor of the books Learning YARN and Hive Cookbook, a certified Hadoop developer, and he has also written various technical papers.
Read more about Shrey Mehrotra

Akash Grade
Akash Grade
author image
Akash Grade

Akash Grade is a data engineer living in New Delhi, India. Akash graduated with a BSc in computer science from the University of Delhi in 2011, and later earned an MSc in software engineering from BITS Pilani. He spends most of his time designing highly scalable data pipeline using big-data solutions such as Apache Spark, Hive, and Kafka. Akash is also a Databricks-certified Spark developer. He has been working on Apache Spark for the last five years, and enjoys writing applications in Python, Go, and SQL.
Read more about Akash Grade

View More author details
Right arrow

What this book covers

Chapter 1, Introduction to Apache Spark, provides an introduction to Spark 2.0. It provides a brief description of different Spark components, including Spark Core, Spark SQL, Spark Streaming, machine learning, and graph processing. It also discusses the advantages of Spark compared to other similar frameworks.

Chapter 2, Apache Spark Installation, provides a step-by-step guide to installing Spark on an AWS EC2 instance from scratch. It also helps you install all the prerequisites, such as Python, Java, and Scala.

Chapter 3, Spark RDD, explains Resilient Distributed Datasets (RDD) APIs, which are the heart of Apache Spark. It also discusses various transformations and actions that can be applied on an RDD.

Chapter 4, Spark DataFrame and Dataset, covers Spark's structured APIs: DataFrame and Dataset. This chapter also covers various operations that can be performed on a DataFrame or Dataset.

Chapter 5, Spark Architecture and Application Execution Flow, explains the interaction between different services involved in Spark application execution. It explains the role of worker nodes, executors, and drivers in application execution in both client and cluster mode. It also explains how Spark creates a Directed Acyclic Graph (DAG) that consists of stages and tasks.

Chapter 6, Spark SQL, discusses how Spark gracefully supports all SQL operations by providing a Spark-SQL interface and various DataFrame APIs. It also covers the seamless integration of Spark with the Hive metastore.

Chapter 7, Spark Streaming, Machine Learning, and Graph Analysis, explores different Spark APIs for working with real-time data streams, machine learning, and graphs. It explains the candidature of features based on the use case requirements.

Chapter 8, Spark Optimizations, covers different optimization techniques to improve the performance of your Spark applications. It explains how you can use resources such as executors and memory in order to better parallelize your tasks.

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Apache Spark Quick Start Guide
Published in: Jan 2019Publisher: PacktISBN-13: 9781789349108

Authors (2)

author image
Shrey Mehrotra

Shrey Mehrotra has over 8 years of IT experience and, for the past 6 years, has been designing the architecture of cloud and big-data solutions for the finance, media, and governance sectors. Having worked on research and development with big-data labs and been part of Risk Technologies, he has gained insights into Hadoop, with a focus on Spark, HBase, and Hive. His technical strengths also include Elasticsearch, Kafka, Java, YARN, Sqoop, and Flume. He likes spending time performing research and development on different big-data technologies. He is the coauthor of the books Learning YARN and Hive Cookbook, a certified Hadoop developer, and he has also written various technical papers.
Read more about Shrey Mehrotra

author image
Akash Grade

Akash Grade is a data engineer living in New Delhi, India. Akash graduated with a BSc in computer science from the University of Delhi in 2011, and later earned an MSc in software engineering from BITS Pilani. He spends most of his time designing highly scalable data pipeline using big-data solutions such as Apache Spark, Hive, and Kafka. Akash is also a Databricks-certified Spark developer. He has been working on Apache Spark for the last five years, and enjoys writing applications in Python, Go, and SQL.
Read more about Akash Grade