Apache Spark Graph Processing

Build, process and analyze large-scale graph data effectively with Spark
Preview in Mapt

Apache Spark Graph Processing

Rindra Ramamonjison

Build, process and analyze large-scale graph data effectively with Spark
Mapt Subscription
FREE
$29.99/m after trial
eBook
$19.60
RRP $27.99
Save 29%
Print + eBook
$34.99
RRP $34.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$19.60
$34.99
$29.99p/m after trial
RRP $27.99
RRP $34.99
Subscription
eBook
Print + eBook
Start 30 Day Trial

Frequently bought together


Apache Spark Graph Processing Book Cover
Apache Spark Graph Processing
$ 27.99
$ 19.60
Big Data Processing using Apache Spark [Video] Book Cover
Big Data Processing using Apache Spark [Video]
$ 124.99
$ 106.25
Buy 2 for $35.00
Save $117.98
Add to Cart
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 

Book Details

ISBN 139781784391805
Paperback148 pages

Book Description

Apache Spark is the next standard of open-source cluster-computing engine for processing big data. Many practical computing problems concern large graphs, like the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. Apache Spark GraphX API combines the advantages of both data-parallel and graph-parallel systems by efficiently expressing graph computation within the Spark data-parallel framework.

This book will teach the user to do graphical programming in Apache Spark, apart from an explanation of the entire process of graphical data analysis. You will journey through the creation of graphs, its uses, its exploration and analysis and finally will also cover the conversion of graph elements into graph structures.

This book begins with an introduction of the Spark system, its libraries and the Scala Build Tool. Using a hands-on approach, this book will quickly teach you how to install and leverage Spark interactively on the command line and in a standalone Scala program. Then, it presents all the methods for building Spark graphs using illustrative network datasets. Next, it will walk you through the process of exploring, visualizing and analyzing different network characteristics. This book will also teach you how to transform raw datasets into a usable form. In addition, you will learn powerful operations that can be used to transform graph elements and graph structures. Furthermore, this book also teaches how to create custom graph operations that are tailored for specific needs with efficiency in mind. The later chapters of this book cover more advanced topics such as clustering graphs, implementing graph-parallel iterative algorithms and learning methods from graph data.

Table of Contents

Chapter 1: Getting Started with Spark and GraphX
Downloading and installing Spark 1.4.1
Experimenting with the Spark shell
Getting started with GraphX
Summary
Chapter 2: Building and Exploring Graphs
Network datasets
Graph builders
Building graphs
Computing the degrees of the network nodes
Summary
Chapter 3: Graph Analysis and Visualization
Network datasets
The graph visualization
The analysis of network connectedness
The network centrality and PageRank
Scala Build Tool revisited
Summary
Chapter 4: Transforming and Shaping Up Graphs to Your Needs
Transforming the vertex and edge attributes
Modifying graph structures
Joining graph datasets
Data operations on VertexRDD and EdgeRDD
Summary
Chapter 5: Creating Custom Graph Aggregation Operators
NCAA College Basketball datasets
The aggregateMessages operator
Joining average stats into a graph
Performance optimization
The MapReduceTriplets operator
Summary
Chapter 6: Iterative Graph-Parallel Processing with Pregel
The Pregel computational model
The Pregel API in GraphX
Community detection through label propagation
The Pregel implementation of PageRank
Summary
Chapter 7: Learning Graph Structures
Community clustering in graphs
Applications – music fan community detection
Summary

What You Will Learn

  • Write, build and deploy Spark applications with the Scala Build Tool.
  • Build and analyze large-scale network datasets
  • Analyze and transform graphs using RDD and graph-specific operations
  • Implement new custom graph operations tailored to specific needs.
  • Develop iterative and efficient graph algorithms using message aggregation and Pregel abstraction
  • Extract subgraphs and use it to discover common clusters
  • Analyze graph data and solve various data science problems using real-world datasets.

Authors

Table of Contents

Chapter 1: Getting Started with Spark and GraphX
Downloading and installing Spark 1.4.1
Experimenting with the Spark shell
Getting started with GraphX
Summary
Chapter 2: Building and Exploring Graphs
Network datasets
Graph builders
Building graphs
Computing the degrees of the network nodes
Summary
Chapter 3: Graph Analysis and Visualization
Network datasets
The graph visualization
The analysis of network connectedness
The network centrality and PageRank
Scala Build Tool revisited
Summary
Chapter 4: Transforming and Shaping Up Graphs to Your Needs
Transforming the vertex and edge attributes
Modifying graph structures
Joining graph datasets
Data operations on VertexRDD and EdgeRDD
Summary
Chapter 5: Creating Custom Graph Aggregation Operators
NCAA College Basketball datasets
The aggregateMessages operator
Joining average stats into a graph
Performance optimization
The MapReduceTriplets operator
Summary
Chapter 6: Iterative Graph-Parallel Processing with Pregel
The Pregel computational model
The Pregel API in GraphX
Community detection through label propagation
The Pregel implementation of PageRank
Summary
Chapter 7: Learning Graph Structures
Community clustering in graphs
Applications – music fan community detection
Summary

Book Details

ISBN 139781784391805
Paperback148 pages
Read More

Read More Reviews

Recommended for You

Machine Learning with Spark Book Cover
Machine Learning with Spark
$ 29.99
$ 3.00
Spark Cookbook Book Cover
Spark Cookbook
$ 35.99
$ 25.20
Scala for Machine Learning Book Cover
Scala for Machine Learning
$ 35.99
$ 25.20
Practical Data Science Cookbook Book Cover
Practical Data Science Cookbook
$ 29.99
$ 21.00
Practical Data Analysis Book Cover
Practical Data Analysis
$ 29.99
$ 21.00
Python Machine Learning Book Cover
Python Machine Learning
$ 35.99
$ 25.20