Hands-On Big Data Analytics with PySpark

Use PySpark to easily crush messy data at-scale and discover proven techniques to create testable, immutable, and easily parallelizable Spark jobs

Hands-On Big Data Analytics with PySpark

Rudy Lai, Bartłomiej Potaczek
New Release!

Use PySpark to easily crush messy data at-scale and discover proven techniques to create testable, immutable, and easily parallelizable Spark jobs
Packt Subscription
FREE
$9.99/m after trial
eBook
$8.00
RRP $19.99
Save 59%
Print + eBook
$23.99
RRP $23.99
What do I get with a Packt subscription?
  • Exclusive monthly discount - no contract
  • Unlimited access to entire Packt library of 6500+ eBooks and Videos
  • 120 new titles added every month, on new and emerging tech
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the subscription reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the subscription reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the subscription reader
$0.00
$8.00
$23.99
$9.99 p/m after trial
RRP $19.99
RRP $23.99
Subscription
eBook
Print + eBook
Start a FREE 10-day trial

Frequently bought together


Hands-On Big Data Analytics with PySpark Book Cover
Hands-On Big Data Analytics with PySpark
$ 19.99
$ 8.00
Hands-On PySpark for Big Data Analysis [Video] Book Cover
Hands-On PySpark for Big Data Analysis [Video]
$ 124.99
$ 50.00
Buy 2 for $58.00
Save $86.98
Add to Cart

Book Details

ISBN 139781838644130
Paperback182 pages

Book Description

Apache Spark is an open source parallel-processing framework that has been around for quite some time now. One of the many uses of Apache Spark is for data analytics applications across clustered computers. In this book, you will not only learn how to use Spark and the Python API to create high-performance analytics with big data, but also discover techniques for testing, immunizing, and parallelizing Spark jobs.

You will learn how to source data from all popular data hosting platforms, including HDFS, Hive, JSON, and S3, and deal with large datasets with PySpark to gain practical big data experience. This book will help you work on prototypes on local machines and subsequently go on to handle messy data in production and at scale. This book covers installing and setting up PySpark, RDD operations, big data cleaning and wrangling, and aggregating and summarizing data into useful reports. You will also learn how to implement some practical and proven techniques to improve certain aspects of programming and administration in Apache Spark.

By the end of the book, you will be able to build big data analytical solutions using the various PySpark offerings and also optimize them effectively.

Table of Contents

Chapter 2: Getting Your Big Data into the Spark Environment Using RDDs

What You Will Learn

  • Get practical big data experience while working on messy datasets
  • Analyze patterns with Spark SQL to improve your business intelligence
  • Use PySpark's interactive shell to speed up development time
  • Create highly concurrent Spark programs by leveraging immutability
  • Discover ways to avoid the most expensive operation in the Spark API: the shuffle operation
  • Re-design your jobs to use reduceByKey instead of groupBy
  • Create robust processing pipelines by testing Apache Spark jobs

Authors

Table of Contents

Chapter 2: Getting Your Big Data into the Spark Environment Using RDDs

Book Details

ISBN 139781838644130
Paperback182 pages
Read More

Read More Reviews

Recommended for You

Hands-On PySpark for Big Data Analysis [Video] Book Cover
Hands-On PySpark for Big Data Analysis [Video]
$ 124.99
$ 50.00
Hands-On Data Analytics with R [Video] Book Cover
Hands-On Data Analytics with R [Video]
$ 124.99
$ 50.00
Hands-On Data Analysis with Scala Book Cover
Hands-On Data Analysis with Scala
$ 35.99
$ 14.40
Apache Spark with Python - Big Data with PySpark and Spark [Video] Book Cover
Apache Spark with Python - Big Data with PySpark and Spark [Video]
$ 149.99
$ 60.00
Hands-On Data Science with Java [Video] Book Cover
Hands-On Data Science with Java [Video]
$ 124.99
$ 50.00
Hands-On Big Data Processing with Hadoop 3 [Video] Book Cover
Hands-On Big Data Processing with Hadoop 3 [Video]
$ 124.99
$ 50.00