Spark for Python Developers

A concise guide to implementing Spark Big Data analytics for Python developers, and building a real-time and insightful trend tracker data intensive app
Preview in Mapt

Spark for Python Developers

Amit Nandi

1 customer reviews
A concise guide to implementing Spark Big Data analytics for Python developers, and building a real-time and insightful trend tracker data intensive app
Mapt Subscription
FREE
$29.99/m after trial
eBook
$22.40
RRP $31.99
Save 29%
Print + eBook
$39.99
RRP $39.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$22.40
$39.99
$29.99p/m after trial
RRP $31.99
RRP $39.99
Subscription
eBook
Print + eBook
Start 30 Day Trial

Frequently bought together


Spark for Python Developers Book Cover
Spark for Python Developers
$ 31.99
$ 22.40
From 0 to 1 : Spark for Data Science with Python [Video] Book Cover
From 0 to 1 : Spark for Data Science with Python [Video]
$ 32.99
$ 28.05
Buy 2 for $35.00
Save $29.98
Add to Cart
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 

Book Details

ISBN 139781784399696
Paperback206 pages

Book Description

Looking for a cluster computing system that provides high-level APIs? Apache Spark is your answer—an open source, fast, and general purpose cluster computing system. Spark's multi-stage memory primitives provide performance up to 100 times faster than Hadoop, and it is also well-suited for machine learning algorithms.

Are you a Python developer inclined to work with Spark engine? If so, this book will be your companion as you create data-intensive app using Spark as a processing engine, Python visualization libraries, and web frameworks such as Flask.

To begin with, you will learn the most effective way to install the Python development environment powered by Spark, Blaze, and Bookeh. You will then find out how to connect with data stores such as MySQL, MongoDB, Cassandra, and Hadoop.

You’ll expand your skills throughout, getting familiarized with the various data sources (Github, Twitter, Meetup, and Blogs), their data structures, and solutions to effectively tackle complexities. You’ll explore datasets using iPython Notebook and will discover how to optimize the data models and pipeline. Finally, you’ll get to know how to create training datasets and train the machine learning models.

By the end of the book, you will have created a real-time and insightful trend tracker data-intensive app with Spark.

Table of Contents

Chapter 1: Setting Up a Spark Virtual Environment
Understanding the architecture of data-intensive applications
Understanding Spark
Understanding Anaconda
Setting up the Spark powered environment
Building our first app with PySpark
Virtualizing the environment with Vagrant
Moving to the cloud
Summary
Chapter 2: Building Batch and Streaming Apps with Spark
Architecting data-intensive apps
Connecting to social networks
Analyzing the data
Exploring the GitHub world
Previewing our app
Summary
Chapter 3: Juggling Data with Spark
Revisiting the data-intensive app architecture
Serializing and deserializing data
Harvesting and storing data
Exploring data using Blaze
Exploring data using Spark SQL
Summary
Chapter 4: Learning from Data Using Spark
Contextualizing Spark MLlib in the app architecture
Classifying Spark MLlib algorithms
Spark MLlib data types
Machine learning workflows and data flows
Clustering the Twitter dataset
Building machine learning pipelines
Summary
Chapter 5: Streaming Live Data with Spark
Laying the foundations of streaming architecture
Processing live data with TCP sockets
Manipulating Twitter data in real time
Building a reliable and scalable streaming app
Closing remarks on the Lambda and Kappa architecture
Summary
Chapter 6: Visualizing Insights and Trends
Revisiting the data-intensive apps architecture
Preprocessing the data for visualization
Gauging words, moods, and memes at a glance
Geo-locating tweets and mapping meetups
Summary

What You Will Learn

  • Create a Python development environment powered by Spark (PySpark), Blaze, and Bookeh
  • Build a real-time trend tracker data intensive app
  • Visualize the trends and insights gained from data using Bookeh
  • Generate insights from data using machine learning through Spark MLLIB
  • Juggle with data using Blaze
  • Create training data sets and train the Machine Learning models
  • Test the machine learning models on test datasets
  • Deploy the machine learning algorithms and models and scale it for real-time events

Authors

Table of Contents

Chapter 1: Setting Up a Spark Virtual Environment
Understanding the architecture of data-intensive applications
Understanding Spark
Understanding Anaconda
Setting up the Spark powered environment
Building our first app with PySpark
Virtualizing the environment with Vagrant
Moving to the cloud
Summary
Chapter 2: Building Batch and Streaming Apps with Spark
Architecting data-intensive apps
Connecting to social networks
Analyzing the data
Exploring the GitHub world
Previewing our app
Summary
Chapter 3: Juggling Data with Spark
Revisiting the data-intensive app architecture
Serializing and deserializing data
Harvesting and storing data
Exploring data using Blaze
Exploring data using Spark SQL
Summary
Chapter 4: Learning from Data Using Spark
Contextualizing Spark MLlib in the app architecture
Classifying Spark MLlib algorithms
Spark MLlib data types
Machine learning workflows and data flows
Clustering the Twitter dataset
Building machine learning pipelines
Summary
Chapter 5: Streaming Live Data with Spark
Laying the foundations of streaming architecture
Processing live data with TCP sockets
Manipulating Twitter data in real time
Building a reliable and scalable streaming app
Closing remarks on the Lambda and Kappa architecture
Summary
Chapter 6: Visualizing Insights and Trends
Revisiting the data-intensive apps architecture
Preprocessing the data for visualization
Gauging words, moods, and memes at a glance
Geo-locating tweets and mapping meetups
Summary

Book Details

ISBN 139781784399696
Paperback206 pages
Read More
From 1 reviews

Read More Reviews

Recommended for You

Python Machine Learning Book Cover
Python Machine Learning
$ 35.99
$ 25.20
Practical Data Science Cookbook Book Cover
Practical Data Science Cookbook
$ 29.99
$ 21.00
Machine Learning with Spark Book Cover
Machine Learning with Spark
$ 29.99
$ 3.00
Building Machine Learning Systems with Python Book Cover
Building Machine Learning Systems with Python
$ 29.99
$ 6.00
Python Data Analysis Book Cover
Python Data Analysis
$ 29.99
$ 21.00
Spark Cookbook Book Cover
Spark Cookbook
$ 35.99
$ 25.20