Spark for Python Developers

A concise guide to implementing Spark Big Data analytics for Python developers, and building a real-time and insightful trend tracker data intensive app

Spark for Python Developers

Learning
Amit Nandi

4 customer reviews
A concise guide to implementing Spark Big Data analytics for Python developers, and building a real-time and insightful trend tracker data intensive app
$31.99
$39.99
RRP $31.99
RRP $39.99
eBook
Print + eBook

Instantly access this course right now and get the skills you need in 2017

With unlimited access to a constantly growing library of over 4,000 eBooks and Videos, a subscription to Mapt gives you everything you need to learn new skills. Cancel anytime.

Free Sample

Book Details

ISBN 139781784399696
Paperback206 pages

Book Description

Looking for a cluster computing system that provides high-level APIs? Apache Spark is your answer—an open source, fast, and general purpose cluster computing system. Spark's multi-stage memory primitives provide performance up to 100 times faster than Hadoop, and it is also well-suited for machine learning algorithms.

Are you a Python developer inclined to work with Spark engine? If so, this book will be your companion as you create data-intensive app using Spark as a processing engine, Python visualization libraries, and web frameworks such as Flask.

To begin with, you will learn the most effective way to install the Python development environment powered by Spark, Blaze, and Bookeh. You will then find out how to connect with data stores such as MySQL, MongoDB, Cassandra, and Hadoop.

You’ll expand your skills throughout, getting familiarized with the various data sources (Github, Twitter, Meetup, and Blogs), their data structures, and solutions to effectively tackle complexities. You’ll explore datasets using iPython Notebook and will discover how to optimize the data models and pipeline. Finally, you’ll get to know how to create training datasets and train the machine learning models.

By the end of the book, you will have created a real-time and insightful trend tracker data-intensive app with Spark.

Table of Contents

Chapter 1: Setting Up a Spark Virtual Environment
Understanding the architecture of data-intensive applications
Understanding Spark
Understanding Anaconda
Setting up the Spark powered environment
Building our first app with PySpark
Virtualizing the environment with Vagrant
Moving to the cloud
Summary
Chapter 2: Building Batch and Streaming Apps with Spark
Architecting data-intensive apps
Connecting to social networks
Analyzing the data
Exploring the GitHub world
Previewing our app
Summary
Chapter 3: Juggling Data with Spark
Revisiting the data-intensive app architecture
Serializing and deserializing data
Harvesting and storing data
Exploring data using Blaze
Exploring data using Spark SQL
Summary
Chapter 4: Learning from Data Using Spark
Contextualizing Spark MLlib in the app architecture
Classifying Spark MLlib algorithms
Spark MLlib data types
Machine learning workflows and data flows
Clustering the Twitter dataset
Building machine learning pipelines
Summary
Chapter 5: Streaming Live Data with Spark
Laying the foundations of streaming architecture
Processing live data with TCP sockets
Manipulating Twitter data in real time
Building a reliable and scalable streaming app
Closing remarks on the Lambda and Kappa architecture
Summary
Chapter 6: Visualizing Insights and Trends
Revisiting the data-intensive apps architecture
Preprocessing the data for visualization
Gauging words, moods, and memes at a glance
Geo-locating tweets and mapping meetups
Summary

What You Will Learn

  • Create a Python development environment powered by Spark (PySpark), Blaze, and Bookeh
  • Build a real-time trend tracker data intensive app
  • Visualize the trends and insights gained from data using Bookeh
  • Generate insights from data using machine learning through Spark MLLIB
  • Juggle with data using Blaze
  • Create training data sets and train the Machine Learning models
  • Test the machine learning models on test datasets
  • Deploy the machine learning algorithms and models and scale it for real-time events

Authors

Table of Contents

Chapter 1: Setting Up a Spark Virtual Environment
Understanding the architecture of data-intensive applications
Understanding Spark
Understanding Anaconda
Setting up the Spark powered environment
Building our first app with PySpark
Virtualizing the environment with Vagrant
Moving to the cloud
Summary
Chapter 2: Building Batch and Streaming Apps with Spark
Architecting data-intensive apps
Connecting to social networks
Analyzing the data
Exploring the GitHub world
Previewing our app
Summary
Chapter 3: Juggling Data with Spark
Revisiting the data-intensive app architecture
Serializing and deserializing data
Harvesting and storing data
Exploring data using Blaze
Exploring data using Spark SQL
Summary
Chapter 4: Learning from Data Using Spark
Contextualizing Spark MLlib in the app architecture
Classifying Spark MLlib algorithms
Spark MLlib data types
Machine learning workflows and data flows
Clustering the Twitter dataset
Building machine learning pipelines
Summary
Chapter 5: Streaming Live Data with Spark
Laying the foundations of streaming architecture
Processing live data with TCP sockets
Manipulating Twitter data in real time
Building a reliable and scalable streaming app
Closing remarks on the Lambda and Kappa architecture
Summary
Chapter 6: Visualizing Insights and Trends
Revisiting the data-intensive apps architecture
Preprocessing the data for visualization
Gauging words, moods, and memes at a glance
Geo-locating tweets and mapping meetups
Summary

Book Details

ISBN 139781784399696
Paperback206 pages
Read More
From 4 reviews

Read More Reviews