Data Science with Spark [Video]

Data Science with Spark [Video]

This video is included in a Mapt subscription
Eric Charles

Get started with Spark for data science using this unique video tutorial
$10.00
RRP $124.99
Preview in Mapt

Video Details

ISBN 139781786467935
Course Length3 hours 20 minutes

Video Description

The real power and value proposition of Apache Spark is its speed and platform to execute Data Science tasks. Spark's unique use case is that it combines ETL, batch analytic, real-time stream analysis, machine learning, graph processing, and visualizations to allow Data Scientists to tackle the complexities that come with raw unstructured data sets. Spark embraces this approach and has the vision to make the transition from working on a single machine to working on a cluster, something that makes data science tasks a lot more agile.

In this course, you’ll get a hands-on technical resource that will enable you to become comfortable and confident working with Spark for Data Science. We won't just explore Spark’s Data Science libraries, we’ll dive deeper and expand on the topics.

This course starts by taking you through Spark and the needed steps to build machine learning applications. You will learn to collect, clean, and visualize data coming from Twitter with Spark streaming. Then, you will get acquainted with Spark Machine learning algorithms and different machine learning techniques. You will also learn to apply statistical analysis and mining operations on our Tweet dataset. Finally, the course will end by giving you some ideas on how to perform awesome analysis including graph processing. By the end of the course, you will be able to do your Data scientist job in a very visual way, comprehensive and appealing for business and other stakeholders.

Style and Approach

This practical hands-on tutorial covers the fundamentals of Spark needed to get grips with Data Science through a single data set. It expands on the next learning curve for those comfortable with Spark programming who are looking to apply Spark in the field of Data Science.

Table of Contents

Your Spark and Visualization Toolkit
The Course Overview
Spark: Origins and Ecosystem for Big Data Scientists, the Scala, Python, and R flavors
Install Spark on Your Laptop with Docker, or Scale Fast in the Cloud
Apache Zeppelin, a Web-Based Notebook for Spark with matplotlib and ggplot2
First Steps with Spark Visualization
Manipulating Data with the Core RDD API
Using Dataframe, Dataset, and SQL – Natural and Easy!
Manipulating Rows and Columns
Dealing with File Format
Visualizing More – ggplot2, matplotlib, and Angular.js at the Rescue
The Spark Machine Learning Algorithms
Discovering spark.ml and spark.mllib - and Other Libraries
Wrapping Up Basic Statistics and Linear Algebra
Cleansing Data and Engineering the Features
Reducing the Dimensionality
Pipeline for a Life
Collecting and Cleansing the Dirty Tweets
Streaming Tweets to Disk
Streaming Tweets on a Map
Cleansing and Building Your Reference Dataset
Querying and Visualizing Tweets with SQL
Statistical Analysis on Tweets
Indicators, Correlations, and Sampling
Validating Statistical Relevance
Running SVD and PCA
Extending the Basic Statistics for Your Needs
Extracting Features from the Tweets
Analyzing Free Text from the Tweets
Dealing with Stemming, Syntax, Idioms and Hashtags
Detecting Tweet Sentiment
Identifying Topics with LDA
Mine Data and Share Results
Word Cloudify Your Dataset
Locating Users and Displaying Heatmaps with GeoHash
Collaborating on the Same Note with Peers
Create Visual Dashboards for Your Business Stakeholders
Classifying the Tweets
Building the Training and Test Datasets
Training a Logistic Regression Model
Evaluating Your Classifier
Selecting Your Model
Clustering Users
Clustering Users by Followers and Friends
Clustering Users by Location
Running KMeans on a Stream
Your Next Data Challenges
Recommending Similar Users
Analyzing Mentions with GraphX
Where to Go from Here

What You Will Learn

  • Understand the Spark programming language and its ecosystem of packages in Data Science
  • Obtain and clean data before processing it
  • Understand the Spark machine learning algorithm to build a simple pipeline
  • Work with interactive visualization packages in Spark
  • Apply data mining techniques on the available data sets
  • Build a recommendation engine

Authors

Table of Contents

Your Spark and Visualization Toolkit
The Course Overview
Spark: Origins and Ecosystem for Big Data Scientists, the Scala, Python, and R flavors
Install Spark on Your Laptop with Docker, or Scale Fast in the Cloud
Apache Zeppelin, a Web-Based Notebook for Spark with matplotlib and ggplot2
First Steps with Spark Visualization
Manipulating Data with the Core RDD API
Using Dataframe, Dataset, and SQL – Natural and Easy!
Manipulating Rows and Columns
Dealing with File Format
Visualizing More – ggplot2, matplotlib, and Angular.js at the Rescue
The Spark Machine Learning Algorithms
Discovering spark.ml and spark.mllib - and Other Libraries
Wrapping Up Basic Statistics and Linear Algebra
Cleansing Data and Engineering the Features
Reducing the Dimensionality
Pipeline for a Life
Collecting and Cleansing the Dirty Tweets
Streaming Tweets to Disk
Streaming Tweets on a Map
Cleansing and Building Your Reference Dataset
Querying and Visualizing Tweets with SQL
Statistical Analysis on Tweets
Indicators, Correlations, and Sampling
Validating Statistical Relevance
Running SVD and PCA
Extending the Basic Statistics for Your Needs
Extracting Features from the Tweets
Analyzing Free Text from the Tweets
Dealing with Stemming, Syntax, Idioms and Hashtags
Detecting Tweet Sentiment
Identifying Topics with LDA
Mine Data and Share Results
Word Cloudify Your Dataset
Locating Users and Displaying Heatmaps with GeoHash
Collaborating on the Same Note with Peers
Create Visual Dashboards for Your Business Stakeholders
Classifying the Tweets
Building the Training and Test Datasets
Training a Logistic Regression Model
Evaluating Your Classifier
Selecting Your Model
Clustering Users
Clustering Users by Followers and Friends
Clustering Users by Location
Running KMeans on a Stream
Your Next Data Challenges
Recommending Similar Users
Analyzing Mentions with GraphX
Where to Go from Here

Video Details

ISBN 139781786467935
Course Length3 hours 20 minutes
Read More

Read More Reviews

Recommended for You

Swift: Mastering the Core Concepts [Integrated Course] Book Cover
Swift: Mastering the Core Concepts [Integrated Course]
$ 10.00
Developing your First Canvas [Video] Book Cover
Developing your First Canvas [Video]
$ 10.00
Universal JavaScript with React, Node, and Redux [Video] Book Cover
Universal JavaScript with React, Node, and Redux [Video]
$ 10.00
VCP6-DCV(6.5) Examination Preparation Guide [Video] Book Cover
VCP6-DCV(6.5) Examination Preparation Guide [Video]
$ 10.00