![PySpark and AWS: Master Big Data with PySpark and AWS [Video]](https://content.packt.com/V18093/cover_image_small.jpeg)
PySpark and AWS: Master Big Data with PySpark and AWS [Video]
Subscription
FREE
Video + Subscription
$15.99
Video
$134.99
What do you get with a Packt Subscription?
What do you get with a Packt Subscription?
What do you get with Video + Subscription?
What do you get with a Packt Subscription?
What do you get with eBook?
What do I get with Print?
What do you get with video?
What do you get with Audiobook?
Subscription
FREE
Video + Subscription
$15.99
Video
$134.99
What do you get with a Packt Subscription?
What do you get with a Packt Subscription?
What do you get with Video + Subscription?
What do you get with a Packt Subscription?
What do you get with eBook?
What do I get with Print?
What do you get with video?
What do you get with Audiobook?
-
Free ChapterIntroduction
-
Introduction to Hadoop, Spark Ecosystems and Architectures
-
Spark RDDs
- Spark RDDs
- Creating Spark RDD
- Running Spark Code Locally
- RDD Map (Lambda)
- RDD Map (Simple Function)
- Quiz (Map)
- Solution 1 (Map)
- Solution 2 (Map)
- RDD FlatMap
- RDD Filter
- Quiz (Filter)
- Solution (Filter)
- RDD Distinct
- RDD GroupByKey
- RDD ReduceByKey
- Quiz (Word Count)
- Solution (Word Count)
- RDD (Count and CountByValue)
- RDD (saveAsTextFile)
- RDD (Partition)
- Finding Average-1
- Finding Average-2
- Quiz (Average)
- Solution (Average)
- Finding Min and Max
- Quiz (Min and Max)
- Solution (Min and Max)
- Project Overview
- Total Students
- Total Marks by Male and Female Student
- Total Passed and Failed Students
- Total Enrollments per Course
- Total Marks per Course
- Average Marks per Course
- Finding Minimum and Maximum Marks
- Average Age of Male and Female Students
-
Spark DFs
- Introduction to Spark DFs
- Creating Spark DFs
- Spark Infer Schema
- Spark Provide Schema
- Create DF from RDD
- Rectifying the Error
- Select DF Columns
- Spark DF with Column
- Spark DF with Column Renamed and Alias
- Spark DF Filter Rows
- Quiz (Select, Withcolumn, Filter)
- Solution (Select, Withcolumn, Filter)
- Spark DF (Count, Distinct, Duplicate)
- Quiz (Distinct, Duplicate)
- Solution (Distinct, Duplicate)
- Spark DF (Sort, OrderBy)
- Quiz (Sort, OrderBy)
- Solution (Sort, OrderBy)
- Spark DF (Group By)
- Spark DF (Group By - Multiple Columns and Aggregations)
- Spark DF (Group By -Visualization)
- Spark DF (Group By - Filtering)
- Quiz (Group By)
- Solution (Group By)
- Quiz (Word Count)
- Solution (Word Count)
- Spark DF (UDFs)
- Quiz (UDFs)
- Solution (UDFs)
- Solution (Cache and Persist)
- Spark DF (DF to RDD)
- Spark DF (Spark SQL)
- Spark DF (Write DF)
- Project Overview
- Project (Count and Select)
- Project (Group By)
- Project (Group By, Aggregations and Order By)
- Project (Filtering)
- Project (UDF and WithColumn)
- Project (Write)
-
Collaborative Filtering
-
Spark Streaming
-
ETL Pipeline
-
Project - Change Data Capture / Replication Ongoing
- Introduction to Project
- Project Architecture
- Creating RDS MySQL Instance
- Creating S3 Bucket
- Creating DMS Source Endpoint
- Creating DMS Destination Endpoint
- Creating DMS Instance
- MySQL WorkBench
- Connecting with RDS and Dumping Data
- Querying RDS
- DMS Full Load
- DMS Replication Ongoing
- Stopping Instances
- Glue Job (Full Load)
- Glue Job (Change Capture)
- Glue Job (CDC)
- Creating Lambda Function and Adding Trigger
- Checking Trigger
- Getting S3 File Name in Lambda
- Creating Glue Job
- Adding Invoke for Glue Job
- Testing Invoke
- Writing Glue Shell Job
- Full Load Pipeline
- Change Data Capture Pipeline
About this video
The hottest buzzwords in the Big Data analytics industry are Python and Apache Spark. PySpark supports the collaboration of Python and Apache Spark. In this course, you’ll start right from the basics and proceed to the advanced levels of data analysis. From cleaning data to building features and implementing machine learning (ML) models, you’ll learn how to execute end-to-end workflows using PySpark.
Right through the course, you’ll be using PySpark to perform data analysis. You’ll explore Spark RDDs, Dataframes, and a bit of Spark SQL queries. Also, you’ll explore the transformations and actions that can be performed on the data using Spark RDDs and Dataframes. You’ll also explore the ecosystem of Spark and Hadoop and their underlying architecture. You’ll use the Databricks environment to run the Spark scripts and explore it as well.
Finally, you’ll have a taste of Spark with AWS cloud. You’ll see how we can leverage AWS storages, databases, computations, and how Spark can communicate with different AWS services and get its required data.
By the end of this course, you’ll be able to understand and implement the concepts of PySpark and AWS to solve real-world problems.
The code bundles are available here: https://github.com/PacktPublishing/PySpark-and-AWS-Master-Big-Data-with-PySpark-and-AWS
- Publication date:
- September 2021
- Publisher
- Packt
- Duration
- 16 hours 10 minutes
- ISBN
- 9781803236698