The Ultimate Hands-on Hadoop [Video]

The Ultimate Hands-on Hadoop [Video]

This video is included in a Mapt subscription
Frank Kane

1 customer reviews
Tame Your Big Data
$0.00
$54.00
$29.99p/m after trial
RRP $179.99
Subscription
Video
Start 30 Day Trial
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 
Code Files
Preview in Mapt

Video Details

ISBN 139781788478489
Course Length14 hours and 31 minutes

Video Description

The world of Hadoop and "Big Data" can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. With this course, you'll not only understand what those systems are and how they fit together - but you'll go hands-on and learn how to use them to solve real business problems!This course is comprehensive, covering over 25 different technologies in over 14 hours of video lectures. It's filled with hands-on activities and exercises, so you get some real experience in using Hadoop - it's not just theory.You'll find a range of activities in this course for people at every level. If you're a project manager who just wants to learn the buzzwords, there are web UI's for many of the activities in the course that require no programming knowledge. If you're comfortable with command lines, we'll show you how to work with them too. And if you're a programmer, I'll challenge you with writing real scripts on a Hadoop system using Scala, Pig Latin, and Python.

Style and Approach

This course is comprehensive, covering over 25 different technologies in over 14 hours of video lectures. It's filled with hands-on activities and exercises, so you get some real experience in using Hadoop - it's not just theory.

You'll walk away from this course with a real, deep understanding of Hadoop and its associated distributed systems, and you can apply Hadoop to real-world problems. Plus a valuable completion certificate is waiting for you at the end!

Table of Contents

Learn all the buzzwords! And install Hadoop
[Activity] Introduction, and install Hadoop on your desktop!
Hadoop Overview and History
Overview of Hadoop Ecosystem
Tips for Using This Course
Using Hadoop's Core: HDFs and MapReduce
HDFS: What it is, and how it works
[Activity] Install the MovieLens dataset into HDFS using the Ambari UI
[Activity] Install the MovieLens dataset into HDFS using the command line
MapReduce: What it is, and how it works
How MapReduce distributes processing
MapReduce example: Break down movie ratings by rating score
[Activity] Installing Python, MRJob, and nano
[Activity] Code up the ratings histogram MapReduce job and run it
[Exercise] Rank Movies by their popularity
[Activity] Check your results against mine!
Programming Hadoop with Pig
Introducing Ambari
Introducing Pig
Example: Find the oldest movie with 5-star rating using Pig
[Activity] Find old 5-star movies with Pig
More Pig Latin
[Exercise] Find the most-rated one-star movie
Pig Challenge: Compare Your Results to Mine!
Programming Hadoop with Spark
Why Spark?
The Resilient Distributed Datasets(RDD)
[Activity] Find the movie with the lowest average rating - with RDD's
Datasets and Spark 2.0
[Activity] Find the movie with the lowest average rating - with DataFrames
[Activity] Movie recommendations with MLLib
[Exercise] Filter the lowest-rated movies by number of ratings
[Activity] Check your results against mine!
Using relational data stores with Hadoop
What is Hive?
[Activity] Use Hive to find the most popular movie
How Hive Works?
[Exercise] Use Hive to find the movie with the highest average rating
Compare your solution to mine
Integrating MySQL with Hadoop
[Activity] Install MySQL and import our movie data
[Activity] Use Sqoop to import data from MySQL to HFDS/Hive
[Activity] Use Sqoop to export data from Hadoop to MySQL
Using non-relational data stores with Hadoop
Why NoSQL?
What is HBase
[Activity] Import movie ratings into HBase
[Activity] Use HBase with Pig to import data at scale
Cassandra Overview
[Activity] Installing Cassandra
[Activity] Write Spark output into Cassandra
MongoDB overview
[Activity] Install MongoDB, and integrate Spark with MongoDB
[Activity] Using the MongoDB shell
Choosing a database technology
[Exercise] Choose a database for a given problem
Querying Your Data Interactively
Overview of Drill
[Activity] Setting up Drill
[Activity] Querying across multiple databases with Drill
Overview of Phoenix
[Activity] Install Phoenix and query HBase with it
[Activity] Integrate Phoenix with Pig
Overview of Presto
[Activity] Install Presto, and query Hive with it
[Activity] Query both Cassandra and Hive using Presto
Managing your Cluster
YARN Explained
Tez explained
[Activity] Use Hive on Tez and measure the performance benefit
Mesos explained
ZooKeeper explained
[Activity] Simulating a failing master with ZooKeeper
Oozie explained
[Activity] Set up a simple Oozie workflow
Zeppelin overview
[Activity] Use Zeppelin to analyze movie ratings, part 1
[Activity] Use Zeppelin to analyze movie ratings, part 2
Hue Overview
Other technologies worth mentioning
Feeding Data to your Cluster
Kafka explained
[Activity] Setting up Kafka, and publishing some data
[Activity] Publishing web logs with Kafka
Flume explained
[Activity] Set up Flume and publish logs with it
[Activity] Set up Flume to monitor a directory and store its data in HDFS
Analysing Streams of Data
Spark Streaming: Introduction
[Activity] Analyze web logs published with Flume using Spark streaming
[Exercise] Monitor Flume-published logs for errors in real time
Exercise solution: Aggregating HTTP access codes with Spark Streaming
Apache Storm: Introduction
[Activity] Count words with Storm
Flink: An Overview
[Activity] Counting words with Flink
Designing Real-World Systems
The Best of the Rest
Review: How the pieces fit together
Understanding your requirements
Sample Application: consume web server logs and keep tracks of top-sellers
Sample application: serving movie recommendations to a website
[Exercise] Design a system to report web sessions per day
Exercise solution: Design a system to count daily sessions
Learning More
Books and online resources
Bonus lecture: Discounts on my other big data / data science courses!

What You Will Learn

  • Design distributed systems that manage "big data" using Hadoop and related technologies.
  • Use HDFS and MapReduce for storing and analyzing data at scale.
  • Use Pig and Spark to create scripts to process data on a Hadoop cluster in more complex ways.
  • Analyze relational data using Hive and MySQL
  • Analyze non-relational data using HBase, Cassandra, and MongoDB
  • Query data interactively with Drill, Phoenix, and Presto
  • Choose an appropriate data storage technology for your application
  • Understand how Hadoop clusters are managed by YARN, Tez, Mesos, Zookeeper, Zeppelin, Hue, and Oozie.
  • Publish data to your Hadoop cluster using Kafka, Sqoop, and Flume
  • Consume streaming data using Spark Streaming, Flink, and Storm

Authors

Table of Contents

Learn all the buzzwords! And install Hadoop
[Activity] Introduction, and install Hadoop on your desktop!
Hadoop Overview and History
Overview of Hadoop Ecosystem
Tips for Using This Course
Using Hadoop's Core: HDFs and MapReduce
HDFS: What it is, and how it works
[Activity] Install the MovieLens dataset into HDFS using the Ambari UI
[Activity] Install the MovieLens dataset into HDFS using the command line
MapReduce: What it is, and how it works
How MapReduce distributes processing
MapReduce example: Break down movie ratings by rating score
[Activity] Installing Python, MRJob, and nano
[Activity] Code up the ratings histogram MapReduce job and run it
[Exercise] Rank Movies by their popularity
[Activity] Check your results against mine!
Programming Hadoop with Pig
Introducing Ambari
Introducing Pig
Example: Find the oldest movie with 5-star rating using Pig
[Activity] Find old 5-star movies with Pig
More Pig Latin
[Exercise] Find the most-rated one-star movie
Pig Challenge: Compare Your Results to Mine!
Programming Hadoop with Spark
Why Spark?
The Resilient Distributed Datasets(RDD)
[Activity] Find the movie with the lowest average rating - with RDD's
Datasets and Spark 2.0
[Activity] Find the movie with the lowest average rating - with DataFrames
[Activity] Movie recommendations with MLLib
[Exercise] Filter the lowest-rated movies by number of ratings
[Activity] Check your results against mine!
Using relational data stores with Hadoop
What is Hive?
[Activity] Use Hive to find the most popular movie
How Hive Works?
[Exercise] Use Hive to find the movie with the highest average rating
Compare your solution to mine
Integrating MySQL with Hadoop
[Activity] Install MySQL and import our movie data
[Activity] Use Sqoop to import data from MySQL to HFDS/Hive
[Activity] Use Sqoop to export data from Hadoop to MySQL
Using non-relational data stores with Hadoop
Why NoSQL?
What is HBase
[Activity] Import movie ratings into HBase
[Activity] Use HBase with Pig to import data at scale
Cassandra Overview
[Activity] Installing Cassandra
[Activity] Write Spark output into Cassandra
MongoDB overview
[Activity] Install MongoDB, and integrate Spark with MongoDB
[Activity] Using the MongoDB shell
Choosing a database technology
[Exercise] Choose a database for a given problem
Querying Your Data Interactively
Overview of Drill
[Activity] Setting up Drill
[Activity] Querying across multiple databases with Drill
Overview of Phoenix
[Activity] Install Phoenix and query HBase with it
[Activity] Integrate Phoenix with Pig
Overview of Presto
[Activity] Install Presto, and query Hive with it
[Activity] Query both Cassandra and Hive using Presto
Managing your Cluster
YARN Explained
Tez explained
[Activity] Use Hive on Tez and measure the performance benefit
Mesos explained
ZooKeeper explained
[Activity] Simulating a failing master with ZooKeeper
Oozie explained
[Activity] Set up a simple Oozie workflow
Zeppelin overview
[Activity] Use Zeppelin to analyze movie ratings, part 1
[Activity] Use Zeppelin to analyze movie ratings, part 2
Hue Overview
Other technologies worth mentioning
Feeding Data to your Cluster
Kafka explained
[Activity] Setting up Kafka, and publishing some data
[Activity] Publishing web logs with Kafka
Flume explained
[Activity] Set up Flume and publish logs with it
[Activity] Set up Flume to monitor a directory and store its data in HDFS
Analysing Streams of Data
Spark Streaming: Introduction
[Activity] Analyze web logs published with Flume using Spark streaming
[Exercise] Monitor Flume-published logs for errors in real time
Exercise solution: Aggregating HTTP access codes with Spark Streaming
Apache Storm: Introduction
[Activity] Count words with Storm
Flink: An Overview
[Activity] Counting words with Flink
Designing Real-World Systems
The Best of the Rest
Review: How the pieces fit together
Understanding your requirements
Sample Application: consume web server logs and keep tracks of top-sellers
Sample application: serving movie recommendations to a website
[Exercise] Design a system to report web sessions per day
Exercise solution: Design a system to count daily sessions
Learning More
Books and online resources
Bonus lecture: Discounts on my other big data / data science courses!

Video Details

ISBN 139781788478489
Course Length14 hours and 31 minutes
Read More
From 1 reviews

Read More Reviews

Recommended for You

Taming Big Data with MapReduce and Hadoop - Hands On! [Video] Book Cover
Taming Big Data with MapReduce and Hadoop - Hands On! [Video]
$ 79.99
$ 24.00
Web analytics with hands on projects in R [Video] Book Cover
Web analytics with hands on projects in R [Video]
$ 124.99
$ 37.50
Taming Big Data with Spark Streaming and Scala - Hands On! [Video] Book Cover
Taming Big Data with Spark Streaming and Scala - Hands On! [Video]
$ 79.99
$ 24.00
Taming Big Data with Apache Spark and Python - Hands On! [Video] Book Cover
Taming Big Data with Apache Spark and Python - Hands On! [Video]
$ 79.99
$ 24.00
Data Science and Machine Learning with Python - Hands On! [Video] Book Cover
Data Science and Machine Learning with Python - Hands On! [Video]
$ 98.99
$ 29.70
Getting Started With Oracle SOA Suite 11g R1 - A Hands-On Tutorial Book Cover
Getting Started With Oracle SOA Suite 11g R1 - A Hands-On Tutorial
$ 35.99
$ 18.00