Learn By Example: Hadoop, MapReduce for Big Data problems [Video]

Preview in Mapt

Learn By Example: Hadoop, MapReduce for Big Data problems [Video]

Loonycorn

1 customer reviews
A hands-on workout in Hadoop, MapReduce and the art of thinking "parallel"
Mapt Subscription
FREE
$29.99/m after trial
Video
$42.50
RRP $49.99
Save 14%
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$42.50
$29.99 p/m after trial
RRP $49.99
Subscription
Video
Start 14 Day Trial

Frequently bought together


Learn By Example: Hadoop, MapReduce for Big Data problems [Video] Book Cover
Learn By Example: Hadoop, MapReduce for Big Data problems [Video]
$ 49.99
$ 42.50
Taming Big Data with MapReduce and Hadoop - Hands On! [Video] Book Cover
Taming Big Data with MapReduce and Hadoop - Hands On! [Video]
$ 79.99
$ 68.00
Buy 2 for $35.00
Save $94.98
Add to Cart

Video Details

ISBN 139781788994491
Course Length13 hours 44 minutes

Video Description

This course is a zoom-in, zoom-out, hands-on workout involving Hadoop, MapReduce and the art of thinking parallel. This course is both broad and deep. It covers the individual components of Hadoop in great detail and also gives you a higher level picture of how they interact with each other. It's a hands-on workout involving Hadoop, MapReduce. This course will get you hands-on with Hadoop very early on. You'll learn how to set up your own cluster using both VMs and the Cloud. All the major features of MapReduce are covered, including advanced topics like Total Sort and Secondary Sort. MapReduce completely changed the way people thought about processing Big Data. Breaking down any problem into parallelizable units is an art. The examples in this course will train you to think in parallel.

Style and Approach

Hands-on workout involving Hadoop, MapReduce.

Table of Contents

Introduction
You, this course and Us
Why is Big Data a Big Deal
The Big Data Paradigm
Serial vs Distributed Computing
What is Hadoop?
HDFS or the Hadoop Distributed File System
MapReduce Introduced
YARN or Yet Another Resource Negotiator
Installing Hadoop in a Local Environment
Hadoop Install Modes
Hadoop Standalone mode Install
Hadoop Pseudo-Distributed mode Install
The MapReduce "Hello World"
The basic philosophy underlying MapReduce
MapReduce - Visualized And Explained
MapReduce - Digging a little deeper at every step
"Hello World" in MapReduce
The Mapper
The Reducer
The Job
Run a MapReduce Job
Get comfortable with HDFS
Run your first MapReduce Job
Juicing your MapReduce - Combiners, Shuffle and Sort and The Streaming API
Parallelize the reduce phase - use the Combiner
Not all Reducers are Combiners
How many mappers and reducers does your MapReduce have?
Parallelizing reduce using Shuffle And Sort
MapReduce is not limited to the Java language - Introducing the Streaming API
Python for MapReduce
HDFS and Yarn
HDFS - Protecting against data loss using replication
HDFS - Name nodes and why they're critical
HDFS - Checkpointing to backup name node information
Yarn - Basic components
Yarn - Submitting a job to Yarn
Yarn - Plug in scheduling policies
Yarn - Configure the scheduler
MapReduce Customizations For Finer Grained Control
Setting up your MapReduce to accept command line arguments
The Tool, ToolRunner and GenericOptionsParser
Configuring properties of the Job object
Customizing the Partitioner, Sort Comparator, and Group Comparator
The Inverted Index, Custom Data Types for Keys, Bigram Counts and Unit Tests!
The heart of search engines - The Inverted Index
Generating the inverted index using MapReduce
Custom data types for keys - The Writable Interface
Represent a Bigram using a WritableComparable
MapReduce to count the Bigrams in input text
Test your MapReduce job using MRUnit
Input and Output Formats and Customized Partitioning
Introducing the File Input Format
Text And Sequence File Formats
Data partitioning using a custom partitioner
Make the custom partitioner real in code
Total Order Partitioning
Input Sampling, Distribution, Partitioning and configuring these
Secondary Sort
Recommendation Systems using Collaborative Filtering
Introduction to Collaborative Filtering
Friend recommendations using chained MR jobs
Get common friends for every pair of users - the first MapReduce
Top 10 friend recommendation for every user - the second MapReduce
Hadoop as a Database
Structured data in Hadoop
Running an SQL Select with MapReduce
Running an SQL Group By with MapReduce
A MapReduce Join - The Map Side
A MapReduce Join - The Reduce Side
A MapReduce Join - Sorting and Partitioning
A MapReduce Join - Putting it all together
K-Means Clustering
What is K-Means Clustering?
A MapReduce job for K-Means Clustering
K-Means Clustering - Measuring the distance between points
K-Means Clustering - Custom Writables for Input/Output
K-Means Clustering - Configuring the Job
K-Means Clustering - The Mapper and Reducer
K-Means Clustering: The Iterative MapReduce Job
Setting up a Hadoop Cluster
Manually configuring a Hadoop cluster (Linux VMs)
Getting started with Amazon Web Servicies
Start a Hadoop Cluster with Cloudera Manager on AWS
Appendix
Setup a Virtual Linux Instance (For Windows users)
[For Linux/Mac OS Shell Newbies] Path and other Environment Variables

What You Will Learn

  • Develop advanced MapReduce applications to process BigData
  • Master the art of thinking parallel and how to break up a task into Map/Reduce transformations
  • Self-sufficiently set up your own mini-Hadoop cluster whether it's a single node, a physical cluster or in the cloud.
  • Use Hadoop + MapReduce to solve a wide variety of problems : from NLP to Inverted Indices to Recommendations
  • Understand HDFS, MapReduce and YARN and how they interact with each other
  • Understand the basics of performance tuning and managing your own cluster

Authors

Table of Contents

Introduction
You, this course and Us
Why is Big Data a Big Deal
The Big Data Paradigm
Serial vs Distributed Computing
What is Hadoop?
HDFS or the Hadoop Distributed File System
MapReduce Introduced
YARN or Yet Another Resource Negotiator
Installing Hadoop in a Local Environment
Hadoop Install Modes
Hadoop Standalone mode Install
Hadoop Pseudo-Distributed mode Install
The MapReduce "Hello World"
The basic philosophy underlying MapReduce
MapReduce - Visualized And Explained
MapReduce - Digging a little deeper at every step
"Hello World" in MapReduce
The Mapper
The Reducer
The Job
Run a MapReduce Job
Get comfortable with HDFS
Run your first MapReduce Job
Juicing your MapReduce - Combiners, Shuffle and Sort and The Streaming API
Parallelize the reduce phase - use the Combiner
Not all Reducers are Combiners
How many mappers and reducers does your MapReduce have?
Parallelizing reduce using Shuffle And Sort
MapReduce is not limited to the Java language - Introducing the Streaming API
Python for MapReduce
HDFS and Yarn
HDFS - Protecting against data loss using replication
HDFS - Name nodes and why they're critical
HDFS - Checkpointing to backup name node information
Yarn - Basic components
Yarn - Submitting a job to Yarn
Yarn - Plug in scheduling policies
Yarn - Configure the scheduler
MapReduce Customizations For Finer Grained Control
Setting up your MapReduce to accept command line arguments
The Tool, ToolRunner and GenericOptionsParser
Configuring properties of the Job object
Customizing the Partitioner, Sort Comparator, and Group Comparator
The Inverted Index, Custom Data Types for Keys, Bigram Counts and Unit Tests!
The heart of search engines - The Inverted Index
Generating the inverted index using MapReduce
Custom data types for keys - The Writable Interface
Represent a Bigram using a WritableComparable
MapReduce to count the Bigrams in input text
Test your MapReduce job using MRUnit
Input and Output Formats and Customized Partitioning
Introducing the File Input Format
Text And Sequence File Formats
Data partitioning using a custom partitioner
Make the custom partitioner real in code
Total Order Partitioning
Input Sampling, Distribution, Partitioning and configuring these
Secondary Sort
Recommendation Systems using Collaborative Filtering
Introduction to Collaborative Filtering
Friend recommendations using chained MR jobs
Get common friends for every pair of users - the first MapReduce
Top 10 friend recommendation for every user - the second MapReduce
Hadoop as a Database
Structured data in Hadoop
Running an SQL Select with MapReduce
Running an SQL Group By with MapReduce
A MapReduce Join - The Map Side
A MapReduce Join - The Reduce Side
A MapReduce Join - Sorting and Partitioning
A MapReduce Join - Putting it all together
K-Means Clustering
What is K-Means Clustering?
A MapReduce job for K-Means Clustering
K-Means Clustering - Measuring the distance between points
K-Means Clustering - Custom Writables for Input/Output
K-Means Clustering - Configuring the Job
K-Means Clustering - The Mapper and Reducer
K-Means Clustering: The Iterative MapReduce Job
Setting up a Hadoop Cluster
Manually configuring a Hadoop cluster (Linux VMs)
Getting started with Amazon Web Servicies
Start a Hadoop Cluster with Cloudera Manager on AWS
Appendix
Setup a Virtual Linux Instance (For Windows users)
[For Linux/Mac OS Shell Newbies] Path and other Environment Variables

Video Details

ISBN 139781788994491
Course Length13 hours 44 minutes
Read More
From 1 reviews

Read More Reviews

Recommended for You

Taming Big Data with MapReduce and Hadoop - Hands On! [Video] Book Cover
Taming Big Data with MapReduce and Hadoop - Hands On! [Video]
$ 79.99
$ 68.00
Learn By Example: C++ Programming - 75 Solved Problems [Video] Book Cover
Learn By Example: C++ Programming - 75 Solved Problems [Video]
$ 50.99
$ 43.35
Learn by Example: Python [Video] Book Cover
Learn by Example: Python [Video]
$ 98.99
$ 84.15
Artificial Intelligence for Big Data Book Cover
Artificial Intelligence for Big Data
$ 35.99
$ 25.20
Apache Kafka Series - Kafka Streams for Data Processing [Video] Book Cover
Apache Kafka Series - Kafka Streams for Data Processing [Video]
$ 114.99
$ 97.75
Apache Spark with Scala - Learn Spark from a Big Data Guru [Video] Book Cover
Apache Spark with Scala - Learn Spark from a Big Data Guru [Video]
$ 149.99
$ 127.50