Taming Big Data with MapReduce and Hadoop - Hands On! [Video]
-
Free ChapterIntroduction, and Getting Started
-
Understanding MapReduce
- MapReduce Basic Concepts
- A Walkthrough of Rating Histogram Code
- Understanding How MapReduce Scales / Distributed Computing
- Average Friends by Age Example – Part 1
- Average Friends by Age Example – Part 2
- Minimum Temperature by Location Example
- Maximum Temperature by Location Example
- Word Frequency in a Book Example
- Making the Word Frequency Mapper Better with Regular Expressions
- Sorting the Word Frequency Results Using Multi-Stage MapReduce Jobs
- Activity: Design a Mapper and Reducer for Total Spent by Customer
- Activity: Write Code for Total Spent by Customer
- Compare Your Code to Mine – Sort Results by Amount Spent
- Compare Your Code to Mine for Sorted Results
- Combiners
-
Advanced MapReduce Examples
- Example – Most Popular Movie
- Including Ancillary Lookup Data in the Example
- Example – Most Popular Superhero Part 1
- Example – Most Popular Superhero Part 2
- Example: Degrees of Separation – Concepts
- Degrees of Separation – Preprocessing the Data
- Degrees of Separation – Code Walkthrough
- Degrees of Separation – Running and Analyzing the Results
- Example – Similar Movies Based on Ratings: Concepts
- Similar Movies – Code Walkthrough
- Similar Movies – Running and Analyzing the Results
- Learning Activity – Improving Our Movie Similarities MapReduce Job
-
Using Hadoop and Elastic MapReduce
- Fundamental Concepts of Hadoop
- The Hadoop Distributed File System (HDFS)
- Apache YARN
- Hadoop Streaming – How Hadoop Runs Your Python Code
- Setting Up Your Amazon Elastic MapReduce Account
- Linking Your EMR Account with MRJob
- Exercise – Run Movie Recommendations on Elastic MapReduce
- Analyze the Results of Your EMR Job
-
Advanced Hadoop and EMR
- Distributed Computing Fundamentals
- Activity – Running Movie Similarities on Four Machines
- Analyzing the Results of the Four-Machine Job
- Troubleshooting Hadoop Jobs with EMR and MRJob – Part 1
- Troubleshooting Hadoop Jobs – Part 2
- Analyzing One Million Movie Ratings across 16 Machines – Part 1
- Analyzing One Million Movie Ratings across 16 Machines – Part 2
-
Other Hadoop Technologies
Big Data processing is creating a lot of buzz in the market lately, with organizations having to deal with large amounts of data on a daily basis. Processing such data and extracting actionable insights from it is a major challenge; that’s where Hadoop and MapReduce comes to the rescue. This course will teach you how to use MapReduce for Big Data processing – with lots of practical examples and use-cases. You will start with understanding the Hadoop ecosystem and the basics of MapReduce. You will proceed to see how MapReduce can be used to process different types of data – whether it is analyzing movie ratings or your social network data. You will also learn how to run MapReduce jobs on Hadoop clusters using Amazon Elastic MapReduce. The course wraps up with an overview of other Hadoop-based technologies such as Hive, Pig, and the in-demand Apache Spark.
- Publication date:
- September 2016
- Publisher
- Packt
- Duration
- 4 hours 58 minutes
- ISBN
- 9781787125568