Solving 10 Hadoop'able Problems [Video]

More Information
  • Explore the Hadoop big data Ecosystem in a nutshell 
  • Process payment data from an event stream using the streaming API: Payment Analyzer 
  • Detect BOT traffic using Spark Streaming, make log data queryable, and investigate customer data
  • Supply Chain analysis - find top-seller items in a streaming way, enhance top-seller items
  • Analyze Customer churn amounts quantitatively with DataFrame queries
  • Perform IoT sensor data analysis with device response to system failures and data streams
  • High-performance computation with neighborhood aggregations
  • Page ranking using Spark GraphX
  • Threat Analysis – Analyzing weblogs for suspicious activity and anomalies in network traffic
  • Extract information from unstructured text via Spark DataFrames
  • Perform sentiment analysis of posts using Logistic Regression, and find the author of a post
  • Find what product users want to buy using Cloudera Sandbox Toolkit
  • Use movie history to suggest content, and test and experiment with Recommendation Enginec

The Apache Hadoop ecosystem is a popular and powerful tool to solve big data problems. With so many competing tools to process data, many users want to know which particular problems are well suited to Hadoop, and how to implement those solutions.

To know what types of problems are Hadoop-able it is good to start with a basic understanding of the core components of Hadoop. You will learn about the ecosystem designed to run on top of Hadoop as well as software that is deployed alongside it. These tools give us the building blocks to build data processing applications. This course covers the core parts of the Hadoop ecosystem, helping to give a broad understanding and get you up-and-running fast. Next, it describes a number of common problems as case-study projects Hadoop is able to solve. These sections are broken down into sections by different projects, each serving as a specific use case for solving big data problems.

By the end of this course, you will have been exposed to a wide variety of Hadoop software and examples of how it is used to solve common big data problems.

Style and Approach

This course is filled in with hands-on exercises and implementation/execution techniques to help you solve 10 real-time, big data problems. First, you'll learn the Hadoop Ecosystem in a nutshell, then set up a development environment and sandbox. Finally you'll learn solutions to the problems you come across using big data techniques.

  • Learn how to crack big data projects via the Hadoop Ecosystem in a nutshell.
  • Implement practical code to find a solution to your common business and technical problems.
  • Hands-on solutions to your perplexing, real-world big data problems
Course Length 3 hours 12 minutes
ISBN 9781788390118
Date Of Publication 28 Feb 2018


Tomasz Lelek

Tomasz Lelek is a software engineer, programming mostly in Java and Scala. He has been working with the Spark and ML APIs for the past 6 years, with production experience in processing petabytes of data. He is passionate about nearly everything associated with software development and believes that we should always try to consider different solutions and approaches before attempting to solve a problem. Recently, he was also a speaker at conferences in Poland—Confitura, and JDD (Java Developers Day) and at Krakow Scala User Group. He has also conducted a live coding session at the Geecon Conference.