Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Hadoop Real-World Solutions Cookbook - Second Edition

You're reading from  Hadoop Real-World Solutions Cookbook - Second Edition

Product type Book
Published in Mar 2016
Publisher
ISBN-13 9781784395506
Pages 290 pages
Edition 2nd Edition
Languages
Author (1):
Tanmay Deshpande Tanmay Deshpande
Profile icon Tanmay Deshpande

Table of Contents (18) Chapters

Hadoop Real-World Solutions Cookbook Second Edition
Credits
About the Author
Acknowledgements
About the Reviewer
www.PacktPub.com
Preface
Getting Started with Hadoop 2.X Exploring HDFS Mastering Map Reduce Programs Data Analysis Using Hive, Pig, and Hbase Advanced Data Analysis Using Hive Data Import/Export Using Sqoop and Flume Automation of Hadoop Tasks Using Oozie Machine Learning and Predictive Analytics Using Mahout and R Integration with Apache Spark Hadoop Use Cases Index

Chapter 10. Hadoop Use Cases

In this chapter, we'll take a look at the following recipes:

  • Call Data Record analytics

  • Web log analytics

  • Sensitive data masking and encryption using Hadoop

Introduction


Throughout this book, we have been discussing Hadoop and its real-world use cases. In this final chapter, we are going to discuss the end-to-end implementation of a few such use cases. The motivation for this chapter is to apply the learning you've gathered from the earlier chapters. We will discuss use cases related the telecom, finance, and e-commerce domains. So, let's get started.

Call Data Record analytics


Call data records is data that is gathered by telecom operators that are specific to individual customers. We are going to take a look at telecom domain-specific use cases in this recipe.

Getting ready

To perform this recipe, you should have an up and running Hadoop cluster. We need some sample data for these use cases; I have written a data generator, which can used for your reference. You can find it at https://github.com/deshpandetanmay/cdr-data-generator.

How to do it...

Before jumping into the solution let's first try to understand a problem statement.

Problem Statement

Telecom companies keep records of each and every call made by their subscribers. They also keep information such as when a call was made, who it was made to, the start time, end time, and so on. Detailed information, such as SMSes, data sessions, and so on, is also stored by these companies. Here, the problem statement is how we can use this data to make company operations run smoother and help them...

Web log analytics


Web logs is data generated by web servers running a website. This use case is applicable to domains where companies have their websites hosted and want to know more about their website performance and customer behavior on the website.

Getting ready

To perform this recipe, you should have an up and running Hadoop cluster. I have uploaded the data of some sample web logs from

https://github.com/deshpandetanmay/hadoop-real-world-cookbook/blob/master/data/mylog.txt.

How to do it...

Before jumping into the solution, let's first try to understand the problem statement:

Problem statement

Many companies run businesses on their websites. Their website performance decides the sales or profitability. Web servers generally log information about the user, browser, IP address, and so on. We can use this information in order to make the website browsing experience smoother for users, which would help increase profitability.

Solution

Here, we assume that a company hosting its website on an Apache...

Sensitive data masking and encryption using Hadoop


A lot of companies handle sensitive information such as SSN numbers, names, credit card numbers, and so on. In this recipe, we are going to take a look at how to use Hadoop to mask or encrypt this data in order to secure it. This recipe can be referred to by various domains, such as finance, retail, telecom, and those people who handle critical information.

Getting ready

To perform this recipe, you should have an up and running Hadoop cluster.

How to do it...

Before jumping into the solution, let's first try to understand the problem statement.

Problem statement

Handling sensitive information is a critical part of today's data operations. Here, the problem statement is to transform critical information into masked data or completely encrypted data.

Solution

Here, we assume that we already have data with us in flat files and it has been loaded into HDFS.

Let's say we have some sample data, as shown here, which has the name and credit card number of...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Hadoop Real-World Solutions Cookbook - Second Edition
Published in: Mar 2016 Publisher: ISBN-13: 9781784395506
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}