Throughout this book, we have been discussing Hadoop and its real-world use cases. In this final chapter, we are going to discuss the end-to-end implementation of a few such use cases. The motivation for this chapter is to apply the learning you've gathered from the earlier chapters. We will discuss use cases related the telecom, finance, and e-commerce domains. So, let's get started.
Call data records is data that is gathered by telecom operators that are specific to individual customers. We are going to take a look at telecom domain-specific use cases in this recipe.
To perform this recipe, you should have an up and running Hadoop cluster. We need some sample data for these use cases; I have written a data generator, which can used for your reference. You can find it at https://github.com/deshpandetanmay/cdr-data-generator.
Before jumping into the solution let's first try to understand a problem statement.
Telecom companies keep records of each and every call made by their subscribers. They also keep information such as when a call was made, who it was made to, the start time, end time, and so on. Detailed information, such as SMSes, data sessions, and so on, is also stored by these companies. Here, the problem statement is how we can use this data to make company operations run smoother and help them...
Web logs is data generated by web servers running a website. This use case is applicable to domains where companies have their websites hosted and want to know more about their website performance and customer behavior on the website.
To perform this recipe, you should have an up and running Hadoop cluster. I have uploaded the data of some sample web logs from
https://github.com/deshpandetanmay/hadoop-real-world-cookbook/blob/master/data/mylog.txt.
Before jumping into the solution, let's first try to understand the problem statement:
Many companies run businesses on their websites. Their website performance decides the sales or profitability. Web servers generally log information about the user, browser, IP address, and so on. We can use this information in order to make the website browsing experience smoother for users, which would help increase profitability.
A lot of companies handle sensitive information such as SSN numbers, names, credit card numbers, and so on. In this recipe, we are going to take a look at how to use Hadoop to mask or encrypt this data in order to secure it. This recipe can be referred to by various domains, such as finance, retail, telecom, and those people who handle critical information.