Free eBook: Big Data Analytics with Hadoop 3

Big Data Analytics with Hadoop 3
Explore big data concepts, platforms, analytics, and their applications using the power of Hadoop 3

Sridhar Alla, 482 pages, May 2018

Key Features

  • Learn Hadoop 3 to build effective big data analytics solutions on-premise and on cloud
  • Integrate Hadoop with other big data tools such as R, Python, Apache Spark, and Apache Flink
  • Exploit big data using Hadoop 3 with real-world examples


Apache Hadoop is the most popular platform for big data processing to build powerful analytics solutions. This book shows you how to do just that, with the help of practical examples. You will be well-versed with the analytical capabilities of Hadoop ecosystem with Apache Spark and Apache Flink to perform big data analytics by the end of this book.

Register now to access this free eBook

Your password must have at least 8 characters, one uppercase, one lowercase and one number.

By signing up, you are confirming you would like to receive occasional emails about special offers and recommendations.


Chapter 1


Introduction to Hadoop

We will start by introducing the changes and new features in the Hadoop 3 release. Particularly, we will talk about the new features of HDFS and Yet Another Resource Negotiator (YARN), and changes to client applications. Furthermore, we will also install a Hadoop cluster locally and demonstrate t...

Chapter 2


Overview of Big Data Analytics

In this chapter, we will talk about big data analytics, starting with a general point of view and then taking a deep dive into some common technologies used to gain insights into data. This chapter introduces the reader to the process of examining large data sets to uncover patterns in data, gene...

Chapter 3


Big Data Processing with MapReduce

This chapter puts everything we have learned in the book into a practical use case of building an end-to-end pipeline to perform big data analytics utilising the MapReduce framework.

Chapter 4


Scientific Computing and Big Data Analysis with Python and Hadoop

In this chapter, we provide an introduction to Python and analyzing big data using Hadoop and Python packages. We will be looking at a basic Python installation, opening a Jupyter Notebook, and working through some examples.

Chapter 5


Statistical Big Data Computing with R and Hadoop

This chapter provides an introduction to R and how to use R to perform statistical computing on big data using Hadoop. We will see alternatives ranging from open source R on workstations to parallelized commercial products such as Revolution R Enterprise, and many other options in between will pr...

Chapter 6


Batch Analytics with Apache Spark

In this chapter, you will learn about Apache Spark and how to use it for big data analytics based on a batch processing model. Spark SQL is a component on top of Spark Core that can be used to query structured data. It is becoming the de facto tool, replacing Hive as the choice for batch analytic...

Related Titles

Modern Big Data Processing with Hadoop

A comprehensive guide to design, build and execute effective Big Data strategies using Hadoop

Scala and Spark for Big Data Analytics

Harness the power of Scala to program Spark and analyze tonnes of data in the blink of an eye!

Big Data Architect's Handbook

A comprehensive end-to-end guide that gives hands-on practice in big data and Artificial Intelligence

Discover the new Packt free eBook range