Free eBook: Big Data Analytics with Hadoop 3
Sridhar Alla, 482 pages, May 2018
- Learn Hadoop 3 to build effective big data analytics solutions on-premise and on cloud
- Integrate Hadoop with other big data tools such as R, Python, Apache Spark, and Apache Flink
- Exploit big data using Hadoop 3 with real-world examples
DescriptionApache Hadoop is the most popular platform for big data processing to build powerful analytics solutions. This book shows you how to do just that, with the help of practical examples. You will be well-versed with the analytical capabilities of Hadoop ecosystem with Apache Spark and Apache Flink to perform big data analytics by the end of this book.
Register now to access this free eBook
Introduction to Hadoop
We will start by introducing the changes and new features in the Hadoop 3 release. Particularly, we will talk about the new features of HDFS and Yet Another Resource Negotiator (YARN), and changes to client applications. Furthermore, we will also install a Hadoop cluster locally and demonstrate t...
Overview of Big Data Analytics
In this chapter, we will talk about big data analytics, starting with a general point of view and then taking a deep dive into some common technologies used to gain insights into data. This chapter introduces the reader to the process of examining large data sets to uncover patterns in data, gene...
Big Data Processing with MapReduce
This chapter puts everything we have learned in the book into a practical use case of building an end-to-end pipeline to perform big data analytics utilising the MapReduce framework.
Scientific Computing and Big Data Analysis with Python and Hadoop
In this chapter, we provide an introduction to Python and analyzing big data using Hadoop and Python packages. We will be looking at a basic Python installation, opening a Jupyter Notebook, and working through some examples.
Statistical Big Data Computing with R and Hadoop
This chapter provides an introduction to R and how to use R to perform statistical computing on big data using Hadoop. We will see alternatives ranging from open source R on workstations to parallelized commercial products such as Revolution R Enterprise, and many other options in between will pr...
Batch Analytics with Apache Spark
In this chapter, you will learn about Apache Spark and how to use it for big data analytics based on a batch processing model. Spark SQL is a component on top of Spark Core that can be used to query structured data. It is becoming the de facto tool, replacing Hive as the choice for batch analytic...
A comprehensive guide to design, build and execute effective Big Data strategies using Hadoop
Harness the power of Scala to program Spark and analyze tonnes of data in the blink of an eye!
A comprehensive end-to-end guide that gives hands-on practice in big data and Artificial Intelligence