Modern Big Data Processing with Hadoop

A comprehensive guide to design, build and execute effective Big Data strategies using Hadoop
Preview in Mapt
Code Files

Modern Big Data Processing with Hadoop

V. Naresh Kumar, Prashant Shindgikar
New Release!

1 customer reviews
A comprehensive guide to design, build and execute effective Big Data strategies using Hadoop
Mapt Subscription
FREE
$29.99/m after trial
eBook
$10.00
RRP $31.99
Save 68%
Print + eBook
$39.99
RRP $39.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$10.00
$39.99
$29.99 p/m after trial
RRP $31.99
RRP $39.99
Subscription
eBook
Print + eBook
Start 14 Day Trial

Frequently bought together


Modern Big Data Processing with Hadoop Book Cover
Modern Big Data Processing with Hadoop
$ 31.99
$ 10.00
Hands-on DevOps Book Cover
Hands-on DevOps
$ 35.99
$ 10.00
Buy 2 for $20.00
Save $47.98
Add to Cart

Book Details

ISBN 139781787122765
Paperback394 pages

Book Description

The complex structure of data these days requires sophisticated solutions for data transformation, to make the information more accessible to the users.This book empowers you to build such solutions with relative ease with the help of Apache Hadoop, along with a host of other Big Data tools.

This book will give you a complete understanding of the data lifecycle management with Hadoop, followed by modeling of structured and unstructured data in Hadoop. It will also show you how to design real-time streaming pipelines by leveraging tools such as Apache Spark, and build efficient enterprise search solutions using Elasticsearch. You will learn to build enterprise-grade analytics solutions on Hadoop, and how to visualize your data using tools such as Apache Superset. This book also covers techniques for deploying your Big Data solutions on the cloud Apache Ambari, as well as expert techniques for managing and administering your Hadoop cluster.

By the end of this book, you will have all the knowledge you need to build expert Big Data systems.

Table of Contents

Chapter 1: Enterprise Data Architecture Principles
Data architecture principles
The importance of metadata
Data governance
Data security
Data as a Service
Evolution data architecture with Hadoop
Summary
Chapter 2: Hadoop Life Cycle Management
Data wrangling
Data masking
Data security
Summary
Chapter 3: Hadoop Design Consideration
Understanding data structure principles
Installing Hadoop cluster
Exploring HDFS architecture
Introducing YARN
Configuring HDFS high availability
Configuration of HA NameNodes with QJM
Hadoop cluster composition
Best practices Hadoop deployment
Hadoop file formats
Summary
Chapter 4: Data Movement Techniques
Batch processing versus real-time processing
Apache Sqoop
Flume
Apache NiFi
Kafka Connect
Summary
Chapter 5: Data Modeling in Hadoop
Apache Hive
Supported datatypes
How Hive works
Hive architecture
Hive data model management
JSON documents using Hive
Apache HBase
Summary
Chapter 6: Designing Real-Time Streaming Data Pipelines
Real-time streaming concepts
Real-time streaming components
Apache Storm
Other popular real-time data streaming frameworks
Apache Flink versus Spark
Apache Spark versus Storm
Summary
Chapter 7: Large-Scale Data Processing Frameworks
MapReduce
Hadoop MapReduce
Apache Spark 2
Summary
Chapter 8: Building Enterprise Search Platform
The data search concept
The need for an enterprise search engine
Elasticsearch
How to index documents in Elasticsearch?
Mapping
Elasticsearch-supported data types
Analyzer
Logstash
Kibana
Use case
Summary
Chapter 9: Designing Data Visualization Solutions
Data visualization
Practical data visualization in Hadoop
Summary
Chapter 10: Developing Applications Using the Cloud
What is the Cloud?
Available technologies in the Cloud
Planning the Cloud infrastructure
Building a Hadoop cluster in the Cloud
Data access in the Cloud
Summary
Chapter 11: Production Hadoop Cluster Deployment
Apache Ambari architecture
Setting up a Hadoop cluster with Ambari
Hadoop clusters
Summary

What You Will Learn

  • Build an efficient enterprise Big Data strategy centered around Apache Hadoop
  • Gain a thorough understanding of using Hadoop with various Big Data frameworks such as Apache Spark, Elasticsearch and more
  • Set up and deploy your Big Data environment on premises or on the cloud with Apache Ambari
  • Design effective streaming data pipelines and build your own enterprise search solutions
  • Utilize the historical data to build your analytics solutions and visualize them using popular tools such as Apache Superset
  • Plan, set up and administer your Hadoop cluster efficiently

Authors

Table of Contents

Chapter 1: Enterprise Data Architecture Principles
Data architecture principles
The importance of metadata
Data governance
Data security
Data as a Service
Evolution data architecture with Hadoop
Summary
Chapter 2: Hadoop Life Cycle Management
Data wrangling
Data masking
Data security
Summary
Chapter 3: Hadoop Design Consideration
Understanding data structure principles
Installing Hadoop cluster
Exploring HDFS architecture
Introducing YARN
Configuring HDFS high availability
Configuration of HA NameNodes with QJM
Hadoop cluster composition
Best practices Hadoop deployment
Hadoop file formats
Summary
Chapter 4: Data Movement Techniques
Batch processing versus real-time processing
Apache Sqoop
Flume
Apache NiFi
Kafka Connect
Summary
Chapter 5: Data Modeling in Hadoop
Apache Hive
Supported datatypes
How Hive works
Hive architecture
Hive data model management
JSON documents using Hive
Apache HBase
Summary
Chapter 6: Designing Real-Time Streaming Data Pipelines
Real-time streaming concepts
Real-time streaming components
Apache Storm
Other popular real-time data streaming frameworks
Apache Flink versus Spark
Apache Spark versus Storm
Summary
Chapter 7: Large-Scale Data Processing Frameworks
MapReduce
Hadoop MapReduce
Apache Spark 2
Summary
Chapter 8: Building Enterprise Search Platform
The data search concept
The need for an enterprise search engine
Elasticsearch
How to index documents in Elasticsearch?
Mapping
Elasticsearch-supported data types
Analyzer
Logstash
Kibana
Use case
Summary
Chapter 9: Designing Data Visualization Solutions
Data visualization
Practical data visualization in Hadoop
Summary
Chapter 10: Developing Applications Using the Cloud
What is the Cloud?
Available technologies in the Cloud
Planning the Cloud infrastructure
Building a Hadoop cluster in the Cloud
Data access in the Cloud
Summary
Chapter 11: Production Hadoop Cluster Deployment
Apache Ambari architecture
Setting up a Hadoop cluster with Ambari
Hadoop clusters
Summary

Book Details

ISBN 139781787122765
Paperback394 pages
Read More
From 1 reviews

Read More Reviews

Recommended for You

Hands-on DevOps Book Cover
Hands-on DevOps
$ 35.99
$ 10.00
Python: End-to-end Data Analysis Book Cover
Python: End-to-end Data Analysis
$ 71.99
$ 10.00
Ultimate Big Data Application Development Book Cover
Ultimate Big Data Application Development
$ 31.99
$ 10.00
Architectural Patterns Book Cover
Architectural Patterns
$ 39.99
$ 10.00
Mastering Machine Learning Algorithms Book Cover
Mastering Machine Learning Algorithms
$ 35.99
$ 10.00
Apache Spark with Python - Big Data with PySpark and Spark [Video] Book Cover
Apache Spark with Python - Big Data with PySpark and Spark [Video]
$ 149.99
$ 10.01