Hadoop Backup and Recovery Solutions

Learn the best strategies for data recovery from Hadoop backup clusters and troubleshoot problems

Hadoop Backup and Recovery Solutions

Gaurav Barot, Chintan Mehta, Amij Patel

Learn the best strategies for data recovery from Hadoop backup clusters and troubleshoot problems
Mapt Subscription
FREE
$29.99/m after trial
eBook
$16.80
RRP $23.99
Save 29%
Print + eBook
$29.99
RRP $29.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$16.80
$29.99
$29.99p/m after trial
RRP $23.99
RRP $29.99
Subscription
eBook
Print + eBook
Start 30 Day Trial
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 
Preview in Mapt

Book Details

ISBN 139781783289042
Paperback206 pages

Book Description

Hadoop offers distributed processing of large datasets across clusters and is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. It enables computing solutions that are scalable, cost-effective, flexible, and fault tolerant to back up very large data sets from hardware failures.

Starting off with the basics of Hadoop administration, this book becomes increasingly exciting with the best strategies of backing up distributed storage databases.

You will gradually learn about the backup and recovery principles, discover the common failure points in Hadoop, and facts about backing up Hive metadata. A deep dive into the interesting world of Apache HBase will show you different ways of backing up data and will compare them. Going forward, you'll learn the methods of defining recovery strategies for various causes of failures, failover recoveries, corruption, working drives, and metadata. Also covered are the concepts of Hadoop matrix and MapReduce. Finally, you'll explore troubleshooting strategies and techniques to resolve failures.

Table of Contents

Chapter 1: Knowing Hadoop and Clustering Basics
Understanding the need for Hadoop
Understanding HDFS design
Understanding the basics of Hadoop cluster
Summary
Chapter 2: Understanding Hadoop Backup and Recovery Needs
Understanding the backup and recovery philosophies
Knowing the necessity of backing up Hadoop
Determining backup areas – what should I back up?
Is taking backup enough?
Summary
Chapter 3: Determining Backup Strategies
Knowing the areas to be protected
Understanding the common failure types
Learning a way to define the backup strategy
Understanding the need for backing up Hive metadata
Summary
Chapter 4: Backing Up Hadoop
Data backup in Hadoop
HBase
Approaches to backing up HBase
Summary
Chapter 5: Determining Recovery Strategy
Knowing the key considerations of recovery strategy
Disaster failure at data centers
Restoring a point-in time copy for auditing
Restoring a data copy due to user error or accidental deletion
Defining recovery strategy
Summary
Chapter 6: Recovering Hadoop Data
Failover to backup cluster
Importing a table or restoring a snapshot
Pointing the HBase root folder to the backup location
Locating and repairing corruptions
Recovering a drive from the working state
Lost files
The recovery of NameNode
Summary
Chapter 7: Monitoring
Monitoring overview
Metrics of Hadoop
Monitoring node health
Cluster monitoring
Logging
Summary
Chapter 8: Troubleshooting
Understanding troubleshooting approaches
Understanding common failure points
Identifying the root cause
Knowing issue resolution techniques
Summary

What You Will Learn

  • Familiarize yourself with HDFS and daemons
  • Determine backup areas, disaster recover principles, and backup needs
  • Understand the necessity for Hive metadata backup
  • Discover HBase to explore different backup styles, such as snapshot, replication, copy table, the HTable API, and manual backup
  • Learn the key considerations of a recovery strategy and restore data in the event of accidental deletion
  • Tune the performance of a Hadoop cluster and recover from scenarios such as failover, corruption, working drives, and NameNodes
  • Monitor node health, and explore various techniques for checks, including HDFS checks and MapReduce checks
  • Identify common hardware failure points and discover mitigation techniques

Authors

Table of Contents

Chapter 1: Knowing Hadoop and Clustering Basics
Understanding the need for Hadoop
Understanding HDFS design
Understanding the basics of Hadoop cluster
Summary
Chapter 2: Understanding Hadoop Backup and Recovery Needs
Understanding the backup and recovery philosophies
Knowing the necessity of backing up Hadoop
Determining backup areas – what should I back up?
Is taking backup enough?
Summary
Chapter 3: Determining Backup Strategies
Knowing the areas to be protected
Understanding the common failure types
Learning a way to define the backup strategy
Understanding the need for backing up Hive metadata
Summary
Chapter 4: Backing Up Hadoop
Data backup in Hadoop
HBase
Approaches to backing up HBase
Summary
Chapter 5: Determining Recovery Strategy
Knowing the key considerations of recovery strategy
Disaster failure at data centers
Restoring a point-in time copy for auditing
Restoring a data copy due to user error or accidental deletion
Defining recovery strategy
Summary
Chapter 6: Recovering Hadoop Data
Failover to backup cluster
Importing a table or restoring a snapshot
Pointing the HBase root folder to the backup location
Locating and repairing corruptions
Recovering a drive from the working state
Lost files
The recovery of NameNode
Summary
Chapter 7: Monitoring
Monitoring overview
Metrics of Hadoop
Monitoring node health
Cluster monitoring
Logging
Summary
Chapter 8: Troubleshooting
Understanding troubleshooting approaches
Understanding common failure points
Identifying the root cause
Knowing issue resolution techniques
Summary

Book Details

ISBN 139781783289042
Paperback206 pages
Read More

Read More Reviews

Recommended for You

Learning Hadoop 2 Book Cover
Learning Hadoop 2
$ 29.99
$ 21.00
Big Data Analytics with R and Hadoop Book Cover
Big Data Analytics with R and Hadoop
$ 29.99
$ 21.00
Hadoop Operations and Cluster Management Cookbook Book Cover
Hadoop Operations and Cluster Management Cookbook
$ 29.99
$ 21.00
Spark Cookbook Book Cover
Spark Cookbook
$ 35.99
$ 25.20
Hadoop MapReduce v2 Cookbook - Second Edition Book Cover
Hadoop MapReduce v2 Cookbook - Second Edition
$ 29.99
$ 21.00
Practical Machine Learning Book Cover
Practical Machine Learning
$ 37.99
$ 26.60