Packt+ | Advance your knowledge in tech

You're reading from Hadoop 2.x Administration Cookbook

Product type Book

Published in May 2017

Publisher Packt

ISBN-13 9781787126732

Pages 348 pages

Edition 1st Edition

Languages

Concepts

System Administration

Author (1):

Aman Singh

Table of Contents (20) Chapters

Hadoop 2.x Administration Cookbook

Credits

About the Author

About the Reviewers

www.PacktPub.com

Customer Feedback

Preface

Hadoop Architecture and Deployment

Maintaining Hadoop Cluster HDFS

Maintaining Hadoop Cluster – YARN and MapReduce

High Availability

Schedulers

Backup and Recovery

Data Ingestion and Workflow

Performance Tuning

HBase Administration

Cluster Planning

Troubleshooting, Diagnostics, and Best Practices

Security

Index

Chapter 6. Backup and Recovery

In this chapter, we will cover the following recipes:

Initiating Namenode saveNamespace
Using HDFS Image Viewer
Fetching parameters which are in-effect
Configuring HDFS and YARN logs
Backing up and recovering Namenode
Configuring Secondary Namenode
Promoting Secondary Namenode to Primary
Namenode recovery
Namenode roll edits – Online mode
Namenode roll edits – Offline mode
Datanode recovery – Disk full
Configuring NFS gateway to serve HDFS
Recovering deleted files

Introduction

In this chapter, we will configure backup, restore processes, logs, and recovery using Secondary Namenode. Despite high availability, it is very important to back up data for adverse situations, irrespective of the notion of having a Secondary / backup node running and syncing constantly from the Primary node.

In a master-slave architecture, if the slave is syncing some data from the master and the data on the master gets corrupted, the slave will most likely pull the same corrupted data and now we will have two bad copies of the data. Although there are checks in place to account for corrupt data using checksums, it is still for production-critical data and so there must always be a business continuity or recovery plan.

Initiating Namenode saveNamespace

In our earlier recipes, we have configured the Hadoop cluster and have gone through various concepts on cluster operations. We saw that Namenode stores the metadata, which is a combination of the fsimage file and edits file and these two images are never merged by Namenode, unless it is restarted or there is some other node, such as Secondary Namenode, which does this. We will be covering Secondary Namenode in this chapter at a later stage.

Whenever the Namenode is started, it applies all the changes in the edits file to the fsimage file and starts with a clean edits file. Depending upon the size of the edits log, it could take a long time to start up Namenode and this adds to the total time a Namenode stays in the safemode. Safemode, as discussed in the earlier chapters, is not just a factor of edits file size, but also of the time taken to build up the bitmap, which is a mapping of blocks to the Datanodes.

Note

If the edits file cannot be applied to the fsimage...

Using HDFS Image Viewer

It is important to understand how metadata of the Namenode is stored and what changes to the filesystem have been rolled into fsimage file. The fsimage file and edits file cannot be viewed using a cat or vi editor, but needs specialized tools to do so.

Hadoop, by default, comes with utilities to view the fsimage file and edits file and, in this recipe, we will cover how to use these tools.

Getting ready

To step through the recipe in this section, we need a Hadoop cluster set up and running. The reader is encouraged to understand the Namenode metadata and its location.

How to do it...

Connect to master1.cyrus.com master node in the cluster and switch to user hadoop.
Confirm the location of the Namenode metadata by looking into hdfs-site.xml file. There will be something similar to the following, which points to the location of metadata. We will see in a later recipe how to check the value of any parameter, without opening the configuration files. To know the location of...

Fetching parameters which are in-effect

In this recipe, we look at how we can fetch the configured parameters in the Hadoop cluster, without going through the files.

The parameters are either default values, defined by files such as hdfs-default.xml, core-default.xml, yarn-default.xml, and so on, or defined explicitly in the configuration files such as hdfs-site.xml, core-site.xml, mapred-site.xml, and a few others. The default files are part of the packaged jars with the distribution, and any changes we make overrides them.

Getting ready

To step through the recipe, the user needs at least one node in the cluster and needs to make sure that the Hadoop environment variables are in place. It is not necessary to start any of the daemons in the cluster.

How to do it...

Connect to the master1.cyrus.com master node in the cluster and switch to user hadoop.

To find the value of any parameter, use the following command:

$ hdfs getconf –confkey dfs.blocksize
$ hdfs getconf –confkey dfs.replication
$ hdfs...

Configuring HDFS and YARN logs

In this recipe, we will configure logs for the HDFS and YARN, which is very important for troubleshooting and diagnosis of job failures.

For larger clusters, it is important to manage logs in terms of disk space usage, ease of retrieval, and performance. It is always recommended to store logs on separate hard disks and that too on RAIDed disks for performance. The disk thats used by Namenode or Datanodes for metadata or HDFS blocks must not be shared with for logs.

Getting ready

To complete the recipe, the user must have a running cluster with HDFS and YARN configured and have played around with Chapter 1, Hadoop Architecture and Deployment and Chapter 2, Maintain Hadoop Cluster HDFS to understand things better.

How to do it...

Connect to the master1.cyrus.com master node in the cluster and switch to user hadoop.
By default, the location of HDFS and YARN logs is defined by the settings $HADOOP_HOME/logs and $YARN_LOG_DIR/logs in file hadoop-env.xml and yarn-env.sh...

Backing up and recovering Namenode

In this recipe, we will look at how to backup and restore Namenode. As discussed previously, the importance of backup, despite having high availability, will cover some ways to restore the backup. The backup could be as simple as just a copy of the metadata to the other system and then copying it back on the new node and starting the Namenode process or using the import command to point to the backup location and executing the command to copy the contents to the right location with the right permissions.

Getting ready

For this recipe, you will again need a running cluster with HDFS configured in the cluster. Readers are recommended to read the previous recipes in this chapter to understand this recipe better.

How to do it...

Connect to the master1.cyrus.com master node and switch to user hadoop.
For backup, copy the contents of the directory pointed by dfs.namenode.name.dir to any other location, preferably outside the system. This could be doing a simple scp...

Configuring Secondary Namenode

In this recipe, we will be configuring Secondary Namenode, which is a checkpointing node. In the very first recipe of this chapter, we say that it is critical to manage metadata and keep it clean as often as possible.

The Secondary Namenode can have multiple roles such as backup node, checkpointing node, and so on. The most common is the checkpointing node, which pulls the metadata from Namenode and also does merging of the fsimage and edits logs, which is called the check pointing process and pushes the rolled copy back to the Primary Namenode.

Getting ready

Make sure that the user has a running cluster with HDFS and has one more node to be used as Secondary. The master2 node, from the Namenode HA using shared storage recipe in Chapter 4, High Availability can be used as a Secondary Namenode or Secondary Namenode can co-exist with the Primary Namenode.

When running Namenode HA, there is no need to run Secondary Namenode, as the standby Namenode will do the job...

Promoting Secondary Namenode to Primary

In this recipe, we will cover how to promote Secondary Namenode to be Primary Namenode.

In production, Datanodes will never talk to the Secondary and only the Primary node knows about the data block mappings. In a non-HA setup, if the Primary Namenode fails, there will be outage, but we can still reduce the downtime by quickly promoting the Secondary to be Primary.

Getting ready

For this recipe, make sure you have completed the previous recipe on Secondary Namenode configuration and have a running Secondary Namenode.

How to do it...

Connect to the master2.cyrus.com master node and switch to user hadoop.
The first thing is to check the seen_txid file under location /data/secondary/current/, to make sure until what point is the Secondary in sync with Primary.
If the lag is high, it is important that the metadata is copied from the NFS mount of the Primary Namenode. That is the reason of having at least one Primary Namenode metadata directory mount on NFS.
Change...

Namenode recovery

In this recipe, we will cover how to recover Namenode from corrupted edits or fsimage. During the Namenode startup, the edits is check pointed into the fsimage image . What if the image is corrupted? Will the Namenode boot up?

In a worst-case scenario, it recovers as much data as possible and removes any corrupted entries from metadata so that the Namenode can start up.

Getting ready

For this recipe, make sure you have completed the recipe on Hadoop cluster setup, Chapter 1, Hadoop Architecture and Deployment, and have at least HDFS running perfectly.

How to do it...

Connect to the master1.cyrus.com master node and switch to user hadoop.
While attempting to start the Namenode, it just fails with bad or corrupted blocks and we have no way to bring the Namenode up.
Remember we cannot run fsck if Namenode is not running.
As a last resort, we will try to skip the missing or corrupted metadata and start the Namenode with a cleaner image.
To start the Namenode recover process, use the...

Namenode roll edits – online mode

In this recipe, we will take a look at how to roll edits and keep the size of the metadata to a minimum. Over a period of time, the number of edits files grow and also the Namenode keeps old versions of fsimage file. In a large busy cluster, the edits files could be large, with each of about 1 GB.

This will utilize disk space on Namenode and can cause disk issues in the longer run of the cluster. Also, if the Secondary Namenode is not configured or is not working correctly, these edits files will be in large numbers, with each file of approximately 1 million transactions. Due to this, the Namenode start time will increase and Namenode might not even start if the memory is not sufficient to do the checkpoint operation.

Getting ready

For this recipe, make sure that you have completed the recipe on Hadoop cluster setup, Chapter 1, Hadoop Architecture and Deployment, and have at least HDFS running for a few hour with some data on HDFS.

How to do it...

Connect to...

Namenode roll edits – offline mode

In this recipe, we will look at how to roll edits in offline mode. What if the Namenode disk is completely full and it is not able to start at all? We cannot use the process described in the precious recipe.

For this, we will use another Namenode, mount the NFS mount there, and perform the process there.

Getting ready

For this recipe, make sure you have completed the recipe on Hadoop cluster setup, Chapter 1, Hadoop Architecture and Deployment, and has one more node to use as just Namenode.

How to do it...

Connect to the master1.cyrus.com master node and switch to user root.
Unmount the NFS share, which is used as a directory to store the metadata as shown in the following screenshot:
Again, we are keeping the other mount point /data/namenode1 safe this time and just playing around with /data/namenode2.
Mount the NFS share on the master2.cyrus.com node using the following command:
```
# mount –t nfs4 filer.cyrus.com:/nfs /mnt
```
On master2, create a directory owned by...

Datanode recovery – disk full

In this recipe, we will discuss on the process to recover the Datanode once it is low on disk space. Usually, Datanodes are assumed to fail in the cluster, but sometimes it is important to know how to recover in case of the disk being full.

This is a process which we have to perform when the replication factor is set to 1 and we have critical data to recover.

If the disk on the Datanode is bad and it cannot be read due to hardware issues such as controller failure, then we cannot follow this process. On the Datanode, which is low on disk space, we will add a new larger disk and mount it on the Datanode and start the Datanode daemon for the blocks that are available.

One thing we need to know here is that once we shutdown the Datanode, how quickly the Namenode sees it being removed from the cluster. Remember, we are not decommissioning the node, but trying to replace the disk and start the Datanode service back, without movement of blocks of the Datanode.

This could...

Configuring NFS gateway to serve HDFS

In this recipe, we will configure NFS server to export HDFS as a filesystem which can be mounted onto another system and the native operating system commands, such as ls, cp, and so, on will work efficiently.

As we have seen upto now, the HDFS filesystem is a filesystem which is not understood by Linux shell commands such, as cp, ls, and mkdir, and so the user must use Hadoop commands to perform file operations.

Getting ready

Make sure that the user has a running cluster with at least HDFS configured and working perfectly. Users are expected to have a basic knowledge about Linux and NFS server. The NFS Gateway supports NFSv3 and allows HDFS to be mounted as part of the client's local filesystem.

How to do it...

Connect to the master1.cyrus.com master node and switch to user hadoop.
The user running the NFS-gateway must be able to proxy all the users using the NFS mounts. Edit the core-site.xml file and add the configuration lines as shown here:
```
<property...
```

Recovering deleted files

In this recipe, we will look at how we can recover deleted files from the Hadoop cluster. What if the user deletes a critical file with the -skipTrash option? Can it be recovered?

This recipe, is more of a best effort to restore the files after deletion. When the delete command is executed, the Namenode updates its metadata in edits file and then fires the invalidate command to remove the blocks. If the cluster is very busy, the invalidation might take time and we can revoke the files. But, on an idle cluster, if we delete the files, Namenode will immediately fire the invalidate command in response to the Datanode heartbeat and as Datanode does not have any pending operations to do, it will delete the blocks.