Reader small image

You're reading from  Monitoring Hadoop

Product typeBook
Published inApr 2015
Publisher
ISBN-139781783281558
Edition1st Edition
Tools
Right arrow
Author (1)
Aman Singh
Aman Singh
author image
Aman Singh

Gurmukh Singh is a seasoned technology professional with 14+ years of industry experience in infrastructure design, distributed systems, performance optimization, and networks. He has worked in big data domain for the last 5 years and provides consultancy and training on various technologies. He has worked with companies such as HP, JP Morgan, and Yahoo. He has authored Monitoring Hadoop by Packt Publishing
Read more about Aman Singh

Right arrow

Chapter 4. HDFS Checks

The Hadoop distributed File System is an important component of the cluster. The state of the File System must be clean at all stages and the components related to it must be healthy.

In this chapter, we will look at the HDFS checks by using the Hadoop commands, and we will also discuss how to set up the Nagios monitoring for them.

The following topics will be covered in this chapter:

  • Replication consistency

  • Space utilization

  • CPU utilization

  • NameNode health checks

  • Number of DataNodes in a cluster

HDFS overview


HDFS is a distributed File System that has been designed for robustness by having multiple copies of blocks across the File System. The metadata for the File System is stored on NameNode and the actual data blocks are stored on DataNodes. For a healthy File System, the metadata must be consistent, DataNode blocks must be clean, and replication must be consistent. Let's look at each of these one by one and learn how they can be monitored. The protocol used for communication between NameNode and DataNodes is RPC, and the protocol used for data transfer is HDFS over HTTP.

  • HDFS checks: Hadoop natively provides the commands to verify the File System. The commands must be run by the user, with whom the HDFS is running. This is mostly HDFS, or you can have any other user. But do not run it as root. To run these commands, the PATH variable must be set and it must include the path to the Hadoop binaries.

    • hadoop dfsadmin –report: This command provides an exclusive report of the HDFS...

Nagios master configuration


As discussed in Chapter 1, Introduction to Monitoring, Nagios is a monitoring platform, and it works very well for the Hadoop monitoring needs. Let's see how to configure Nagios for the Hadoop service checks.

On the Nagios server, called mnode, we need to set up the service definitions, the command definitions, and the host definitions as defined here. These definitions will enable checks, and by using these we can gather the status of a service or a node. The plugin needs to be downloaded and installed from http://www.nagios.org/download.

  • HDFS space check: Check the HDFS space usage on the cluster.

    define command{
      command_name check_hadoop_space
      command_line $PATH$/check_hadoop_namenode.pl -H $HOSTADDRESS$ -u $USER8$ -P $PORT$ -s $ARG2$ -w $ARG3$ -c $ARG4$
    }
    
    define host {
      
      use hadoop-server
      host_name hadoopnode1
      alias Remote
      Host address 192.168.0.1
      contact_groups admins
    }
    Service definition:
    
    define service {
      
      use generic-service
      service_description...

The Nagios client configuration


Every Hadoop node, whether NameNode, DataNode, or Zookeeper is a client node of the Nagios Server. Each node must have the NRPE plugin installed with the checks described under /usr/local/nagios/libexec and the commands specified under /usr/local/nagios/etc/nrpe.cfg as shown here:

command[check_balancer]=/usr/local/nagios/libexec/check_hadoop_namenode.pl -H $HOSTADDRESS$ -u $USER8$ -P $PORT$ -b $ARG2$
command[check_zkp]=/usr/local/nagios/libexec/check_zkpd

Similarly, entries need to be made for each check that is executed on the nodes.

In addition to the aforementioned plugins, checks must be in place for hardware, disk, CPU, and memory. You should check the number of processes running on a system by using the check_procs plugin, check the open ports by using check_tcp. Make sure that all the nodes have ntp running and that the time is synced by using check_ntp. All of these are provided as the standard Nagios system plugins, and they must be placed on each...

Summary


In this chapter, we looked at how to set up monitoring for the HDFS components, such as the HDFS space utilization, the number of DataNodes in a cluster, heap usage, replication, and the Zookeeper state. In the next chapter, we will look at checks and monitoring for the map reducing components, such as the JobTracker, the TaskTracker, and the various utilization parameters.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Monitoring Hadoop
Published in: Apr 2015Publisher: ISBN-13: 9781783281558
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Aman Singh

Gurmukh Singh is a seasoned technology professional with 14+ years of industry experience in infrastructure design, distributed systems, performance optimization, and networks. He has worked in big data domain for the last 5 years and provides consultancy and training on various technologies. He has worked with companies such as HP, JP Morgan, and Yahoo. He has authored Monitoring Hadoop by Packt Publishing
Read more about Aman Singh