Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Monitoring Hadoop
Monitoring Hadoop

Monitoring Hadoop: Get to grips with the intricacies of Hadoop monitoring using the power of Ganglia and Nagios

By Aman Singh
R$147.99 R$80.00
Book Apr 2015 100 pages 1st Edition
eBook
R$147.99 R$80.00
Print
R$183.99
Subscription
Free Trial
eBook
R$147.99 R$80.00
Print
R$183.99
Subscription
Free Trial

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Apr 28, 2015
Length 100 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781783281558
Vendor :
Apache
Category :
Table of content icon View table of contents Preview book icon Preview Book

Monitoring Hadoop

Chapter 1. Introduction to Monitoring

In any enterprise, no matter how big or small, it is very important to monitor the health of all its components such as servers, network devices, databases, and so on, and make sure that things are working as intended. Monitoring is a critical part for any business that is dependent upon infrastructure. This can be done by giving signals to enable the necessary actions in case of any failures.

In a real production environment, monitoring can be very complex with many components and configurations. There might be different security zones, different ways in which servers are set up, or the same database might have been used in many different ways with servers listening to various service ports.

Before diving into setting up monitoring and logging for Hadoop, it is very important to understand the basics of monitoring, how it works, and some commonly used tools in the market. In Hadoop, we can monitor the resources, services, and also collect the metrics of various Hadoop counters. In this book, we will be looking at monitoring and metrics collection.

In this chapter, we will begin our journey by exploring the open source monitoring tools that we use in enterprises, and learn how to configure them.

The following topics will be covered in this chapter:

  • Some of the widely used monitoring tools

  • Installing and configuring Nagios

  • Installing and configuring Ganglia

  • Understanding how system logging works

The need for monitoring


If we have tested our code and found that the functionality and everything else is fine, then why do we need monitoring?

The production load might be different from what we tested and found, there could be human errors while conducting the day-to-days operations, someone could have executed a wrong command or added a wrong configuration. There could also be hardware/network failures that could make your application unavailable. How long can you afford to keep the application down? Maybe for a few minutes or for a few hours, but what about the revenue loss, or what if it is a critical application for carrying out financial transactions? We need to respond to the failures as soon as possible, and this can be done only if we perform early detections and send out notifications.

The monitoring tools available in the market


In the market, there are many tools are available for monitoring, but the important things to keep in mind are as follows:

  • How easy it is to deploy and maintain the tool

  • The license costs, but more importantly the TCO (Total Cost of Ownership)

  • Can it perform standard checks, and how easy is to write custom plugins

  • Overhead in terms of CPU and memory usage

  • User interface

Some of the monitoring tools available in the market are BandwidthD, EasyNetMonitor, Zenoss, NetXMS, Splunk, and many more.

Of the many tools available, Nagios and Ganglia are most widely deployed for monitoring the Hadoop clusters. Many Hadoop vendors, such as Cloudera and Hortonworks use Nagios and Ganglia for monitoring their clusters.

Nagios

Nagios is a powerful monitoring system that provides you with instant awareness about your organization's mission-critical IT infrastructure.

By using Nagios, you can do the following:

  • Plan the release cycle and the rollouts, before things are outdated

  • Early detection, before it causes an outage

  • Have automation and a better response across the organization

  • Find hindrances in the infrastructure, which could impact the SLAs

Nagios architecture

The Nagios architecture was designed keeping in mind flexibility and scalability. It consists of a central server, which is referred to as the Monitoring Server and the clients are the Nagios agents, that run on each node that needs to be monitored.

The checks can be performed for service, port, memory, disk, and so on, by using either active checks or passive checks. The active checks are initiated by the Nagios server and the passive checks are initiated by the client. Its flexibility allows us to have programmable APIs and customizable plugins for monitoring.

Prerequisites for installing and configuring Nagios

Nagios is an enterprise class monitoring solution, which can manage a large number of nodes. It can be scaled easily, and it has the ability to write custom plugins for your applications. Nagios is quite flexible and powerful, and it supports many configurations and components.

Tip

Nagios is such a vast and extensive product that this chapter is in no way a reference manual for it. This chapter is written with the primary aim of setting up monitoring, as quickly as possible, and familiarizing the readers with it.

Prerequisites

Always set up a separate host as the monitoring node/server and do not install other critical services on it. The number of hosts that are monitored can be a few thousand, with each host having from 15 to 20 checks that can be either active or passive.

Before starting with the installation of Nagios, make sure that Apache HTTP Server version 2.0 is running and gcc and gd have been installed. Make sure that you are logged in as root or as with sudo privileges. Nagios runs on many platforms, such as RHEL, Fedora, Windows, CentOS; however, in this book we will use the CentOS 6.5 platform.

$ ps -ef | grep httpd
$ service httpd status
$ rpm -qa | grep gcc
$ rpm -qa | grep gd

Installing Nagios

Let's look at the installation of Nagios, and how we can set it up. The following steps are for Rhel, CentOS, Fedora, and Ubuntu:

  • Download Nagios and the Nagios plugin from the Nagios repository, which can be found at http://www.nagios.org/download/.

  • The latest stable version of Naigos at the time of writing this chapter was nagios-4.0.8.tar.gz.

  • Create a Nagios user to manage the Nagios interface. You have to execute the commands as either root or with sudo privileges.

  • You can download it either from http://sourceforge.net/ or from any other commercial site, but a few sites might ask for registration.

    $ sudo /usr/sbin/useradd -m nagios
    $ passwd nagios
    
  • Create a new nagcmd group so that external commands can be submitted through the web interface.

  • If you prefer, you can download the file directly into the user's home directory.

  • Create a Nagios user and an Apache user, as a part of the group.

    $ sudo /usr/sbin/groupadd nagcmd
    $ sudo /usr/sbin/usermod -a -G nagcmd nagios
    $ sudo /usr/sbin/usermod -a -G nagcmd apache
    

Let's start with the configuration.

Navigate to the directory, where the package was downloaded. The downloaded package could be either in the Downloads folder or in the present working directory.

$ tar zxvf nagios-4.0.8.tar.gz
$ cd nagios-4.0.8/
$ ./configure –with-command-group=nagcmd

Tip

On Red Hat, the . /configure command might not work and might hang while displaying the message. So, add –enable-redhat-pthread-workaround to the . /configure command as a work-around for the preceding problem, as follows:

$ make all; sudo make install; sudo make install-init
$ sudo make install-config; sudo make install-commandmode

Web interface configuration

  • After installing Nagios, we need to do a minimal level of configuration. Explore the /usr/local/nagios/etc directory for a few samples.

  • Update /usr/local/nagios/etc/objects/contacts.cfg, with the e-mail address on which you want to receive the alerts.

  • Secondly, we need to configure the web interface through which we will monitor and manage the services. Install the Nagios web configuration file in the Apache configuration directory using the following command:

    $ sudo make install-webconf
    
  • The preceding command will work only in the extracted directory of the Nagios. Make sure that you have extracted Nagios from the TAR file and are in that directory.

  • Create an nagadm account for logging into the Nagios web interface using the following command:

    $ sudo htpasswd -c /usr/local/nagios/etc/htpasswd.users nagadm
    
  • Reload apache, to read the changes, using the following command:

    $ sudo service httpd restart
    $ sudo /etc/init.d/nagios restart
    
  • Open http://localhost/nagios/ in any browser on your machine.

If you see a message, such as Return code of 127 is out of bounds – plugin may be missing on the right panel, then this means that your configuration is correct as of now. This message indicates that the Nagios plugins are missing, and we will show you how to install these plugins in the next step.

Nagios plugins

Nagios provides many useful plug-ins to get us started with monitoring all the basics. We can write our custom checks and integrate it with other plug-ins, such as check_disk, check_load, and many more. Download the latest stable version of the plugins and then extract them. The following command lines help you in extracting and installing Nagios plugins:

$ tar zxvf nagios-plugins-2.x.x.tar.gz
$ cd nagios-plugins-2.x.x/
$ ./configure -–with-nagios-user=nagios -–with-nagios- group=nagios
$ make ; sudo make install

After the installation of the core and the plug-in packages, we will be ready to start nagios.

Verification

Before starting the Nagios service, make sure that there are no configuration errors by using the following command:

$ sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Start the nagios service by using the following command:

$ sudo service nagios start
$ sudo chkconfig --–add nagios; sudo chkconfig nagios on

Configuration files

There are many configuration files in Nagios, but the major ones are located under the /usr/local/nagios/etc directory:

Configuration File

Description

nagios.cfg

This controls the nagios behavior and contains the global directives.

cgi.cfg

This is the user interface configuration file.

resource.cfg

To safeguard any sensitive information, such as passwords, this file has been made in such a way that it is readable only by the nagios user.

The other configuration files under the /usr/local/nagios/etc/objects directory are described as follows:

Configuration File

Description

contacts.cfg

This contains a list of the users who need to be notified by the alerts.

commands.cfg

All the commands to check the services are defined here. Use Macros for command substitution.

localhost.cfg

This is a baseline file to define the other hosts whom you would like to monitor.

The nagios.conf file under /usr/local/nagios/etc/ is the main configuration file with various directives that define what all the files include. For example, cfg_dir=<directory_name>.

Nagios will recursively process all the configuration files in the subdirectories of the directory that you specify with this directive as follows:

cfg_dir=/usr/local/nagios/etc/commands
cfg_dir=/usr/local/nagios/etc/services
cfg_dir=/usr/local/nagios/etc/hosts

Setting up monitoring for clients

The Nagios server can do an active or a passive check. If the Nagios server proactively initiates a check, then it is an active check. Otherwise, it is a passive check.

The following are the steps for setting up monitoring for clients:

  1. Download NRPE addon from http://www.nagios.org and then install check_nrpe.

  2. Create a host and a service definition for the host to be monitored by creating a new configuration file, /usr/local/nagios/etc/objects/clusterhosts.cfg for that particular group of nodes.

Tip

Configuring a disk check

define host {

  use linux-server 
  host_name remotehost 
  alias Remote 
  Host address 192.168.0.1 
  contact_groups admins
}
Service definition sample:

define service {

  use generic-service
  service_description Root Partition 
  contact_groups admins 
  check_command check_nrpe!check_disk
}

Communication among NRPE components:

  • The NRPE on the server (check_nrpe) executes the check on the remote NRPE

  • The check is returned to the Nagios server through the NRPE on the remote host

On each of the client hosts, perform the following steps:

  1. Install the Nagios Plugins and the NRPE addon, as explained earlier.

  2. Create an account to run nagios from, which can be under any username.

    [client] # useradd nagios; passwd nagios
    
  3. Install nagios-plugin with the LD flags:

    [client] # tar xvfz nagios-plugins-2.x.x.tar.gz; cd nagios-plugins-2.x.x/
    [client]# export LDFLAGS=-ldl
    [client]# ./configure –with-nagios-user=nagios –with- nagios-group=nagios –enable-redhat-pthread-workaround 
    [client]# make; make install
    
  4. Change the ownership of the directories, where nagios was installed by the nagios user:

    [client]# chown nagios.nagios /usr/local/nagios
    [client]# chown -R nagios.nagios /usr/local/nagios/libexec/
    
  5. Install NRPE and run it as daemon:

    [client]# tar xvfz nrpe-2.x.tar.gz; cd nrpe-2.x
    [client]# ./configure; make all ;make install-plugin; make install-daemon; make install-daemon-config; make install-xinetd
    
  6. Start the service, after creating the /et/xinet.d/nrpe file with the IP of the server:

    [client#] service xinetd restart
    
  7. Modify the /usr/local/nagios/etc/nrpe.cfg configuration file:

     command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
    

After getting a good insight into Nagios, we are ready to understand its deployment in the Hadoop clusters.

The second tool that we will look into is Ganglia. It is a beautiful tool for aggregating stats and plotting them nicely. Nagios gives the events and alerts, Ganglia aggregates and presents them in a meaningful way. What if you want to look for the total CPU, memory per cluster of 2000 nodes or total free disk space on 1000 nodes? Plotting the CPU memory for one node is easy, but aggregating it for a group on a node requires a tool that can do this.

Ganglia

Ganglia is an open source, distributed monitoring platform for collecting metrics across the cluster. It can do aggregation on CPU, memory, disk I/O, and many more components across a group of nodes. There are alternate tools, such as Cacti and Munin, but Ganglia scales very well for large enterprises.

Some of the key features of Ganglia are as follows:

  • You can view historical and real time metrics of a single node or for an entire cluster

  • You can use the data to make decisions on the cluster sizing and the performance

Ganglia components

We will now discuss some components of Ganglia.

  • Ganglia Monitoring Daemon (gmond): It runs on the nodes that need to be monitored, and it captures the state change and sends updates to a central daemon by using XDR.

  • Ganglia Meta Daemon (gmetad): It collects data from gmond and the other gmetad daemons. The data is indexed and stored on the disk in a round robin fashion. There is also a Ganglia front-end for a meaningful display of the information collected.

Ganglia installation

Let's begin by setting up Ganglia, and see what the important parameters that need to be taken care of are. Ganglia can be downloaded from http://ganglia.sourceforge.net/. Perform the following steps to install Ganglia:

  1. Install gmond on the nodes that need to be monitored:

    $ sudo apt-get install ganglia-monitor
    Configure /etc/ganglia/gmond.conf
    globals {
      daemonize = yes
      setuid = yes
      user = ganglia
      debug_level = 0
      max_udp_msg_len = 1472
      mute = no
      deaf = no
      host_dmax = 0 
      cleanup_threshold = 600
      gexec = no
    send_metadata_interval = 0
    }
    udp_send_channel {
      host = gmetad.cluster1.com
      port = 8649
    }
    udp_recv_channel {
     port = 8649
    }
    tcp_accept_channel {
      port = 8649
    }
  2. Restart the Ganglia service:

    $ service ganglia-monitor restart
    
  3. Install gmetad on the master node. It can be downloaded from http://ganglia.sourceforge.net/:

    $ sudo apt-get install gmetad
    
  4. Update the gmetad.conf file, which tells you where it will collect the data from along with the data source:

    vi /etc/ganglia/gmetad.conf
    data_source "my cluster" 120 localhost
    
  5. Update the gmond.conf file on all the nodes so that they point to the master node, which has the same cluster name.

System logging


Logging is an important part of any application or a system, as it tells you about the progress, errors, states of services, security breaches, and repeated user failures, and this helps you in troubleshooting and analyzing these events. The important features about logs are collecting, transporting, storing, alerting, and analyzing the events.

Collection

Logs can be generated in many ways. They can be generated either through system facilities, such as syslog or through applications that can directly write their logs. In either case, the collection of the logs must be organized so that they can be easily retrieved when needed.

Transportation

The logs can be transferred from multiple nodes to a central location, so that instead of parsing logs on hundreds of servers individually, you can maintain them in an easy way by central logging. The size of the logs transferred across the network, and how often we need to transfer them, are also matters of concern.

Storage

The storage needs will depend upon the retention policy of the logs, and the cost will also vary according to the storage media or the location of storage, such as cloud storage or local storage.

Alerting and analysis

The logs collected need to be parsed and the alerts should be sent for any errors. The errors need to be detected in a speculated time frame and remediation should be provided.

Analyzing the logs to identify the traffic patterns of a website is important. The apache web server hosting a website and its logs needs to be analyzed, which IPs were visited, using which user agent or operating system. All of this information can be used to target advertisements at various sections of the internet user base.

The syslogd and rsyslogd daemons

The logging into the Linux system is controlled by the syslogd daemons and recently by rsyslogd daemons. There is one more logger called klogd, which logs kernel messages.

The syslogd is configured by /etc/syslogd.conf and the format of the file is defined as facility.priority log_location.

The logging facility and priority is described in the tables as follows:

Facility

Description

authpriv

These are the security / authorization messages.

cron

These are the clock daemons (atd and crond).

kern

These are the kernel messages.

local[0-7]

These are reserved for local use.

mail

This is the e-mail system.

The table shown here describes the priority:

Priority

Description

debug

This displays the debugging information.

info

This displays the general informative messages.

warning

This displays the warning messages.

err

This displays an error condition.

crit

This displays the critical condition.

alert

This displays an immediate action that is required.

emerg

This displays that the system is no longer available.

For example, the logging events for an e-mail event can be configured as follows:

mail.* /var/log/mail

This command logs all the e-mail messages to the /var/log/messages file.

Here's another example; start the logging daemon and it will start capturing the logs from the various daemons and applications. Use the following command to perform this action:

$ service syslogd/rsyslog restart

Note

In the versions released after RHEL 5 or Centos 5, syslog has been replaced by rsyslogd.

Summary


This chapter has built the base for monitoring, logging, and log collection. In this chapter, we talked about the monitoring concepts, and how we can setup Nagios and Ganglia for monitoring. We also discussed how the structure of the configuration files is, and how they can be segregated into various sections for the ease of use.

Using this as a baseline, we will move on to understand the Hadoop services, the ports used by Hadoop, and then configure monitoring for them in the upcoming chapters of this book.

In the next chapter, we will deal with the Hadoop daemons and services.

Left arrow icon Right arrow icon

Key benefits

What you will learn

Install Nagios and Ganglia and understand logging at the operating system level Create and configure Nagios nodes for monitoring with custom checks Monitor Hadoop daemons such as NameNode, DataNode, JobTracker, and so on Configure logs for various daemons and set up audits for the options done on the cluster Track important parameters for the File System, MapReduce, and other counters Set up Nagios master and client nodes with checks for the system and applications running on it Configure the Hadoop metrics collection and visualize it for nontechnical users Understand the communication between different daemons and protocols and the ports they use

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Apr 28, 2015
Length 100 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781783281558
Vendor :
Apache
Category :

Table of Contents

14 Chapters
Monitoring Hadoop Chevron down icon Chevron up icon
Credits Chevron down icon Chevron up icon
About the Author Chevron down icon Chevron up icon
About the Reviewers Chevron down icon Chevron up icon
www.PacktPub.com Chevron down icon Chevron up icon
Preface Chevron down icon Chevron up icon
Introduction to Monitoring Chevron down icon Chevron up icon
Hadoop Daemons and Services Chevron down icon Chevron up icon
Hadoop Logging Chevron down icon Chevron up icon
HDFS Checks Chevron down icon Chevron up icon
MapReduce Checks Chevron down icon Chevron up icon
Hadoop Metrics and Visualization Using Ganglia Chevron down icon Chevron up icon
Hive, HBase, and Monitoring Best Practices Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Filter icon Filter
Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.