Reader small image

You're reading from  Monitoring Hadoop

Product typeBook
Published inApr 2015
Publisher
ISBN-139781783281558
Edition1st Edition
Tools
Right arrow
Author (1)
Aman Singh
Aman Singh
author image
Aman Singh

Gurmukh Singh is a seasoned technology professional with 14+ years of industry experience in infrastructure design, distributed systems, performance optimization, and networks. He has worked in big data domain for the last 5 years and provides consultancy and training on various technologies. He has worked with companies such as HP, JP Morgan, and Yahoo. He has authored Monitoring Hadoop by Packt Publishing
Read more about Aman Singh

Right arrow

Chapter 7. Hive, HBase, and Monitoring Best Practices

In this chapter, we will look at the monitoring and metrics collection for Hive, HBase, and many more. In addition to this, we will look at best practices for tuning Nagios and other improvements, which will be really helpful in large enterprise setups.

The chapter is a build from the previous chapter on metrics collection and monitoring covered in the initial chapters.

The following topics will be covered in this chapter:

  • Hive monitoring

  • HBase monitoring

  • Metrics collections

  • Tuning and improvements for large setups of clusters

Hive monitoring


In Hadoop, Apache Hive is a data warehousing tool, similar to SQL. It provides a query layer on top of Hadoop, thus easing out the learning curve between the traditional DBAs using SQL and the Hadoop framework.

In Apache Hive, the query language is referred to as HiveQL; it contains Metastore, which can be embedded, implying that it is internal and stored in the default database called derby, or stored externally in an RDMS such as MySQL. External storage is considered a best practice, as it lets multiple users connect to Hive. In the embedded mode, only one user can connect to the Hive prompt.

It is very important to make sure that Hive components such as Metastore or host health are constantly monitored. There are few important things that need to be kept track of in Hive such as the following:

  • Hive Metastore health checks: Irrespective of whether Metastore is local or remote, it is important to monitor the health of Metastore. Important things to keep track of are as follows...

Hive metrics


Apache Hive provides very basic metrics for JVM profiling, which could be handy from the monitoring and performance aspects.

It makes sense to enable JMX when running the Hive thrift server by using the following code snippet:

JMX_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008"

With the thrift server, it actually executes hadoop jar and passes the option to JVM; $HIVE_OPTS must be set in the hive-env.sh file.

The Java package called org.apache.hadoop.hive.common.metrics can be tapped for Hive metrics collection.

HBase monitoring

HBase is a NoSQL database designed to work very well on a distributed framework such as Hadoop. It has the concept of master and slave servers (region servers) much like the Hadoop architecture. Being a database and holding large amounts of data makes its state consistent and performance optimal.

Knowing what's happening at a given time...

HBase Nagios monitoring


To monitor HBase master and region servers, there are Nagios plugins that can be downloaded from the exchange.nagios.org website and configured to monitor the HBase components. The plugins can be downloaded from https://github.com/harisekhon/nagios-plugins. As discussed in the earlier chapters, each check needs to be defined with a service on the Nagios master and a corresponding NRPE check must be configured on the client hosts. For example, the check_hbase_tables_jsp.pl check can be used to check for HBase connectivity and table states by using the JSP interface of the HBase server.

In addition to this, HBase comes with a tool called hbase hbck, which provides a lot of useful information about the state of the master and each region server. This command lists a lot of information about the tables; we can filter out the ROOT and META tables by using a custom plugin and use it to pull the status to Nagios.

As usual, first define a service for this in the Nagios server...

HBase metrics


HBase provides an interface to tap into the various metrics that it provides. The new improved Metrics2 system has a lot of metrics for looking into how the HBase components perform. The main motivation behind any metrics collection is to understand the behavior of the system, debug any issues, or give us a forecast for our requirements.

The HBase master, region server has a Metrics2 system to tap into and look for minute details in terms of its memory, CPU, and I/O parameters.

We can get metrics from many components in the case of HBase, as shown in the following diagram:

The collection method could be as simple as writing to a file or web UI or JMX or Ganglia. To collect metrics in any of the given forms, HBase must generate them first by using hadoop-metrics.properties by enabling the contexts per plugin.

These contexts could be RPC, region server-based, or JVM contexts; accordingly, the metrics will be generated either to a file or to the Ganglia gmetd daemon.

For region servers...

Monitoring best practices


Until now, we have talked about monitoring and metrics collection for Hadoop components, HBase, Hive, and many more. But, it is very important to understand what should be collected, else we might find it difficult to manage the data collected and extract any meaningful information from it.

It is good to enable logging, but at what level? Are we fine to log every event that is generated? Will that be helpful to us in any way? These are the questions we need to ask ourselves while designing a monitoring and logging system.

Some of the key points to keep in mind while designing a monitoring and metrics collection system are as follows:

  • How easily it can be scaled

  • How easily we can extract information from the system

  • What we should log and collect

  • How long should we keep the data

We cannot log or collect all the metrics; for example, let's say we have a 200-node cluster with HBase region servers. Let's say we collect 20 metrics per region, 500 regions live at a time, and...

The Filter class


To address the issue discussed above, Hadoop provides a filter class, which provides regular expressions to filter the metrics and make it more compact and meaningful as follows:

*.source.filter.class=org.apache.hadoop.metrics2.filter.GlobFilter
*.record.filter.class=${*.source.filter.class}
*.metric.filter.class=${*.source.filter.class}

The syntax of the filter class is explained as follows:

subsystem.[sink|source].sink_name.[sources|record|metric].filter.[include|exclude]
subsystem – daemon: hbase, yarn, hdfs, etc
sink|source – sink or source for feed
sink_name – name of sink used
sources|record|metric – level of filter to operate
include|exclude – will filter exclude or include metrics.

The filters can be applied at the level of source, record, or metrics and constructed with regex for trimming the information generated by the metrics system.

Nagios and Ganglia best practices


To make sure that the monitoring and metrics collection system is working at the optimal performance, it must be designed and tuned for it.

  • In the case of Nagios, make sure to have a right mix of active and passive checks for services.

  • The performance of the total number of checks deployed as active checks and the number of nodes on which they will be executed, depends upon the resources that the Nagios server has in terms of memory and CPU cores.

  • Also, the network plays an important role, as it important to understand how bandwidth monitoring will take place.

  • Other best practice is to always have a hierarchy of the Nagios configuration layouts. Make use of host groups, service, and templates and having groups for everything makes adding nodes very easy.

  • Define smart check rather than doing checks every minute. For example, doing a disk check every minute might not make sense, as it does not grow that often.

  • Optimize plugins so as to reduce the load on the system...

Summary


In this chapter, we looked at how to monitor Hive, HBase, and their metrics collection. We also looked at the best monitoring practices for the enterprise, in addition to the filtering of alerts.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Monitoring Hadoop
Published in: Apr 2015Publisher: ISBN-13: 9781783281558
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Aman Singh

Gurmukh Singh is a seasoned technology professional with 14+ years of industry experience in infrastructure design, distributed systems, performance optimization, and networks. He has worked in big data domain for the last 5 years and provides consultancy and training on various technologies. He has worked with companies such as HP, JP Morgan, and Yahoo. He has authored Monitoring Hadoop by Packt Publishing
Read more about Aman Singh