Packt+ | Advance your knowledge in tech

You're reading from Hadoop 2.x Administration Cookbook

Product typeBook

Published inMay 2017

PublisherPackt

ISBN-139781787126732

Edition1st Edition

Tools

Hadoop

Concepts

System Administration

Author (1)

Aman Singh

Chapter 11. Troubleshooting, Diagnostics, and Best Practices

In this chapter, we will cover the following recipes:

Namenode troubleshooting
Datanode troubleshooting
Resourcemanager troubleshooting
Diagnose communication issues
Parse logs for errors
Hive troubleshooting
HBase troubleshooting
Hadoop best practices

Introduction

In this chapter, we will look at best practices and troubleshooting techniques for various components of Hadoop. The same can be used to troubleshoot any other service or application.

With distributed systems and the scale at which Hadoop operates, it can become cumbersome to troubleshoot it. In production, most will use log management and parsing tools such as Splunk and a combination of Ganglia, Nagios, or other tools for monitoring and alerting.

In this chapter, we will build the basics of troubleshooting skills and how we can quickly look for keywords, which will point the users to common errors in the Hadoop cluster. Users are encouraged to read this chapter after reading Chapter 8, Performance Tuning, to better relate and understand the recipes in this chapter.

Namenode troubleshooting

In this recipe, we will see how to find issues with Namenode and resolve them. As this is a recipe book, we will keep the theory to a minimum, but users must understand the moto behind the commands and how the mentioned tools work.

Getting ready

To step through the recipes in this chapter, make sure you have gone through the steps to install Hadoop cluster with HDFS and YARN enabled. Make sure to use Multi-node Hadoop cluster for better understanding and troubleshooting practice.

It is assumed that the user has basic knowledge about networking fundamentals, Linux commands, and filesystem.

How to do it...

Scenario 1: Namenode not starting due to permission issues on the Namenode directory.

Connect to the master1.cyrus.com master node in the cluster and change to user hadoop.
Try to write a test file to the location using the following command. If it succeeds, then the permissions are fine:
```
$ touch /data/namenode1/test
```
Otherwise, make sure the permission of the directory...

Datanode troubleshooting

In this recipe, we will look at some of the common issues with Datanode and how to resolve them.

Getting ready

The user is expected to complete the previous recipe and must have completed the Setting up multi-node HBase cluster recipe in Chapter 9, HBase Administration. In this recipe, we will be using the already configured Hadoop cluster.

How to do it...

Scenario 1: Datanode not starting due to permission issues on the Datanode directory specified by dfs.datanode.data.dir:

Connect to the dn1.cyrus.com master node in the cluster and change to user hadoop.
Try to write a test file to the location using the following command:
```
$ touch /space/dn1/test
```
If it succeeds, then the permissions are fine.
Otherwise, make sure the permissions of the directories pointed by dfs.datanode.data.dir are owned by the correct user. This is shown in the following screenshot:
The user could be hadoop or hdfs. Also, the directory permission is 755 for the top directory, as shown in the following...

Resourcemanager troubleshooting

In this recipe, we will look at common Resourcemanager issues and how these can be addressed.

Getting ready

To step through the recipe in this section, make sure the users have completed the Setting up multi-node HBase cluster recipe in Chapter 9, HBase Administration.

How to do it…

Scenario 1: Resourcemanager daemon not starting.

The Resourcemanager, by default, will bind to port 80030 to 80033 and 8088. These ports can be configured in the yarn-site.xml file and you should make sure these are unique and not used by any other service. In our labs, we used the ports as shown in the following screenshot:
The listening ports can be seen by using the following command:
```
$ netsta -tlpn
```
Look into the logs for any Bind Errors and make sure the hostname is resolvable. Check for both forward and reverse lookup:
```
$ nslookup <resource_manager_host>
```
On Node Manager, the import ports are 8040, 8041, and 8042. These are used for scheduling, localization, and so on. So,...

Diagnose communication issues

In this recipe, we will look at how to troubleshoot communication issues between nodes and how we can quickly find common errors.

Getting ready

To step through the recipe, the user must have completed the Setting up multi-node HBase cluster recipe in Chapter 9, HBase Administration and have gone through the previous recipes in this chapter. It is good to have a basic knowledge of the DNS and TCP communication.

How to do it...

Connect to the master1.cyrus.com master node in the cluster and switch to user hadoop.
The first thing is to check which connections are already established to the nodes. This can be seen with the following command, as shown here:
Check the reachability of nodes in the cluster using the following commands and also ensure reverse lookup for each host in the cluster:
```
$ ping master1.cyrus.com
$ ping dn1.cyrus.com
$ nslookup "IP of Namenode, RM and Datanodes"
```
If there is a reachability issue, check for firewall rules on any intermediate network devices...

Parse logs for errors

In this recipe, we will look at how to parse logs and quickly find errors. There are job logs, which are aggregated on HDFS, logs which include daemon logs, system logs, and so on.

We will look at some keywords and commands to find the errors in logs.

Getting ready

To complete the recipe, the user must have a running Hadoop cluster, must have completed the Setting up multi-node HBase cluster recipe in Chapter 9, HBase Administration, and know Bash or Perl/Python scripting basics.

How to do it...

Connect to the edge1.cyrus.com node in the cluster and switch to user hadoop. However, we can connect to any node in the cluster from which we can access the logs.
The location of the YARN logs on the cluster is exported as NFS export and mounted at location /logs/hadoop on the Edge node. Refer to the HDFS as NFS export recipe.
All the other logs, such as system and daemon logs, from the cluster are exported to the location /logs/system.
If the user is not from a Linux system background...

Hive troubleshooting

In this recipe, we will look at Hive troubleshooting steps and important keywords in the logs, which can help us to identify issues.

Getting ready

For this recipe, the user must have completed the Operating Hive with ZooKeeper recipe in Chapter 7, Data Ingestion and Workflow and have a basic understanding of database connectivity.

How to do it...

Connect to the edge1.cyrus.com Edge node and switch to user hadoop.
The Hive query logs location is defined by hive.querylog.location and the Hive server2 logs is defined by hive.server2.logging.operation.log.location.
As an example, if I try to query a table that does not exist, we can see the errors in the Hive log, as shown in the following screenshot:
Make it a good habit to read logs to troubleshoot, as logs will give hints about errors.
Make sure Hive is able to connect to the Hive metastore. To verify this, first connect manually, as shown here:
```
$ mysql –u Hadoop –h master1.cyrus.com -p
```
Make sure the user used in Hive Hadoop...

HBase troubleshooting

In this recipe, we will look at HBase troubleshooting and how to identify some of the common issues in the HBase cluster.

Getting ready

Make sure that the user has completed the Setting up multi-node HBase cluster recipe in Chapter 9, HBase Administration for this section, and the assumption is that HDFS and YARN are working fine. Refer to previous recipes to troubleshoot any issues with the Hadoop cluster, before starting troubleshooting of HBase.

How to do it...

Connect to the master1.cyrus.com master node and switch to user hadoop.
Firstly, make sure ZooKeeper is up and the ensemble is healthy, as shown in the following screenshot — this is only if an external ZooKeeper is used:
Rather than starting the entire cluster in one go, start each component one-by-one. Start hbase master using the following command:
```
$ hbase-daemon.sh start master
```
Quickly check which nodes and services the HBase master is talking to. In the following screenshot, we can see connections to ZooKeeper...

Hadoop best practices

In this section, we will cover some of the common best practices for the Hadoop cluster in terms of log management and troubleshooting tools.

These are not from a tuning perspective, but to make things easier to troubleshoot and diagnose.

Things to keep in mind:

Always enable logs for each daemon that runs in the Hadoop cluster. Keep the logging level to INFO and, when needed, change it to DEBUG. Once the troubleshooting is done, revert to level INFO.
Implement log rotation and retention polices to manage the logs.
Use tools such as Nagios to alert for any errors in the cluster before it becomes an issue.
Use log aggregation and analysis tools such as Splunk to parse logs.
Never co-locate the logs disk with other data disks in the cluster.
Use central configuration management systems such as Puppet or Chef to maintain consistent configuration across the cluster.
Schedule a benchmarking job to run every day on the cluster and proactively predict any bottlenecks. This can be...

The rest of the chapter is locked

You have been reading a chapter from

Hadoop 2.x Administration Cookbook

Published in: May 2017Publisher: PacktISBN-13: 9781787126732

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Aman Singh

Gurmukh Singh is a seasoned technology professional with 14+ years of industry experience in infrastructure design, distributed systems, performance optimization, and networks. He has worked in big data domain for the last 5 years and provides consultancy and training on various technologies. He has worked with companies such as HP, JP Morgan, and Yahoo. He has authored Monitoring Hadoop by Packt Publishing
Read more about Aman Singh

Other recommended products

Related to this chapter

Apache Hadoop 3 Quick Start Guide

Apache Hadoop is a widely used distributed data platform. It enables large datasets to be efficiently processed instead of using one large computer to store and process the data. This book will get you started with the Hadoop ecosystem, and introduce you to the main technical topics such as MapReduce, YARN and HDFS.

BookOct 2018220 pages

HBase High Performance Cookbook

BookJan 2017350 pages

Mastering Hadoop 3

This is a comprehensive guide to understand advanced concepts of Hadoop ecosystem. You will learn how Hadoop works internally, and build solutions to some of real world use cases. Finally, you will have a solid understanding of how components in the Hadoop ecosystem are effectively integrated to implement a fast and reliable Big Data pipeline

BookFeb 2019544 pages

Apache Hive Essentials

Apache Hive helps you deal with data summarization, queries, and analysis for huge amounts of data. This book will give you a background in big data, and familiarize you with your Hive working environment. Next you will cover advanced topics like performance and security in Hive and how to work efficiently to find solutions to big data problems.

BookJun 2018210 pages

Modern Big Data Processing with Hadoop

This book presents unique techniques to conquer different Big Data processing and analytics challenges using Hadoop. Practical examples are provided to boost your understanding of Big Data concepts and their implementation. By the end of the book, you will have all the knowledge and skills you need to become a true Big Data expert.

BookMar 2018394 pages

Mastering Apache Storm

With real-world examples and clear explanations, this book will ensure you will have a thorough mastery Apache Storm.You’ll get an understanding of deploying Storm on clusters. Introduce yourself to topics such as trident topology, monitoring, Storm Parallelism, scheduler and log processing. Learn how to integrate Storm with other well-known Big Data technologies such as HBase, Redis, Kafka, and Hadoop to realize the full potential of Storm.You will be able to use the knowledge to develop efficient, distributed real-time applications to cater to your business needs.

BookAug 2017284 pages

Data Lake for Enterprises

The term 'Data Lake' has recently emerged as a prominent term in the big data industry. Data scientists can make use of it in deriving meaningful insights which can be used by businesses to redefine or transform the way they operate. Lambda architecture is also emerging as one of the very eminent patterns in the big data landscape, as it helps to derive useful information from not only the historical data but also correlates real-time data to enable business for taking critical decisions. This book tries to bring these two important aspects into one, namely data lake and lambda architecture.

BookMay 2017596 pages

Personalised recommendations for you

Based on your interests and search pattern

Designing and Implementing Microsoft Azure Networking Solutions

Designing and Implementing Microsoft Azure Networking Solutions Exam Ref AZ-700 is an all-encompassing guide to the AZ-700 exam and contains all the information you need to succeed in the world of virtual networking with Azure. With this book, you will be fully prepared for the exam and the world of cloud networking.

BookAug 2023524 pages

Microsoft 365 Security, Compliance, and Identity Administration

The Microsoft 365 Security, Compliance, and Identity Administration is a comprehensive guide that helps you employ Microsoft 365's robust suite of features and empowers you to optimize your administrative tasks.

BookAug 2023630 pages

Zero Trust Overview and Playbook Introduction

Get started on Zero Trust with this step-by-step playbook and learn everything you need to know for a successful Zero Trust journey with tailored guidance for every role, covering strategy, operations, architecture, implementation, and measuring success. This book will become an indispensable reference for everyone in your organization.

BookOct 2023240 pages

The Self-Taught Cloud Computing Engineer

This self-study book helps you master multiple clouds, including AWS, Azure, and GCP, and serves as a roadmap to becoming a certified cloud computing expert. The book will guide you to develop a professional cloud career by helping you build a broad cloud knowledge base, developing hands-on cloud computing skills, and getting cloud certified.

BookSep 2023472 pages

Technology Operating Models for Cloud and Edge

This book will help you build and create ownership of a technology operating model, as well as connect your leadership with engineering and operations, keeping your internal and external customers in mind. It provides practical tips on why, where, and how to make the cloud and edge platform paradigm sing for you, your team, and your organization.

BookAug 2023228 pages

Azure Architecture Explained

Azure is the preferred platform to build mission-critical and secure apps. This book provides comprehensive coverage of essential Azure products, services, and solutions vital for every solution architect's success. Elevate your knowledge and master the critical components of Azure to excel in your role with Azure Architecture Explained.

BookSep 2023446 pages

Pentesting Active Directory and Windows-based Infrastructure

This practical guide helps you explore the pentesting of Microsoft infrastructure in detail, and enhances your offensive skillset by showing you the different ways to perform security assessment. This book will help blue teamers and IT engineers get up to speed with possible security issues they may encounter in their Windows environments.

BookNov 2023360 pages

Practical Ansible

In Practical Ansible, you'll work with the latest release of Ansible and learn to solve complex issues quickly with the help of task-oriented scenarios. You'll start by installing and configuring Ansible to automate monotonous and repetitive IT tasks and get to grips with concepts such as playbooks, inventories, plugins, collections, and network modules.

BookSep 2023420 pages

Windows 11 for Enterprise Administrators

Microsoft’s launch of Windows 11 is a step toward satisfying the enterprise administrator’s needs for better management and enhanced user experience customization. This book provides the enterprise administrator with the knowledge needed to fully utilize the advanced feature set of Windows 11 Enterprise.

BookOct 2023286 pages

The Linux DevOps Handbook

This book is for software and IT professionals seeking knowledge on Linux systems and DevOps practices. This book will provide you with guidance and tools to learn and gain proficiency in managing Linux-based infrastructures and knowledge of DevOps.

BookNov 2023428 pages2