Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Learning YARN

You're reading from  Learning YARN

Product type Book
Published in Aug 2015
Publisher
ISBN-13 9781784393960
Pages 278 pages
Edition 1st Edition
Languages

Table of Contents (20) Chapters

Learning YARN
Credits
About the Authors
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Starting with YARN Basics Setting up a Hadoop-YARN Cluster Administering a Hadoop-YARN Cluster Executing Applications Using YARN Understanding YARN Life Cycle Management Migrating from MRv1 to MRv2 Writing Your Own YARN Applications Dive Deep into YARN Components Exploring YARN REST Services Scheduling YARN Applications Enabling Security in YARN Real-time Data Analytics Using YARN Index

Chapter 5. Understanding YARN Life Cycle Management

The YARN framework consists of ResourceManager and NodeManager services. These services maintain different components of the life cycle associated with YARN such as an application, a container, a resource, and so on. This chapter focuses on the core implementation of YARN framework and describes how ResourceManager and NodeManager manage the application execution in a distributed environment.

It does not matter if you are a Java developer, an open source contributor, a cluster administrator, or a user; this chapter provides a simple and easy approach to gain YARN insights. In this chapter, we'll discuss the following topics:

  • Introduction to state management analogy

  • ResourceManager's view for a node, an application, an application attempt, and a container

  • NodeManager's view for an application, a container, and a resource

  • Analyzing transitions through logs

An introduction to state management analogy


Life cycle is an important phenomenon in event-driven implementation of components in any system. Components of the system pass through a predefined series of valid states. The transition across states is governed by events associated with the state and actions to be performed to address the event occurred.

Here are the some key terms that are used in this chapter:

  • State: In computer science, the state of a computer program is a technical term for all the stored information, at a given instance in time, to which the program has access.

  • Event: An event is an action or occurrence detected by the program that may be handled by the program. Typically, events are handled synchronously with the program flow, that is, the program has one or more dedicated places where events are handled.

  • Event handle: Handles are associated with the events that describe what would be the next state and store information for the process if a particular event occurred.

  • State...

The ResourceManager's view


Being the master service, the ResourceManager service manages the following:

  • Cluster resources (nodes in the cluster)

  • Applications submitted to the cluster

  • Attempt of running applications

  • Containers running on cluster nodes

The ResourceManager service has its own view for different processes associated with YARN management and application execution of YARN. The following is the view of ResourceManager:

  • Node: This is the machine with the NodeManager daemon

  • Application: This is the code submitted by any client to the ResourceManager

  • Application attempt: This attempt is associated with the execution of any application

  • Container: This is the process running the business logic of the submitted application

View 1 – Node

The node view of ResourceManager manages the life cycle for NodeManager nodes within a cluster. For every node in the cluster, the ResourceManager maintains an RMNode object. The states and event types of a node are defined in enumerations NodeState and RMNodeEventType...

The NodeManager's view


The NodeManager service in YARN updates its resource capabilities to the ResourceManager and tracks the execution of containers running on the node.

Other than the health of a node, the NodeManager service is responsible for the following:

  • Execution of an application and its associated containers

  • Provide localized resources for the execution of containers related to applications

  • Manage logs of different applications

The NodeManager service has its own view for the following:

  • Application: This manages the application's execution, logs, and resources

  • Container: This manages the execution of containers as a separate process

  • Localized resource: This involves the files required for the container's execution

View 1 – Application

NodeManager manages the life cycle of the application's containers and resources used during application execution. The NodeManager view of an application represents how NodeManager manages the container's execution, resources, and logs of the application...

Analyzing transitions through logs


Both YARN services, ResourceManager and NodeManager generate logs and store them in a .log file locally inside the folder specified using the HADOOP_LOGS_DIR variable. By default, the logs are stored in HADOOP_PREFIX/logs. All the state transitions in YARN are recorded in the log files. In this section, we'll cover few state transitions and the logs generated during those transitions.

Note

Setting the log level: Hadoop-YARN uses Apache Log4j library and it uses a log4j.properties file located in the configuration folder of the Hadoop-YARN bundle at HADOOP_PREFIX/etc/hadoop.

The Log4j library supports six log levels – TRACE, DEBUG, INFO, WARN, ERROR, and FATAL. A cluster administrator sets the log level for Hadoop-YARN services and the default log level is INFO. The hadoop.root.logger property is used to update the log level for Hadoop-YARN services. To read more about Apache Log4j library, you can refer to the official site at http://logging.apache.org/log4j...

Summary


In this chapter, we learned about the state management analogy of YARN and why it is important. We discussed about the ResourceManager and NodeManager views for the different processes associated with the YARN framework. This chapter provides core concepts about how YARN monitors and manages the resources or application execution over YARN. You can now easily scan the logs for ResourceManager or NodeManager and observe the messages during state transitions of a node, an application, or a container, and so on.

In the next chapter, we'll talk about the execution of MapReduce applications over a YARN cluster and how you can migrate from MRv1 to MRv2.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Learning YARN
Published in: Aug 2015 Publisher: ISBN-13: 9781784393960
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}