YARN Essentials

More Information
Learn
  • Understand how existing MapReduce applications can run on top of YARN and how they are backward compatible
  • Explore the YARN concepts, terminologies, architecture, key components, and interaction between the components
  • Set up a standalone and multi-node clustered YARN environment
  • Design, develop, and run different frameworks such as MapReduce, Apache Storm, Apache Tez, and Giraffe on top of YARN
  • Get to grips with the built-in support for multitenancy in YARN
  • Discover the motivation behind YARN's architecture design, implementations, and why YARN was needed
  • Learn how failures at each level are gracefully handled by the new framework to achieve fault tolerance and scalability
About

YARN is the next generation generic resource platform used to manage resources in a typical cluster and is designed to support multitenancy in its core architecture. As optimal resource utilization is central to the design of YARN, learning how to fully utilize the available fine-grained resources (RAM, CPU cycles, and so on) in the cluster becomes vital.

This book is an easy-to-follow, self-learning guide to help you start working with YARN. Beginning with an overview of YARN and Hadoop, you will dive into the pitfalls of Hadoop 1.x and how YARN takes us to the next level. You will learn the concepts, terminology, architecture, core components, and key interactions, and cover the installation and administration of a YARN cluster as well as learning about YARN application development with new and emerging data processing frameworks.

Features
  • Learn the inner workings of YARN and how its robust and generic framework enables optimal resource utilization across multiple applications
  • Get to grips with single and multi-node installation, administration, and real-time distributed application development
  • A step-by-step self-learning guide to help you perform optimal resource utilization in a cluster
Page Count 176
Course Length 5 hours 16 minutes
ISBN 9781784391737
Date Of Publication 24 Feb 2015

Authors

Amol Fasale

Amol Fasale has more than 4 years of industry experience actively working in the fields of big data and distributed computing; he is also an active blogger in and contributor to the open source community. Amol works as a senior data system engineer at MakeMyTrip.com, a very well-known travel and hospitality portal in India, responsible for real-time personalization of online user experience with Apache Kafka, Apache Storm, Apache Hadoop, and many more. Also, Amol has active hands-on experience in Java/J2EE, Spring Frameworks, Python, machine learning, Hadoop framework components, SQL, NoSQL, and graph databases.

You can follow Amol on Twitter at @amolfasale or on LinkedIn. Amol is very active on social media. You can catch him online for any technical assistance; he would be happy to help.

Amol has completed his bachelor's in engineering (electronics and telecommunication) from Pune University and postgraduate diploma in computers from CDAC.

Nirmal Kumar

Nirmal Kumar is a lead software engineer at iLabs, the R&D team at Impetus Infotech Pvt. Ltd. He has more than 8 years of experience in open source technologies such as Java, JEE, Spring, Hibernate, web services, Hadoop, Hive, Flume, Sqoop, Kafka, Storm, NoSQL databases such as HBase and Cassandra, and MPP databases such as Teradata.

You can follow him on Twitter at @nirmal___kumar. He spends most of his time reading about and playing with different technologies. He has also undertaken many tech talks and training sessions on big data technologies.

He has attained his master's degree in computer applications from Harcourt Butler Technological Institute (HBTI), Kanpur, India and is currently part of the big data R&D team in iLabs at Impetus Infotech Pvt. Ltd.