Packt+ | Advance your knowledge in tech

You're reading from Securing Hadoop

Product typeBook

Published inNov 2013

Reading LevelIntermediate

PublisherPackt

ISBN-139781783285259

Edition1st Edition

Languages

Java

Tools

Hadoop

Concepts

Data Processing

Author (1)

Sudheesh Narayan

Chapter 2. Hadoop Security Design

In Chapter 1, Hadoop Security Overview, we discussed the security considerations for an end-to-end Hadoop-based Big Data ecosystem. In this chapter, we will narrow our focus and take a deep dive into the security design of the Hadoop platform. Hadoop security was implemented as part of the HADOOP-4487 Jira issue, starting in late 2009 (https://issues.apache.org/jira/browse/HADOOP-4487). Currently, there are efforts to implement SSO Authentication in Hadoop. This is currently not production-ready, and hence will be out of scope of this book.

Hadoop security implementation is based on Kerberos. So in this chapter, first we will be provided with a high-level overview of key Kerberos terminologies and concepts, and then we will look into the details of the Hadoop security implementation.

The following are the topics we'll be covering in this chapter:

What is Kerberos?
The Hadoop default security model
The Hadoop Kerberos security implementation

What is Kerberos?

In any distributed system, when two parties (the client and server) have to communicate over the network, the first step in this communication is to establish trust between these parties. This is usually done through the authentication process, where the client presents its password to the server and the server verifies this password. If the client sends passwords over an unsecured network, there is a risk of passwords getting compromised as they travel through the network.

Kerberos is a secured network authentication protocol that provides strong authentication for client/server applications without transferring the password over the network. Kerberos works by using time-sensitive tickets that are generated using the symmetric key cryptography. Kerberos is derived from the Greek mythology where Kerberos was the three-headed dog that guarded the gates of Hades. The three heads of Kerberos in the security paradigm are:

The user who is trying to authenticate.
The service to...

The Hadoop default security model without Kerberos

Now that we understand how the Kerberos security protocol works, let us look at the details of the Hadoop default security model and its limitations.

Hadoop implements a security model similar to the POSIX filesystem, which gives the ability to apply file permissions and restrict read-write access to files and directories in HDFS. The user and admin can use the chmod and chown commands to change the permissions and ownership of the file/directories, similar to the POSIX filesystem. Hadoop does not provide any user management functionality. It uses the operating system user within Hadoop.

By default, Hadoop doesn't support any authentication of users or Hadoop services. A user only authenticates with the operating system during the logon process. After that, when the user invokes the Hadoop command, the user ID and group is set by executing whoami and bash -c groups respectively. So if a user writes their own whoami script and adds it to the...

Hadoop Kerberos security implementation

Enforcing security within a distributed system such as Hadoop is complex. The detailed requirements for securing Hadoop were identified by Owen O'Malley and others as part of the Hadoop security design. The detailed document is attached with the ticket HADOOP-4487 at https://issues.apache.org/jira/browse/HADOOP-4487. A summary of these requirements is explained in this section.

User-level access controls

A brief on the user-level access controls is:

Users of Hadoop should only be able to access data that is authorized for them
Only authenticated users should be able to submit jobs to the Hadoop cluster
Users should be able to view, modify, and kill only their own jobs
Only authenticated services should be able to register themselves as DataNodes or TaskTracker
Data block access within DataNode needs to be secured, and only authenticated users should be able to access the data stored in the Hadoop cluster

Service-level access controls

Here's a gist of the service...

Summary

In this chapter, we looked at the Kerberos authentication protocol and understood the key concepts involved in implementing Kerberos. We understood the default security implementation in Hadoop and how a Hadoop process gets the logged in user and group details. The default security implementation has many gaps and can't be used in production.

In a production scenario, securing Hadoop with Kerberos is essential. So we looked at the requirements that Hadoop supports at the user and Hadoop service level to secure the Hadoop cluster. We looked at the various internal secret keys (Delegation Token, Block Access Token, and Job Token) that are exchanged by the various Hadoop processes to ensure a secured ecosystem. Understanding the need and use of these tokens is vital to debug and troubleshoot any configuration issues in a secured Hadoop cluster. In the next chapter we will detail the procedure for securing a Hadoop cluster.

The rest of the chapter is locked

You have been reading a chapter from

Securing Hadoop

Published in: Nov 2013Publisher: PacktISBN-13: 9781783285259

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Sudheesh Narayan

Sudheesh Narayanan is a Technology Strategist and Big Data Practitioner with expertise in technology consulting and implementing Big Data solutions. With over 15 years of IT experience in Information Management, Business Intelligence, Big Data & Analytics, and Cloud & J2EE application development, he provided his expertise in architecting, designing, and developing Big Data products, Cloud management platforms, and highly scalable platform services. His expertise in Big Data includes Hadoop and its ecosystem components, NoSQL databases (MongoDB, Cassandra, and HBase), Text Analytics (GATE and OpenNLP), Machine Learning (Mahout, Weka, and R), and Complex Event Processing. Sudheesh is currently working with Genpact as the Assistant Vice President and Chief Architect – Big Data, with focus on driving innovation and building Intellectual Property assets, frameworks, and solutions. Prior to Genpact, he was the co-inventor and Chief Architect of the Infosys BigDataEdge product.
Read more about Sudheesh Narayan

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages