Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Mastering Hadoop 3

You're reading from  Mastering Hadoop 3

Product type Book
Published in Feb 2019
Publisher Packt
ISBN-13 9781788620444
Pages 544 pages
Edition 1st Edition
Languages
Authors (2):
Chanchal Singh Chanchal Singh
Profile icon Chanchal Singh
Manish Kumar Manish Kumar
Profile icon Manish Kumar
View More author details

Table of Contents (23) Chapters

Title Page
Dedication
About Packt
Foreword
Contributors
Preface
Journey to Hadoop 3 Deep Dive into the Hadoop Distributed File System YARN Resource Management in Hadoop Internals of MapReduce SQL on Hadoop Real-Time Processing Engines Widely Used Hadoop Ecosystem Components Designing Applications in Hadoop Real-Time Stream Processing in Hadoop Machine Learning in Hadoop Hadoop in the Cloud Hadoop Cluster Profiling Who Can Do What in Hadoop Network and Data Security Monitoring Hadoop Other Books You May Enjoy Index

Chapter 13. Who Can Do What in Hadoop

This chapter introduces you to security in the Hadoop ecosystem. When you are adopting Hadoop in your enterprise, then security becomes very important. You do not want people with unauthorized access to be able to reach the data stored in the HDFS File System. Security not only concerns a single aspect; you have to think about multiple aspects while securing your Hadoop enterprise application. Let's look at these aspects and understand what roles these play in securing Hadoop-based enterprise applications.

In this chapter, we will cover the following topics:

  • Different aspects of Hadoop security pillars
  • Security systems
  • Kerberos authentication
  • User authorization

Hadoop security pillars


Before designing security for your Hadoop cluster, you should be very clear about the different aspects of security that have to be incorporated. This section talks about the different pillars of security that are required to make your Hadoop cluster secure and threat-proof. We will discuss each of those pillars in brief in this section. Later, they will be elaborately discussed in different sections of this chapter. 

The following diagram gives you a glimpse of the Hadoop security pillars:

Before discussing the rings in the preceding diagram, we should understand one important aspect of securing the Hadoop cluster, and that is Security Administration. As a Hadoop security administrator, you have to perform the following activities at a high level:

  • Get or develop a means of centralized Hadoop security enablement in terms of automated scripts or security tools.
  • Get or develop a means of a centralized Hadoop monitoring and alerting system. This system should comply with...

System security


System security is mostly related to the Operating System (OS) security and remote Secure Shell (SSH) access to nodes. OS security consists of regular checking and resolution of OS security vulnerability by applying patches or workarounds. As an administrator, you need to be aware of OS vulnerabilities and new malware that's released by hackers. You should also be aware of the different security patches and workarounds for those vulnerabilities.

Note

The following link has details of the latest OS vulnerabilities and malware:https://www.cvedetails.com/

 

A Hadoop cluster consists of a variety of nodes with different profiles. Some are master nodes consisting of NameNodes and journal nodes. Some are worker nodes consisting of HDFS DataNodes and HBase region servers. Especially in the case of remote SSH access, your firewall rules may also vary as per profile types of nodes. However, the following table is specific to SSH access and what kind of roles should have SSH access to...

Kerberos authentication


There are various challenges involved in Big Data like storing, processing, and analysis, and managing and securing large data assets. When enterprises started implementing Hadoop, securing Hadoop from an enterprise context became challenging due to the distributed nature of the ecosystem and a wide range of applications that are placed on top of Hadoop. One of the key security considerations in securing Hadoop is authentication.

Kerberos is chosen by the Hadoop team as the component for implementing authentication in Hadoop. Kerberos is a secured network authentication protocol that brings in major authentication for client-server applications without transferring the password through the network. Kerberos implements time-sensitive tickets that are created using symmetric key cryptography, which was chosen over the most widely used SSL-based authentication.

Kerberos advantages

There are different advantages of Kerberos, which are as follows:

  • Better performance: Kerberos...

User authorization


Once the identity of the end user is established via Kerberos authentication, the next step in Hadoop security is to ensure what actions or services those established identities can perform. Authorization deals with that. In the following sections, we will look into how authorization rules can be established for different users across different services and how data is stored in HDFS. We will look into two different types of tools that facilitate centralized security policy management for authorization. Let's look into these in brief.

Ranger

The following diagram represents the architecture of the Ranger tool, which lets you centrally manage security policies for different Hadoop services:

As shown in the preceding diagram, all policies are centrally managed through an administrative web portal. The portal has three distinct parts, namely auditing, KMS, and the policy server. The policy server has several different functionalities. The following are the prime functions of...

List of security features that have been worked upon in Hadoop 3.0


The following is the a of the JIRAs that have been solved or worked upon with the Hadoop 3.0 release:

AliyunOSS: update oss-sdk version to 3.0.0 MajorResolvedFixed:

Summary


In this chapter, we have covered the Hadoop security pillars, and it should now be very clear about the different aspects of security that have to be incorporated along with system security and Kerberos authentication. We studied the different advantages of Kerberos, along with how Kerberos authentication flows. This chapter also talked about user authorization, which includes two different tools, namely Ranger and Sentry. Lastly, we were also introduced to the list of JIRAs that have been solved or worked upon with the Hadoop 3.0 release.

In the next chapter, we will study network and data security, which includes aspects like Hadoop networks, perimeter security, data encryption, data masking, row, and column level security.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Mastering Hadoop 3
Published in: Feb 2019 Publisher: Packt ISBN-13: 9781788620444
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}

Issue Type

Issue key

Issue ID

Parent ID

Summary

Bug

HADOOP-15866

13192984

 

Renamed HADOOP_SECURITY_GROUP_SHELL_COMMAND_TIMEOUT keys break compatibility

Task

HADOOP-15816

13189089

 

Upgrade Apache ZooKeeper version due to security concerns

Bug

HADOOP-15861

13192122

 

Move DelegationTokenIssuer to the right path

Bug

HADOOP-15523

13164833

 

Shell command timeout given is in seconds whereas it is taken as millisec while scheduling

Improvement

HADOOP-15609

13172372

 

Retry KMS calls when SSLHandshakeException occurs

Improvement

HADOOP-15804

13188359

 

Upgrade to commons-compress 1.18

Bug

HADOOP-15698

13181280

 

KMS log4j is not initialized properly at startup

Bug

HADOOP-15864

13192767

 

Job submitter/executor fail when SBN domain name can not resolved

Subtask

HADOOP-15607

13172317

12989378

AliyunOSS: fix duplicated...