Reader small image

You're reading from  Securing Hadoop

Product typeBook
Published inNov 2013
Reading LevelIntermediate
PublisherPackt
ISBN-139781783285259
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Sudheesh Narayan
Sudheesh Narayan
author image
Sudheesh Narayan

Sudheesh Narayanan is a Technology Strategist and Big Data Practitioner with expertise in technology consulting and implementing Big Data solutions. With over 15 years of IT experience in Information Management, Business Intelligence, Big Data & Analytics, and Cloud & J2EE application development, he provided his expertise in architecting, designing, and developing Big Data products, Cloud management platforms, and highly scalable platform services. His expertise in Big Data includes Hadoop and its ecosystem components, NoSQL databases (MongoDB, Cassandra, and HBase), Text Analytics (GATE and OpenNLP), Machine Learning (Mahout, Weka, and R), and Complex Event Processing. Sudheesh is currently working with Genpact as the Assistant Vice President and Chief Architect – Big Data, with focus on driving innovation and building Intellectual Property assets, frameworks, and solutions. Prior to Genpact, he was the co-inventor and Chief Architect of the Infosys BigDataEdge product.
Read more about Sudheesh Narayan

Right arrow

Chapter 5. Integrating Hadoop with Enterprise Security Systems

In the previous chapter, we looked at how to establish Kerberos authentication for the Hadoop ecosystem components. Establishing the authentication is only the first step towards providing secured access to the Hadoop ecosystem. In this chapter, we will focus on centrally managing the authentication and authorization of the various Hadoop users, and address the various challenges for integrating the Enterprise Security Systems with a secured Hadoop cluster.

Once Hadoop users are centrally managed, there is a need for these users to directly access and work on the Hadoop cluster. However, Hadoop service daemons use multiple communication protocols to communicate with each other. This requires multiple unsecured ports to be opened between the cluster machines. This brings in a security concern for the organization deploying Hadoop. So, usually, Hadoop clusters are isolated in a separate network and user access is only provided through...

Integrating Enterprise Identity Management systems


Typically, organizations have a central user identity management system known as Enterprise Identity Management (EIM) system using products such as IBM Tivoli Identity Manager, Oracle Identity Manager, and Windows Active Directory. Enterprise user's access privileges are centrally managed in these systems. These systems manage the user credentials and their roles using groups. User authorization is managed using these security groups. Users are assigned to groups, where each group has a specific authorization and access privilege defined. The user inherits group privileges based on their group membership.

By default, Hadoop uses the logged in Operating System (OS) users and the corresponding user groups to provide the authorization within Hadoop. Hadoop daemons (NameNode, DataNode, and so on) and ecosystem components such as Oozie, Hive, HBase uses these group memberships to determine the level of authorization allowed for the user. By default...

Accessing a secured Hadoop cluster from an enterprise network


Typical deployment architecture of a secured Hadoop cluster in an enterprise context is shown in the following diagram:

The Corporate Network is firewalled with the Hadoop cluster and connectivity is only provided through the EdgeNodes (also also known as Gateway Servers). The Gateway Server allows an entry point for external applications, tools, and users to the secured Hadoop cluster. It is deployed between the Hadoop cluster and the corporate network. As all users log in to this machine and the credentials for the user defined in this machine are used while accessing the Hadoop cluster, this node can be used to provide access control, policy enforcement, logging, and gateway services to the Hadoop environment. Depending on the number of users accessing the Hadoop cluster, there could be more than one Gateway Server in a Hadoop cluster.

Clients in the corporate network can't directly access the Hadoop cluster. They log in...

Summary


In this chapter, we looked at the various challenges for integrating a secured Hadoop cluster with Enterprise systems. One of the main concerns for organizations adopting Big Data is its security. Having the ability to manage Hadoop users' identity and authorizations centrally using existing EIM systems clears the first hurdle in the Big Data adoption journey. In this chapter, we looked at the implementation details for integrating EIM systems with the Hadoop KDC and how seamlessly Enterprise existing security process can be easily extended to Hadoop. Another big concern for organizations is usually around network security. In this chapter, we detailed out the implementation approach to enforce perimeter security around the Hadoop cluster and how to provide end users with seamless access from corporate networks to the secured Hadoop cluster using HttpFS, HUE, and Knox Gateway Server.

In the next chapter, we look at another important security concern for Big Data adoption and that...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Securing Hadoop
Published in: Nov 2013Publisher: PacktISBN-13: 9781783285259
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Sudheesh Narayan

Sudheesh Narayanan is a Technology Strategist and Big Data Practitioner with expertise in technology consulting and implementing Big Data solutions. With over 15 years of IT experience in Information Management, Business Intelligence, Big Data & Analytics, and Cloud & J2EE application development, he provided his expertise in architecting, designing, and developing Big Data products, Cloud management platforms, and highly scalable platform services. His expertise in Big Data includes Hadoop and its ecosystem components, NoSQL databases (MongoDB, Cassandra, and HBase), Text Analytics (GATE and OpenNLP), Machine Learning (Mahout, Weka, and R), and Complex Event Processing. Sudheesh is currently working with Genpact as the Assistant Vice President and Chief Architect – Big Data, with focus on driving innovation and building Intellectual Property assets, frameworks, and solutions. Prior to Genpact, he was the co-inventor and Chief Architect of the Infosys BigDataEdge product.
Read more about Sudheesh Narayan