Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Securing Hadoop

You're reading from  Securing Hadoop

Product type Book
Published in Nov 2013
Publisher Packt
ISBN-13 9781783285259
Pages 116 pages
Edition 1st Edition
Languages
Author (1):
Sudheesh Narayan Sudheesh Narayan
Profile icon Sudheesh Narayan

Chapter 3. Setting Up a Secured Hadoop Cluster

In Chapter 2, Hadoop Security Design, we looked at the internals of Hadoop security design and enabled ourselves to set up a secured Hadoop cluster. In this chapter, we will look at how to set up a Kerberos authentication and then get into the details of how to set up and configure a secure Hadoop cluster.

To set up a secured Hadoop cluster, we need to set up Kerberos authentication on the nodes. Kerberos authentication requires reverse DNS lookup to work on all the nodes as it uses the hostname to resolve the principal name. Once Kerberos is installed and configured, we set up the Hadoop service principals and all user principals. After that, we update the Hadoop configurations to enable Kerberos authentication on all nodes and start the Hadoop cluster.

These are the topics we'll be covering in this chapter:

  • Prerequisites for setting up a secure Hadoop cluster

  • Setting up Kerberos

  • Configuring Hadoop with Kerberos authentication

  • Configuring Hadoop...

Prerequisites


The following are the prerequisites for installing a secure Hadoop cluster:

  • Root or sudo access for the user installing the cluster.

  • Hadoop cluster is configured and running in a non-secured mode.

  • Proper file permissions are assigned to local and Hadoop system directories.

  • Incase, we are building Kerberos from the source code, we will need the GCC compiler to compile the Kerberos source code. On RHEL/CentOS, run the yum groupinstall 'Development Tools' command to install all the dependencies.

  • DNS resolutions and host mappings are working for all machines in the cluster. Kerberos doesn't work with IP. Reverse DNS lookup on all nodes should be working and returning the fully qualified hostname.

  • The ports required for Kerberos are port 88 for KDC and port 749 for admin services. Since all nodes will have to connect with KDC for authentication, port 88 should be open for all nodes in the cluster running the Hadoop daemons.

  • The name of the Kerberos realm that will be used for authenticating...

Setting up Kerberos


The first step in the process to establish a secure Hadoop cluster is to set up the Kerberos authentication and ensure that the Kerberos authentication for the Hadoop service principals are working for all the nodes on the cluster. To set up Kerberos, we establish a Kerberos Server (KDC) on a separate node and install the Kerberos client on all nodes of the Hadoop cluster as shown in the following figure:

The following figure illustrates the high-level steps involved in installing and configuring Kerberos. It also shows the various Kerberos utilities that are available.

We will use the following realm and domain for the rest of this chapter:

Domain name: mydomain.com

Realm name: MYREALM.COM

Installing the Key Distribution Center

To set up Kerberos, we need to install the Key Distribution Center (KDC) on a secured server.

On RHEL/CentOS/Fedora, to install Kerberos, run the following command with root privileges:

yum install krb5-server krb5-libs krb5-workstation

Detailed instructions...

Configuring Hadoop with Kerberos authentication


Once the Kerberos setup is completed and the user principals are added to KDC, we can configure Hadoop to use Kerberos authentication. It is assumed that a Hadoop cluster in a non-secured mode is configured and available. We will begin the configuration using Cloudera Distribution of Hadoop (CDH4).

The steps involved in configuring Kerberos authentication for Hadoop are shown in the following figure:

Setting up the Kerberos client on all the Hadoop nodes

In each of the Hadoop node (master node and slave node), we need to install the Kerberos client. This is done by installing the client packages and libraries on the Hadoop nodes.

For RHEL/CentOS/Fedora, we will use the following command:

yum install krb5-libs krb5-workstation

For Ubuntu, we will use the following command:

apt-get install krb5-user

Setting up Hadoop service principals

In CDH4, there are three users (hdfs, mapred, and yarn) that are used to run the various Hadoop daemons. All the...

Configuring users for Hadoop


All users required to run MapReduce jobs on the cluster need to be set up all the nodes in the cluster. In a large cluster, setting up these users will be very time consuming. So the best practice is to integrate the existing enterprise users in Active Directory or LDAP using cross-realm authentication in Kerberos.

Users are centrally managed in Active Directory or LDAP, and we set up a one-way cross-realm trust between Active Directory/LDAP and KDC on the cluster. Thus, the Hadoop service principal doesn't have to be set up in Active Directory/LDAP, and they authenticate locally on the cluster with KDC. This also ensures that the cluster load is isolated from the rest of the enterprise. We look at how to integrate Hadoop security with Enterprise Security Systems in subsequent chapters.

Automation of a secured Hadoop deployment


In a production environment, there are hundreds (sometimes even thousands) of nodes in a Hadoop cluster. Managing and configuring such a large cluster is not done manually as it is laborious and error prone. Traditionally, enterprises used Chef/Puppet or a similar solution for cluster configuration management and deployment, In this approach, organizations had to continuously update their chef recipes based on the changes in Apache Hadoop releases. Instead, organizations typically deploy Hadoop cluster deployment automation based on the Hadoop distribution they work with. For example, in a Cloudera-based Hadoop distribution, organizations leverages Cloudera Manager to provide cluster deployment. automation, and management capability. For Hortonworks-based distributions, organizations prefer Ambari. Similarly, Intel distribution has Intel Manager for Apache Hadoop. Each of these deployment managers support secured Hadoop deployment. The approach...

Summary


In this chapter, we looked at the steps to set up the Kerberos authentication protocol and how to add the required principals to the KDC. We then looked at the overall process of configuring the Hadoop security with Kerberos. The Hadoop configurations have to be replicated in all the nodes of the cluster. All users running MapReduce need to set up on all nodes of the cluster. Setting up users across the entire cluster nodes can be challenging and setting up an Active Directory- or LDAP-based authentication mechanism avoids the problem of manually creating the users in each of the cluster nodes.

In the next chapter, we will look at how we can configure Kerberos security for the rest of the Hadoop ecosystem such as Hive, WebHDFS, Oozie, and Flume.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Securing Hadoop
Published in: Nov 2013 Publisher: Packt ISBN-13: 9781783285259
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}