Packt+ | Advance your knowledge in tech

You're reading from Securing Hadoop

Product typeBook

Published inNov 2013

Reading LevelIntermediate

PublisherPackt

ISBN-139781783285259

Edition1st Edition

Languages

Java

Tools

Hadoop

Concepts

Data Processing

Author (1)

Sudheesh Narayan

Chapter 4. Securing the Hadoop Ecosystem

In Chapter 3, Setting Up a Secured Hadoop Cluster, we looked at how to set up Kerberos authentication for HDFS and MapReduce components within a secured Hadoop cluster. But in our secured Big Data journey, this is only half done. The Hadoop ecosystem consists of various components such as Hive, Oozie, and HBase. We need to secure all the other Hadoop ecosystem components. In this chapter, we will look at the each of the ecosystem components and the various security challenges for each of these components, and how to set up secured authentication and user authorization for each of them.

Each ecosystem component has its own security challenges and needs to be configured uniquely based on its architecture to secure them. Each of these ecosystem components has end users directly accessing the component or a backend service accessing the Hadoop core components (HDFS and MapReduce).

The following are the topics that we'll be covering in this chapter:

Configuring...

Configuring Kerberos for Hadoop ecosystem components

The Hadoop ecosystem is growing continuously and maturing with increasing enterprise adoption. In this section, we look at some of the most important Hadoop ecosystem components, their architecture, and how they can be secured.

Securing Hive

Hive provides the ability to run SQL queries over the data stored in the HDFS. Hive provides the Hive query engine that converts Hive queries provided by the user to a pipeline of MapReduce jobs that are submitted to Hadoop (JobTracker or ResourceManager) for execution. The results of the MapReduce executions are then presented back to the user or stored in HDFS. The following figure shows a high-level interaction of a business user working with Hive to run Hive queries on Hadoop:

There are multiple ways a Hadoop user can interact with Hive and run Hive queries; these are as follows:

The user can directly run the Hive queries using Command Line Interface (CLI). The CLI connects to the Hive metastore using...

Best practices for securing the Hadoop ecosystem components

We looked at different types of Hadoop ecosystem components and understood how to set up a secured Hadoop ecosystem with all these components. In this section, let us summarize these best practices as follows:

All services that are running within the Hadoop ecosystem need to be authenticated with KDC. This will ensure that there is no rogue process creating malicious activity.
It is a best practice to store the KDC credentials in an LDAP store, so that the credentials and authorizations can be centrally managed.
The keytab file needs to be secured, and only the user for whom the file is created should be provided with read access to the file.
Whenever a Java client is accessing the service, client authentication should be done by the service using RPC authentication mechanism.
Whenever user impersonation is used to impersonate an end user by the service user, the service process has to be fully secured by Kerberos and also the host running...

Summary

In this chapter, we looked at the steps that need to be adopted to set up various Hadoop ecosystem components. At the high level, the process involves creating the Kerberos principal for each of the components and then securing the keytab file under the user's home directory. If the service has to impersonate the end user, then the service principal is configured as superuser in Hadoop. Each ecosystem component has specific configuration that needs to be updated to support secured authentication with Kerberos. Some of the components such as Sqoop or Sqoop2, leave a certain amount of security hole when used in production. So these components have to be used with caution and deployed with additional security measures.

In the next chapter, we will look at how to integrate the authentication and authorization of these ecosystem components with the Enterprise Identity Management systems.

The rest of the chapter is locked

You have been reading a chapter from

Securing Hadoop

Published in: Nov 2013Publisher: PacktISBN-13: 9781783285259

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Sudheesh Narayan

Sudheesh Narayanan is a Technology Strategist and Big Data Practitioner with expertise in technology consulting and implementing Big Data solutions. With over 15 years of IT experience in Information Management, Business Intelligence, Big Data & Analytics, and Cloud & J2EE application development, he provided his expertise in architecting, designing, and developing Big Data products, Cloud management platforms, and highly scalable platform services. His expertise in Big Data includes Hadoop and its ecosystem components, NoSQL databases (MongoDB, Cassandra, and HBase), Text Analytics (GATE and OpenNLP), Machine Learning (Mahout, Weka, and R), and Complex Event Processing. Sudheesh is currently working with Genpact as the Assistant Vice President and Chief Architect – Big Data, with focus on driving innovation and building Intellectual Property assets, frameworks, and solutions. Prior to Genpact, he was the co-inventor and Chief Architect of the Infosys BigDataEdge product.
Read more about Sudheesh Narayan

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages