Reader small image

You're reading from  Securing Hadoop

Product typeBook
Published inNov 2013
Reading LevelIntermediate
PublisherPackt
ISBN-139781783285259
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Sudheesh Narayan
Sudheesh Narayan
author image
Sudheesh Narayan

Sudheesh Narayanan is a Technology Strategist and Big Data Practitioner with expertise in technology consulting and implementing Big Data solutions. With over 15 years of IT experience in Information Management, Business Intelligence, Big Data & Analytics, and Cloud & J2EE application development, he provided his expertise in architecting, designing, and developing Big Data products, Cloud management platforms, and highly scalable platform services. His expertise in Big Data includes Hadoop and its ecosystem components, NoSQL databases (MongoDB, Cassandra, and HBase), Text Analytics (GATE and OpenNLP), Machine Learning (Mahout, Weka, and R), and Complex Event Processing. Sudheesh is currently working with Genpact as the Assistant Vice President and Chief Architect – Big Data, with focus on driving innovation and building Intellectual Property assets, frameworks, and solutions. Prior to Genpact, he was the co-inventor and Chief Architect of the Infosys BigDataEdge product.
Read more about Sudheesh Narayan

Right arrow

Chapter 6. Securing Sensitive Data in Hadoop

In Chapter 5, Integrating Hadoop with Enterprise Security Systems, we looked at integrating a secured Hadoop cluster with an Enterprise Identity Management system and enforce user authorization within Hadoop. User privileges are managed centrally and then synchronized with the secured Hadoop cluster. This enables enterprise users to access secured Hadoop services seamlessly. As an organization matures with their Big Data implementations, there is an increasing need to move sensitive information into the Hadoop ecosystem to generate valuable insights. Sensitive data in the cluster needs special protection and should be secured both at rest and in motion.

In this chapter, we look at how to secure sensitive data within a Hadoop ecosystem.

These are the topics we'll be covering in this chapter:

  • Securing sensitive data in Hadoop

  • Encrypting sensitive data in Hadoop

  • Implementing data encryption in Hadoop

Securing sensitive data in Hadoop


Sensitive data inside Hadoop can be classified into two high-level categories:

  • Sensitive data related to customers' personal information, customers' financial information, and so on that exists in enterprise systems and that needs to be brought to Hadoop for analysis.

  • The Hadoop analytical process generates sensitive insights after processing the data stored inside Hadoop. These insights are more valuable and sensitive compared to the raw source data that is used to generate them. For example, a retail e-commerce enterprise has detailed transactions of customer purchases. These transaction details might not be very sensitive. This data is brought to Hadoop for generating various insights. Using the customer historical purchases and correlating the same with customer's household purchases, insights related to customer purchase patterns, behavior patterns, customer sentiment, and customer life events could be inferred. This information is highly sensitive compared...

Summary


In this chapter, we looked at how to secure sensitive data in the Hadoop cluster. We looked at the approaches for encryption of data in motion while block-level encryption for data is at rest. We also looked at the MapReduce processing and ways to enforce data encryption on the input side, intermediate data, and the final results created by the MapReduce program. Encryption causes performance degradation and this has to be carefully evaluated so that only sensitive data is encrypted and secured.

In the next chapter, we will look at how to identify security incidents and events in a secured Hadoop cluster. And we will also look at how to implement auditing and logging of user activities in the Hadoop cluster.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Securing Hadoop
Published in: Nov 2013Publisher: PacktISBN-13: 9781783285259
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Sudheesh Narayan

Sudheesh Narayanan is a Technology Strategist and Big Data Practitioner with expertise in technology consulting and implementing Big Data solutions. With over 15 years of IT experience in Information Management, Business Intelligence, Big Data & Analytics, and Cloud & J2EE application development, he provided his expertise in architecting, designing, and developing Big Data products, Cloud management platforms, and highly scalable platform services. His expertise in Big Data includes Hadoop and its ecosystem components, NoSQL databases (MongoDB, Cassandra, and HBase), Text Analytics (GATE and OpenNLP), Machine Learning (Mahout, Weka, and R), and Complex Event Processing. Sudheesh is currently working with Genpact as the Assistant Vice President and Chief Architect – Big Data, with focus on driving innovation and building Intellectual Property assets, frameworks, and solutions. Prior to Genpact, he was the co-inventor and Chief Architect of the Infosys BigDataEdge product.
Read more about Sudheesh Narayan