Reader small image

You're reading from  Securing Hadoop

Product typeBook
Published inNov 2013
Reading LevelIntermediate
PublisherPackt
ISBN-139781783285259
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Sudheesh Narayan
Sudheesh Narayan
author image
Sudheesh Narayan

Sudheesh Narayanan is a Technology Strategist and Big Data Practitioner with expertise in technology consulting and implementing Big Data solutions. With over 15 years of IT experience in Information Management, Business Intelligence, Big Data & Analytics, and Cloud & J2EE application development, he provided his expertise in architecting, designing, and developing Big Data products, Cloud management platforms, and highly scalable platform services. His expertise in Big Data includes Hadoop and its ecosystem components, NoSQL databases (MongoDB, Cassandra, and HBase), Text Analytics (GATE and OpenNLP), Machine Learning (Mahout, Weka, and R), and Complex Event Processing. Sudheesh is currently working with Genpact as the Assistant Vice President and Chief Architect – Big Data, with focus on driving innovation and building Intellectual Property assets, frameworks, and solutions. Prior to Genpact, he was the co-inventor and Chief Architect of the Infosys BigDataEdge product.
Read more about Sudheesh Narayan

Right arrow

Chapter 7. Security Event and Audit Logging in Hadoop

In Chapter 6, Securing Sensitive Data in Hadoop, we looked at the approach to secure sensitive data in a Hadoop cluster, and how we could implement block-level encryption to protect sensitive data. In this chapter, we look at security incidents and event monitoring that needs to be implemented in a secured Hadoop cluster. We then discussed the best practices in security procedures and policies that need to be adopted to secure the Hadoop ecosystem and how some of these policies can be configured as rules in the security event and audit logging system.

A Hadoop cluster in production hosts sensitive customer information. Security of data assets is of prime importance for organizations to have a successful big data journey. While we focus on ensuring that the Hadoop cluster is secured through various measures such as enforcing perimeter security, Kerberos authentication, and authorization, there is always a possibility of security breaches...

Security Incident and Event Monitoring in a Hadoop Cluster


A Security Incident and Event Monitoring (SIEM) system is responsible for collecting, monitoring, analyzing, and generating various security alerts for any suspicious activity in the cluster. SIEM systems usually collect the various system logs, network logs, and application logs to identify these security incidents and events. Hadoop itself can be used to perform the analysis and correlation of these security events in a batch mode.

The first step in any SIEM system is to collect the various system logs and identify corresponding events. The following are the events that need to be monitored in a Hadoop cluster to detect any security incidents:

  • User login and authorization events: User login events in a secured Hadoop cluster are generated when the end users or service principals authenticate themselves within the KDC or EIM system. krb5kdc.log for the KDC in the local Hadoop realm will contain the service login events. The central...

Setting up audit logging in a secured Hadoop cluster


To enable security event monitoring and auditing in Hadoop, we need to enable the logging framework to write the detailed audit trails in the logfile. Enabling detailed audit logs needs careful planning. These logs could grow very fast if there are continuous exceptions and could fill up the disk space. There should be a system monitoring this log growth and taking corrective actions such as cleaning and compressing. This can be done by configuring the Log4j.properties file in the Hadoop configuration directory. By default, the Hadoop security and audit logfile appenders are set to Null appenders and hence, disabled. This needs to be modified to reflect the correct logfile location for audit and security logs. We also need to enable the capture of the authentication logs from the local KDC.

Configuring Hadoop audit logs

The following configuration shows the audit and security logging configurations that need to be done on all the nodes...

Summary


In this chapter we looked at the general approach for identifying security incidents and events in a secured Hadoop cluster. The SIEM systems consists of a collection agent that gathers the events from the cluster and publishes them to the monitoring server. The monitoring server is configured with rules and policies that are applied on the collected events to generate security alerts and reports. We also looked at how we configure the audit and security logs for the various components in a secured Hadoop cluster.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Securing Hadoop
Published in: Nov 2013Publisher: PacktISBN-13: 9781783285259
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Sudheesh Narayan

Sudheesh Narayanan is a Technology Strategist and Big Data Practitioner with expertise in technology consulting and implementing Big Data solutions. With over 15 years of IT experience in Information Management, Business Intelligence, Big Data & Analytics, and Cloud & J2EE application development, he provided his expertise in architecting, designing, and developing Big Data products, Cloud management platforms, and highly scalable platform services. His expertise in Big Data includes Hadoop and its ecosystem components, NoSQL databases (MongoDB, Cassandra, and HBase), Text Analytics (GATE and OpenNLP), Machine Learning (Mahout, Weka, and R), and Complex Event Processing. Sudheesh is currently working with Genpact as the Assistant Vice President and Chief Architect – Big Data, with focus on driving innovation and building Intellectual Property assets, frameworks, and solutions. Prior to Genpact, he was the co-inventor and Chief Architect of the Infosys BigDataEdge product.
Read more about Sudheesh Narayan