Reader small image

You're reading from  Apache Hive Essentials

Product typeBook
Published inFeb 2015
Reading LevelIntermediate
PublisherPackt
ISBN-139781783558575
Edition1st Edition
Languages
Right arrow
Author (1)
Dayong Du
Dayong Du
author image
Dayong Du

Dayong Du has all his career dedicated to enterprise data and analytics for more than 10 years, especially on enterprise use case with open source big data technology, such as Hadoop, Hive, HBase, Spark, etc. Dayong is a big data practitioner as well as author and coach. He has published the 1st and 2nd edition of Apache Hive Essential and coached lots of people who are interested to learn and use big data technology. In addition, he is a seasonal blogger, contributor, and advisor for big data start-ups, co-founder of Toronto big data professional association.
Read more about Dayong Du

Right arrow

Chapter 9. Security Considerations

In most open source software, security is one of the most important areas, but always addressed at a later stage. As the main SQL-like interface of data in Hadoop, Hive must ensure that data is securely protected and accessed. For this reason, security in Hive is now considered as an integral and important part of the Hadoop ecosystem. The earlier version of Hive mainly relied on the HDFS for security. The security of Hive gradually became mature after HiveServer2 was released as an important milestone of the Hive server.

This chapter will discuss Hive security in the following areas:

  • Authentication

  • Authorization

  • Encryption

Authentication


Authentication is the process of verifying the identity of a user by obtaining the user's credentials. Hive has offered authentication since HiveServer2. In the previous HiveServer, if we could access the host/port over the network, we could access the data. In this case, the Hive Metastore Server can be used to authenticate thrift clients using Kerberos. As mentioned in Chapter 2, Setting Up the Hive Environment, it is strongly recommended to upgrade the Hive server to HiveServer2 in terms of security and reliability. In this section, we will briefly talk about authentication configurations in both Metastore Server and HiveServer2.

Note

Kerberos

Kerberos is a network authentication protocol developed by MIT as part of Project Athena. It uses time-sensitive tickets that are generated using symmetric key cryptography to securely authenticate a user in an unsecured network environment. Kerberos is derived from Greek mythology, where Kerberos was the three-headed dog that guarded...

Authorization


Authorization in Hive is used to verify if a user has permission to perform a certain action, such as creating, reading, and writing data or metadata. Hive provides three authorization modes: legacy mode, storage-based mode, and SQL standard-based mode.

Legacy mode

This is the default authorization mode in Hive, providing column and row-level authorization through HQL statements. However, it is not a completely secure authorization mode and has a couple of limitations. It can be mainly used to prevent good users from accidentally doing bad things rather than preventing malicious users' operations. In order to enable the legacy authorization mode, we need to set the following properties in hive-site.xml:

<property>
  <name>hive.security.authorization.enabled</name>
  <value>true</value>
  <description>enables or disable the hive client authorization
  </description>
</property>
<property>
  <name>hive.security.authorization...

Encryption


For sensitive and legally protected data such as personal identity information (PII), it is required to store the data in encrypted format in the filesystem. However, Hive does not natively support encryption and decryption yet (see https://issues.apache.org/jira/browse/HIVE-5207).

Alternatively, we can look for third-party tools to encrypt and decrypt data after exporting it from Hive, but this requires additional postprocessing. The new HDFS encryption (see https://issues.apache.org/jira/browse/HDFS-6134) offers great transparent encryption and decryption of data on HDFS. It will satisfy our request if we want to encrypt the whole dataset in HDFS. However, it cannot be applied to the selected column and row level in the table of Hive, where most PII that is encrypted is only a part of raw data. In this case, the best solution for now is to use Hive UDF to plug in encryption and decryption implementations on selected columns or partial data in the Hive tables.

Sample UDF implementations...

Summary


In this chapter, we introduced three main areas for Hive security: authentication, authorization, and encryption. We covered the authentications in Metastore server and HiveServer2. Then, we talked about default, storage-based, and SQL standard-based authorization methods in HiveServer2. At the end of this chapter, we discussed the use of Hive UDF for encryption and decryption. After going through this chapter, we should clearly understand the different areas that will help us address Hive security.

In the next chapter, we'll talk about using Hive with other tools.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Apache Hive Essentials
Published in: Feb 2015Publisher: PacktISBN-13: 9781783558575
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Dayong Du

Dayong Du has all his career dedicated to enterprise data and analytics for more than 10 years, especially on enterprise use case with open source big data technology, such as Hadoop, Hive, HBase, Spark, etc. Dayong is a big data practitioner as well as author and coach. He has published the 1st and 2nd edition of Apache Hive Essential and coached lots of people who are interested to learn and use big data technology. In addition, he is a seasonal blogger, contributor, and advisor for big data start-ups, co-founder of Toronto big data professional association.
Read more about Dayong Du