Reader small image

You're reading from  Apache Hive Essentials. - Second Edition

Product typeBook
Published inJun 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788995092
Edition2nd Edition
Languages
Tools
Right arrow
Author (1)
Dayong Du
Dayong Du
author image
Dayong Du

Dayong Du has all his career dedicated to enterprise data and analytics for more than 10 years, especially on enterprise use case with open source big data technology, such as Hadoop, Hive, HBase, Spark, etc. Dayong is a big data practitioner as well as author and coach. He has published the 1st and 2nd edition of Apache Hive Essential and coached lots of people who are interested to learn and use big data technology. In addition, he is a seasonal blogger, contributor, and advisor for big data start-ups, co-founder of Toronto big data professional association.
Read more about Dayong Du

Right arrow

Security Considerations

For most open source software, security is a critical area to address before production release. As the leading SQL-like interface for Hadoop data, Hive must ensure that data is securely protected and accessed. For this reason, security in Hive is always considered an integral and important part of the ecosystem. The earlier version of Hive mainly relied on HDFS for security. The security of Hive gradually became mature after hiveserver2 was released.

This chapter will discuss Hive security in the following areas:

  • Authentication
  • Authorization
  • Mask and encryption

Authentication

Authentication is the process of verifying the identity of a user by obtaining the user's credentials. Hive has offered authentication since hiveserver2. In the old version of Hive, hiveserver1 does not support Kerberos authentication for thrift clients. As result, if we could access the host/port over the network, we could access the server. Instead, we can leverage the metastore server, which supports Kerberos, for authentication. In this section, we will briefly talk about authentication configurations in both the metastore server and hiveserver2.

Kerberos is a network authentication protocol developed by MIT as part of Project Athena. It uses time-sensitive tickets that are generated using symmetric key cryptography to securely authenticate a user in an unsecured network environment. Kerberos, in Greek mythology, was the three-headed dog that guarded the...

Authorization

Authorization is used to verify whether a user has permission to perform a certain action, such as creating, reading, or writing data or metadata. Hive provides three authorization modes: legacy mode, storage-based mode, and SQL standard-based mode.

Legacy mode

This is the default authorization mode in Hive, providing column- and row-level authorization through HQL statements. However, it is not a completely secure authorization mode and has a couple of limitations. It can be mainly used to prevent good users from accidentally doing bad things rather than preventing malicious user operations. In order to enable legacy authorization mode, we need to set the following properties in hive-site.xml:

<property>...

Mask and encryption

For sensitive and legally protected data, such as Personal Identity Information (PII) or Personal Confidential Information (PCI), it is necessary to store data in encrypted or masked format in the filesystem. Since Hive v0.13.0, its data security features have matured in the areas of data hashing, data masking, and data encryption/decryption functions.

The data-hashing function

Before masking data was supported, the built-in hash function has been an alternative since Hive v1.3.0. A hash function reads an input string and produces a fixed-size alphanumeric output string. Since the output is generally uniquely (very little chance of collision) mapping to the input string, the hashed value is quite often...

Summary

In this chapter, we introduced the Hive security areas of authentication, authorization, mask, and encryption. We covered authentications in the metastore server and hiveserver2. Then, we talked about default, storage-based, and SQL standard-based mode authorization. At the end of this chapter, we discussed various ways of applying data masks and security in Hive. After going through this chapter, you should be able to address security concerns with different authentication, authorization, and data-mask or security methods.

In the next chapter, we'll talk about using Hive with other tools in the big data ecosystem.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Apache Hive Essentials. - Second Edition
Published in: Jun 2018Publisher: PacktISBN-13: 9781788995092
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Dayong Du

Dayong Du has all his career dedicated to enterprise data and analytics for more than 10 years, especially on enterprise use case with open source big data technology, such as Hadoop, Hive, HBase, Spark, etc. Dayong is a big data practitioner as well as author and coach. He has published the 1st and 2nd edition of Apache Hive Essential and coached lots of people who are interested to learn and use big data technology. In addition, he is a seasonal blogger, contributor, and advisor for big data start-ups, co-founder of Toronto big data professional association.
Read more about Dayong Du