Reader small image

You're reading from  Apache Hive Essentials. - Second Edition

Product typeBook
Published inJun 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788995092
Edition2nd Edition
Languages
Tools
Right arrow
Author (1)
Dayong Du
Dayong Du
author image
Dayong Du

Dayong Du has all his career dedicated to enterprise data and analytics for more than 10 years, especially on enterprise use case with open source big data technology, such as Hadoop, Hive, HBase, Spark, etc. Dayong is a big data practitioner as well as author and coach. He has published the 1st and 2nd edition of Apache Hive Essential and coached lots of people who are interested to learn and use big data technology. In addition, he is a seasonal blogger, contributor, and advisor for big data start-ups, co-founder of Toronto big data professional association.
Read more about Dayong Du

Right arrow

Setting Up the Hive Environment

This chapter will introduce how to install and set up the Hive environment in the cluster and cloud. It also covers the usage of basic Hive commands and the Hive integrated-development environment.

In this chapter, we will cover the following topics:

  • Installing Hive from Apache
  • Installing Hive from vendors
  • Using Hive in the cloud
  • Using the Hive command
  • Using the Hive IDE

Installing Hive from Apache

To introduce the Hive installation, we will use Hive version 2.3.3 as an example. The pre-installation requirements for this installation are as follows:

  • JDK 1.8
  • Hadoop 2.x.y
  • Ubuntu 16.04/CentOS 7
Since we focus on Hive in this book, the installation steps for Java and Hadoop are not provided here. For steps on installing them, please refer to https://www.java.com/en/download/help/download_options.xml and http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html.

The following steps describe how to install Apache Hive in the command-line environment:

  1. Download Hive from Apache Hive and unpack it:
      $cd /opt
$wget https://archive.apache.org/dist/hive/hive-2.3.3/apache-
hive-2.3.3-bin.tar.gz

$tar -zxvf apache-hive-2.3.3-bin.tar.gz
$ln -sfn /opt/apache-hive-2.3.3 /opt/hive
  1. Add the necessary system...

Installing Hive from vendors

Right now, many companies, such as Cloudera and Hortonworks, have packaged the Hadoop ecosystem and management tools into an easily manageable enterprise distribution. Each company takes a slightly different strategy, but the consensus for all of these packages is to make the Hadoop ecosystem easier and more stable for enterprise usage. For example, we can easily install Hive with the Hadoop management tools, such as Cloudera Manager (https://www.cloudera.com/products/product-components/cloudera-manager.html) or Ambari (https://ambari.apache.org/), which are packed in vendor distributions. Once the management tool is installed and started, we can add the Hive service to the Hadoop cluster with the following steps:

  1. Log in to the Cloudera Manager/Ambari and click the Add a Service option to enter the Add Service Wizard

  2. Choose the service to install...

Using Hive in the cloud

Right now, all major cloud service providers, such as Amazon, Microsoft, and Google, offer matured Hadoop and Hive as services in the cloud. Using the cloud version of Hive is very convenient. It requires almost no installation and setup. Amazon EMR (http://aws.amazon.com/elasticmapreduce/) is the earliest Hadoop service in the cloud. However, it is not a pure open source version since it is customized to run only on Amazon Web Services (AWS). Hadoop enterprise service and distribution providers, such as Cloudera and Hortonworks, also provide tools to easily deploy their own distributions on different public or private clouds. Cloudera Director (http://www.cloudera.com/content/cloudera/en/products-and-services/director.html) and Cloudbreak (https://hortonworks.com/open-source/cloudbreak/), open up Hadoop deployments in the cloud through a simple, self...

Using the Hive command

Hive first started with hiveserver1. However, this version of Hive server was not very stable. It sometimes suspended or blocked the client's connection quietly. Since v0.11.0, Hive has included a new thrift server called hivesever2 to replace hiveserver1. hiveserver2 has an enhanced server designed for multiple client concurrency and improved authentication. It also recommends using beeline as the major Hive command-line interface instead of the hive command. The primary difference between the two versions of servers is how the clients connect to them. hive is an Apache-Thrift-based client, and beeline is a JDBC client. The hive command directly connects to the Hive drivers, so we need to install the Hive library on the client. However, beeline connects to hiveserver2 through JDBC connections without installing Hive libraries on the client. That means...

Using the Hive IDE

Besides the command-line interface, there are other Integrated Development Environment (IDE) tools available to support Hive. One of the best is Oracle SQL Developer, which leverages the powerful functionalities of the Oracle IDE and is totally free to use. Since Oracle SQL Developer supports general JDBC connections, it is quite convenient to switch between Hive and other JDBC-supported databases in the same IDE. Oracle SQL Developer has supported Hive since v4.0.3. Configuring it to work with Hive is quite straightforward:

  1. Download Oracle SQL Developer (http://www.oracle.com/technetwork/developer-tools/sql-developer/downloads/index.html).
  2. Download the Hive JDBC drivers (https://www.cloudera.com/downloads/connectors/hive/jdbc.html).
  3. Unzip the driver file to a local directory.
  4. Start Oracle SQL Developer and navigate to Preferences | Database | Third Party JDBC...

Summary

In this chapter, we learned how to set up Hive in different environments. We also looked into a few examples of using Hive commands in both the command-line and the interactive mode for beeline and hive. Since it is quite productive to use IDE with Hive, we walked through the setup of Oracle SQL Developer for Hive. Now that you've finished this chapter, you should be able to set up your own Hive environment locally and use Hive.

In the next chapter, we will dive into the details of Hive's data definition languages.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Apache Hive Essentials. - Second Edition
Published in: Jun 2018Publisher: PacktISBN-13: 9781788995092
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Dayong Du

Dayong Du has all his career dedicated to enterprise data and analytics for more than 10 years, especially on enterprise use case with open source big data technology, such as Hadoop, Hive, HBase, Spark, etc. Dayong is a big data practitioner as well as author and coach. He has published the 1st and 2nd edition of Apache Hive Essential and coached lots of people who are interested to learn and use big data technology. In addition, he is a seasonal blogger, contributor, and advisor for big data start-ups, co-founder of Toronto big data professional association.
Read more about Dayong Du