You're reading from Apache Hive Essentials. - Second Edition

Product typeBook

Published inJun 2018

Reading LevelIntermediate

PublisherPackt

ISBN-139781788995092

Edition2nd Edition

Languages

Java

Tools

Hive

Concepts

Data Analysis

Author (1)

Dayong Du

Setting Up the Hive Environment

This chapter will introduce how to install and set up the Hive environment in the cluster and cloud. It also covers the usage of basic Hive commands and the Hive integrated-development environment.

In this chapter, we will cover the following topics:

Installing Hive from Apache
Installing Hive from vendors
Using Hive in the cloud
Using the Hive command
Using the Hive IDE

Installing Hive from Apache

To introduce the Hive installation, we will use Hive version 2.3.3 as an example. The pre-installation requirements for this installation are as follows:

JDK 1.8
Hadoop 2.x.y
Ubuntu 16.04/CentOS 7

Since we focus on Hive in this book, the installation steps for Java and Hadoop are not provided here. For steps on installing them, please refer to https://www.java.com/en/download/help/download_options.xml and http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html.

The following steps describe how to install Apache Hive in the command-line environment:

Download Hive from Apache Hive and unpack it:

      $cd /opt
      $wget https://archive.apache.org/dist/hive/hive-2.3.3/apache-
      hive-2.3.3-bin.tar.gz
      $tar -zxvf apache-hive-2.3.3-bin.tar.gz
      $ln -sfn /opt/apache-hive-2.3.3 /opt/hive

Add the necessary system...

Installing Hive from vendors

Right now, many companies, such as Cloudera and Hortonworks, have packaged the Hadoop ecosystem and management tools into an easily manageable enterprise distribution. Each company takes a slightly different strategy, but the consensus for all of these packages is to make the Hadoop ecosystem easier and more stable for enterprise usage. For example, we can easily install Hive with the Hadoop management tools, such as Cloudera Manager (https://www.cloudera.com/products/product-components/cloudera-manager.html) or Ambari (https://ambari.apache.org/), which are packed in vendor distributions. Once the management tool is installed and started, we can add the Hive service to the Hadoop cluster with the following steps:

Log in to the Cloudera Manager/Ambari and click the Add a Service option to enter the Add Service Wizard
Choose the service to install...

Using Hive in the cloud

Right now, all major cloud service providers, such as Amazon, Microsoft, and Google, offer matured Hadoop and Hive as services in the cloud. Using the cloud version of Hive is very convenient. It requires almost no installation and setup. Amazon EMR (http://aws.amazon.com/elasticmapreduce/) is the earliest Hadoop service in the cloud. However, it is not a pure open source version since it is customized to run only on Amazon Web Services (AWS). Hadoop enterprise service and distribution providers, such as Cloudera and Hortonworks, also provide tools to easily deploy their own distributions on different public or private clouds. Cloudera Director (http://www.cloudera.com/content/cloudera/en/products-and-services/director.html) and Cloudbreak (https://hortonworks.com/open-source/cloudbreak/), open up Hadoop deployments in the cloud through a simple, self...

Using the Hive command

Hive first started with hiveserver1. However, this version of Hive server was not very stable. It sometimes suspended or blocked the client's connection quietly. Since v0.11.0, Hive has included a new thrift server called hivesever2 to replace hiveserver1. hiveserver2 has an enhanced server designed for multiple client concurrency and improved authentication. It also recommends using beeline as the major Hive command-line interface instead of the hive command. The primary difference between the two versions of servers is how the clients connect to them. hive is an Apache-Thrift-based client, and beeline is a JDBC client. The hive command directly connects to the Hive drivers, so we need to install the Hive library on the client. However, beeline connects to hiveserver2 through JDBC connections without installing Hive libraries on the client. That means...

Using the Hive IDE

Besides the command-line interface, there are other Integrated Development Environment (IDE) tools available to support Hive. One of the best is Oracle SQL Developer, which leverages the powerful functionalities of the Oracle IDE and is totally free to use. Since Oracle SQL Developer supports general JDBC connections, it is quite convenient to switch between Hive and other JDBC-supported databases in the same IDE. Oracle SQL Developer has supported Hive since v4.0.3. Configuring it to work with Hive is quite straightforward:

Download Oracle SQL Developer (http://www.oracle.com/technetwork/developer-tools/sql-developer/downloads/index.html).
Download the Hive JDBC drivers (https://www.cloudera.com/downloads/connectors/hive/jdbc.html).
Unzip the driver file to a local directory.
Start Oracle SQL Developer and navigate to Preferences | Database | Third Party JDBC...

Summary

In this chapter, we learned how to set up Hive in different environments. We also looked into a few examples of using Hive commands in both the command-line and the interactive mode for beeline and hive. Since it is quite productive to use IDE with Hive, we walked through the setup of Oracle SQL Developer for Hive. Now that you've finished this chapter, you should be able to set up your own Hive environment locally and use Hive.

In the next chapter, we will dive into the details of Hive's data definition languages.

The rest of the chapter is locked

You have been reading a chapter from

Apache Hive Essentials. - Second Edition

Published in: Jun 2018Publisher: PacktISBN-13: 9781788995092

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Dayong Du

Dayong Du has all his career dedicated to enterprise data and analytics for more than 10 years, especially on enterprise use case with open source big data technology, such as Hadoop, Hive, HBase, Spark, etc. Dayong is a big data practitioner as well as author and coach. He has published the 1st and 2nd edition of Apache Hive Essential and coached lots of people who are interested to learn and use big data technology. In addition, he is a seasonal blogger, contributor, and advisor for big data start-ups, co-founder of Toronto big data professional association.
Read more about Dayong Du

Other recommended products

Related to this chapter

Hadoop 2.x Administration Cookbook

A practical and use case driven approach to Hadoop administration with coverage on a vast array of topics including Hadoop cluster installation, performance tuning, cluster planning, security, and much more. This book covers Hadoop from the perspective of running clusters in critical and large environments with complex data and at scale.

BookMay 2017348 pages

Apache Hadoop 3 Quick Start Guide

Apache Hadoop is a widely used distributed data platform. It enables large datasets to be efficiently processed instead of using one large computer to store and process the data. This book will get you started with the Hadoop ecosystem, and introduce you to the main technical topics such as MapReduce, YARN and HDFS.

BookOct 2018220 pages

Modern Big Data Processing with Hadoop

This book presents unique techniques to conquer different Big Data processing and analytics challenges using Hadoop. Practical examples are provided to boost your understanding of Big Data concepts and their implementation. By the end of the book, you will have all the knowledge and skills you need to become a true Big Data expert.

BookMar 2018394 pages

Mastering Hadoop 3

This is a comprehensive guide to understand advanced concepts of Hadoop ecosystem. You will learn how Hadoop works internally, and build solutions to some of real world use cases. Finally, you will have a solid understanding of how components in the Hadoop ecosystem are effectively integrated to implement a fast and reliable Big Data pipeline

BookFeb 2019544 pages

Big Data Analytics with Hadoop 3

Apache Hadoop is the most popular platform for big data processing to build powerful analytics solutions. This book shows you how to do just that, with the help of practical examples. You will be well-versed with the analytical capabilities of Hadoop ecosystem with Apache Spark and Apache Flink to perform big data analytics by the end of this book.

BookMay 2018482 pages

Data Lake for Enterprises

The term 'Data Lake' has recently emerged as a prominent term in the big data industry. Data scientists can make use of it in deriving meaningful insights which can be used by businesses to redefine or transform the way they operate. Lambda architecture is also emerging as one of the very eminent patterns in the big data landscape, as it helps to derive useful information from not only the historical data but also correlates real-time data to enable business for taking critical decisions. This book tries to bring these two important aspects into one, namely data lake and lambda architecture.

BookMay 2017596 pages

Teradata Cookbook

Teradata, unlike other relational database management systems, is mainly popular because of its faster analytical processing and storage capabilities. This book unveils all the functionality offered by Teradata's analytical platform with the help of practical recipes. From installation to querying, indexing to loading utilities for analytical processing and data administration tasks, this book's solution-based approach will help you resolve problems you encounter while performing day-to-day activities with Teradata. By the end of the book, you'll be equipped with all the knowledge you need to be an expert in Teradata development and administration.

BookFeb 2018454 pages

Apache Spark Quick Start Guide

Apache Spark is a ?exible in-memory framework that allows processing of both batch and real-time data. Its unified engine has made it quite popular for big data use cases. This book will help you to quickly get started with Apache Spark 2.0 and write efficient big data applications for a variety of use cases.

BookJan 2019154 pages

HBase High Performance Cookbook

BookJan 2017350 pages

MySQL 8 for Big Data

MySQL is one of the most popular relational databases in the world today, and has become a popular choice of tool to handle vast amounts of structured data - that is, structured Big Data. This book will demonstrate how you can dabble with large amounts of data using MySQL 8. It also highlights topics such as integrating MySQL 8 and a Big Data solution like Apache Hadoop using different tools like Apache Sqoop and MySQL Applier. With practical examples and use-cases, you will get a better clarity on how you can leverage the offerings of MySQL 8 to build a robust Big Data solution.

BookOct 2017296 pages

Hands-On Data Science with SQL Server 2017

Learn how to utilize Microsoft SQL Server with NoSQL concepts for data science challenges. This book will help enhance your knowledge beyond data querying & processing tasks by implementing a data science pipeline. We will implement data science tasks and show how to use them on a day-to-day basis for efficient smart predictive models.

BookNov 2018506 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages