Reader small image

You're reading from  Apache Hive Essentials. - Second Edition

Product typeBook
Published inJun 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788995092
Edition2nd Edition
Languages
Tools
Right arrow
Author (1)
Dayong Du
Dayong Du
author image
Dayong Du

Dayong Du has all his career dedicated to enterprise data and analytics for more than 10 years, especially on enterprise use case with open source big data technology, such as Hadoop, Hive, HBase, Spark, etc. Dayong is a big data practitioner as well as author and coach. He has published the 1st and 2nd edition of Apache Hive Essential and coached lots of people who are interested to learn and use big data technology. In addition, he is a seasonal blogger, contributor, and advisor for big data start-ups, co-founder of Toronto big data professional association.
Read more about Dayong Du

Right arrow

Hivemall

Apache Hivemall (https://hivemall.incubator.apache.org/) is a collection of Hive UDFs for machine learning. It contains a number of ML algorithm implementations across classification, regression, recommendations, loss functions, and feature engineering, all as UDFs. This allows end users to use SQL and only SQL to apply machine learning algorithms to a large volume of training data. Perform the following steps to set it up:

  1. Download Hivemall from https://hivemall.incubator.apache.org/download.html and put it into HDFS:
      $ hdfs fs -mkdir -p /apps/hivemall
$ hdfs fs -put hivemall-all-xxx.jar /apps/hivemall
  1. Create permanent functions using script here (https://github.com/apache/incubator-hivemall/blob/master/resources/ddl/define-all-as-permanent.hive):
      > CREATE DATABASE IF NOT EXISTS hivemall; -- create a db for the 
udfs
> USE hivemall...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Apache Hive Essentials. - Second Edition
Published in: Jun 2018Publisher: PacktISBN-13: 9781788995092

Author (1)

author image
Dayong Du

Dayong Du has all his career dedicated to enterprise data and analytics for more than 10 years, especially on enterprise use case with open source big data technology, such as Hadoop, Hive, HBase, Spark, etc. Dayong is a big data practitioner as well as author and coach. He has published the 1st and 2nd edition of Apache Hive Essential and coached lots of people who are interested to learn and use big data technology. In addition, he is a seasonal blogger, contributor, and advisor for big data start-ups, co-founder of Toronto big data professional association.
Read more about Dayong Du