Reader small image

You're reading from  Practical Big Data Analytics

Product typeBook
Published inJan 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781783554393
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
Nataraj Dasgupta
Nataraj Dasgupta
author image
Nataraj Dasgupta

Nataraj Dasgupta is the vice president of advanced analytics at RxDataScience Inc. Nataraj has been in the IT industry for more than 19 years, and has worked in the technical and analytics divisions of Philip Morris, IBM, UBS Investment Bank, and Purdue Pharma. At Purdue Pharma, Nataraj led the data science division, where he developed the company's award-winning big data and machine learning platform. Prior to Purdue, at UBS, he held the role of Associate Director, working with high-frequency and algorithmic trading technologies in the foreign exchange trading division of the bank.
Read more about Nataraj Dasgupta

Right arrow

Installing Hadoop


There are several ways to install Hadoop. The most common ones are:

  1. Installing Hadoop from the source files from https://hadoop.apache.org
  2. Installing using open source distributions from commercial vendors such as Cloudera and Hortonworks

In this exercise, we will install the Cloudera Distribution of Apache Hadoop (CDH), an integrated platform consisting of several Hadoop and Apache-related products. Cloudera is a popular commercial Hadoop vendor that provides managed services for enterprise-scale Hadoop deployments in addition to its own release of Hadoop. In our case, we'll be installing the HDP Sandbox in a VM environment.

Installing Oracle VirtualBox

A VM environment is essentially a copy of an existing operating system that may have preinstalled software. The VM can be delivered in a single file, which allows users to replicate an entire machine by just launching a file instead of reinstalling the OS and configuring it to mimic another system. The VM operates in a self...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Practical Big Data Analytics
Published in: Jan 2018Publisher: PacktISBN-13: 9781783554393

Author (1)

author image
Nataraj Dasgupta

Nataraj Dasgupta is the vice president of advanced analytics at RxDataScience Inc. Nataraj has been in the IT industry for more than 19 years, and has worked in the technical and analytics divisions of Philip Morris, IBM, UBS Investment Bank, and Purdue Pharma. At Purdue Pharma, Nataraj led the data science division, where he developed the company's award-winning big data and machine learning platform. Prior to Purdue, at UBS, he held the role of Associate Director, working with high-frequency and algorithmic trading technologies in the foreign exchange trading division of the bank.
Read more about Nataraj Dasgupta