Packt+ | Advance your knowledge in tech

You're reading from HBase Administration Cookbook

Product typeBook

Published inAug 2012

PublisherPackt

ISBN-139781849517140

Edition1st Edition

Tools

Hadoop HBase

Concepts

Database Administration

Author (1)

Yifeng Jiang

Setting up Hadoop

A fully distributed HBase runs on top of HDFS. As a fully distributed HBase cluster installation, its master daemon (HMaster) typically runs on the same server as the master node of HDFS (NameNode), while its slave daemon (HRegionServer) runs on the same server as the slave node of HDFS, which is called DataNode.

Hadoop MapReduce is not required by HBase. MapReduce daemons do not need to be started. We will cover the setup of MapReduce in this recipe too, in case you like to run MapReduce on HBase. For a small Hadoop cluster, we usually have a master daemon of MapReduce (JobTracker) run on the NameNode server, and slave daemons of MapReduce (TaskTracker) run on the DataNode servers.

This recipe describes the setup of Hadoop. We will have one master node (master1) run NameNode and JobTracker on it. We will set up three slave nodes (slave1 to slave3), which will run DataNode and TaskTracker on them, respectively.

Getting ready

You will need four small EC2 instances, which can be obtained by using the following command:

$ec2-run-instances ami-77287b32 -t m1.small -n 4 -k your-key-pair

All these instances must be set up properly, as described in the previous recipe, Getting ready on Amazon EC2. Besides the NTP and DNS setups, Java installation is required by all servers too.

We will use the hadoop user as the owner of all Hadoop daemons and files. All Hadoop files and data will be stored under /usr/local/hadoop. Add the hadoop user and create a /usr/local/hadoop directory on all the servers, in advance.

We will set up one Hadoop client node as well. We will use client1, which we set up in the previous recipe. Therefore, the Java installation, hadoop user, and directory should be prepared on client1 too.

How to do it...

Here are the steps to set up a fully distributed Hadoop cluster:

1. In order to SSH log in to all nodes of the cluster, generate the hadoop user's public key on the master node:
```
hadoop@master1$ ssh-keygen -t rsa -N ""
```
- This command will create a public key for the hadoop user on the master node, at ~/.ssh/id_rsa.pub.

2. On all slave and client nodes, add the hadoop user's public key to allow SSH login from the master node:

hadoop@slave1$ mkdir ~/.ssh
hadoop@slave1$ chmod 700 ~/.ssh
hadoop@slave1$ cat >> ~/.ssh/authorized_keys

3. Copy the hadoop user's public key you generated in the previous step, and paste to ~/.ssh/authorized_keys. Then, change its permission as following:
```
hadoop@slave1$ chmod 600 ~/.ssh/authorized_keys
```
4. Get the latest, stable, HBase-supported Hadoop release from Hadoop's official site, http://www.apache.org/dyn/closer.cgi/hadoop/common/. While this chapter was being written, the latest HBase-supported, stable Hadoop release was 1.0.2. Download the tarball and decompress it to our root directory for Hadoop, then add a symbolic link, and an environment variable:
```
hadoop@master1$ ln -s hadoop-1.0.2 current
hadoop@master1$ export HADOOP_HOME=/usr/local/hadoop/current
```

5. Create the following directories on the master node:

hadoop@master1$ mkdir -p /usr/local/hadoop/var/dfs/name
hadoop@master1$ mkdir -p /usr/local/hadoop/var/dfs/data
hadoop@master1$ mkdir -p /usr/local/hadoop/var/dfs/namesecondary

6. You can skip the following steps if you don't use MapReduce:
```
hadoop@master1$ mkdir -p /usr/local/hadoop/var/mapred
```

7. Set up JAVA_HOME in Hadoop's environment setting file (hadoop-env.sh):

hadoop@master1$ vi $HADOOP_HOME/conf/hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.6

8. Add the hadoop.tmp.dir property to core-site.xml:

hadoop@master1$ vi $HADOOP_HOME/conf/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/var</value>
</property>

9. Add the fs.default.name property to core-site.xml:

hadoop@master1$ vi $HADOOP_HOME/conf/core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://master1:8020</value>
</property>

10. If you need MapReduce, add the mapred.job.tracker property to mapred-site.xml:

hadoop@master1$ vi $HADOOP_HOME/conf/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>master1:8021</value>
</property>

11. Add a slave server list to the slaves file:

hadoop@master1$ vi $HADOOP_HOME/conf/slaves
slave1
slave2
slave3

12. Sync all Hadoop files from the master node, to client and slave nodes. Don't sync ${hadoop.tmp.dir} after the initial installation:

hadoop@master1$ rsync -avz /usr/local/hadoop/ client1:/usr/local/hadoop/
hadoop@master1$ for i in 1 2 3
do rsync -avz /usr/local/hadoop/ slave$i:/usr/local/hadoop/
sleep 1
done

13. You need to format NameNode before starting Hadoop. Do it only for the initial installation:
```
hadoop@master1$ $HADOOP_HOME/bin/hadoop namenode -format
```

14. Start HDFS from the master node:

hadoop@master1$ $HADOOP_HOME/bin/start-dfs.sh

15. You can access your HDFS by typing the following command:
```
hadoop@master1$ $HADOOP_HOME/bin/hadoop fs -ls /
```
- You can also view your HDFS admin page from the browser. Make sure the 50070 port is opened. The HDFS admin page can be viewed at http://master1:50070/dfshealth.jsp:
16. Start MapReduce from the master node, if needed:
```
hadoop@master1$ $HADOOP_HOME/bin/start-mapred.sh
```
- Now you can access your MapReduce admin page from the browser. Make sure the 50030 port is opened. The MapReduce admin page can be viewed at http://master1:50030/jobtracker.jsp:
17. To stop HDFS, execute the following command from the master node:
```
hadoop@master1$ $HADOOP_HOME/bin/stop-dfs.sh
```
18. To stop MapReduce, execute the following command from the master node:
```
hadoop@master1$ $HADOOP_HOME/bin/stop-mapred.sh
```

How it works...

To start/stop the daemon on remote slaves from the master node, a passwordless SSH login of the hadoop user is required. We did this in step 1.

HBase must run on a special HDFS that supports a durable sync implementation. If HBase runs on an HDFS that has no durable sync implementation, it may lose data if its slave servers go down. Hadoop versions later than 0.20.205, including Hadoop 1.0.2 which we have chosen, support this feature.

HDFS and MapReduce use local filesystems to store their data. We created directories required by Hadoop in step 3, and set up the path to the Hadoop's configuration file in step 5.

In steps 9 to 11, we set up Hadoop so it could find HDFS, JobTracker, and slave servers. Before starting Hadoop, all Hadoop directories and settings need to be synced with the slave servers. The first time you start Hadoop (HDFS), you need to format NameNode. Note that you should only do this at the initial installation.

At this point, you can start/stop Hadoop using its start/stop script. Here we started/stopped HDFS and MapReduce separately, in case you don't require MapReduce. You can also use $HADOOP_HOME/bin/start-all.sh and stop-all.sh to start/stop HDFS and MapReduce using one command.

You have been reading a chapter from

HBase Administration Cookbook

Published in: Aug 2012Publisher: PacktISBN-13: 9781849517140

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Yifeng Jiang

Yifeng Jiang is a Hadoop and HBase Administrator and Developer at Rakutenthe largest e-commerce company in Japan. After graduating from the University of Science and Technology of China with a B.S. in Information Management Systems, he started his career as a professional software engineer, focusing on Java development. In 2008, he started looking over the Hadoop project. In 2009, he led the development of his previous company's display advertisement data infrastructure using Hadoop and Hive. In 2010, he joined his current employer, where he designed and implemented the Hadoop- and HBase-based, large-scale item ranking system. He is also one of the members of the Hadoop team in the company, which operates several Hadoop/HBase clusters
Read more about Yifeng Jiang

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages