Reader small image

You're reading from  Hadoop 2.x Administration Cookbook

Product typeBook
Published inMay 2017
PublisherPackt
ISBN-139781787126732
Edition1st Edition
Tools
Right arrow
Author (1)
Aman Singh
Aman Singh
author image
Aman Singh

Gurmukh Singh is a seasoned technology professional with 14+ years of industry experience in infrastructure design, distributed systems, performance optimization, and networks. He has worked in big data domain for the last 5 years and provides consultancy and training on various technologies. He has worked with companies such as HP, JP Morgan, and Yahoo. He has authored Monitoring Hadoop by Packt Publishing
Read more about Aman Singh

Right arrow

Adding nodes to the cluster


Over a period of time, our cluster will grow in data and there will be a need to increase the capacity of the cluster by adding more nodes.

We can add Datanodes to the cluster in the same way that we first configured the Datanode started the Datanode daemon on it. But the important thing to keep in mind is that all nodes can be part of the cluster. It should not be that anyone can just start a Datanode daemon on his laptop and join the cluster, as it will be disastrous. By default, there is nothing preventing any node being a Datanode, as the user has just to untar the Hadoop package and point the file "core-site.xml" to the Namenode and start the Datanode daemon.

Getting ready

For the following steps, we assume that the cluster that is up and running with Datanodes is in a healthy state and we need to add a new Datanode in the cluster. We will login to the Namenode and make changes there.

How to do it...

  1. ssh to Namenode and edit the file hdfs-site.xml to add the following property to it:

    <property>
    <name>dfs.hosts</name>
    <value>/home/hadoop/includes</value>
    <final>true</final>
    </property>
  2. Make sure the file includes is readable by the user hadoop.

  3. Restart the Namenode daemon for the property to take effect:

    $ hadoop-daemons.sh stop namenode
    $ hadoop-daemons.sh start namenode
    
  4. A restart of Namenode is required only when any property is changed in the file. Once the property is in place, Namenode can read the changes to the contents of the includes file by simply refreshing the nodes.

  5. Add the dn1.cluster1.com node to the file excludes:

    $ cat includes
    dn1.cluster1.com
    
  6. The file includes or excludes can contain a list of multiple nodes, one node per line.

  7. After adding the node to the file, we just need to reload the file by entering the following:

    $ hadoop dfsadmin -refreshNodes
    
  8. After some time, the node will be available in the cluster and can be seen:

    $ hdfs dfsadmin -report
    

How it works...

The file /home/hadoop/includes will contain a list of all the Datanodes that are allowed to join a cluster. If the file includes is blank, then all Datanodes are allowed to join the cluster. If there is both an include and exclude file, the list of nodes must be mutually exclusive in both the files. So, to decommission the node dn1.cluster.com from the cluster, it must be removed from the includes file and added to the excludes file.

There's more...

In addition to controlling the nodes as we described, there will be firewall rules in place and separate VLANs for Hadoop clusters to keep the traffic and data isolated.

Previous PageNext Chapter
You have been reading a chapter from
Hadoop 2.x Administration Cookbook
Published in: May 2017Publisher: PacktISBN-13: 9781787126732
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Aman Singh

Gurmukh Singh is a seasoned technology professional with 14+ years of industry experience in infrastructure design, distributed systems, performance optimization, and networks. He has worked in big data domain for the last 5 years and provides consultancy and training on various technologies. He has worked with companies such as HP, JP Morgan, and Yahoo. He has authored Monitoring Hadoop by Packt Publishing
Read more about Aman Singh