Reader small image

You're reading from  Hadoop 2.x Administration Cookbook

Product typeBook
Published inMay 2017
PublisherPackt
ISBN-139781787126732
Edition1st Edition
Tools
Right arrow
Author (1)
Aman Singh
Aman Singh
author image
Aman Singh

Gurmukh Singh is a seasoned technology professional with 14+ years of industry experience in infrastructure design, distributed systems, performance optimization, and networks. He has worked in big data domain for the last 5 years and provides consultancy and training on various technologies. He has worked with companies such as HP, JP Morgan, and Yahoo. He has authored Monitoring Hadoop by Packt Publishing
Read more about Aman Singh

Right arrow

Distcp usage


In Hadoop, we deal with large data, so performing a simple copy operation might not be the optimal thing to do. Imagine copying a 1 TB file from one cluster to another, or within the same cluster to a different path, and after 50% of the copy operation it times out. In this situation, the copy has to be started from the beginning.

Getting ready

This recipe shows the steps needed to copy files within and across the cluster. Ensure that the user has a running cluster with YARN configured to run MapReduce, as discussed in Chapter 1, Hadoop Architecture and Deployment.

For this recipe, there is no configuration needed to run Distcp; just make sure HDFS and YARN is up and running.

How to do it...

  1. ssh to Namenode or the edge node and execute the following command to copy the projects directory to the new directory:

    $ hadoop distcp /projects /new
    
  2. The preceding command will submit a MapReduce job to the cluster, and once the job finishes we can see the data copied at the destination.

  3. We can...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Hadoop 2.x Administration Cookbook
Published in: May 2017Publisher: PacktISBN-13: 9781787126732

Author (1)

author image
Aman Singh

Gurmukh Singh is a seasoned technology professional with 14+ years of industry experience in infrastructure design, distributed systems, performance optimization, and networks. He has worked in big data domain for the last 5 years and provides consultancy and training on various technologies. He has worked with companies such as HP, JP Morgan, and Yahoo. He has authored Monitoring Hadoop by Packt Publishing
Read more about Aman Singh