Hadoop Cluster Deployment

Hadoop Cluster Deployment
eBook: $20.99
Formats: PDF, PacktLib, ePub and Mobi formats
save 15%!
Print + free eBook + free PacktLib access to the book: $55.98    Print cover: $34.99
save 37%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Table of Contents
Sample Chapters
  • Choose the hardware and Hadoop distribution that best suits your needs
  • Get more value out of your Hadoop cluster with Hive, Impala, and Sqoop
  • Learn useful tips for performance optimization and security

Book Details

Language : English
Paperback : 126 pages [ 235mm x 191mm ]
Release Date : November 2013
ISBN : 1783281715
ISBN 13 : 9781783281718
Author(s) : Danil Zburivsky
Topics and Technologies : All Books, Big Data and Business Intelligence, Open Source

Table of Contents

Chapter 1: Setting Up Hadoop Cluster – from Hardware to Distribution
Chapter 2: Installing and Configuring Hadoop
Chapter 3: Configuring the Hadoop Ecosystem
Chapter 4: Securing Hadoop Installation
Chapter 5: Monitoring Hadoop Cluster
Chapter 6: Deploying Hadoop to the Cloud
  • Chapter 1: Setting Up Hadoop Cluster – from Hardware to Distribution
    • Choosing Hadoop cluster hardware
      • Choosing the DataNode hardware
      • Low storage density cluster
      • High storage density cluster
      • NameNode and JobTracker hardware configuration
        • The NameNode hardware
        • The JobTracker hardware
      • Gateway and other auxiliary services
      • Network considerations
      • Hadoop hardware summary
    • Hadoop distributions
      • Hadoop versions
      • Choosing Hadoop distribution
      • Cloudera Hadoop distribution
      • Hortonworks Hadoop distribution
      • MapR
    • Choosing OS for the Hadoop cluster
    • Summary
    • Chapter 2: Installing and Configuring Hadoop
      • Configuring OS for Hadoop cluster
        • Choosing and setting up the filesystem
        • Setting up Java Development Kit
        • Other OS settings
        • Setting up the CDH repositories
      • Setting up NameNode
        • JournalNode, ZooKeeper, and Failover Controller
        • Hadoop configuration files
        • NameNode HA configuration
        • JobTracker configuration
          • Configuring the job scheduler
        • DataNode configuration
          • TaskTracker configuration
          • Advanced Hadoop tuning
      • Summary
      • Chapter 3: Configuring the Hadoop Ecosystem
        • Hosting the Hadoop ecosystem
        • Sqoop
          • Installing and configuring Sqoop
          • Sqoop import example
          • Sqoop export example
        • Hive
          • Hive architecture
          • Installing Hive Metastore
          • Installing the Hive client
          • Installing Hive Server
        • Impala
          • Impala architecture
          • Installing Impala state store
          • Installing the Impala server
        • Summary
        • Chapter 4: Securing Hadoop Installation
          • Hadoop security overview
          • HDFS security
          • MapReduce security
          • Hadoop Service Level Authorization
          • Hadoop and Kerberos
            • Kerberos overview
            • Kerberos in Hadoop
              • Configuring Kerberos clients
              • Generating Kerberos principals
              • Enabling Kerberos for HDFS
              • Enabling Kerberos for MapReduce
          • Summary
          • Chapter 5: Monitoring Hadoop Cluster
            • Monitoring strategy overview
            • Hadoop Metrics
              • JMX Metrics
              • Monitoring Hadoop with Nagios
              • Monitoring HDFS
              • NameNode checks
              • JournalNode checks
              • ZooKeeper checks
            • Monitoring MapReduce
              • JobTracker checks
            • Monitoring Hadoop with Ganglia
            • Summary
            • Chapter 6: Deploying Hadoop to the Cloud
              • Amazon Elastic MapReduce
                • Installing the EMR command-line interface
                • Choosing the Hadoop version
                • Launching the EMR cluster
                  • Temporary EMR clusters
                  • Preparing input and output locations
              • Using Whirr
                • Installing and configuring Whirr
              • Summary

              Danil Zburivsky

              Danil Zburivsky is a database professional with a focus on open source technologies. Danil started his career as a MySQL database administrator and is currently working as a consultant at Pythian, a global data infrastructure management company. At Pythian, Danil was involved in building a number of Hadoop clusters for customers in financial, entertainment, and communication sectors. Danil's other interests include writing fun things in Python, robotics, and machine learning. He is also a regular speaker at various industrial events.
              Sorry, we don't have any reviews for this title yet.

              Submit Errata

              Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.


              - 3 submitted: last submission 16 Jun 2014

              Page no: 41 | Errata type: Code

               The location of the local directory is


              It should be


              Page no: 52 | Errata type: Code


              service hadoop-0.20-mapreduce-tasktracker start

              instead of

              service hadoop-0.20-mapreduce-tasktracker

              Page no: 24 | Errata type: Technical

              ext4 filesystem example should be:
              /dev/sda1 /disk1 ext4 noatime,nodiratime 1 2
              /dev/sdb1 /disk2 ext4 noatime,nodiratime 1 2

              Sample chapters

              You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

              Frequently bought together

              Hadoop Cluster Deployment +    Mastering Web Application Development with AngularJS =
              50% Off
              the second eBook
              Price for both: $33.45

              Buy both these recommended eBooks together and get 50% off the cheapest eBook.

              What you will learn from this book

              • Choose the optimal hardware configuration for your Hadoop cluster
              • Decipher the differences between various Hadoop versions and distributions
              • Make your cluster crash-proof with Namenode High Availability
              • Learn tips and tricks for Jobtracker, Tasktracker, and Datanodes
              • Discover the most important Hadoop ecosystem projects
              • Get more value out of your cluster by using SQL with Hive and real-time query processing with Impala
              • Set up a proper permissions model for your cluster
              • Secure Hadoop with Kerberos
              • Deploy a Hadoop cluster in a cloud environment


              In Detail

              Big Data is the hottest trend in the IT industry at the moment. Companies are realizing the value of collecting, retaining, and analyzing as much data as possible. They are therefore rushing to implement the next generation of data platform, and Hadoop is the centerpiece of these platforms.

              This practical guide is filled with examples which will show you how to successfully build a data platform using Hadoop. Step-by-step instructions will explain how to install, configure, and tie all major Hadoop components together. This book will allow you to avoid common pitfalls, follow best practices, and go beyond the basics when building a Hadoop cluster.

              This book will walk you through the process of building a Hadoop cluster from the ground up. By using practical examples and command samples, you will be able to get a cluster up and running in no time, and you will also gain a deep understanding of how various Hadoop components work and interact with each other.

              You will learn how to pick the right hardware for different types of Hadoop clusters and about the differences between various Hadoop distributions. By the end of this book, you will be able to install and configure several of the most popular Hadoop ecosystem projects including Hive, Impala, and Sqoop, and you will also be given a sneak peek into the pros and cons of using Hadoop in the cloud.


              This book is a step-by-step tutorial filled with practical examples which will show you how to build and manage a Hadoop cluster along with its intricacies.

              Who this book is for

              This book is ideal for database administrators, data engineers, and system administrators, and it will act as an invaluable reference if you are planning to use the Hadoop platform in your organization. It is expected that you have basic Linux skills since all the examples in this book use this operating system. It is also useful if you have access to test hardware or virtual machines to be able to follow the examples in the book.

              Code Download and Errata
              Packt Anytime, Anywhere
              Register Books
              Print Upgrades
              eBook Downloads
              Video Support
              Contact Us
              Awards Voting Nominations Previous Winners
              Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
              Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software