HDInsight Essentials


HDInsight Essentials
eBook: $20.99
Formats: PDF, PacktLib, ePub and Mobi formats
$17.84
save 15%!
Print + free eBook + free PacktLib access to the book: $55.98    Print cover: $34.99
$34.99
save 37%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Overview
Table of Contents
Author
Support
Sample Chapters
  • Architect a Hadoop solution with a modular design for data collection, distributed processing, analysis, and reporting
  • Build a multi-node Hadoop cluster on Windows servers
  • Establish a Big Data solution using HDInsight with open source software, and provide useful Excel reports
  • Run Pig scripts and build simple charts using Interactive JS (Azure)

Book Details

Language : English
Paperback : 122 pages [ 235mm x 191mm ]
Release Date : September 2013
ISBN : 1849695369
ISBN 13 : 9781849695367
Author(s) : Rajesh Nadipalli
Topics and Technologies : All Books, Big Data and Business Intelligence, Other

Table of Contents

Preface
Chapter 1: Hadoop and HDInsight in a Heartbeat
Chapter 2: Deploying HDInsight on Premise
Chapter 3: HDInsight Azure Cloud Service
Chapter 4: Administering Your HDInsight Cluster
Chapter 5: Ingesting Data to Your Cluster
Chapter 6: Transforming Data in Cluster
Chapter 7: Analyzing and Reporting Your Data
Chapter 8: Project Planning Tips and Resources
Index
  • Chapter 1: Hadoop and HDInsight in a Heartbeat
    • Big Data – hype or real?
    • Apache Hadoop concepts
      • Core components
      • Hadoop cluster layout
      • The Hadoop ecosystem
        • Data access
        • Data processing
        • The Hadoop data store
        • Management and integration
      • Hadoop distributions
      • HDInsight distribution differentiator
      • End-to-end solution using HDInsight
        • Key phases of a Hadoop project
    • Summary
    • Chapter 2: Deploying HDInsight on Premise
      • HDInsight and Hadoop relationship
      • Deployment options for on-premise
        • Windows HDInsight server
        • Hortonworks Data Platform (HDP for Windows)
        • Supported platforms for on-premise install
      • Single-node install
        • Downloading the software
        • Running the install wizard
        • Validating the install
      • Multinode planning and preparation
        • Setting up the network
        • Setting common time on all nodes
        • Setting up remote scripting
        • Configuring firewall ports
      • Multinode installation
        • Downloading the software
        • Configuring the multinode install
        • Running the installer
        • Validating the install
      • Managing HDInsight services
      • Uninstalling HDInsight
      • Summary
      • Chapter 3: HDInsight Azure Cloud Service
        • HDInsight Service on Azure
          • Considerations for Azure HDInsight Service
        • Provision your cluster
        • HDInsight management dashboard
        • Verify the cluster and run sample jobs
          • Access HDFS
          • Deploy and execute the sample MapReduce job
          • View job results
        • Monitor your cluster
        • Azure storage integration
        • Remove your cluster
          • Delete your cluster
          • Delete your storage
          • Restore your cluster
        • Summary
          • Chapter 5: Ingesting Data to Your Cluster
            • Loading data using Hadoop commands
              • Step 1 – connect to a Hadoop client
              • Step 2 – get your files on local storage
              • Step 3 – upload to HDFS
            • Loading data using Azure Storage Vault (ASV)
              • Storage access keys
              • Storage tools
              • Azure Storage Explorer
                • Registering your storage account
                • Uploading files to your blob storage
            • Loading data using interactive JavaScript
            • Shipping data to Azure
            • Loading data using Sqoop
              • Key benefits
              • Two modes of using Sqoop
              • Using Sqoop to import (SQL to Hadoop)
            • Summary
            • Chapter 6: Transforming Data in Cluster
              • Transformation scenario
                • Scenario
                • Transformation objective
                • File organization
              • MapReduce solution
                • Design
                • Map code
                • Reduce code
                • Driver code
                • Compiling and packaging the code
                • Executing MapReduce
                • Results verification
              • Hive solution
                • Overview of Hive
                • Starting Hive in the HDInsight node
                • Step 1 – table creation
                • Step 2 – table loading
                • Step 3 – summary table creation
                • Step 4 – verifying the summary table
              • Pig solution
                • Pig architecture
                • Pig or Hive?
                • Starting Pig in the HDInsight node
                • Pig Grunt script
                  • Code
                  • Code explanation
                  • Execution
                  • Verification
              • Summary
              • Chapter 7: Analyzing and Reporting Your Data
                • Analyzing and reporting using Excel
                  • Step 1 – installing the Hive ODBC driver
                  • Step 2 – creating Hive ODBC data source
                  • Step 3 – importing data to Excel
                • Hive for ad hoc queries
                  • Creating reference tables
                  • Ad hoc queries
                  • Analytic functions in HiveQL
                • Interactive JavaScript for analysis and reporting
                • Other business intelligence tools
                • Summary
                • Chapter 8: Project Planning Tips and Resources
                  • Architectural considerations
                    • Extensible and modular
                    • Metadata-driven solution
                    • Integration strategy
                    • Security
                  • Project planning
                    • Proof of Concept
                    • Production implementation
                    • Reference sites and blogs
                  • Summary

                  Rajesh Nadipalli

                  Rajesh Nadipalli has over 17 years' IT experience and has held a technical leadership position at Cisco Systems. His key focus areas have been Data Management, Enterprise Architecture, Business Intelligence, Data Warehousing, and Extract Transform Load (ETL). He has demonstrated success by delivering scalable data management and BI solutions that empower businesses to make informed decisions. In his current role as a Senior Solutions Architect at Zaloni, Raj evaluates Big Data goals for his clients, recommends a target state architecture, assists in proof of concepts, and prepares them for a production implementation. In addition, Raj is an instructor for Hadoop for Developers, Hive, Pig, and HBase. His clients include Verizon, American Express, Netapp, Cisco, EMC, and United Health Group. Raj holds an MBA from NC State University and a BS in EE from the University of Mumbai, India.
                  Sorry, we don't have any reviews for this title yet.

                  Code Downloads

                  Download the code and support files for this book.


                  Submit Errata

                  Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.

                  Sample chapters

                  You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

                  Frequently bought together

                  HDInsight Essentials +    Oracle E-Business Suite 12 Financials Cookbook =
                  50% Off
                  the second eBook
                  Price for both: $46.20

                  Buy both these recommended eBooks together and get 50% off the cheapest eBook.

                  What you will learn from this book

                  • Explore the characteristics of a Big Data problem
                  • Analyse and report your data using PowerPivot, Power View, Excel, and other Microsoft BI tools
                  • Explore the architectural considerations for scalability, maintainability, and security
                  • Understand the concept of Data Ingestion to your HDInsight cluster including community tools and scripts
                  • Administer and monitor your HDInsight cluster including capacity and process management
                  • Get to know the Hadoop ecosystem with various tools and software based on their roles
                  • Get to know the HDInsight differentiator and how it is built on top of Apache Hadoop
                  • Transform your data using open source software such as MapReduce, Hive, Pig and JavaScript

                  In Detail

                  We live in an era in which data is generated with every action and a lot of these are unstructured; from Twitter feeds, Facebook updates, photos and digital sensor inputs. Current relational databases cannot handle the volume, velocity and variations of data. HDInsight gives you the ability to gain the full value of Big Data with a modern, cloud-based data platform that manages data of any size and type, whether structured or unstructured.

                  A hands-on guide that shows you how to seamlessly store and process Big Data of all types through Microsoft’s modern data platform; which provides simplicity, ease of management, and an open enterprise-ready Hadoop service all running in the Cloud. You will then learn how to analyze your Hadoop data with PowerPivot, Power View, Excel, and other Microsoft BI tools; thanks to integration with the Microsoft data platform, this will give you a solid foundation to build your own HDInsight solution, both on premise and on Cloud.

                  Firstly, we will provide an overview of Hadoop and Microsoft Big Data strategy, where HDinsight plays a key role. We will then show you how to set up your HDInsight cluster and take you through the 4 stages of collecting, processing, analysing and reporting. For each of these stages, you will see a practical example with working code.

                  You will then learn core Hadoop concepts like HDFS and MapReduce. You will also get a closer look at how Microsoft’s HDInsight leverages Hortonworks Data Platform that uses Apache Hadoop. You will then be guided through Hadoop commands and programming using open source software, such as Hive and Pig with HDInsight. Finally, you will learn to analyze and report using PowerPivot, Power View, Excel, and other Microsoft BI tools.

                  This guide provides step-by-step instructions on how to build a Big Data solution using HDInsight with open source software, provide useful Excel reports, and open up the full value of HDInsight.

                  Approach

                  This book is a fast-paced guide full of step-by-step instructions on how to build a multi-node Hadoop cluster on Windows servers.

                  Who this book is for

                  If you are a data architect or developer who wants to understand how to transform your data using open source software, such as MapReduce, Hive, Pig and JavaScript, and also leverage the Windows infrastructure; this book is perfect for you. It is also ideal if you are part of a team who is starting or planning a Hadoop implementation, and you want to understand the key components of Hadoop, and how HDInsight provides added value in administration and reporting.

                  Code Download and Errata
                  Packt Anytime, Anywhere
                  Register Books
                  Print Upgrades
                  eBook Downloads
                  Video Support
                  Contact Us
                  Awards Voting Nominations Previous Winners
                  Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
                  Resources
                  Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software