HDInsight Essentials

Getting to grips with the fundamentals of HDInsight is amazingly straightforward when you delve into this course. It shows you how to manage even the largest volumes of unstructured data to gain business knowledge.

HDInsight Essentials

Starting
Rajesh Nadipalli

Getting to grips with the fundamentals of HDInsight is amazingly straightforward when you delve into this course. It shows you how to manage even the largest volumes of unstructured data to gain business knowledge.
$20.99
$34.99
RRP $20.99
RRP $34.99
eBook
Print + eBook
$12.99 p/month

Get Access

Get Unlimited Access to every Packt eBook and Video course

Enjoy full and instant access to over 3000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.

Book Details

ISBN 139781849695367
Paperback122 pages

About This Book

  • Architect a Hadoop solution with a modular design for data collection, distributed processing, analysis, and reporting
  • Build a multi-node Hadoop cluster on Windows servers
  • Establish a Big Data solution using HDInsight with open source software, and provide useful Excel reports
  • Run Pig scripts and build simple charts using Interactive JS (Azure)

Who This Book Is For

If you are a data architect or developer who wants to understand how to transform your data using open source software, such as MapReduce, Hive, Pig and JavaScript, and also leverage the Windows infrastructure; this book is perfect for you. It is also ideal if you are part of a team who is starting or planning a Hadoop implementation, and you want to understand the key components of Hadoop, and how HDInsight provides added value in administration and reporting.

Table of Contents

Chapter 1: Hadoop and HDInsight in a Heartbeat
Big Data – hype or real?
Apache Hadoop concepts
Summary
Chapter 2: Deploying HDInsight on Premise
HDInsight and Hadoop relationship
Deployment options for on-premise
Single-node install
Multinode planning and preparation
Multinode installation
Managing HDInsight services
Uninstalling HDInsight
Summary
Chapter 3: HDInsight Azure Cloud Service
HDInsight Service on Azure
Provision your cluster
HDInsight management dashboard
Verify the cluster and run sample jobs
Monitor your cluster
Azure storage integration
Remove your cluster
Summary
Chapter 4: Administering Your HDInsight Cluster
Cluster status
Distributed filesystem health
MapReduce health
Key files
Summary
Chapter 5: Ingesting Data to Your Cluster
Loading data using Hadoop commands
Loading data using Azure Storage Vault (ASV)
Loading data using interactive JavaScript
Shipping data to Azure
Loading data using Sqoop
Summary
Chapter 6: Transforming Data in Cluster
Transformation scenario
MapReduce solution
Hive solution
Pig solution
Summary
Chapter 7: Analyzing and Reporting Your Data
Analyzing and reporting using Excel
Hive for ad hoc queries
Interactive JavaScript for analysis and reporting
Other business intelligence tools
Summary
Chapter 8: Project Planning Tips and Resources
Architectural considerations
Project planning
Summary

What You Will Learn

  • Explore the characteristics of a Big Data problem
  • Analyse and report your data using PowerPivot, Power View, Excel, and other Microsoft BI tools
  • Explore the architectural considerations for scalability, maintainability, and security
  • Understand the concept of Data Ingestion to your HDInsight cluster including community tools and scripts
  • Administer and monitor your HDInsight cluster including capacity and process management
  • Get to know the Hadoop ecosystem with various tools and software based on their roles
  • Get to know the HDInsight differentiator and how it is built on top of Apache Hadoop
  • Transform your data using open source software such as MapReduce, Hive, Pig and JavaScript

In Detail

We live in an era in which data is generated with every action and a lot of these are unstructured; from Twitter feeds, Facebook updates, photos and digital sensor inputs. Current relational databases cannot handle the volume, velocity and variations of data. HDInsight gives you the ability to gain the full value of Big Data with a modern, cloud-based data platform that manages data of any size and type, whether structured or unstructured.

A hands-on guide that shows you how to seamlessly store and process Big Data of all types through Microsoft’s modern data platform; which provides simplicity, ease of management, and an open enterprise-ready Hadoop service all running in the Cloud. You will then learn how to analyze your Hadoop data with PowerPivot, Power View, Excel, and other Microsoft BI tools; thanks to integration with the Microsoft data platform, this will give you a solid foundation to build your own HDInsight solution, both on premise and on Cloud.

Firstly, we will provide an overview of Hadoop and Microsoft Big Data strategy, where HDinsight plays a key role. We will then show you how to set up your HDInsight cluster and take you through the 4 stages of collecting, processing, analysing and reporting. For each of these stages, you will see a practical example with working code.

You will then learn core Hadoop concepts like HDFS and MapReduce. You will also get a closer look at how Microsoft’s HDInsight leverages Hortonworks Data Platform that uses Apache Hadoop. You will then be guided through Hadoop commands and programming using open source software, such as Hive and Pig with HDInsight. Finally, you will learn to analyze and report using PowerPivot, Power View, Excel, and other Microsoft BI tools.

This guide provides step-by-step instructions on how to build a Big Data solution using HDInsight with open source software, provide useful Excel reports, and open up the full value of HDInsight.

Authors

Table of Contents

Chapter 1: Hadoop and HDInsight in a Heartbeat
Big Data – hype or real?
Apache Hadoop concepts
Summary
Chapter 2: Deploying HDInsight on Premise
HDInsight and Hadoop relationship
Deployment options for on-premise
Single-node install
Multinode planning and preparation
Multinode installation
Managing HDInsight services
Uninstalling HDInsight
Summary
Chapter 3: HDInsight Azure Cloud Service
HDInsight Service on Azure
Provision your cluster
HDInsight management dashboard
Verify the cluster and run sample jobs
Monitor your cluster
Azure storage integration
Remove your cluster
Summary
Chapter 4: Administering Your HDInsight Cluster
Cluster status
Distributed filesystem health
MapReduce health
Key files
Summary
Chapter 5: Ingesting Data to Your Cluster
Loading data using Hadoop commands
Loading data using Azure Storage Vault (ASV)
Loading data using interactive JavaScript
Shipping data to Azure
Loading data using Sqoop
Summary
Chapter 6: Transforming Data in Cluster
Transformation scenario
MapReduce solution
Hive solution
Pig solution
Summary
Chapter 7: Analyzing and Reporting Your Data
Analyzing and reporting using Excel
Hive for ad hoc queries
Interactive JavaScript for analysis and reporting
Other business intelligence tools
Summary
Chapter 8: Project Planning Tips and Resources
Architectural considerations
Project planning
Summary

Book Details

ISBN 139781849695367
Paperback122 pages
Read More