HDInsight Essentials

More Information
  • Explore the characteristics of a Big Data problem
  • Analyse and report your data using PowerPivot, Power View, Excel, and other Microsoft BI tools
  • Explore the architectural considerations for scalability, maintainability, and security
  • Understand the concept of Data Ingestion to your HDInsight cluster including community tools and scripts
  • Administer and monitor your HDInsight cluster including capacity and process management
  • Get to know the Hadoop ecosystem with various tools and software based on their roles
  • Get to know the HDInsight differentiator and how it is built on top of Apache Hadoop
  • Transform your data using open source software such as MapReduce, Hive, Pig and JavaScript

We live in an era in which data is generated with every action and a lot of these are unstructured; from Twitter feeds, Facebook updates, photos and digital sensor inputs. Current relational databases cannot handle the volume, velocity and variations of data. HDInsight gives you the ability to gain the full value of Big Data with a modern, cloud-based data platform that manages data of any size and type, whether structured or unstructured.

A hands-on guide that shows you how to seamlessly store and process Big Data of all types through Microsoft’s modern data platform; which provides simplicity, ease of management, and an open enterprise-ready Hadoop service all running in the Cloud. You will then learn how to analyze your Hadoop data with PowerPivot, Power View, Excel, and other Microsoft BI tools; thanks to integration with the Microsoft data platform, this will give you a solid foundation to build your own HDInsight solution, both on premise and on Cloud.

Firstly, we will provide an overview of Hadoop and Microsoft Big Data strategy, where HDinsight plays a key role. We will then show you how to set up your HDInsight cluster and take you through the 4 stages of collecting, processing, analysing and reporting. For each of these stages, you will see a practical example with working code.

You will then learn core Hadoop concepts like HDFS and MapReduce. You will also get a closer look at how Microsoft’s HDInsight leverages Hortonworks Data Platform that uses Apache Hadoop. You will then be guided through Hadoop commands and programming using open source software, such as Hive and Pig with HDInsight. Finally, you will learn to analyze and report using PowerPivot, Power View, Excel, and other Microsoft BI tools.

This guide provides step-by-step instructions on how to build a Big Data solution using HDInsight with open source software, provide useful Excel reports, and open up the full value of HDInsight.

  • Architect a Hadoop solution with a modular design for data collection, distributed processing, analysis, and reporting
  • Build a multi-node Hadoop cluster on Windows servers
  • Establish a Big Data solution using HDInsight with open source software, and provide useful Excel reports
  • Run Pig scripts and build simple charts using Interactive JS (Azure)
Page Count 122
Course Length 3 hours 39 minutes
ISBN 9781849695367
Date Of Publication 22 Sep 2013


Rajesh Nadipalli

Rajesh Nadipalli is currently Director, Professional Services and Support at Zaloni, an award-wining provider of enterprise data lake management solutions that enables global clients to innovate and leverage big data for business impact. Rajesh leads Hadoop-based technical proof-of-concepts, strategy, solution architectures, and post-sales product support for his clients. His clientele includes AIG, NBCU, Verizon, Du, American Express, Netapp, Dell-EMC, United Health Group, and Cisco. In his previous role as the director of product management, he was leading the product strategy, roadmap, and feature definitions for Zaloni's Hadoop data management platform.

Throughout his 20 plus years in IT, Rajesh has had a passion for data and held various roles as big data architect, solutions architect, database administrator (DBA), business intelligence architect, and Etldeveloper. He believes in using technology as a strategic advantage for his clients by improving productivity, performance, and real-time insight to relevant data.

Rajesh is also the author of HDInsight Essentials, by Packt publishing, which takes you through the journey of building a modern data lake architecture using HDInsight, a Hadoop-based service that allows you to successfully manage high volume and velocity data in Azure Cloud.

He is a regular blogger and his articles are published in Zaloni blog, Datafloq, and Dzone sites.

He holds a MBA from North Carolina State University and a BS in EE from University of Mumbai, India.