Reader small image

You're reading from  Hands-On Machine Learning with Azure

Product typeBook
Published inOct 2018
PublisherPackt
ISBN-139781789131956
Edition1st Edition
Tools
Right arrow
Authors (5):
Thomas K Abraham
Thomas K Abraham
author image
Thomas K Abraham

Dr. Thomas K Abraham is a cloud solution architect (advanced analytics and AI) at Microsoft in the South Central Region of the USA. Since January 2016, he's been assisting organizations in leveraging technologies such as SQL, Spark, Hadoop, NoSQL, BI, and AI on Azure. Prior to that, Thomas spent 10 years in Ecolab, where he designed algorithms for IoT devices and built solutions for anomaly detection. In the oil and gas division, he designed and built customer-facing analytics solutions for multiple super majors. His work was focused on preventing equipment failure by modeling corrosion, scale, and other stresses. He has a PhD in Chemical Engineering from The Ohio State University in 2005. His thesis focused on the use of nonlinear optimization with reaction models.
Read more about Thomas K Abraham

Parashar Shah
Parashar Shah
author image
Parashar Shah

Parashar Shah is a Senior Program Manager in the Azure Machine Learning platform team.Currently, he works on making Azure Machine Learning services the best place to do e2e machine learning for building custom AI solutions using big data. Previously at Microsoft, he has been a Data Scientist and a Data Solutions Architect in various Cloud and AI teams. Prior to joining Microsoft, Parashar worked at Nokia Networks as a Solutions Architect & Product Manager building customer experience analytics solutions for global telcos. He also co-founded a carpooling startup, which helped employees carpool safely. He has 10+ years of global work experience. He is an alum of Indian Institute of Management, Bangalore and Gujarat University.
Read more about Parashar Shah

Jen Stirrup
Jen Stirrup
author image
Jen Stirrup

Jen Stirrup is a data strategist and technologist, a Microsoft Most Valuable Professional (MVP), and a Microsoft Regional Director, a tech community advocate, a public speaker and blogger, a published author, and a keynote speaker. Jen is the founder of a boutique consultancy based in the UK, Data Relish, which focuses on delivering successful business intelligence and artificial intelligence solutions that add real value to customers worldwide. She has featured on the BBC as a guest expert on topics relating to data.
Read more about Jen Stirrup

Lauri Lehman
Lauri Lehman
author image
Lauri Lehman

Lauri Lehman is a data scientist who is focused on machine learning tools in Azure. He helps customers to design and implement machine learning solutions in the cloud. He works for the software consultancy company, Zure, based in Helsinki, Finland. For the past 4 years, Lauri has specialized in data and machine learning in Azure. He has worked on many machine learning projects, developing solutions for demand estimation, text analytics, and image recognition, for example. Lauri has previously worked as an academic researcher in theoretical physics, after obtaining his PhD on topological quantum walks. He still likes to follow the progress of modern physics and is eagerly a waiting the era of quantum machine learning!
Read more about Lauri Lehman

Anindita Basak
Anindita Basak
author image
Anindita Basak

Anindita Basak is a cloud architect with almost 15+ years of experience, the last 12 years of which she has been extensively working on Azure. She has delivered various real-time implementations on Azure data analytics, and cloud-native and real-time event-driven architecture for Fortune 500 enterprises, ranging from banking, financial services, and insurance (BFSI)to retail sectors. She is also a cloud and DataOps trainer and consultant, and author of cloud AI and DevOps books.
Read more about Anindita Basak

View More author details
Right arrow

HDInsight

HDInsight is a type of implementation of Hadoop that runs on the Microsoft Azure platform. HDInsight builds on the Hortonworks Data Platform (HDP), and is completely compatible with Apache Hadoop.

HDInsight can be perceived as Microsoft's Hadoop-as-a-Service (Haas). You can quickly deploy the system from a portal or through Windows PowerShell scripting, without having to create any physical or virtual machines.

The following are features of HDInsights:

  • You can implement a small or large number of nodes in a cluster
  • You pay only for what you use
  • When your job is complete, you can deprovision the cluster and, of course, stop paying for it
  • You can use Microsoft Azure Storage so that even when the cluster is deprovisioned, you can retain the data
  • The HDInsight service works with input-output technologies from Microsoft and other vendors

As mentioned, the HDInsight...

R with HDInsight

What are the main features of HDInsight? It is a Microsoft proprietary solution, but it is a 100% Apache Hadoop solution in the Microsoft Azure cloud. Azure HDInsight is a service that deploys and provisions Apache Hadoop clusters in the cloud for big data analytics.

HDInsight provides a software framework designed to manage, analyze, and report on big data. You can use HDInsight to perform interactive queries at petabyte scales over structured or unstructured data in any format. You can also build models, connecting them to BI tools. HDInsight is aimed at providing big data analytics and insights through Excel and Power BI. Azure's HDInsight service makes Apache Hadoop available as a service in the cloud, providing a software framework designed to manage, analyze, and report on big data. As a cloud-based service, it makes these resources available in a simpler...

Getting started with Azure HDInsight and ML services

HDInsight has a number of cluster types, which include Hadoop (Hive), HBase, Storm, Spark, Kafka, Interactive Hive (LLAP), and ML Services (R Server) (with R Studio, R 9.1). Here is the ML Cluster configuration, which is established during setup:

Setup and configuration of HDInsight

In this section, we will set up and configure HDInsight. To set up and configure HDInsight, carry out the following steps:

  1. Ensure that you have an Azure account
  2. Log into the Azure portal at portal.azure.com
  3. When you are logged into the Azure portal, click on the button to add a new resource
  4. In the search query box, type in HDInsight and you will be given a number of options
  5. Select the option...

HDInsight and data analytics with R

Firstly, we need to get our data into Azure so that HDInsight can see it. We can upload data directly to Azure Storage, or we can use functionality in SQL Server Integration Services (SSIS). SSIS has the capability of connecting to Azure Blob Storage and Azure HDInsight. It enables you to create integration service packages that transfer data between an Azure Blob Storage and the on-premise data source. Then, the Azure HDInsight process can conduct processing on the data.

In order to get the data into HDInsight using SSIS, it's necessary to install the Azure Feature Pack. The Microsoft SSIS Feature Pack for Azure provides SQL Server Integration Services with the capability to connect to many Azure services, such as Azure Blob Storage, Azure Data Lake Store, Azure SQL Data Warehouse, and Azure HDInsight. It is a separate install, and you...

Enriching data for analysis

With big data solutions, sometimes the data needs to be transformed and processed into smaller chunks due to its sheer size. In order to deal with this problem, Microsoft have introduced some functionality to help. This section will cover the features that are designed to assist with big data issues.

rxDataSteps

The rxDataStep function can be used to process data in chunks. It is one of the important data transformation functions in Microsoft ML Services.

The rxDataStep function can be used to create and transform subsets of data. The rxDataStep function processes data one chunk at a time, reading from one data source and writing to another. rxDataStep allows you to modify existing columns or add...

Summary

In this chapter, we have examined the machine learning process for Microsoft ML Services on HDInsight. We have reviewed how to ingest data, how to clean it, how to model it, and how to visualize it.

The final step is to ensure that HDInsight is torn down when you have stopped using it. HDInsight is charged by the minute and it will cost you money to leave it running. It is recommended that you save code and tear everything down when you no longer need it.

If you are simply wanting to run code and learn how to use Microsoft ML Services, the previous code samples will also work on Microsoft ML Server in Visual Studio on the DSVM.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Machine Learning with Azure
Published in: Oct 2018Publisher: PacktISBN-13: 9781789131956
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (5)

author image
Thomas K Abraham

Dr. Thomas K Abraham is a cloud solution architect (advanced analytics and AI) at Microsoft in the South Central Region of the USA. Since January 2016, he's been assisting organizations in leveraging technologies such as SQL, Spark, Hadoop, NoSQL, BI, and AI on Azure. Prior to that, Thomas spent 10 years in Ecolab, where he designed algorithms for IoT devices and built solutions for anomaly detection. In the oil and gas division, he designed and built customer-facing analytics solutions for multiple super majors. His work was focused on preventing equipment failure by modeling corrosion, scale, and other stresses. He has a PhD in Chemical Engineering from The Ohio State University in 2005. His thesis focused on the use of nonlinear optimization with reaction models.
Read more about Thomas K Abraham

author image
Parashar Shah

Parashar Shah is a Senior Program Manager in the Azure Machine Learning platform team.Currently, he works on making Azure Machine Learning services the best place to do e2e machine learning for building custom AI solutions using big data. Previously at Microsoft, he has been a Data Scientist and a Data Solutions Architect in various Cloud and AI teams. Prior to joining Microsoft, Parashar worked at Nokia Networks as a Solutions Architect & Product Manager building customer experience analytics solutions for global telcos. He also co-founded a carpooling startup, which helped employees carpool safely. He has 10+ years of global work experience. He is an alum of Indian Institute of Management, Bangalore and Gujarat University.
Read more about Parashar Shah

author image
Jen Stirrup

Jen Stirrup is a data strategist and technologist, a Microsoft Most Valuable Professional (MVP), and a Microsoft Regional Director, a tech community advocate, a public speaker and blogger, a published author, and a keynote speaker. Jen is the founder of a boutique consultancy based in the UK, Data Relish, which focuses on delivering successful business intelligence and artificial intelligence solutions that add real value to customers worldwide. She has featured on the BBC as a guest expert on topics relating to data.
Read more about Jen Stirrup

author image
Lauri Lehman

Lauri Lehman is a data scientist who is focused on machine learning tools in Azure. He helps customers to design and implement machine learning solutions in the cloud. He works for the software consultancy company, Zure, based in Helsinki, Finland. For the past 4 years, Lauri has specialized in data and machine learning in Azure. He has worked on many machine learning projects, developing solutions for demand estimation, text analytics, and image recognition, for example. Lauri has previously worked as an academic researcher in theoretical physics, after obtaining his PhD on topological quantum walks. He still likes to follow the progress of modern physics and is eagerly a waiting the era of quantum machine learning!
Read more about Lauri Lehman

author image
Anindita Basak

Anindita Basak is a cloud architect with almost 15+ years of experience, the last 12 years of which she has been extensively working on Azure. She has delivered various real-time implementations on Azure data analytics, and cloud-native and real-time event-driven architecture for Fortune 500 enterprises, ranging from banking, financial services, and insurance (BFSI)to retail sectors. She is also a cloud and DataOps trainer and consultant, and author of cloud AI and DevOps books.
Read more about Anindita Basak