Reader small image

You're reading from  Azure Data Scientist Associate Certification Guide

Product typeBook
Published inDec 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800565005
Edition1st Edition
Languages
Right arrow
Authors (2):
Andreas Botsikas
Andreas Botsikas
author image
Andreas Botsikas

Andreas Botsikas is an experienced advisor working in the software industry. He has worked in the finance sector, leading highly efficient DevOps teams, and architecting and building high-volume transactional systems. He then traveled the world, building AI-infused solutions with a group of engineers and data scientists. Currently, he works as a trusted advisor for customers onboarding into Azure, de-risking and accelerating their cloud journey. He is a strong engineering professional with a Doctor of Philosophy (Ph.D.) in resource optimization with artificial intelligence from the National Technical University of Athens.
Read more about Andreas Botsikas

Michael Hlobil
Michael Hlobil
author image
Michael Hlobil

Michael Hlobil is an experienced architect focused on quickly understanding customers' business needs, with over 25 years of experience in IT pitfalls and successful projects, and is dedicated to creating solutions based on the Microsoft Platform. He has an MBA in Computer Science and Economics (from the Technical University and the University of Vienna) and an MSc (from the ESBA) in Systemic Coaching. He was working on advanced analytics projects in the last decade, including massive parallel systems and Machine Learning systems. He enjoys working with customers and supporting the journey to the cloud.
Read more about Michael Hlobil

View More author details
Right arrow

Chapter 7: The AzureML Python SDK

In this chapter, you will understand how the AzureML Python Software Development Kit (SDK) is structured and how to work with it, something that is key for the DP-100 exam. You will learn how to work with the Notebooks experience that is built into the AzureML Studio web portal, a tool that boosts coding productivity. Using the notebook editor, you will write some Python code to gain a better understanding of how to manage the compute targets, datastores, and datasets that are registered in the workspace. Finally, you are going to revisit the Azure CLI we looked at in Chapter 2, Deploying Azure Machine Learning Workspace Resources, to perform workspace management actions using the AzureML extension. This will allow you to script and automate your workspace management activities.

In this chapter, we are going to cover the following main topics:

  • Overview of the Python SDK
  • Working with AzureML notebooks
  • Basic coding with the AzureML...

Technical requirements

You will need to have access to an Azure subscription. Within that subscription, you will need a resource group named packt-azureml-rg. You will need to have either a Contributor or Owner Access control (IAM) role at the resource group level. Within that resource group, you should have already deployed a machine learning resource named packt-learning-mlw, as described in Chapter 2, Deploying Azure Machine Learning Workspace Resources.

You will also need to have a basic understanding of the Python language. The code snippets in this chapter target Python version 3.6 or later. You should know the basics of how a Jupyter notebook works and how the variables that you defined in one cell exist in the execution context of others.

You can find all the notebooks and code snippets for this chapter on GitHub at http://bit.ly/dp100-ch07.

Overview of the Python SDK

The AzureML SDK is a Python library that allows you to interact with the AzureML services. It also provides you with data science modules that will assist you in your machine learning journey. The AzureML SDK is available in the R programming language through a Python to R interoperability package.

The SDK consists of several packages that group different types of modules you can import into your code base. All the Microsoft-supported modules are placed within packages that start with azureml, such as azureml.core and azureml.train.hyperdrive. The following diagram offers a broad overview of the AzureML SDK's most frequently used packages, as well as the key modules that you will see in this book and the exam:

Figure 7.1 – The AzureML SDK modules and important classes

Note that all the key classes that exist in the azureml.core package can also be imported from the corresponding child module. For example, the Experiment...

Working in AzureML notebooks

AzureML Studio offers integration with a couple of code editors that allow you to edit notebooks and Python scripts. These editors are powered by the compute instance you provisioned in Chapter 4, Configuring the Workspace. If you have stopped that compute instance to save on costs, navigate to Manage | Compute and start it. From this view, you can open all third-party coding editors AzureML Studio integrates with, as shown in the following screenshot:

Figure 7.2 – List of third-party code editor experiences Azure Studio integrates with

The most widely known open source data science editors are Jupyter Notebook and its newer sibling, JupyterLab. You can open those editing environments by clicking on the respective links shown in the preceding screenshot. This will open a new browser tab, as shown in the following screenshot:

Figure 7.3 – JupyterLab and Jupyter editing experiences provided by the...

Basic coding with the AzureML SDK

The first class you will work with is the AzureML Workspace, a class that gives you access to all the resources within your workspace. To create a reference to your workspace, you will need the following information:

  • Subscription ID: The subscription where the workspace is located. This is a Globally Unique Identifier (GUID, also known as a UUID) that consists of 32 hexadecimal (0-F) digits; for example, ab05ab05-ab05-ab05-ab05-ab05ab05ab05. You can find this ID in the Azure portal in the Properties tab of the subscription you are using.
  • Resource group name: The resource group that contains the AzureML workspace components.
  • Workspace name: The name of the AzureML workspace.

You can store this information in variables by running the following assignments:

subscription_id = '<Subscription Id>'
resource_group = 'packt-azureml-rg'
workspace_name = 'packt-learning-mlw'

The first approach to...

Working with the AzureML CLI extension

In Chapter 2, Deploying Azure Machine Learning Workspace Resources, you learned how to use the Azure CLI and how to install the azure-cli-ml extension. This extension uses the Python SDK you saw in this chapter to perform various operations. To work with the Azure CLI, you can do one of the following:

  1. Open the cloud shell in the Azure portal, as you did in Chapter 2, Deploying Azure Machine Learning Workspace Resources.
  2. Open a terminal in the compute instance you have been working on in this chapter.
  3. Use the shell assignment feature of Jupyter notebooks, which allows you to execute commands using the underlying shell by using an exclamation mark (!), also known as bang.

In this section, you will use the notebook, something that will allow you to store the steps and repeat them if you need them in the future:

  1. The first thing you will need to do is install the azure-cli-ml extension in the Azure CLI of the compute...

Summary

In this chapter, you learned how the AzureML Python SDK is structured. You also discovered the AzureML notebook editor, which allows you to code Python scripts. You then worked with the SDK. You started your coding journey by managing the compute targets that are attached to the AzureML workspace. You then attached new datastores and got a reference to existing ones, including the default datastore for the workspace. Then, you worked with various files and tabular-based datasets and learned how to reuse them by registering them in the workspace.

Finally, you worked with the AzureML CLI extension, which is a client that utilizes the Python SDK you explored in this chapter.

In the next chapter, you will build on top of this knowledge and learn how to use the AzureML SDK during the data science experimentation phase. You will also learn how to track metrics on your data science experiments, as well as how to scale your training into bigger computes, by running scripts in...

Questions

Please answer the following questions to check your knowledge of the topics that were discussed in this chapter:

  1. What is the default minimum number of nodes for an AzureML compute cluster?

    a. 0

    b. 1

    c. Equal to the maximum number of nodes

  2. You upload a CSV file to the default datastore that contains credit card transaction details. Which of the following methods should you use to create a dataset reference?

    a. Dataset.File.from_files()

    b. Dataset.Tabular.from_delimited_files()

    c. Workspace.from_csv_files()

    d. Datastore.from_csv_files()

  3. How can you force the creation of a blob container during the registration process of an Azure blob-based datastore?

    a. Pass the force_create=True parameter to the Datastore.register_azure_blob_container() method.

    b. Pass the create_if_not_exists=True parameter to the Datastore.register_azure_blob_container() method.

    c. Pass the force_create=True parameter to the Datastore.register_container() method.

    b. Pass the create_if_not_exists=True...

Further reading

This section offers a list of useful web resources that will help you augment your knowledge of the AzureML SDK and the various third-party libraries that were used in this chapter:

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Azure Data Scientist Associate Certification Guide
Published in: Dec 2021Publisher: PacktISBN-13: 9781800565005
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Andreas Botsikas

Andreas Botsikas is an experienced advisor working in the software industry. He has worked in the finance sector, leading highly efficient DevOps teams, and architecting and building high-volume transactional systems. He then traveled the world, building AI-infused solutions with a group of engineers and data scientists. Currently, he works as a trusted advisor for customers onboarding into Azure, de-risking and accelerating their cloud journey. He is a strong engineering professional with a Doctor of Philosophy (Ph.D.) in resource optimization with artificial intelligence from the National Technical University of Athens.
Read more about Andreas Botsikas

author image
Michael Hlobil

Michael Hlobil is an experienced architect focused on quickly understanding customers' business needs, with over 25 years of experience in IT pitfalls and successful projects, and is dedicated to creating solutions based on the Microsoft Platform. He has an MBA in Computer Science and Economics (from the Technical University and the University of Vienna) and an MSc (from the ESBA) in Systemic Coaching. He was working on advanced analytics projects in the last decade, including massive parallel systems and Machine Learning systems. He enjoys working with customers and supporting the journey to the cloud.
Read more about Michael Hlobil