Home Data Azure Machine Learning Engineering

Azure Machine Learning Engineering

By Sina Fakhraee , Balamurugan Balakreshnan , Megan Masanz
ai-assist-svg-icon Book + AI Assistant
eBook + AI Assistant $33.99 $22.99
Print $41.99
Subscription $15.99 $10 p/m for three months
ai-assist-svg-icon NEW: AI Assistant (beta) Available with eBook, Print, and Subscription.
ai-assist-svg-icon NEW: AI Assistant (beta) Available with eBook, Print, and Subscription. $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime! ai-assist-svg-icon NEW: AI Assistant (beta) Available with eBook, Print, and Subscription.
What do you get with a Packt Subscription?
Gain access to our AI Assistant (beta) for an exclusive selection of 500 books, available during your subscription period. Enjoy a personalized, interactive, and narrative experience to engage with the book content on a deeper level.
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
Gain access to our AI Assistant (beta) for an exclusive selection of 500 books, available during your subscription period. Enjoy a personalized, interactive, and narrative experience to engage with the book content on a deeper level.
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Along with your eBook purchase, enjoy AI Assistant (beta) access in our online reader for a personalized, interactive reading experience.
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
ai-assist-svg-icon NEW: AI Assistant (beta) Available with eBook, Print, and Subscription. ai-assist-svg-icon NEW: AI Assistant (beta) Available with eBook, Print, and Subscription. BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime! ai-assist-svg-icon NEW: AI Assistant (beta) Available with eBook, Print, and Subscription.
eBook + AI Assistant $33.99 $22.99
Print $41.99
Subscription $15.99 $10 p/m for three months
What do you get with a Packt Subscription?
Gain access to our AI Assistant (beta) for an exclusive selection of 500 books, available during your subscription period. Enjoy a personalized, interactive, and narrative experience to engage with the book content on a deeper level.
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
Gain access to our AI Assistant (beta) for an exclusive selection of 500 books, available during your subscription period. Enjoy a personalized, interactive, and narrative experience to engage with the book content on a deeper level.
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Along with your eBook purchase, enjoy AI Assistant (beta) access in our online reader for a personalized, interactive reading experience.
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
  1. Free Chapter
    Chapter 1: Introducing the Azure Machine Learning Service
About this book
Data scientists working on productionizing machine learning (ML) workloads face a breadth of challenges at every step owing to the countless factors involved in getting ML models deployed and running. This book offers solutions to common issues, detailed explanations of essential concepts, and step-by-step instructions to productionize ML workloads using the Azure Machine Learning service. You’ll see how data scientists and ML engineers working with Microsoft Azure can train and deploy ML models at scale by putting their knowledge to work with this practical guide. Throughout the book, you’ll learn how to train, register, and productionize ML models by making use of the power of the Azure Machine Learning service. You’ll get to grips with scoring models in real time and batch, explaining models to earn business trust, mitigating model bias, and developing solutions using an MLOps framework. By the end of this Azure Machine Learning book, you’ll be ready to build and deploy end-to-end ML solutions into a production system using the Azure Machine Learning service for real-time scenarios.
Publication date:
January 2023
Publisher
Packt
Pages
362
ISBN
9781803239309

 

Introducing the Azure Machine Learning Service

Machine Learning (ML), leveraging data to build and train a model to make predictions, is rapidly maturing. Azure Machine Learning (AML) is Microsoft’s cloud service, which not only enables model development but also your data science life cycle. AML is a tool designed to empower data scientists, ML engineers, and citizen data scientists. It provides a framework to train and deploy models empowered through MLOps to monitor, retrain, evaluate, and redeploy models in a collaborative environment backed by years of feedback from Microsoft’s Fortune 500 customers.

In this chapter, we will focus on deploying an AML workspace, the resource that leverages Azure resources to provide an environment to bring together the assets you will leverage when you use AML. We will showcase how to deploy these resources using a Guided User Interface (GUI), followed by setting up your AML service via the Azure Command-Line Interface (CLI) ml extension (v2), which is the ml extension for the Azure CLI, allowing model training and deployment through the command line. We will proceed with setting up the workspace by leveraging Azure Resource Management (ARM) templates, which are referred to as ARM deployments.

During deployment, key resources will be deployed, including AML Studio, a portal for data scientists to manage their workload, often referred to as your workspace; Azure Key Vault for storing sensitive information; Application Insights for logging information; Azure Container Registry to store docker images to leverage; and an Azure storage account to hold data. These resources will be leveraged behind the scenes as you navigate through the Azure Machine Learning service workspace, creating compute resources for writing code by leveraging the Integrated Development Environments (IDE) of your choice, including Jupyter Notebook, Jupyter Lab, as well as VS Code.

In this chapter, we will cover the following topics:

  • Building your first AMLS workspace
  • Navigating AMLS
  • Creating a compute for writing code
  • Developing within AMLS
  • Connecting AMLS to VS Code
 

Technical requirements

In this section, you will sign up for an Azure account and use the web-based Azure portal to create various resources. As such, you will require internet access and a working web browser.

The following are the prerequisites for the chapter:

 

Building your first AMLS workspace

Within Azure, there are numerous ways to create Azure resources. The most common method is through the Azure portal, a web interface that allows you to create resources through a GUI. To automate the creation of resources, users can leverage the Azure CLI with the ml extension (V2), which provides you with a familiar terminal to automate deployment. You can also create resources using ARM templates. Both the CLI and the ARM templates provide an automatable, repeatable process to create resources in Azure.

In the upcoming subsections, we will first create an AMLS workspace through the web portal. After you have mastered this task, you will also create another workspace via the Azure CLI. Once you understand how the CLI works, you will create an ARM template and use it to deploy a third workspace. After learning about all three deployment methods, you will delete all excess resources before moving on to the next section; leaving excess resources up and running will cost you money, so be careful.

Creating an AMLS workspace through the Azure portal

Using the portal to create an AMLS workspace is the easiest, most straightforward approach. Through the GUI, you create a resource group, a container to hold multiple resources, along with your AMLS workspace and all its components. To create a workspace, navigate to https://portal.azure.com and follow these steps:

  1. Navigate to https://portal.azure.com and type Azure Machine Learning into the search box as shown in Figure 1.1 and press Enter:

Figure 1.1 – Selecting resource groups

Figure 1.1 – Selecting resource groups

  1. On the top left of the Azure portal, select the + Create option shown in Figure 1.2:
Figure 1.2 – Creating an AML workspace

Figure 1.2 – Creating an AML workspace

Selecting the + Create option will bring up the Basics tab as shown here:

Figure 1.3 – Filling in the corresponding fields to create the ML workspace

Figure 1.3 – Filling in the corresponding fields to create the ML workspace

  1. In the Basics tab shown in Figure 1.3 for creating your AML workspace, populate the following values:
    1. Subscription: The Azure subscription you would like to deploy your resource.
    2. Resource group: Click on Create new and enter a name for a resource group. In Azure, resource groups can be thought of as folder, or container that holds resources for a particular solution. As we deploy the AMLS workspace, the resources will be deployed into this resource group to ensure we can easily delete the resources after performing this exercise.
    3. Workspace name: The name of the AMLS workspace resource.
    4. The rest of the options are the default, and you can click on the Review + create button.
  2. This will cause validation to occur – once the information has been validated, click on the + Create button to deploy your resources.
  3. It usually takes a few minutes for the workspace to be created. Once the deployment is completed, click on Go to resource in the Azure portal and then click on Launch studio to go to the AMLS workspace.

You are now on the landing page for AMLS as shown in Figure 1.4:

Figure 1.4 – AMLS

Figure 1.4 – AMLS

Congratulations! You have now successfully built your first AMLS workspace. While you can start by loading in data right now, take the time to walk through the next section to learn how to create it via code.

Creating an AMLS workspace through the Azure CLI

For people who prefer a code-first approach to creating resources, the Azure CLI is the perfect fit. At the time of writing, the AML CLI v2 is the most up-to-date extension for the Azure CLI available. While leveraging the Azure CLI v2, assets are defined by leveraging a YAML file, as we will see in later chapters.

Note

The Azure CLI v2 uses commands that follow a format of az ml <noun> <verb> <options>.

To create an AMLS workspace via the Azure CLI ml extension (v2), follow these steps:

  1. You need to install the Azure CLI from https://docs.microsoft.com/en-us/cli/azure/install-azure-cli.
  2. Find your subscription ID. In the Azure portal in the search box, you can type Subscriptions, and bring up a list of Azure subscriptions and the ID of the subscriptions. For the subscription that you would like to use, copy the Subscription ID information to use it with the CLI.

Here’s a view of Subscriptions within the Azure portal:

Figure 1.5 – Azure subscription list

Figure 1.5 – Azure subscription list

  1. Launch your command-line interpreter (CLI) based on your OS – for example, Command Prompt (CMD) or Windows Powershell (Windows PS) – and check your version of the Azure CLI by running the following command:
    az version

Note

You will need to have a version of the Azure CLI that is greater than 2.15.0 to leverage the ml extension.

  1. You will need to remove old extensions if they are installed for your CLI to work properly. You can remove the old ml extensions by running the following commands:
    az extension remove -n azure-cli-ml
    az extension remove -n ml
  2. To install the ml extension, run the following command:
    az extension add -n ml -y
  3. Now, let’s connect to your subscription in Azure through the Azure CLI by running the following command here, replacing xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx with the Subscription ID information you found in Figure 1.5:
    az login
    az account set --subscription xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
  4. Create a resource group by running the following command. Please note that rg_name is an example name for the resource group, just as aml-ws is an example name for an AML workspace:
    az group create --name aml-dev-rg  --location eastus2
  5. Create an AML workspace by running the following command, noting that eastus2 is the Azure region in which we will deploy this AML workspace:
    az ml workspace create -n aml-ws -g aml-dev-rg -l eastus2

You have now created an AMLS workspace with the Azure CLI ml extension and through the portal. There’s one additional way to create an AMLS workspace that’s commonly used, ARM templates, which we will take a look at next.

Creating an AMLS workspace with ARM templates

ARM templates can be challenging to write, but they provide you with a way to easily automate and parameterize the creation of Azure resources. In this section, you will first write a simple ARM template to build an AMLS workspace and then deploy your template using the Azure CLI. To do so, take the following steps:

  1. An ARM template can be downloaded from GitHub and is found here: https://github.com/Azure/azure-quickstart-templates/blob/master/quickstarts/microsoft.machinelearningservices/machine-learning-workspace/azuredeploy.json.

This template creates the following Azure services:

  • Azure Storage Account
  • Azure Key Vault
  • Azure Application Insights
  • Azure Container Registry
  • An AML workspace

The example template has three required parameters:

  • environment, where the resources will be created
  • name, which is the name that we are giving to the AMLS workspace
  • location, the Azure Region the resource will be deployed to
  1. To deploy your template, you have to create a resource group first as follows:
    az group create --name rg_name --location eastus2
  2. Make sure your command prompt is opened to the location to which you downloaded the azuredeploy.json file, and run the following command:
    az deployment group create --name "exampledeployment" --resource-group "rg_name" --template-file "azuredeploy.json" --parameters name="uniquename" environment="dev" location="eastus2"

It will take a few minutes for the workspace to be created.

We have covered a lot of information so far, whether creating an AMLS workspace using the portal, the CLI, or now using ARM templates. In the next section, we will show you how to navigate the workspace, often referred to as the studio.

 

Navigating AMLS

AMLS provides access to key resources for a data science team to leverage. In this section, you will learn how to navigate AMLS exploring the key components found within the studio. You will learn briefly about its capabilities, which we will cover in detail in the rest of the chapters.

Open a browser and go to https://portal.azure.com. Log in with your Azure AD credentials. Once logged into the portal, you will see several icons. Select the Resource group icon and click on the Azure Machine Learning resource.

In the Overview page, click on the Launch Studio button as seen in the following screenshot:

Figure 1.6 – Launch studio

Figure 1.6 – Launch studio

Clicking on the icon shown in Figure 1.6 will open AMLS in a new window.

The studio launch will bring you to the main home page of AMLS. The UI includes functionality to match several personas, including no-code, low -code, and code-based ML. The main page has two sections – the left-hand menu pane and the right-hand workspace pane.

The AMLS workspace home screen is shown in Figure 1.7:

Figure 1.7 – AMLS workspace home screen

Figure 1.7 – AMLS workspace home screen

Now, let us understand the preceding screenshot in brief:

  • In section 1 of Figure 1.7, the left-hand menu pane is displayed. Clicking on any of the words in this pane will bring up a new right workspace pane, which includes sections 2 and 3 of the screen. We can select any of these keywords to quickly access key resources within our AMLS workspace. We will drill into these key resources as we begin exploring the AMLS workspace.
  • In section 2 of Figure 1.7, quick links are provided to key resources that we will leverage throughout this book, enabling AMLS users to create new items covering the varying personas supported.
  • As we continue to explore our environment and dig into creating assets within the AMLS workspace, both with code-based and low-code options, recent resources will begin to appear in section 3 of Figure 1.7, providing users with the ability to see recently leveraged resources, whether the compute, the code execution, the models created, or the datasets that are leveraged.

The home page provides quick access to the key resources found within your AMLS workspace. In addition to the quick links, scroll down and you can view the Documentation section. In the Documentation section, we see great documentation to get you started in understanding how to best leverage your AML environment.

The Documentation section, a hub for documentation resources, is displayed on the right pane of the AMLS home screen:

Figure 1.8 – Documentation

Figure 1.8 – Documentation

As shown in Figure 1.8, the AMLS home page provides you with a wealth of documentation resources to get you started. The links include training modules, tutorials, and even blogs regarding how to leverage AMLS.

On the top-right side of the page, there are several options available:

  • Notifications: The bell icon represents notifications, which display the messages that are generated as you leverage your AMLS workspace. These messages will contain information regarding the creation and deletion of resources, as well as information regarding the resources running within your workspace.
Figure 1.9 – Top-right options

Figure 1.9 – Top-right options

  • Settings: The icon next to the bell that appears as a gear showcases settings for your Azure portal. Clicking on the icon provides the ability to set basic settings as shown in Figure 1.10:
Figure 1.10 – Settings for workspace customization

Figure 1.10 – Settings for workspace customization

Within the Settings blade, options are available to change the background of the workspace UI with themes. There are light and dark shades available. Then, there is a section for changing the preferred language and formats. Check the Language dropdown for a list of languages – the list of languages will change as new languages are added to the workspace.

  • Help: The question mark icon provides helpful resources, from tutorials to placing support requests. This is where all the Help content is organized:
Figure 1.11 – Help for AMLS workspace support

Figure 1.11 – Help for AMLS workspace support

Links are provided for tutorials on how to use the workspace and how to develop and deploy data science projects. Click on Launch guided tour to use the step-by-step guided tour.

To troubleshoot any issue with a workspace, click on Run workspace diagnostics and follow the instructions:

  • Support: This is the section where technical and subscription core limits, and other Azure-related issues, are linked to create a ticket.
  • Resources: This is the section that provides links to the AML documentation, as well as a useful cheat sheet that is hosted on GitHub. A link to Microsoft’s Privacy and Terms is also available in this section.

Clicking on the smiley icon will bring up the Send us feedback section:

Figure 1.12 – Feedback page

Figure 1.12 – Feedback page

Leveraging this section, an AMLS workspace user can provide feedback to the AMLS product team.

In the following screenshot, we can see the workspace selection menu:

Figure 1.13 – Workspace selection menu

Figure 1.13 – Workspace selection menu

When working with multiple workspaces on multiple projects, there may be a need to switch the AMLS workspace between multiple Azure AD directories. This option is available via the selection of the subscription and workspace name as shown in Figure 1.13. Also note, under the Resource Group section, a link will open a new tab in your browser and bring you directly to your resource group in the Azure portal. This is a nice feature, allowing you to quickly explore the Azure resources that are outside of the AMLS workspace but may be relevant to your workload in Azure. The workspace config file, which holds the key information enabling authorized users to connect directly to the AMLS workspace through code, can be downloaded to use with the Azure Machine Learning SDK for Python (AML SDK v2) inside the workspace selection menu.

Next, we will discuss the AMLS left-hand navigation menu shown in Figure 1.7 (1). This navigation menu will allow you to interact with your assets within your AML environment and is divided into three sections:

  • The Author section, which includes Notebooks, Automated ML, and Designer
  • The Assets section includes artifacts that will be created as part of your data science workload, which will be explored in detail in upcoming chapters.
  • The Manage section, which includes resources that will be leveraged as part of your data science workload.

Let’s review the sections as follows:

  • Author is the section in which the data scientist selects the tool of choice for development:
    • Notebooks: This is a section within the Author portion of the menu that provides access to your files, as well as an AMLS workspace IDE, which is similar to a Jupyter notebook, but with a few extra features for data scientists to carry out feature engineering and modeling. Inside this IDE, with a notebook that has been created, users can select a version of Python kernel, connecting them to a Conda environment with a specified version of Python.
Figure 1.14 – Author menu items

Figure 1.14 – Author menu items

Notebooks is an option within the Author section providing access to files, samples, file management, terminal access, and, as we will see later in this chapter in the Developing within AMLS section, a built-in IDE:

Figure 1.15 – Notebooks

Figure 1.15 – Notebooks

We will highlight the different features found within the Notebooks selection:

  1. In section 1 of Figure 1.15, clicking on the Files label shows all the user directories within the collaborative AMLS workspace, in addition to files stored within those directories.
  2. In section 2 of Figure 1.15, clicking on the Samples label provides AML tutorials for getting the most out of AMLS.
  3. Additionally, there is the capability to leverage a terminal on your compute resource.

In this section, you can create new files. Clicking on the + icon gives you the ability to create new files. Note that both files and folder directories can be uploaded as well as created. This allows you to easily upload data files in addition to code.

Create new file has options to name the file and select what type of file it is, such as a Jupyter notebook or Python. Typically, data scientists will create new Jupyter notebooks, but in addition to the .ipynb extension, the menu for File type includes Jupyter, Python, R, Shell, text, and other, in which you can provide your own file extension.

In the left-hand navigation menu of Figure 1.7, we saw Notebooks, which we briefly reviewed, as well as Automated ML and Designer. We will next provide a high-level overview of the Automated ML section.

  • Automated ML: This can also be selected from the Author section. Automated ML is a no-code-required tool that provides the ability to leverage data and select an ML model type and compute to accomplish model creation. In future chapters, we will go through this in more detail, but at a high level, this option provides a walk-through to establish a model based on the dataset provided. You will be prompted to pick classification, regression, or time-series forecasting; natural language processing (multi-class or multi-label classification); or compute vision (including multi-class, label, object detection, and instance segmentation) based on your data science workload. It’s a guided step-by-step process. There are settings available to stop the model from overrunning past a set duration to ensure that unexpected costs are limited. Automated ML also provides the ability to exclude algorithms. AML will select a variety of algorithms and run them with a dataset to provide the best model available. In addition to the capability to run multiple algorithms to determine the best model based on a given dataset, Automated ML also includes model explainability, providing insight into which features are more or less important in determining the response variable. The timing required for this process is dependent on the dataset, as well as the compute resources allocated to the task. Automated ML uses an ad-hoc compute, so when the experiment is submitted to run, it starts the compute and then runs the experiment. Building the models is run inside an experiment as a job, which is saved as a snapshot for future analysis. After the best model is built with Automated ML, AMLS provides the ability to leverage the best model with a single-click deployment of a REST API hosted in an Azure Container Instance (ACI) for development and test environments. AMLS can also support production workloads with a REST API deployment to Azure Kubernetes Services (AKS) and leveraging the CLI v2 or the SDK v2 AMLS supports endpoints that streamline the process of model deployment.

Clicking on Automated ML in the left-hand menu tab opens the ability to create a new Automated ML job:

Figure 1.16 – Automated ML screen with options

Figure 1.16 – Automated ML screen with options

Now that we have seen the Notebooks and Automated ML sections, we will look at the Designer section for a low-code experience.

  • Designer: This is the section where low-code environments are provided. Data scientists can drag and drop and develop model training and validation. Designer has two sections – to the left is the menu and to the right is the authoring section for development. Once the model is built, an option to deploy it in various forms is provided.

Here is a sample experiment built with Designer:

Figure 1.17 – Designer sample

Figure 1.17 – Designer sample

Designer provides options to model with several types of ML models, such as classification, regression, clustering, recommendation, computer vision, and text analytics.

Now that we have reviewed the sections for authoring a model – Notebooks, Automated ML, and Designer – we will explore the concept of assets in the AMLS Assets navigation section.

  • Assets is a section where all the experiment jobs and their artifacts are stored:
Figure 1.18 – Assets menu items

Figure 1.18 – Assets menu items

  • Data: This section will display the registered datasets used within the AMLS workspace under the Data assets tab. Datasets manage the versions created every time a new dataset is registered. Datasets can be created through the UI, SDK, or CLI. When a dataset is created, a data store (the resource hosting the data) is also provided:
    • Data assets: This displays a list of the datasets leveraged within the workspace:
Figure 1.19 – The Datasets display

Figure 1.19 – The Datasets display

Click on Data assets and see the list of all data sets used. The UI displaying datasets can be customized by adding and deleting columns to your view. In addition to providing the ability to register datasets through the UI, there is also the ability to archive a dataset by clicking on Archive. The data in a repository may change over time as applications add in data.

  • Datastores: Within the Data section of the left-hand pane menu, can also be selected. Data stores can be thought of as locations for retrieving data. Examples of data stores include Azure Blob storage, an Azure file share, Azure Data Lake Storage, or an Azure database, including SQL, PostgreSQL, and MySQL. All the security for connecting to a data store associated with your AMLS workspace and stored in Azure Key Vault. During the AMLS workspace deployment, an Azure Blob storage account was created. This Azure Blob storage account is your default datastore for your AMLS workspace.
  • A registered dataset can be monitored with functionality that is currently in preview, which can be reviewed by clicking on the Dataset monitors (preview) label shown in Figure 1.19.
  • Jobs: The Jobs screen shows all the experiments, which are groups of jobs, and the execution of code within your AMLS workspace:
Figure 1.20 – The Experiments display

Figure 1.20 – The Experiments display

You can customize and reset the default view in the UI for jobs by adding columns or deleting columns, the properties of a given job.

Each experiment will display as blue text under Experiment as in Figure 1.20. Within the Jobs section, we can select multiple experiments and see charts on their performance.

  • Pipelines: A pipeline is a sequence of steps performed within the job of an experiment:
Figure 1.21 – The Pipelines display

Figure 1.21 – The Pipelines display

Usually, designer experiments will show the pipeline and provide statuses for the job. As with the UIs for Jobs and Datasets, the UI provides customization when viewing pipelines. You can also display Pipeline endpoints. The Pipeline drafts option is also available. You can sort or filter the view by Status, Experiment, or Tags. Options to select all filters and clear filters are also available. The option to select how many rows to display is also available.

  • Environments: Setting up a Python environment can be a difficult task, as with the value of leveraging open source packages comes the complexity of managing the versions of various packages. While this problem is not unique to the AMLS workspace, Azure has created a solution for managing these resources – in AMLS, they are called environments. Environments is a section in AMLS that allows users to view and register which packages, and which Docker images, should be leveraged by the compute resources. Microsoft has already created several of these environments, which are considered curated, and users can also create their own custom environments. We will be leveraging custom environments in Chapter 3, Training Machine Learning Models in AMLS, as we run experiment jobs on compute clusters.

The Environments section provides a list of environments leveraged by the AMLS workspace:

Figure 1.22 – Environments

Figure 1.22 – Environments

In the Curated environments section, there is a wide variety of environments to select from. This is useful for applications that need specific environments with libraries. The list of environments created is available for selection. Click on each Name to see what is included in the environment. For now, most of the following environments are used for inference purposes.

  • Models: The Models section shows all the models registered and their versions. The UI provides customization of columns as shown in the following screenshot:
Figure 1.23 – The Models display

Figure 1.23 – The Models display

Models can be registered manually, through the SDK, or through the CLI. The options to change how many models to display, to show the current version or all versions of the model, and the ability to sort and filter and then clear are all available.

  • Endpoints: Models can be deployed as REST endpoints. These endpoints leverage the model, and with predicted values, provide a response based on the trained model. Leveraging the REST protocol, these models can easily be consumed by other applications. Clicking on Endpoints on the left-hand navigation menu of AMLS will bring these up.

The Endpoints section displays endpoints for both real-time and batch inferencing:

Figure 1.24 – The Endpoint display

Figure 1.24 – The Endpoint display

Real-time endpoints are referred to as online endpoints and typically take a single row of data and produce a score output, and they are performant as a REST API. Batch endpoints are for batch-based execution, where we pass large datasets and are then provided with the predicted output. This is usually a long-running process. While CLI v1 and the SDK v1 allow AMLS users to deploy to ACI and Kubernetes, this book will focus on deployments leveraging CLI v2 and SDK v2, which leverage endpoints to deploy to managed online endpoints, Kubernetes online endpoints, and batch inference endpoints.

  • Manage is the section in which users can manage resources leveraged by the AMLS workspace, including Compute, Data Labeling, and Linked Services:
    • Compute: This is where we manage various compute for developing data science projects. There are four types of compute resources found within the Compute section in AMLS. These four include Compute instances, Compute clusters, Inference clusters, and Attached computes.

The Compute section provides visibility into the compute resources leveraged with an AMLS workspace:

Figure 1.25 – Compute options

Figure 1.25 – Compute options

A compute can be a single node or include several nodes. A node is a Virtual Machine (VM) instance. A single node instance can vertically scale and will be limited to a Central Processing Unit (CPU) and Graphics Processing Unit (GPU). Compute instances are single nodes. These resources are great for development work. Compute clusters, on the other hand, can be scaled horizontally and can be used for workloads with larger datasets, as the workload can be distributed across the nodes. To enable scaling, jobs can be performed in parallel to effectively scale the training and scoring using our AML SDK.

Within the Compute section, as compute resources are created, the available quota for your subscription is displayed, providing visibility into the number of cores that are available for a given subscription. Most Azure VM SKUs are available for compute resources. For the GPU, depending on the region, users can create support requests to extend vCores if they are available in the region. When creating compute clusters, the number of nodes leveraged by the compute cluster can be set to from 0 to N nodes.

Compute resources in an AMLS workspace incur a cost per node on an hourly basis. On a compute cluster, setting the minimum number of nodes to 0 will shut down the compute resources when an experiment completes after the Idle seconds before scale down is reached. For a compute instance, there is the option to schedule when to switch on or off the instance to save money. In addition to compute instances and compute clusters, AMLS has the concept of inference clusters. Inference clusters in the Compute section allows you to view or create an AKS cluster. The last type of compute available within the compute section is under the Attached computes section. This section allows you to attach your own compute resources, including Azure Databricks, Synapse Spark pools, HDInsights, VMs, and others.

  • Data Labeling: Data Labeling is a newer feature option added to AMLS. This feature is for projects that tag images for custom vision-based modeling. Images are labeled within an AMLS Data Labeling project. Multiple users can label images within one project. To further improve productivity, there is ML-assisted data labeling. Within a labeling project, both text and images can be labeled. For image projects, labeling tasks include Image Classification Multi-class, which involves classifying an image from a set of classes, and Image Classification Multi-label, which applies more labels from a set of classes. There is also Object identification, which defines a bounding box to each object found in an image, and finally, Instance Segmentation, which provides a polygon around an image and assigns a class label. Text projects, include Multi-class and Multi-label and Text Named Entity Recognition options. Multi-class will apply a single label to text, while Multi-label allows you to apply one or more labels to a piece of text. Text Named Entity Recognition allows users to provide one or more entities for a piece of text.

The Data Labeling feature requires a GPU-enabled compute, due to its compute-intensive nature. An option to provide project instructions is available. Every user will be assigned a queue and the user’s progress in the project is also shown on a dashboard for each project.

The following screenshot shows how a sample labeling project is displayed:

Figure 1.26 – Data Labeling

Figure 1.26 – Data Labeling

  • Linked Services: This provides you with integration with other Microsoft products, currently including Azure Synapse Analytics so that you can attach Apache Spark pools. Click on the + Add integration button to select from an Azure subscription followed by a Synapse workspace.

Linked Services, as seen in the following screenshot, provides visibility into established connections with other Microsoft products:

Figure 1.27 – Linked Services

Figure 1.27 – Linked Services

Through this linked service, which is currently in public preview, AMLS can leverage an Azure Synapse workspace, bringing the power of an Apache Spark pool into your AMLS environment. A large component of a data science workload includes data preparation, and through Linked Services, data transformation can be delivered leveraging Spark.

With a basic understanding of the AMLS workspace, you can now move on to writing code. Before you do that, however, you need to create a VM that will power your jobs. Compute instances are AMLS VMs specifically for writing code. They come in many shapes and sizes and can be created via the AMLS GUI, the Azure CLI, Python code, or ARM templates. Each user is required to have their own compute instance, as AMLS allows only one user per compute instance.

We will begin by creating a compute instance via the AMLS GUI. Then, we will add a schedule to our compute instance so that it starts up and shuts down automatically; this is an important cost-saving measure. Next, we will create a compute instance by using the Azure CLI. Finally, we will create a compute instance with a schedule enabled with an ARM template. Even though you will create three compute instances, there is no need to delete them, as you only pay for them while they are in use.

Tip

When you’re not using a compute instance, make sure it is shut down. Leaving compute instances up and running incurs an hourly cost.

In this section, we have navigated through AMLS, leveraging the left-hand navigation menu pane. We explored the Author, Assets, and Manage sections and each of the components found within AMLS. Now that we have covered navigating the components of AMLS, let us continue with creating a compute so that you can begin to write code in AMLS.

 

Creating a compute for writing code

In this section, you will create a compute instance to begin your development. Each subsection will demonstrate how to create these resources in your AMLS workspace following different methods.

Creating a compute instance through the AMLS GUI

The most straightforward way to create a compute instance is through AMLS. Compute instances come in many sizes and you should adjust them to accommodate the size of your data. A good rule of thumb is that you should have 20 times the amount of RAM as the size of your data in CSV format, or 2 times the amount of RAM as the size of your data in a pandas DataFrame, the most popular Python data structure for data science. This is because, when you read in a CSV file as a pandas DataFrame, it expands the data by up to a factor of 10.

The compute name must be unique within a given Azure region, so you will need to make sure that the name of your compute resources is unique or the deployment will fail.

Now, let’s create a compute instance – a single VM-type compute that can be used for development. Each compute instance is assigned to a single user in the workspace for them to develop.

To create a compute instance, follow these steps:

  1. Log in to the AMLS workspace.
  2. Click on Compute in the left-hand menu.
  3. Click on New.

A new tab will open to configure our compute instance. The following screenshot showcases the creation of a compute instance:

Figure 1.28 – Selecting the VM type and region

Figure 1.28 – Selecting the VM type and region

Under Configure required settings, shown in Figure 1.28, let’s execute the following steps:

  1. You need to provide a name for your compute instance. Let’s name the compute instance amldevinstance. Note that the name of the compute instance will need to be unique for a given Azure region. Given that this name will likely already be used, in practice, you can provide a prefix or suffix to your compute name to ensure its uniqueness.
  2. Set Virtual machine type to CPU. GPU can also be selected for high-power deep learning models. Now, set Virtual machine size. The size allocation will display the nodes based on the quota available.
  3. Pick a VM size from the list of available CPUs.
  4. Click on Next: Advanced Settings.
  5. Turn on Enable SSH access if you want to use the compute instance from a remote machine. The Enable virtual network option is available to connect to a private network connected to a corporate network. An option to assign the compute to another user (Assign to another user) is also available. If there is any shell script to provision in the startup, please use the Provision with setup script option:
Figure 1.29 – Configure Settings

Figure 1.29 – Configure Settings

Now that we have the basic configurations provided for the compute, we can move to the next section on scheduling a time at which to shut down the instance to save money.

Adding a schedule to a compute instance

In the previous version of the AML service, data scientists had to manually spin up and shut down compute instances. Unsurprisingly, this led users to incur large bills when they forgot to shut them down over weekends and vacations. Microsoft added the ability to automatically start up and shut down compute instances in order to alleviate this problem. We recommend setting the shutdown schedule to just after your normal working hours conclude.

From Figure 1.29, click on the Add Schedule button to bring up the ability to set a start-up or shutdown automatic schedule.

Here’s the Startup and shutdown schedule window for the compute instance:

Figure 1.30 – Scheduling a shutdown for the compute instance

Figure 1.30 – Scheduling a shutdown for the compute instance

As shown in Figure 1.30, setting a schedule for the automatic shutdown of the compute will save on cost. Once scheduled, the system will automatically shut down to save money.

After setting your schedule, click on Create and wait for the instance to create. Once the instance has been created, it will automatically start and the compute instance page will be displayed.

Creating a compute instance through the Azure CLI

One major advantage of creating a compute instance with code is the ability to save your configuration file for later use.

Launch your command-line interpreter based on your OS (for example, CMD or Windows PS), connect the Azure CLI to your Azure subscription, and create a compute instance noting the name of the compute instance must be unique within an Azure Region by running the following commands:

az login
az ml compute create --name computeinstance01 --size STANDARD_D3_V2 --type ComputeInstance--resource-group my-resource-group --workspace-name my-workspace

Just as you created an AMLS workspace through the Azure CLI, you have now used it to create a compute instance. The next section will cover details on how to use ARM templates to create a compute instance.

Creating a compute instance with ARM templates

Upon creating your AMLS workspace with an ARM template, you can also instantiate compute instances at the same time, specifying their type, size, and schedule. This is an excellent strategy for large organizations that want to tightly control their compute instance configurations. It’s also great for teams who are looking to create multiple compute instances in one step. In order to do so, follow these steps:

  1. The ARM template can be downloaded from GitHub and is found here: https://github.com/Azure/azure-quickstart-templates/blob/master/quickstarts/microsoft.machinelearningservices/machine-learning-compute-create-computeinstance/azuredeploy.json.

The template creates a compute instance for you.

The example template has three required parameters:

  • workspaceName, which is the deployment location.
  • computeName, which is the name of the compute instance to create.
  • objectId, which is the object ID of the person to which to assign the compute instance. In your case, it will be yourself.

To get your object ID, you can run the following command:

az ad signed-in-user show
  1. To deploy your template, you must deploy it into a workspace that has already been created in a resource group that already exists. Be sure to replace objectId with the object ID you found using the az ad signed-in-user show command. Make sure your command prompt is in the location at which you downloaded the azuredeploy.json file, and run the following command:
    az deployment group create --name "exampledeployment" --resource-group "aml-dev-rg" --template-file "azuredeploy.json" --parameters workspaceName="aml-ws" computeName="devamlcompute" objectId="XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX" schedules="{'computeStartStop': [{'action': 'Stop','triggerType': 'Cron','cron': {'startTime': '2022-07-06T23:41:45','timeZone': 'Central Standard Time','expression': '00 20 * * 1,2,3,4,5'}}]}"

Tip

In order to create compute instances for users other than yourself, use the create on behalf option and specify a user ID.

Now you know how to create compute instances using the GUI, the Azure CLI, and ARM templates. You also know how to schedule startup and shutdown times for your compute instance, and are ready to use it to develop code, which will be the focus of the next section.

 

Developing within AMLS

With your compute instance VM created, you can use it to write code in either R or Python. Specifically, you can code in Jupyter, JupyterLab, Visual Studio Code (VSCode), or a terminal. Both Jupyter and JupyterLab are examples of IDEs for writing Python code. VS Code is Microsoft’s recommended IDE that allows you to script in either R or Python, among many other languages.

In this section, you will begin by opening a Jupyter notebook and using it to connect to your AMLS workspace. Similarly, you will do the same thing with JupyterLab. Finally, you will learn how to use AML notebooks in order to develop code.

Developing Python code with Jupyter Notebook

Perhaps the most common way for data scientists to write code within the AMLS is through Jupyter Notebook, the most popularly used Python IDE. Jupyter Notebook does, however, come with many limitations; it lacks a lot of the most basic features of your traditional IDE, such as linting and code analysis to flag errors before you try running your code. Still, many data scientists prefer it for its streamlined, easy-to-use interface. In order to open Jupyter and create a notebook, follow these steps:

  1. Go to the AML workspace UI.
  2. Click on Compute in the left-hand navigation menu under the Manage section of the menu.

The following screenshot shows us the list of all computes created:

Figure 1.31 – List of compute instances

Figure 1.31 – List of compute instances

  1. From the list, select your compute instance and click on Start (if it has not already started). Once the compute instance has started, a link to Applications will appear.
  2. Click on the JupyterLab or Jupyter link to open the Jupyter notebook for further development.
  3. To access the AMLS workspace notebook in the left-hand menu, go to the Notebooks section.
  4. On the next screen, you should see the folders in the left-hand pane. Each user in an AMLS workspace will have a folder created for them by default. Inside your user folder, you can select a notebook to work on. Options to create folders and new notebooks are also available, as shown in Figure 1.15.

Now that you know how to use Jupyter Notebook within the AMLS, we will explore how to leverage an AML notebook.

Developing using an AML notebook

AML notebooks are similar to Jupyter notebooks and provide you with another option for developing code. Whether you choose to develop using Jupyter, JupyterLab, or AML notebooks is largely a matter of personal preference. Please try all three to determine which suits you best. In order to start developing with AML notebooks, follow the following steps:

  1. Go to the AMLS workspace UI.
  2. Click on Notebooks on the left-hand navigation menu under the Author section.
  3. Expand the folder and select your notebook or create a new notebook:
Figure 1.32 – File menu options

Figure 1.32 – File menu options

  1. Click on Create new file.
  2. Name the notebook file amlbookchapter1.ipynb.
  3. Once you click on Create, the page in Figure 1.33 should pop up. The preceding process will create a new notebook for us to start development:
Figure 1.33 – New notebook

Figure 1.33 – New notebook

In your notebook, you will see the IDE default to the Python 3.10 - SDK V2 environment. Each cell will allow you to execute code on your running compute instance.

There are two main ways to develop code within the AMLS: Jupyter and AML notebooks. However, a far more powerful IDE exists, which contains many features to improve your productivity – VS Code. In the next section, you will download VS Code and connect it to the AMLS.

 

Connecting AMLS to VS Code

VS Code is an IDE designed to work with Windows, macOS, or Linux. Its Git integration and debugging features make it a natural selection for a code editor. VS Code has an extension for working directly with your AMLS environment to build, train, and deploy ML models. In order to leverage this powerful tool, install and configure VS Code by following these steps:

  1. Download and install VS Code at https://code.visualstudio.com/download.
  2. From within VS Code, click on the EXTENSIONS icon (Ctrl + Shift + x), search for Azure Machine Learning as shown in Figure 1.34, and select Install:
Figure 1.34 – Selecting the VS Code extension

Figure 1.34 – Selecting the VS Code extension

  1. Sign into your Azure account (Ctrl + Shift + p), and type the following:
    >Azure: Sign In

A new browser window will open for you to supply your credentials to enable sign-in.

  1. Set your default workspace, leveraging the command palette. In order to set your default AMLS workspace, you will need to have a folder open in which VS Code can store metadata. This can be done by selecting File from the menu and selecting Open Folder.
  2. Choose your default workspace by leveraging the command palette (Ctrl + Shift + p) and type Azure ML: Set Default Workspace. This command will walk you through selecting your subscription and your workspace:
     >Azure ML: Set Default Workspace
  3. Go to the Azure icon (Shift + Alt + A), and go to MACHINE LEARNING as shown in Figure 1.35:

Figure 1.35 – Azure icon

Figure 1.35 – Azure icon

  1. Inside the Machine Learning section of the Azure icon, right-click as shown in Figure 1.36 and select Connect for our compute instance:

Figure 1.36 – Connect to compute instance

Figure 1.36 – Connect to compute instance

  1. The VS Code extension will be installed on the compute instance you are connecting to and you will be asked whether you trust the authors of all files in the parent folder. Select Yes, I trust the authors.
  2. This will open a new instance of VS Code on your local machine. In this new instance of VS Code, you will see the user directory and notebook previously created in the Developing using an AML notebook section as shown in Figure 1.37:

Figure 1.37 – Opening the notebook in VS Code

Figure 1.37 – Opening the notebook in VS Code

  1. Select your Python interpreter from the top-right corner by clicking on Select Kernel. Please select the azureml_py310_sdkv2 kernel:
Figure 1.38 – Select Kernel

Figure 1.38 – Select Kernel

  1. After the interpreter has been selected, you can create a print statement as shown in Figure 1.39:
Figure 1.39 – Writing code

Figure 1.39 – Writing code

  1. To execute this code, press the play button to the left of the cell (Ctrl + Enter). Note that the Python code supplied will print out the current working directory. Your code executes on the compute instance as shown by the directory path printed out.

Saving the notebook within VS Code will save the notebook to your AMLS workspace.

In this section, you have installed VS Code, the AML VS Code extension, and connected to your compute instance in your AMLS workspace to run your code. VS Code provides IntelliSense, the ability to run and debug your code, along with built-in Git integration. Combining these features with integration into your AMLS workspace makes this the ideal choice for development.

 

Summary

In this chapter, you have explored the options for creating an AMLS workspace, how to navigate in AMLS, how to create compute instances for developing code, and how to use your compute instance to develop code. You have dug into creating an AMLS workspace through the Azure portal, through the CLI, as well as through ARM templates. You have navigated through the AMLS workspace components, exploring the Author, Assets, and Manage sections of AMLS. You have explored a variety of IDEs, including Jupyter, AML notebooks, as well as VS Code, Microsoft’s premier IDE, with the compute instance that you created either using the portal, the CLI, or via an ARM template. This foundation gives you everything you need to begin scripting ML solutions.

In the next chapter, you will import data into AMLS and connect to a variety of data sources. You will also learn how to automatically track changes to your data and roll back to earlier versions if necessary.

About the Authors
  • Sina Fakhraee

    Sina Fakhraee, Ph.D., is currently working at Microsoft as an enterprise data scientist and senior cloud solution architect. He has helped customers to successfully migrate to Azure by providing best practices around data and AI architectural design and by helping them implement AI/ML solutions on Azure. Prior to working at Microsoft, Sina worked at Ford Motor Company as a product owner for Ford's AI/ML platform. Sina holds a Ph.D. degree in computer science and engineering from Wayne State University and prior to joining the industry, he taught various undergrad and grad computer science courses part time.

    Browse publications by this author
  • Balamurugan Balakreshnan

    Balamurugan Balakreshnan is a principal cloud solution architect at Microsoft Data/AI Architect and Data Science. He has provided leadership on digital transformations with AI and cloud-based digital solutions. He has also provided leadership in terms of ML, the IoT, big data, and advanced analytical solutions.

    Browse publications by this author
  • Megan Masanz

    Megan Masanz is a principal cloud solution architect at Microsoft focused on data, AI, and data science, passionately enabling organizations to address business challenges through the establishment of strategies and road maps for the planning, design, and deployment of Azure Cloud-based solutions. Megan is adept at paving the path to data science via computer science given her master's in computer science with a focus on data science.

    Browse publications by this author
Azure Machine Learning Engineering
Unlock this book and the full library FREE for 7 days
Start now