Reader small image

You're reading from  Azure Data Scientist Associate Certification Guide

Product typeBook
Published inDec 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800565005
Edition1st Edition
Languages
Right arrow
Authors (2):
Andreas Botsikas
Andreas Botsikas
author image
Andreas Botsikas

Andreas Botsikas is an experienced advisor working in the software industry. He has worked in the finance sector, leading highly efficient DevOps teams, and architecting and building high-volume transactional systems. He then traveled the world, building AI-infused solutions with a group of engineers and data scientists. Currently, he works as a trusted advisor for customers onboarding into Azure, de-risking and accelerating their cloud journey. He is a strong engineering professional with a Doctor of Philosophy (Ph.D.) in resource optimization with artificial intelligence from the National Technical University of Athens.
Read more about Andreas Botsikas

Michael Hlobil
Michael Hlobil
author image
Michael Hlobil

Michael Hlobil is an experienced architect focused on quickly understanding customers' business needs, with over 25 years of experience in IT pitfalls and successful projects, and is dedicated to creating solutions based on the Microsoft Platform. He has an MBA in Computer Science and Economics (from the Technical University and the University of Vienna) and an MSc (from the ESBA) in Systemic Coaching. He was working on advanced analytics projects in the last decade, including massive parallel systems and Machine Learning systems. He enjoys working with customers and supporting the journey to the cloud.
Read more about Michael Hlobil

View More author details
Right arrow

Preface

This book helps you acquire practical knowledge about machine learning experimentation on Azure. It covers everything you need to know and understand to become a certified Azure Data Scientist Associate.

The book starts with an introduction to data science, making sure you are familiar with the terminology used throughout the book. You then move into the Azure Machine Learning (AzureML) workspace, your working area for the rest of the book. You will discover the studio interface and manage the various components, like the data stores and the compute clusters.

You will then focus on no-code, and low-code experimentation. You will discover the Automated ML wizard, which helps you to locate and deploy optimal models for your dataset. You will also learn how to run end-to-end data science experiments using the designer provided in AzureML studio.

You will then deep dive into the code first data science experimentation. You will explore the AzureML Software Development Kit (SDK) for Python and learn how to create experiments and publish models using code. You will learn how to use powerful computer clusters to scale up and out your machine learning jobs. You will learn how to optimize your model’s hyperparameters using Hyperdrive. Then you will learn how to use responsible AI tools to interpret and debug your models. Once you have a trained model, you will learn to operationalize it for batch or real-time inferences and how you can monitor it in production.

With this knowledge, you will have a good understanding of the Azure Machine Learning platform and you will be able to clear the DP100 exam with flying colors.

Who this book is for

The book targets two audiences: developers who seek to infuse their applications with AI capabilities and data scientists who want to scale their ML experiments in the Azure cloud. Basic knowledge of Python is needed to follow the code samples present in the book. Some experience in training ML models in Python with common frameworks such as scikit-learn will help you understand the content more easily.

What this book covers

Chapter 1, An Overview of Modern Data Science, provides you with the terminology used throughout the book.

Chapter 2, Deploying Azure Machine Learning Workspace Resources, helps you understand the deployment options for an Azure Machine Learning (AzureML) workspace.

Chapter 3, Azure Machine Learning Studio Components, provides an overview of the studio web interface you will be using to conduct your data science experiments.

Chapter 4, Configuring the Workspace, helps you understand how to provision computational resources and connect to data sources that host your datasets.

Chapter 5, Letting the Machines Do the Model Training, guides you on your first Automated Machine Learning (AutoML) experiment and how to deploy the best-trained model as a web endpoint through the studio’s wizards.

Chapter 6, Visual Model Training and Publishing, helps you author a training pipeline through the studio’s designer experience. You will learn how to operationalize the trained model through a batch or a real-time pipeline by promoting the trained pipeline within the designer.

Chapter 7, The AzureML Python SDK, gets you started on the code-first data science experimentation. You will understand how the AzureML Python SDK is structured, and you will learn how to manage AzureML resources like compute clusters with code.

Chapter 8, Experimenting with Python Code, helps you train your first machine learning model with code. It guides you on how to track model metrics and scale-out your training efforts to bigger compute clusters.

Chapter 9, Optimizing the ML Model, shows you how to optimize your machine learning model with Hyperparameter tuning and helps you discover the best model for your dataset by kicking off an AutoML experiment with code.

Chapter 10, Understanding Model Results, introduces you to the concept of responsible AI and deep dives into the tools that allow you to interpret your models’ predictions, analyze the errors that your models are prone to, and detect potential fairness issues.

Chapter 11, Working with Pipelines, guides you on authoring repeatable processes by defining multi-step pipelines using the AzureML Python SDK.

Chapter 12, Operationalizing Models with Code, helps you register your trained models and operationalize them through real-time web endpoints or batch parallel processing pipelines.

To get the most out of this book

This book tries to provide you with everything you need to learn. The Further reading section of each chapter contains links to pages that will help you deep dive, into topics that are peripheral to the contents of this book. It will help if you have some basic familiarity with the Azure portal and have read some Python code in the past.

In this book, we guide you to use the Notebooks experience available within the AzureML studio. If you want to execute the same code on your workstation instead of the cloud-based experience, you will need a Python environment to run Jupyter notebooks. The easiest way to run Jupyter notebooks on your workstation is through VSCode, a free cross-platform editor with fantastic Python support. You will also need to install Git in your workstation to clone the book’s GitHub repository.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

If you face any issue executing the code, ensure that you have cloned the latest version from the GitHub repository. If the problem persists, feel free to open a GitHub issue to describe the issue you are facing and help you solve it.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Azure-Data-Scientist-Associate-Certification-Guide. If there's an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781800565005_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "You can also change the autogenerated name of the pipeline you are designing. Rename the current pipeline to test-pipeline."

A block of code is set as follows:

from azureml.train.hyperdrive import GridParameterSampling
from azureml.train.hyperdrive import choice
param_sampling = GridParameterSampling( {
        "a": choice(0.01, 0.5),
        "b": choice(10, 100)
    }
)

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

from azureml.core import Workspace
ws = Workspace.from_config()
loans_ds = ws.datasets['loans']
compute_target = ws.compute_targets['cpu-sm-cluster']

Any command-line input or output is written as follows:

az group create --name my-name-rg --location westeurope

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "Navigate to the Author | Notebooks section of your AzureML Studio web interface."

Tips or important notes

Run numbers may be different in your executions. Every time you execute the cells, a new run number is created, continuing from the previous number. So, if you execute code that performs one hyperdrive run with 20 child runs, the last child run will be run 21. The next time you execute the same code, the hyperdrive run will start from run 22, and the last child will be run 42. The run numbers referred to in this section are the ones shown in the various figures, and it is normal to observe differences, especially if you had to rerun a couple of cells.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at customercare@packtpub.com and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you've read Azure Data Scientist Associate Certification Guide, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Azure Data Scientist Associate Certification Guide
Published in: Dec 2021Publisher: PacktISBN-13: 9781800565005
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Andreas Botsikas

Andreas Botsikas is an experienced advisor working in the software industry. He has worked in the finance sector, leading highly efficient DevOps teams, and architecting and building high-volume transactional systems. He then traveled the world, building AI-infused solutions with a group of engineers and data scientists. Currently, he works as a trusted advisor for customers onboarding into Azure, de-risking and accelerating their cloud journey. He is a strong engineering professional with a Doctor of Philosophy (Ph.D.) in resource optimization with artificial intelligence from the National Technical University of Athens.
Read more about Andreas Botsikas

author image
Michael Hlobil

Michael Hlobil is an experienced architect focused on quickly understanding customers' business needs, with over 25 years of experience in IT pitfalls and successful projects, and is dedicated to creating solutions based on the Microsoft Platform. He has an MBA in Computer Science and Economics (from the Technical University and the University of Vienna) and an MSc (from the ESBA) in Systemic Coaching. He was working on advanced analytics projects in the last decade, including massive parallel systems and Machine Learning systems. He enjoys working with customers and supporting the journey to the cloud.
Read more about Michael Hlobil