You're reading from Power BI Machine Learning and OpenAI

Product type Book

Published in May 2023

Publisher Packt

ISBN-13 9781837636150

Pages 308 pages

Edition 1st Edition

Languages

Python

Concepts

GPT/LLMs

Author (1):

Greg Beaumont

Table of Contents (21) Chapters

Preface

Part 1: Data Exploration and Preparation

Chapter 1: Requirements, Data Modeling, and Planning

Chapter 2: Preparing and Ingesting Data with Power Query

Chapter 3: Exploring Data Using Power BI and Creating a Semantic Model

Chapter 4: Model Data for Machine Learning in Power BI

Part 2: Artificial Intelligence and Machine Learning Visuals and Publishing to the Power BI Service

Chapter 5: Discovering Features Using Analytics and AI Visuals

Chapter 6: Discovering New Features Using R and Python Visuals

Chapter 7: Deploying Data Ingestion and Transformation Components to the Power BI Cloud Service

Part 3: Machine Learning in Power BI

Chapter 8: Building Machine Learning Models with Power BI

Chapter 9: Evaluating Trained and Tested ML Models

Chapter 10: Iterating Power BI ML models

Chapter 11: Applying Power BI ML Models

Part 4: Integrating OpenAI with Power BI

Chapter 12: Use Cases for OpenAI

Chapter 13: Using OpenAI and Azure OpenAI in Power BI Dataflows

Chapter 14: Project Review and Looking Forward

Index

Why subscribe?

Other Books You May Enjoy

Iterating Power BI ML models

In Chapter 8, you trained Power BI ML models using all of the features that you had selected for each of the three ML models – that is, Predict Damage ML, Predict Size ML, and Predict Height ML – using data from the FAA Wildlife Strike database. In Chapter 9, you evaluated the test results of the automated training and testing process that is part of Power BI. The test results helped you understand the strengths and weaknesses of the predictive models, along with details about features that contributed to correct predictions.

This chapter will revisit the findings from Chapter 9 and use them to decide if you need to modify and retrain the ML models to achieve better results via iterative development. The list of features that are used to train these ML models can be whittled down, the filter criteria can be adjusted, and the result of the new round of training and testing can be compared to those from Chapter 9.

Technical requirements

The requirements for this chapter are the same as the preceding chapters:

FAA Wildlife Strike data files from either the FAA website or the Packt GitHub site
A Power BI Pro license
One of the following Power BI licensing options for access to Power BI dataflows:
- Power BI Premium
- Power BI Premium Per User
One of the following options for getting data into the Power BI cloud service:
- Microsoft OneDrive (with connectivity to the Power BI cloud service)
- Microsoft Access and Power BI Gateway
- Azure Data Lake (with connectivity to the Power BI cloud service)

Considerations for ML model iterations

Numerous books have been written about ML and reasons that ML models perform well or poorly, including books from Packt Publishing. The purpose of this book is to help you learn Power BI so that you can explore the FAA Wildlife Strike data, analyze that data, and then create SaaS ML models. At this point in this book, you are at a crossroads. Do you continue to iterate these ML models in the SaaS tool? Have you demonstrated enough value to hand an ML model project over to a data science team who will improve upon the model using Azure ML or advanced tools? Or do you go back to your stakeholders, report your findings, and ask for guidance on the next steps? The following diagram shows a few options for the next steps you could consider:

Figure 10.1 – Possible next steps for your Power BI ML models

Rather than diving into the technicalities of ML theory, you will focus on a few possible causes of inaccuracy that...

Assessing the Predict Damage binary prediction ML model

The Predict Damage ML model that you built and reviewed in the previous two chapters is designed to predict the likelihood that damage was reported due to wildlife striking an aircraft. A few key metrics from the training report for that binary prediction model can be seen in the following table:

Assessing the Predict Size ML classification model

The Predict Size ML model was an attempt at building an ML classification model to predict if the size of a wildlife strike was Small, Medium, or Large. The following table shows some key metrics about the initial version of the ML model:

Metric name	Metric value	Comments
Area Under the Curve (AUC)	91%	The AUC indicates the performance of an ML model, with 100% being perfect. 50% would be random guessing, while less than 50% indicates predictions worse than random guessing.
Row Count for Training	23,356	The number of rows used to train the ML model.
Row Count for Testing	...

Metric Name	Metric Value	Comments
AUC	60%	The AUC indicates the performance of an ML model, with 100% being perfect. 60% is better than random guessing, but not very good!
Row Count for Training	11,368	Number of rows used to train the ML model
Row Count for Testing	2,841	Number of rows used to test against the trained ML model

Figure 10.5 – Key metrics...

Assessing the Predict Height ML regression model

The Predict Height ML model is a regression model that’s designed to predict the height at which an aircraft was impacted by wildlife. The regression ML model predicts a numeric value representing height in feet from the ground, at which an impact happened based on the features in the report. Features such as Speed, Distance, and Phase of Flight were listed as top predictors.

80% of the variation in the testing results is explained by the model. Is 80% good? It depends on the use case and the requirements! If the variation (R squared) is 100%, then the ML model will give perfect predictions. 80% could indicate that the predictions are good but that independent and random variables might be 100% impossible. Or, maybe a higher value is possible and the data is either missing important features or measures are inaccurate.

In this use case, common sense dictates that explaining 100% of the variation would be impossible. You...

Summary

In this chapter, you reviewed each of the ML models that you have built. You decided to seek guidance on the next steps for the Predict Damage ML model from either a data science team or your stakeholders. For the Predict Size ML model, you found only slight predictive value and will need to seek guidance for your next course of action. The Predict Height ML model improved when you added new filter criteria and whittled down the feature selection, and the results are promising. At this point, you must either work with a data science team or circle back with your stakeholders for guidance on future plans for the model.

In Chapter 11, you will bring in newly added data from the FAA Wildlife Strike database and run it through your Predict Damage ML model to test the results. In doing so, you will learn how to score new data with your ML model whenever data refreshes in Power BI. You will also explore opportunities to find new value by adding Microsoft OpenAI capabilities to...

The rest of the chapter is locked

You have been reading a chapter from

Power BI Machine Learning and OpenAI

Published in: May 2023 Publisher: Packt ISBN-13: 9781837636150

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime}

Authors (1)

Greg Beaumont

Greg Beaumont is a data architect at Microsoft, where he enjoys identifying and solving complex problems backed by his experience in data architecture and a passion for innovation. Focusing on the healthcare industry, Greg works closely with customers to plan enterprise analytics strategies, evaluate new tools and products, conduct training sessions and hackathons, and architect solutions that improve the quality of care and reduce costs. He strives to be a trusted advisor to his customers and is always seeking new ways to drive progress and help organizations thrive. He is a veteran of the Microsoft data speaker network and has worked with hundreds of customers on their data management and analytics strategies.

See other products by Greg Beaumont