You're reading from Extending Power BI with Python and R - Second Edition

Product type Book

Published in Mar 2024

Publisher Packt

ISBN-13 9781837639533

Pages 814 pages

Edition 2nd Edition

Languages

Python

Concepts

Business Intelligence

Author (1):

Luca Zavarella

Table of Contents (27) Chapters

Preface

1. Where and How to Use R and Python Scripts in Power BI

2. Configuring R with Power BI

3. Configuring Python with Power BI

4. Solving Common Issues When Using Python and R in Power BI

5. Importing Unhandled Data Objects

6. Using Regular Expressions in Power BI

7. Anonymizing and Pseudonymizing Your Data in Power BI

8. Logging Data from Power BI to External Sources

9. Loading Large Datasets Beyond the Available RAM in Power BI

10. Boosting Data Loading Speed in Power BI with Parquet Format

11. Calling External APIs to Enrich Your Data

12. Calculating Columns Using Complex Algorithms: Distances

13. Calculating Columns Using Complex Algorithms: Fuzzy Matching

14. Calculating Columns Using Complex Algorithms: Optimization Problems

15. Adding Statistical Insights: Associations

16. Adding Statistical Insights: Outliers and Missing Values

17. Using Machine Learning without Premium or Embedded Capacity

18. Using SQL Server External Languages for Advanced Analytics and ML Integration in Power BI

19. Exploratory Data Analysis

20. Using the Grammar of Graphics in Python with plotnine

21. Advanced Visualizations

22. Interactive R Custom Visuals

23. Other Books You May Enjoy

24. Index

Appendix 1: Answers

1. Appendix 2: Glossary

Using Machine Learning without Premium or Embedded Capacity

Advances in computing power have made data analysis much more powerful and efficient. In particular, with the advent of machine learning (ML) models, you can now gain valuable insights and enrich your analysis effortlessly. Fortunately, Power BI includes several AI capabilities that seamlessly integrate with ML models, allowing you to act on these insights immediately. Within the Power BI ecosystem, there are integrated tools designed to enhance your analysis with ML. These tools are tightly integrated with Power BI Desktop and Power BI dataflows, allowing you to leverage ML models created by your data scientists on Azure Machine Learning. In addition, Power BI allows you to harness the power of models trained and deployed through Azure AutoML. Furthermore, you can easily access services exposed by Cognitive Services directly through an easy-to-use graphical interface.

It is important to note that these tools (known as...

Technical requirements

This chapter requires you to have a working internet connection and Power BI Desktop already installed on your machine (version: 2.119.986.0 64-bit, July 2023). You must have properly configured the R and Python engines and IDEs as outlined in Chapter 2, Configuring R with Power BI, and Chapter 3, Configuring Python with Power BI.

Interacting with ML in Power BI with dataflows

You can access advanced AI features directly through Power BI Desktop or you can access advanced AI features for dataflows through Power BI dataflows, which are easy-to-use self-service tools for transforming big data into insights to be shown in dashboards. But, as mentioned in the introduction, both modes require specific licenses.

These features are accessible from Power BI Desktop, in the Power Query Home ribbon (click Transform data to access Power Query):

A screenshot of a computer Description automatically generated

Figure 17.1: AI insights in Power BI Desktop

By default, these icons are disabled in Power BI Desktop if you are not logged in with your user in a tenant with one of the following Premium capacity SKUs: EM2, A2, or P1 and above. You can still use these features if you have a PPU license, but not through Power BI Desktop (unless you have set up Azure AI Services for text analytics), only by creating a dataflow directly in the Power BI portal:

Figure 17...

Using AutoML solutions

Writing code from scratch to perform ML requires specific knowledge that a general analyst using Power BI often doesn’t have. That’s why we recommend using AutoML processes from here on out for analysts who don’t have a data science background. Does this mean that anyone can build an accurate ML model without knowing the theory behind the science, just by using AutoML algorithms? Absolutely not! It’s important to keep the following in mind:

IMPORTANT NOTE

An AutoML tool relieves the analyst of all the repetitive tasks typical of a ML process (hyperparameter tuning, model selection, etc.). Often, the tasks that require specific theoretical knowledge on the part of the analyst (e.g., missing value imputation, dataset balancing strategies, feature selection, and feature engineering) are left out of the automated steps. As a result, if the appropriate transformations known only to an expert are not applied to the dataset...

Embedding training code in Power Query

One of the easiest ways to train a ML model is to write the necessary code directly in Power Query, right after you import a dataset on which you want to build the model.

Training a model on a fairly large dataset typically takes a significant amount of time. Because you embed the code in Power Query, it runs every time the data is refreshed, and this can result in a non-negligible delay in getting the data online. Therefore, the following applies:

IMPORTANT NOTE

This solution is recommended if you are confident that the time required to complete the model training is acceptable.

Now let’s look at an example of how to write some training code with PyCaret.

Training and using ML models with PyCaret

Let’s take the Titanic disaster dataset to train a ML model. Specifically, we want to build a model that predicts whether a passenger survives (the Survived column) based on their attributes described...

Using trained models in Power Query

As you already saw in Chapter 5, Importing Unhandled Data Objects, you used to pass objects that were the result of complex, time-consuming processing (including a ML model) in a serialized format specific to the language you were using. At that point, it was very easy to deserialize the file and get the model ready to use in Power Query to predict the target variable from new observations. However, it is important to know the dependencies needed by the scoring function (which takes the new observations as input and returns the predictions) because they are closely related to how the model was trained.

For this reason, we recommend the following:

IMPORTANT NOTE

If you need to use a serialized ML model provided by a third party, make sure that whoever developed it also provides you with a working scoring function in order to avoid unnecessary headaches when predicting target values for unknown observations.

If you think...

Using trained models in script visuals

As you learned in Chapter 5, Importing Unhandled Data Objects, thanks to object serialization and its string representation, you can import any object into a Python or R visual as a DataFrame of strings. Once this DataFrame is available in the script visual, you can convert it back to the original object using inverse deserialization transformations. Of course, since you can do what we’ve described with any object, you can also do it with ML models that have already been trained outside of Power BI.

When the appropriately deserialized model is available in the script visual session, new observations can be immediately predicted using the scoring function described in the previous section.

The first thing you might ask is what’s the point of being able to score a dataset within a script visual if the data always has to be available first in the Power BI data model to be used in the visual? In fact, if the data of the observations...

Calling web services in Power Query

Another way to interact with ML models in Power Query is to call web services. As you may already know, a ML model can be used to score many observations in batch mode against a trained model (the process described earlier). Another way to interact with a ML model is to expose it as a web service so that it can be called via REST APIs. You learned how to work with external APIs in Chapter 11, Calling External APIs to Enrich Your Data.

IMPORTANT NOTE

Keep in mind that you can’t consume external services via REST API calls from a Python or R visual because internet access is blocked for security reasons. Therefore, you can only consume these services in Power Query.

As an example, in this section, you’ll see how to invoke predictions from a released endpoint via Azure Machine Learning, and how to use the services exposed by the Azure Text Analytics feature of Cognitive Services. You could use some M code in...

Summary

In this chapter, you learned how Power BI interacts with Microsoft AI services by default through the Power BI Desktop and dataflow features. You also learned that by using AutoML platforms, you can get around the licensing issue (PPU license or Premium capacity) that Power BI presents for interfacing with Microsoft AI services. You used both an on-premises AutoML solution (PyCaret) and Azure AutoML on the cloud to solve a binary classification problem. You also used Azure AI Language to perform some sentiment analysis directly using a Python SDK.

You’ve learned that AI enrichment is mostly done in Power Query (which provides access to the web using the data gateway), although you’ve seen a case where it can be convenient to use a ML model directly in a Python visual.

In the next chapter, you will see how to implement data exploration of your dataset in Power BI.

References

For additional reading, check out the following books and articles:

Use AI Insights in Power BI Desktop (https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-ai-insights)
AI with dataflows (https://docs.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-machine-learning-integration)
PyCaret repository (https://github.com/pycaret/pycaret)
FLAML repository (https://github.com/microsoft/FLAML)
A Review of Azure Automated Machine Learning (AutoML) (https://medium.com/microsoftazure/a-review-of-azure-automated-machine-learning-automl-5d2f98512406)
Automated Machine Learning with Microsoft Azure, by Dennis Michael Sawyers, Packt Publishing (https://www.amazon.com/Automated-Machine-Learning-Microsoft-Azure/dp/1800565313/)
A Gentle Introduction to Concept Drift in Machine Learning (https://machinelearningmastery.com/gentle-introduction-concept-drift-machine-learning/)
Machine Learning Basics with the...

Test your knowledge

What AutoML solution language is recommended for Power BI analysts to use?
What precautions should you take when incorporating training code into Power Query?
What is the benefit of FLAML’s AutoML feature in terms of model compatibility?
What is the importance of having a working scoring function in Power Query when using a serialized ML model?
How can you solve the delay problem caused by training the model directly in Power Query?
Why is it recommended to train and manage ML models outside of Power BI?
What is the purpose of being able to score a dataset within Script Visual?
How can I interact with third-party ML models in Power Query if I do not have the serialized model?
Why can’t you consume external services via REST API calls from a Python or R visual in Power BI?
What is Azure AI Language and what services does it provide?