Reader small image

You're reading from  Extending Power BI with Python and R - Second Edition

Product typeBook
Published inMar 2024
Reading LevelIntermediate
PublisherPackt
ISBN-139781837639533
Edition2nd Edition
Languages
Right arrow
Author (1)
Luca Zavarella
Luca Zavarella
author image
Luca Zavarella

Luca Zavarella has a rich background as an Azure Data Scientist Associate and Microsoft MVP, with a Computer Engineering degree from the University of L'Aquila. His decade-plus experience spans the Microsoft Data Platform, starting as a T-SQL developer on SQL Server 2000 and 2005, then mastering the full suite of Microsoft Business Intelligence tools (SSIS, SSAS, SSRS), and advancing into data warehousing. Recently, his focus has shifted to advanced analytics, data science, and AI, contributing to the community as a speaker and blogger, especially on Medium. Currently, he leads the Data & AI division at iCubed, and he also holds an honors degree in classical piano from the "Alfredo Casella" Conservatory in L'Aquila.
Read more about Luca Zavarella

Right arrow

R script visuals limitations

R script visuals have some important limitations regarding the data they can handle, both as input and output:

  • An R script visual can handle a data frame with only 150,000 rows. If there are more than 150,000 rows, only the first 150,000 rows are used and a relevant message is displayed on the image.
  • R script visuals have an output size limit of 2 MB.

You must also be careful not to exceed the 5 minutes of runtime calculation for an R script visual in order to avoid a time-out error. Moreover, in order not to run into performance problems, note that the resolution of the R script visual plots is fixed at 72 DPI.As you can imagine, some limitations of R script visuals are different depending on whether you run the visual on Power BI Desktop or the Power BI service.If you think you need to develop reports intended only for Power BI Desktop, without the need to publish them on the service, you can do any of the following:

  • Use any kind of package (CRAN, GitHub...

Summary

In this chapter, you learned about the most popular free R engines in the community. In particular, you learned about the performance advantages introduced by Microsoft in its distribution of R, even if this distribution will be retired in the near future. You also learned how to enhance the standard CRAN R with the oneMKL libraries for multi-threaded calculations.Taking note of the unique features of Power BI Desktop and the Power BI service, you have learned how to properly choose the engines and how to install them.You have also learned about the most popular IDE in the R community and how to install it.In addition, you were introduced to all of the best practices for properly configuring both Power BI Desktop and the Power BI service with R, whether in a development or enterprise environment.Finally, you learned about some of the limitations of using R with Power BI, knowledge of which is critical to avoid making mistakes in developing and deploying reports.In the next chapter...

Test your knowledge

Q01. What are the most popular R engines to date and which of them will be phased out?Q02. What is the most obvious advantage of Microsoft's R distributions?Q03. Is it possible to introduce the benefits in question 2 for CRAN R as well? If so, can this be done directly from the software installer?Q04. You decide to install the latest available version of CRAN R, and through this, you develop some transformation steps in Power Query. What do you need to be able to publish your report to Power BI service and allow it to be refreshed regularly?Q05. Suppose you have installed only the R engine mentioned in Question 4 and you also add a plot made with an R script visual to the same report mentioned in Question 4. What are the problems you’ll encounter on Power BI Desktop? What are the problems you’ll encounter when you publish the report to the Power BI service?Q06. Is it possible to refresh datasets of a published report using R scripts without having...

Creating an environment for data transformations using pip

Contrary to what you have seen with R engines, for which two separate installations of two engines with different versions was done, in the case of Python there is one single installation and only the environments vary.

Here, we will create an environment dedicated to data transformations and containing one of the Python versions made available by Miniconda along with a small number of packages essential to make the initial transformations. In general, it is best to avoid installing the latest versions of Python (especially if they were only released a few months ago), because often all the major packages you will use will take some time to be updated to support the latest versions.

First of all, you have to find the most recent version of Python present in the distribution you just installed:

  1. Open Anaconda Prompt from the Start menu as shown previously.
  2. If the prompt has small fonts, just right-click...

Creating an optimized environment for data transformations using conda

As you have already seen in Chapter 2, Configuring R with Power BI, the specific operations of numerical linear algebra are handled through the specialized libraries BLAS and LAPACK. These libraries now define a standard interface for those types of operations. But depending on which implementation of them you choose, there can be significant differences in performance.

There are many implementations made under different licenses. For example, the standard BLAS and LAPACK libraries that you often find pre-installed on your system (such as those in CRAN R), do not support multi-threading. The Automatically Tuned Linear Algebra Software (ATLAS) implementation achieves good performance and uses the BSD license. Then there is OpenBLAS, another open-source implementation with very good performance due to multi-threading. Finally, we have the Intel oneAPI Math Kernel Library (MKL), optimized for Intel multi-core...

Creating an environment for Python script visuals on the Power BI service

As already mentioned, Python script visuals published on the Power BI service run on a pre-installed Python engine on the cloud, the version of which may change with new releases of the Power BI service itself. Should you need to share a report containing a Python visual with colleagues, you need to be sure that your Python code works correctly on the pre-installed engine.

TIP

We strongly recommend that you also create on your machine an environment with the same version of Python that is used for Python visuals by the Power BI service.

Keep in mind that these limitations would not be there if your reports using Python visuals were not to be shared and you only used them through Power BI Desktop. In this case, it is the engine on your machine that is used for the visuals.

To create the new environment, you must check which versions of both Python and the allowed packages are supported...

What to do when the Power BI service upgrades the Python engine

As we did in Chapter 2, Configuring R with Power BI, let’s assume that you have already developed and published reports containing Python visuals using the new environment you created earlier. Suppose that Microsoft decides to upgrade the Python version supported by the Power BI service, and consequently to upgrade the versions of the currently supported packages as well. As you may have already guessed, it is likely that these updates can cause the code to fail (it is a rare event as very often, backward compatibility is guaranteed).

TIP

In such circumstances, it is often more convenient to create a new environment on the fly, aligned to the updated requirements from Microsoft, through the Python script you have already used previously. Next, you’ll need to test reports containing Python visuals that were already published on the service on Power BI Desktop, making sure that they reference...

Installing an IDE for Python development

In Chapter 2, Configuring R with Power BI, you installed RStudio to conveniently develop your own R scripts. Did you know that, starting with version 1.4, you can write and run Python code directly in RStudio, making use of advanced tools for viewing instantiated Python objects?

Let’s see how to configure your RStudio installation to also run Python code.

Configuring Python with RStudio

In order to allow RStudio to communicate with the Python world, you need to install a package called reticulate, which contains a comprehensive set of tools for interoperability between Python and R thanks to embedded Python sessions within R sessions. After that, it’s a breeze to configure Python within RStudio. Let’s see how to do it:

  1. Open RStudio and make sure the referenced engine is the latest one, in our case CRAN 4.2.2. As seen in Chapter 2, Configuring R with Power BI, you can set up your R engine in RStudio...

Configuring Power BI Desktop to work with Python

Since you have everything you need installed, you can now configure Power BI Desktop to interact with Python engines and IDEs. This is really a very simple task:

  1. In Power BI Desktop, go to the File menu, click on the Options and settings tab, and then click on Options.
  2. In the Options window, click on the Python scripting link on the left. The contents of the panel on the right will update, giving you the option to select the Python environment to reference and the Python IDE to use for Python visuals.

    In order to select a specific environment, you need to choose Other and then click Browse and supply a reference to your environment folder:

    Graphical user interface, text, application  Description automatically generated

    Figure 3.36: Configuring your Python environment and IDE in Power BI

    Usually, you can find the default environments folder in C:\ProgramData\Miniconda3\envs\ (for all-user installations) or C:\Users\<username>\miniconda3\envs\ (for just your user installation...

Limitations of Python visuals

Python visuals have some important limitations regarding the data they can handle, both input and output:

  • A Python visual can handle a dataframe of up to 150,000 rows only. If there are more than 150,000 rows, only the first 150,000 rows are used.
  • Python visuals have an output size limit of 2 MB.

You must also be careful not to exceed the 5-minute runtime calculation for a Python visual in order to avoid a time-out error. Moreover, in order not to run into performance problems, the resolution of the Python visual plots is fixed at 72 DPI.

As you can imagine, some limitations of Python script visuals are different depending on whether you run the visual on Power BI Desktop or the Power BI service.

If you think you need to develop reports intended only for the Power BI Desktop, without the need to publish them on the service, you can do any of the following:

  • Install any kind of package (Conda, PyPI, or custom...

Summary

In this chapter, you learned about the most popular free Python distributions in the community and the best practices for their use.

Using the unique features of Power BI Desktop and the Power BI service, you have learned how to properly create specific Python environments.

You also learned that the most popular IDE in the R community (RStudio) can also run Python code. In addition, you have installed and configured VS Code, which is to date one of the most widely used advanced editors for Python.

You were also introduced to all of the best practices for properly configuring both Power BI Desktop and the Power BI service with Python, whether in a development or enterprise environment.

Finally, you’ve learned some of the limitations of using Python with Power BI, knowledge of which is critical to avoid making mistakes when developing and deploying reports.

In the next chapter, we will show you the most common problems you might run into when using...

Test your knowledge

Keep in mind that many of the questions asked in Chapter 2 about R also apply to Python. It is therefore recommended that you answer those as well, if you have not already done so.

  1. Which Python distributions are most widely used by data scientists?
  2. What are the most commonly used tools for installing packages in Python? What are their most important differences?
  3. How many instances of the engine should be installed to ensure reproducibility of results for Python scripts in Power Query and Python script visuals?
  4. A colleague of yours has prepared a virtual environment dedicated to Python script visuals, containing the latest Python release allowed by Miniconda. Did he follow the best practices stated in this chapter?
  5. Suppose you created a report with a Python script visual that highlights insights from a dataset created by uploading data from an HTML table available on the web. What do you need to allow a refresh of the data once...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Extending Power BI with Python and R - Second Edition
Published in: Mar 2024Publisher: PacktISBN-13: 9781837639533
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Luca Zavarella

Luca Zavarella has a rich background as an Azure Data Scientist Associate and Microsoft MVP, with a Computer Engineering degree from the University of L'Aquila. His decade-plus experience spans the Microsoft Data Platform, starting as a T-SQL developer on SQL Server 2000 and 2005, then mastering the full suite of Microsoft Business Intelligence tools (SSIS, SSAS, SSRS), and advancing into data warehousing. Recently, his focus has shifted to advanced analytics, data science, and AI, contributing to the community as a speaker and blogger, especially on Medium. Currently, he leads the Data & AI division at iCubed, and he also holds an honors degree in classical piano from the "Alfredo Casella" Conservatory in L'Aquila.
Read more about Luca Zavarella