Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Extending Power BI with Python and R - Second Edition

You're reading from  Extending Power BI with Python and R - Second Edition

Product type Book
Published in Mar 2024
Publisher Packt
ISBN-13 9781837639533
Pages 814 pages
Edition 2nd Edition
Languages
Author (1):
Luca Zavarella Luca Zavarella
Profile icon Luca Zavarella

Table of Contents (27) Chapters

Preface Where and How to Use R and Python Scripts in Power BI Configuring R with Power BI Configuring Python with Power BI Solving Common Issues When Using Python and R in Power BI Importing Unhandled Data Objects Using Regular Expressions in Power BI Anonymizing and Pseudonymizing Your Data in Power BI Logging Data from Power BI to External Sources Loading Large Datasets Beyond the Available RAM in Power BI Boosting Data Loading Speed in Power BI with Parquet Format Calling External APIs to Enrich Your Data Calculating Columns Using Complex Algorithms: Distances Calculating Columns Using Complex Algorithms: Fuzzy Matching Calculating Columns Using Complex Algorithms: Optimization Problems Adding Statistical Insights: Associations Adding Statistical Insights: Outliers and Missing Values Using Machine Learning without Premium or Embedded Capacity Using SQL Server External Languages for Advanced Analytics and ML Integration in Power BI Exploratory Data Analysis Using the Grammar of Graphics in Python with plotnine Advanced Visualizations Interactive R Custom Visuals Other Books You May Enjoy
Index
Appendix 1: Answers
Appendix 2: Glossary

Technical requirements

This chapter requires you to have a working internet connection and Power BI Desktop already installed on your machine (we used the version 2.112.603.0 64-bit, December 2022). R Studio was updated at version 2022.12.0 Build 353. You must have properly configured the R and Python engines and IDEs as outlined in Chapter 2, Configuring R with Power BI, and Chapter 3, Configuring Python with Power BI.

Importing RDS files in R

In this section, you will develop mainly R code. We will provide you with various examples to demonstrate the concepts. First, we will give you an overview of what we are going to do, outlining the steps involved. Then, we will proceed to walk you through the process, explaining each step in detail. If you have little experience with R, you should familiarize yourself with the data structures that R provides by starting with this quickstart: http://bit.ly/r-data-struct-quickstart. Take a look at the References section for more in-depth information.

A brief introduction to Tidyverse

A data scientist using R as an analytical language for data analysis and data science must know the set of packages that goes by the name of Tidyverse (https://www.tidyverse.org). It provides everything needed for data wrangling and data visualization, giving the analyst a consistent approach to the entire ecosystem of packages it provides. In this way, it tries to heal the initial...

Importing PKL files in Python

Let's give you an overview of what you're going to implement using the Python code on GitHub. If you are not familiar with Python, you should familiarize yourself with the basic structures through this tutorial: http://bit.ly/py-data-struct-quickstart. For a more detailed study of how to implement algorithms and data structures in Python, we suggest this free e-book: http://bit.ly/algo-py-ebook.

A very short introduction to the PyData world

The PyData world is made up of users and developers who are passionate about data analytics and love to use open source data tools. The PyData community also loves to share best practices, new approaches, and emerging technologies for managing, processing, analyzing, and visualizing data. The most important and popular packages used by the Python data management community are as follows:

  • NumPy: This is the main library for scientific computing in Python. It provides a high-performance multidimensional array...

Summary

In this chapter, you got to learn about the Tidyverse approach to R development and how to serialize R objects to files. After that, you learned how to use these serialized files, both in Power Query Editor and in R visuals.You then approached the same issues using Python. Specifically, you learned which packages are most used by the PyData community, learned how to serialize Python objects to files, and how to use them in Power BI, both in Power Query Editor and in Python visuals.In the next chapter, you'll have a chance to learn how powerful regular expressions are and what benefits they can bring to your Power BI reports.

References

For additional reading, check out the following books and articles:

Test your knowledge

  • Q01. What could be the reason why you may want to import serialized files into R (.rds) or Python (.pkl)?
  • Q02. Is there a specific format of an R object that needs to be serialized so that it can then be deserialized in Power BI?
  • Q03. Why use an alternate method to inject a serialized object from a Python or R script step in Power Query into a Python or R script visual when it is possible to deserialize the object directly in the visual?
  • Q04. Can you briefly summarize the alternative method for injecting a serialized object from Power Query into a script visual?
  • Q05. Why is it important to provide a relationship between the object name table (used in the slicer) and the table containing the byte string representation of objects and their names?

Answers

  • A01. It may happen that a team of Data Scientists makes use of a lot of computational resources for a long time in order to generate an object using Python or R. For the purpose of being able to reuse the result...

Summary

In this chapter, you were introduced to the basics of how to use regexes. Using the bare minimum, you were able to effectively validate strings representing email addresses and dates in Power BI, using both Python and R.

You also learned how to extract information from semi-structured log files using regexes and how to import the extracted information into Power BI in a structured way.

Finally, you learned how to use regex in Python and R to extract information from seemingly unprocessable free text thanks to the real-world case of notes associated with sales orders.

In the next chapter, you’ll learn how to use some de-identification techniques in Power BI to anonymize or pseudonymize datasets that show sensitive data about individuals in plain text before they are imported into Power BI.

References

For additional reading, please refer to the following books and articles:

Test your knowledge

  1. What are the main purposes of using regexes?
  2. What are the main Python packages that implement regex functionality?
  3. What are the main R packages that implement regex functionality?
  4. Can you briefly summarize the method used to extract useful information from a free-text note?

Learn more on Discord

To join the Discord community for this book – where you can share feedback, ask questions to the author, and learn about new releases – follow the QR code below:

https://discord.gg/MKww5g45EB

lock icon The rest of the chapter is locked
You have been reading a chapter from
Extending Power BI with Python and R - Second Edition
Published in: Mar 2024 Publisher: Packt ISBN-13: 9781837639533
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}