Reader small image

You're reading from  Extending Power BI with Python and R - Second Edition

Product typeBook
Published inMar 2024
Reading LevelIntermediate
PublisherPackt
ISBN-139781837639533
Edition2nd Edition
Languages
Right arrow
Author (1)
Luca Zavarella
Luca Zavarella
author image
Luca Zavarella

Luca Zavarella has a rich background as an Azure Data Scientist Associate and Microsoft MVP, with a Computer Engineering degree from the University of L'Aquila. His decade-plus experience spans the Microsoft Data Platform, starting as a T-SQL developer on SQL Server 2000 and 2005, then mastering the full suite of Microsoft Business Intelligence tools (SSIS, SSAS, SSRS), and advancing into data warehousing. Recently, his focus has shifted to advanced analytics, data science, and AI, contributing to the community as a speaker and blogger, especially on Medium. Currently, he leads the Data & AI division at iCubed, and he also holds an honors degree in classical piano from the "Alfredo Casella" Conservatory in L'Aquila.
Read more about Luca Zavarella

Right arrow

Technical requirements

This chapter requires you to have a working internet connection and Power BI Desktop already installed on your machine (we used the version 2.112.603.0 64-bit, December 2022). R Studio was updated at version 2022.12.0 Build 353. You must have properly configured the R and Python engines and IDEs as outlined in Chapter 2, Configuring R with Power BI, and Chapter 3, Configuring Python with Power BI.

Importing RDS files in R

In this section, you will develop mainly R code. We will provide you with various examples to demonstrate the concepts. First, we will give you an overview of what we are going to do, outlining the steps involved. Then, we will proceed to walk you through the process, explaining each step in detail. If you have little experience with R, you should familiarize yourself with the data structures that R provides by starting with this quickstart: http://bit.ly/r-data-struct-quickstart. Take a look at the References section for more in-depth information.

A brief introduction to Tidyverse

A data scientist using R as an analytical language for data analysis and data science must know the set of packages that goes by the name of Tidyverse (https://www.tidyverse.org). It provides everything needed for data wrangling and data visualization, giving the analyst a consistent approach to the entire ecosystem of packages it provides. In this way, it tries to heal the initial...

Importing PKL files in Python

Let's give you an overview of what you're going to implement using the Python code on GitHub. If you are not familiar with Python, you should familiarize yourself with the basic structures through this tutorial: http://bit.ly/py-data-struct-quickstart. For a more detailed study of how to implement algorithms and data structures in Python, we suggest this free e-book: http://bit.ly/algo-py-ebook.

A very short introduction to the PyData world

The PyData world is made up of users and developers who are passionate about data analytics and love to use open source data tools. The PyData community also loves to share best practices, new approaches, and emerging technologies for managing, processing, analyzing, and visualizing data. The most important and popular packages used by the Python data management community are as follows:

  • NumPy: This is the main library for scientific computing in Python. It provides a high-performance multidimensional array...

Summary

In this chapter, you got to learn about the Tidyverse approach to R development and how to serialize R objects to files. After that, you learned how to use these serialized files, both in Power Query Editor and in R visuals.You then approached the same issues using Python. Specifically, you learned which packages are most used by the PyData community, learned how to serialize Python objects to files, and how to use them in Power BI, both in Power Query Editor and in Python visuals.In the next chapter, you'll have a chance to learn how powerful regular expressions are and what benefits they can bring to your Power BI reports.

References

For additional reading, check out the following books and articles:

Test your knowledge

  • Q01. What could be the reason why you may want to import serialized files into R (.rds) or Python (.pkl)?
  • Q02. Is there a specific format of an R object that needs to be serialized so that it can then be deserialized in Power BI?
  • Q03. Why use an alternate method to inject a serialized object from a Python or R script step in Power Query into a Python or R script visual when it is possible to deserialize the object directly in the visual?
  • Q04. Can you briefly summarize the alternative method for injecting a serialized object from Power Query into a script visual?
  • Q05. Why is it important to provide a relationship between the object name table (used in the slicer) and the table containing the byte string representation of objects and their names?

Answers

  • A01. It may happen that a team of Data Scientists makes use of a lot of computational resources for a long time in order to generate an object using Python or R. For the purpose of being able to reuse the result...

Summary

In this chapter, you were introduced to the basics of how to use regexes. Using the bare minimum, you were able to effectively validate strings representing email addresses and dates in Power BI, using both Python and R.

You also learned how to extract information from semi-structured log files using regexes and how to import the extracted information into Power BI in a structured way.

Finally, you learned how to use regex in Python and R to extract information from seemingly unprocessable free text thanks to the real-world case of notes associated with sales orders.

In the next chapter, you’ll learn how to use some de-identification techniques in Power BI to anonymize or pseudonymize datasets that show sensitive data about individuals in plain text before they are imported into Power BI.

References

For additional reading, please refer to the following books and articles:

Test your knowledge

  1. What are the main purposes of using regexes?
  2. What are the main Python packages that implement regex functionality?
  3. What are the main R packages that implement regex functionality?
  4. Can you briefly summarize the method used to extract useful information from a free-text note?

Learn more on Discord

To join the Discord community for this book – where you can share feedback, ask questions to the author, and learn about new releases – follow the QR code below:

https://discord.gg/MKww5g45EB

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Extending Power BI with Python and R - Second Edition
Published in: Mar 2024Publisher: PacktISBN-13: 9781837639533
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Luca Zavarella

Luca Zavarella has a rich background as an Azure Data Scientist Associate and Microsoft MVP, with a Computer Engineering degree from the University of L'Aquila. His decade-plus experience spans the Microsoft Data Platform, starting as a T-SQL developer on SQL Server 2000 and 2005, then mastering the full suite of Microsoft Business Intelligence tools (SSIS, SSAS, SSRS), and advancing into data warehousing. Recently, his focus has shifted to advanced analytics, data science, and AI, contributing to the community as a speaker and blogger, especially on Medium. Currently, he leads the Data & AI division at iCubed, and he also holds an honors degree in classical piano from the "Alfredo Casella" Conservatory in L'Aquila.
Read more about Luca Zavarella