Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Extending Power BI with Python and R - Second Edition

You're reading from  Extending Power BI with Python and R - Second Edition

Product type Book
Published in Mar 2024
Publisher Packt
ISBN-13 9781837639533
Pages 814 pages
Edition 2nd Edition
Languages
Author (1):
Luca Zavarella Luca Zavarella
Profile icon Luca Zavarella

Table of Contents (27) Chapters

Preface 1. Where and How to Use R and Python Scripts in Power BI 2. Configuring R with Power BI 3. Configuring Python with Power BI 4. Solving Common Issues When Using Python and R in Power BI 5. Importing Unhandled Data Objects 6. Using Regular Expressions in Power BI 7. Anonymizing and Pseudonymizing Your Data in Power BI 8. Logging Data from Power BI to External Sources 9. Loading Large Datasets Beyond the Available RAM in Power BI 10. Boosting Data Loading Speed in Power BI with Parquet Format 11. Calling External APIs to Enrich Your Data 12. Calculating Columns Using Complex Algorithms: Distances 13. Calculating Columns Using Complex Algorithms: Fuzzy Matching 14. Calculating Columns Using Complex Algorithms: Optimization Problems 15. Adding Statistical Insights: Associations 16. Adding Statistical Insights: Outliers and Missing Values 17. Using Machine Learning without Premium or Embedded Capacity 18. Using SQL Server External Languages for Advanced Analytics and ML Integration in Power BI 19. Exploratory Data Analysis 20. Using the Grammar of Graphics in Python with plotnine 21. Advanced Visualizations 22. Interactive R Custom Visuals 23. Other Books You May Enjoy
24. Index
Appendix 1: Answers
1. Appendix 2: Glossary

Loading Large Datasets Beyond the Available RAM in Power BI

In the previous chapter, you learned how to read from and write to a CSV file using both Python and R. When it comes to reading a file, whether you use Power BI’s standard data import feature or the techniques shown in the previous chapter, the main limitation on the file size is the amount of RAM available on the machine where Power BI Desktop is installed.

In a data enrichment phase, it may be necessary to extract information needed for ongoing analysis from very large files (terabytes in size). In these cases, it is almost always necessary to implement big data solutions to be able to handle such large amounts of data. Very often, however, it is necessary to import files that are slightly larger than the available memory in order to extract aggregate information and then persist it in a small table for reuse during processing. In such cases, you don’t need to worry about sophisticated big data platforms...

Join our book community on Discord

https://packt.link/EarlyAccessCommunity

Qr code Description automatically generated

As you've learned in previous chapters, Power BI uses Power Query as a tool for extract, transform, and load (ETL) operations. This tool is really very powerful – it allows you to extract data from a wide variety of data sources and then easily transform it with very user-friendly options in order to persist it into the Power BI data model. It is a tool that can only read information from the outside. In fact, Power Query’s biggest limitation is its inability to write information outside of Power BI. However, by integrating analytical languages such as Python and R, you can persist information about Power Query loading and transformation processes to external files or systems. In this chapter, you will learn the following topics:

  • Logging to CSV files
  • Logging to Excel files
  • Logging to (Azure) SQL Server

Technical requirements

This chapter requires you to have a working internet connection and Power BI Desktop already installed on your machine (we used the version 2.114.664.0 64-bit, February 2022). You must have properly configured the R and Python engines and IDEs as outlined in Chapter 2, Configuring R with Power BI, and Chapter 3, Configuring Python with Power BI.

Logging to CSV files

One of the most widely used formats for logging tabular structured information to files is comma-separated values (CSV). Because a CSV file is still a flat text file, CSV is the most popular format for exchanging information between heterogeneous applications.A CSV file is a representation of a rectangular dataset (matrix) containing numeric or string columns. Each row of the matrix is represented by a list of values (one for each column), separated by commas, and should have the same number of values. Sometimes, other value delimiters may be used, such as tab (\t), colon (:), and semicolon (;) characters. The first row might contain the column heads. Usually, a line break, is used as a row delimiter, which is a CRLF (Carriage Return Line Feed) character (usually entered as \r\n), or simply by LF (\n) on Unix systems. Thus, an example of the contents of a CSV file might be as follows:

Figure 8.1 – Example of CSV file content

Note that spaces become part of...

Logging to Excel files

As you probably already know, Microsoft Excel is spreadsheet program available in the Microsoft Office suite. It's one of the most widely used tools in the world for storing and organizing data in a spreadsheet format. It is very popular in organizations because it allows business data to be shared between departments and allows individual users to perform their own data analysis directly and quickly without the help of the IT department.Early versions of Excel stored information in files of the Excel Sheet (XLS) format files. This is a proprietary Microsoft format, based on the Binary Interchange File Format (BIFF). It was the default format for versions from v7.0 (Excel 95) through v11.0 (Excel 2003). From version 8.0 to 11.0 the XLS format can handle 64K (216 = 65,536) rows and 256 columns (28). Starting with version v12.0 (Excel 2007), the default format has changed to Excel Open XML Spreadsheet (XLSX). This is based on the Office Open XML format, and it...

Logging to (Azure) SQL Server

In the vast majority of organizations, business information is stored in a Relational Database Management System (RDBMS). Microsoft's quintessential relational database is SQL Server in its on-premises version if the company has adopted the Microsoft data platform. Otherwise it is Azure SQL Server (or maybe Azure Synapse), which is a Platform as a Service (PaaS), cloud-hosted database.In general, it is a good idea to centralize all of a company's critical information in a single repository. That's why it can be useful to know how to log information from within a Power BI process to a SQL Server database or an Azure SQL database.If you already have access to an on-premises instance of SQL Server or Azure SQL Server, you just need to make sure that the ODBC Driver for SQL Server is installed on your machine. In fact, both Python and R will connect to (Azure) SQL Server via an ODBC connection. You can install the driver directly on your machine...

Summary

In this chapter, you learned how to log some information processed in Power Query to CSV files, Excel, on-premises SQL Server, and Azure SQL Server, in both Python and R, using very simple and straightforward commands.In the next chapter, you will see how to handle very large CSV files that cannot be loaded from Power BI Desktop due to the RAM size of your machine not being sufficient.

References

For additional reading, check out the following books and articles:

lock icon The rest of the chapter is locked
You have been reading a chapter from
Extending Power BI with Python and R - Second Edition
Published in: Mar 2024 Publisher: Packt ISBN-13: 9781837639533
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}