You're reading from Extending Power BI with Python and R - Second Edition

Product typeBook

Published inMar 2024

Reading LevelIntermediate

PublisherPackt

ISBN-139781837639533

Edition2nd Edition

Languages

Python

Tools

Power BI

Concepts

Business Intelligence

Author (1)

Luca Zavarella

Calling External APIs to Enrich Your Data

In the previous chapter, you saw an example of how to enrich existing data with external information. In that case, the data was exposed via CSV files, but this is not always the case. Very often, the data that is useful for enrichment is exposed through external application programming interfaces (APIs), most often in the form of web service endpoints. Power BI allows you to read data from a web service through a dedicated UI, but most of the time it is unusable. So, you have to resort to writing M code to get it done. Writing M code isn’t too difficult, but it’s not easy. You also have to be careful not to write code that causes refresh problems when you publish the report to the Power BI service. In addition, Power BI does not allow you to parallelize more than one call to the same web service to reduce latency when retrieving data. Using Python or R to retrieve data from a web service solves all of these issues very easily...

Technical requirements

This chapter requires you to have a working internet connection and Power BI Desktop already installed on your machine (we used version 2.116.966.0 64-bit, April 2023). You must have properly configured the R and Python engines and IDEs as outlined in Chapter 2, Configuring R with Power BI, and Chapter 3, Configuring Python with Power BI.

What is a web service?

In the course of your work as an analyst, you might sometimes need to use specialized interfaces for data retrieval, such as direct database queries or proprietary protocols. However, this is a rare case because today almost all external data sources are exposed as web services, even within an enterprise. Web services are the most common and popular way to communicate information between heterogeneous information systems, and they offer several advantages. They enable seamless integration by using standard internet protocols, ensuring broad compatibility across platforms and languages. Web services facilitate scalability and flexibility, allowing systems to adapt to changing needs without extensive reconfiguration. They also support secure data transmission through encryption and authentication mechanisms, improving data privacy and integrity. In addition, web services can significantly reduce development time and costs by leveraging reusable components and...

Join our book community on Discord

https://packt.link/EarlyAccessCommunity

Qr code Description automatically generated

Often, the analyses you are asked to perform in Power BI need to be based on fairly large data sources. In the previous chapter, you learned how to work with data that is larger than the RAM available on your machine, and you saw that query execution times are in the order of minutes.Now consider a report that needs to perform several calculations on the same corpus database. Obviously, the performance of the overall execution time of the report is strongly related to the expected number of operations to be performed on the database. Therefore, being able to reduce the execution time of individual queries will allow you to have a report whose dataset is updated as quickly as possible with each refresh.Queries implemented in a Power BI report most often need to retrieve aggregated measures that provide users with the information they need to take specific business actions. If the data is available in relational...

Technical Requirements

This chapter requires you to have a working internet connection and Power BI Desktop already installed on your machine (we used the version 2.114.664.0 64-bit, February 2022). You must have properly configured the R and Python engines and IDEs as outlined in Chapter 2, Configuring R with Power BI, and Chapter 3, Configuring Python with Power BI.

From CSV to the Parquet file format

The traditional approach of storing structured data in CSV files has long been the method of choice for many organizations. For example, the very dataset used in Chapter 9, Loading Large Datasets beyond the Available RAM in Power BI, which contains monthly U.S. flight data from 1987 to 2012, consists of many CSV files. However, this approach has several significant limitations that can negatively impact data processing and analysis:

The CSV file format is not optimized for columnar storage and stores data in a row-based format. As a result, CSV files can have slower read and write times, especially for large datasets. This can result in slower query execution times and reduced overall performance, negatively impacting the efficiency of data processing and analysis.
Although CSV files can handle basic data types such as integers and strings, they can struggle when it comes to dealing with more complex data structures such as arrays and nested data types...

Using the Parquet format to speed up a Power BI report

Now that you have learned how to take advantage of the Parquet format in both Python and R, you can apply this knowledge to real-world cases in Power BI. An example might be as follows.Suppose you need to integrate a report commissioned by your boss with data on the maximum flight delays in the U.S. on each day of the past two months for each departure airport. At the beginning of each month, a CSV can be retrieved with information on the above flights for the previous month. To be consistent with the data we have available, assume that this integration was commissioned for you in December 2012 and that you were able to find all data for flights from October 1987 through November 2012.In order to implement a data flow that optimizes both the time it takes to load new flight data and the time it takes to perform analysis on all the data in the history, the workflow to be followed will be as follows:

At first, the Parquet dataset of...

Summary

In this chapter, we discussed the limitations of the native Parquet connector in Power BI. You learned that these limitations can be overcome by using specific packages in both Python and R. We are talking about Dask and Arrow, the latter being available in both APIs for Python and R. In the last part of the chapter, you saw how to use these techniques in a Power BI report to append new data to very large datasets and query them in a very short amount of time.In the next chapter, we'll finally start working with R and Python scripts in Power BI, doing data ingestion and data transformation.

References

Parquet File Performance In Power BI/Power Query (https://blog.crossjoin.co.uk/2021/03/21/parquet-file-performance-in-power-bi-power-query/)
Optimising The Performance Of Combining Data From Multiple Parquet Files In Power Query/Power BI (https://blog.crossjoin.co.uk/2021/03/29/optimising-the-performance-of-combining-data-from-multiple-parquet-files-in-power-query-power-bi/)
Parquet and CSV: Querying & Processing in Power BI (https://www.datalineo.com/post/parquet-and-csv-querying-processing-in-power-bi)

Test your knowledge

Q01. What are the main features of the Parquet format?Q02. What are the benefits and limitations of the native Parquet connector in Power BI?Q03. What’s the difference between predicate pushdown and partition pruning?Q04. How do you explain the differences in performance between Dask and PyArrow?Q05. Is there a specific API to transform referenced data from an Arrow dataset into R?

Answers

A01. Parquet is an open-source columnar storage format designed to optimize file efficiency. It stores data in a columnar format, which allows for efficient compression and encoding of data, resulting in faster query execution times and reduced storage requirements. Parquet can handle complex data structures and uses binary encoding to enhance data security. It is compatible with interactive and serverless technologies like Azure Synapse Analytics and Azure Databricks and Fabric and can be used in Power BI with Python and R. Basically, Parquet is a high-performance storage...

The rest of the chapter is locked

You have been reading a chapter from

Extending Power BI with Python and R - Second Edition

Published in: Mar 2024Publisher: PacktISBN-13: 9781837639533

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Luca Zavarella

Luca Zavarella has a rich background as an Azure Data Scientist Associate and Microsoft MVP, with a Computer Engineering degree from the University of L'Aquila. His decade-plus experience spans the Microsoft Data Platform, starting as a T-SQL developer on SQL Server 2000 and 2005, then mastering the full suite of Microsoft Business Intelligence tools (SSIS, SSAS, SSRS), and advancing into data warehousing. Recently, his focus has shifted to advanced analytics, data science, and AI, contributing to the community as a speaker and blogger, especially on Medium. Currently, he leads the Data & AI division at iCubed, and he also holds an honors degree in classical piano from the "Alfredo Casella" Conservatory in L'Aquila.
Read more about Luca Zavarella

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages