Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Extending Power BI with Python and R - Second Edition

You're reading from  Extending Power BI with Python and R - Second Edition

Product type Book
Published in Mar 2024
Publisher Packt
ISBN-13 9781837639533
Pages 814 pages
Edition 2nd Edition
Languages
Author (1):
Luca Zavarella Luca Zavarella
Profile icon Luca Zavarella

Table of Contents (27) Chapters

Preface 1. Where and How to Use R and Python Scripts in Power BI 2. Configuring R with Power BI 3. Configuring Python with Power BI 4. Solving Common Issues When Using Python and R in Power BI 5. Importing Unhandled Data Objects 6. Using Regular Expressions in Power BI 7. Anonymizing and Pseudonymizing Your Data in Power BI 8. Logging Data from Power BI to External Sources 9. Loading Large Datasets Beyond the Available RAM in Power BI 10. Boosting Data Loading Speed in Power BI with Parquet Format 11. Calling External APIs to Enrich Your Data 12. Calculating Columns Using Complex Algorithms: Distances 13. Calculating Columns Using Complex Algorithms: Fuzzy Matching 14. Calculating Columns Using Complex Algorithms: Optimization Problems 15. Adding Statistical Insights: Associations 16. Adding Statistical Insights: Outliers and Missing Values 17. Using Machine Learning without Premium or Embedded Capacity 18. Using SQL Server External Languages for Advanced Analytics and ML Integration in Power BI 19. Exploratory Data Analysis 20. Using the Grammar of Graphics in Python with plotnine 21. Advanced Visualizations 22. Interactive R Custom Visuals 23. Other Books You May Enjoy
24. Index
Appendix 1: Answers
1. Appendix 2: Glossary

Calculating Columns Using Complex Algorithms: Distances

The data ingestion phase allows you to gather all the information you need for your analysis from any data source. Once the various datasets have been imported, some of this information may not be useful in describing a phenomenon from an analytical point of view. After the data ingestion phase, it’s not uncommon to find that some of the raw information doesn’t directly contribute to analytical insights as is. Recognizing this, it is essential to refine and enhance the dataset with additional computations that can provide new perspectives and answers to our questions. This often involves the creation of calculated columns that provide measures that are more aligned with our analytical goals. For example, in the context of our exploration, the calculation of the distance between two geographic points or the dissimilarity between two strings can transform seemingly abstract or unrelated data into powerful tools for...

Technical requirements

This chapter requires you to have a working internet connection and Power BI Desktop already installed on your machine (we used version 2.114.664.0 64-bit, February 2023). You must have properly configured the R and Python engines and IDEs as outlined in Chapter 2, Configuring R with Power BI, and Chapter 3, Configuring Python with Power BI.

What is a distance?

A distance, in the context of data analysis and pattern recognition, is a quantitative measure that captures the dissimilarity or similarity between objects or points in a given space. It provides a numerical representation of the extent to which two entities are separate or close to each other and allows us to objectively quantify the relationships and differences between data points so that we can systematically compare and analyze them.

The concept of distance is particularly valuable because it provides a common metric for comparing and evaluating different types of data. Whether dealing with numerical attributes, categorical variables, or even complex structures such as images or text, distances can be defined and calculated to quantify the dissimilarity between instances. By using the concept of distance, analysts and data scientists gain insight into the relationships, patterns, and structures inherent in their data.

The concept of distance finds...

The distance between two geographic locations

It is often the case that you have coordinates in your dataset, expressed in latitude and longitude, that identify points on the globe. Depending on the purpose of the analysis you want to perform, you can use these coordinates to calculate measures that best describe the scenario you want to address. For example, assuming you have the geographic coordinates of some hotels in a dataset, it might be useful to calculate the distance of each hotel to the nearest airport if you want to provide an additional value of interest to a visitor.

Some theory first

To fully understand a phenomenon well, to know what it consists of and what technologies have been developed to deal with it, it is necessary to go deeper into the theory behind it. Since we are talking about measuring the distance between two points on the globe, the first thing that comes to mind is to simplify the phenomenon by using a model that approximates reality. So let&...

The distance between two strings

When considering the concept of distance, our first thoughts often focus on measuring the physical space between two points in a well-defined environment. Whether it’s solving problems in plane geometry or navigating the three-dimensional world we inhabit, distance plays a crucial role. However, it is important to recognize that the concept of distance extends beyond physical dimensions.

Some theory first

As you may recall from the introductory part of this chapter, there are numerous domains where distance is of immense importance in describing events and relationships. One such domain that may surprise you is the space defined by strings of text. Surprisingly, this includes the mathematical domain represented by strings of text. This domain encompasses a set or a range of all conceivable values or arrangements that can be embodied by an entity such as a string of text. That’s right – it is perfectly possible to construct...

Summary

In this chapter, we ventured into the fascinating realm of distances and their many applications. We began by exploring the calculation of geographic distances, introducing the remarkable formulas of the law of Cosines, the law of Haversines, and Vincenty’s distance. Using the PyGeodesy package in Python and the geosphere library in R, we harnessed the power of computation to accurately measure distances between geographic locations.

Expanding our horizons, we delved into the realm of string distances. We encountered the metrics of Hamming, Levenshtein, Jaro-Winkler, and Jaccard distances, each offering unique insights into the dissimilarity or similarity between strings. Python’s TextDistance package and R’s stringdist library provided us with the essential tools to effortlessly compute these string distances.

In your study, you encountered a significant computational hurdle: the quadratic nature of the distance algorithms implemented. With the...

References

Test your knowledge

  1. Why is the concept of distance particularly valuable?
  2. What was one of the most practical benefits introduced by the definition of the Haversine function?
  3. What is the assumption that makes Vincenty’s formula for calculating the distance between two geographic locations so much more accurate than others?
  4. What libraries are used in Python and R to compute distances between geographic points?
  5. If Hamming distance is very powerful, why is it not often used in common string comparison problems?
  6. When is it recommended to use the Damerau-Levenshtein distance?
  7. When is it recommended to use the Jaro-Winkler distance?
  8. When is it recommended to use the Jaccard distance?
  9. What libraries are used in Python and R to compute distances between strings?

Learn more on Discord

To join the Discord community for this book – where you can share feedback, ask questions to the author, and learn about new releases...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Extending Power BI with Python and R - Second Edition
Published in: Mar 2024 Publisher: Packt ISBN-13: 9781837639533
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}