Reader small image

You're reading from  Power BI Machine Learning and OpenAI

Product typeBook
Published inMay 2023
Reading LevelIntermediate
PublisherPackt
ISBN-139781837636150
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
Greg Beaumont
Greg Beaumont
author image
Greg Beaumont

Greg Beaumont is a data architect at Microsoft, where he enjoys identifying and solving complex problems backed by his experience in data architecture and a passion for innovation. Focusing on the healthcare industry, Greg works closely with customers to plan enterprise analytics strategies, evaluate new tools and products, conduct training sessions and hackathons, and architect solutions that improve the quality of care and reduce costs. He strives to be a trusted advisor to his customers and is always seeking new ways to drive progress and help organizations thrive. He is a veteran of the Microsoft data speaker network and has worked with hundreds of customers on their data management and analytics strategies.
Read more about Greg Beaumont

Right arrow

Preparing and Ingesting Data with Power Query

In Chapter 1 of this book, you kicked off a project to design a solution that will help track and predict height and outcomes related to aircraft striking wildlife. You gathered requirements from the project stakeholders, took a deep dive into the FAA Wildlife Strike data, mapped the requirements to the available data, and put together a preliminary data model design, which will be the foundation of your reports and predictive analytics using Power BI ML models.

Creating tables of data that will be used for ML requires a clear understanding of the FAA Wildlife Strike data and an architecture that allows you to discover features in the data. In this chapter, you will embark upon a journey to prepare queries for the data that you explored in Chapter 1, model that data for Power BI using your preliminary data model as a guide, and create curated queries, which will be the basis of both datasets and ML training datasets in Power BI.

...

Technical requirements

This chapter builds on the work that was begun in Chapter 1. All of the data can be found at https://github.com/PacktPublishing/Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI/tree/main/Chapter-02.

For this chapter, you will need the following:

If you’d prefer to follow along using the finished version of the content from this chapter rather than building it all step by step, you can download the completed PBIT version of the file at the Packt GitHub site folder for Chapter 2: https://github.com/PacktPublishing/Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI/tree/main/Chapter-02.

Preparing the primary table of data

You have decided to start the process of building out the design for your dataset by modeling the primary table of data from the FAA Wildlife Strike database. You start by opening the Chapter 1.pbix Power BI Desktop file that was created in Chapter 1. You can also download a clean copy of the file from the Packt GitHub site for this book at this link as a PBIT file, which can be populated as a PBIX using the data downloaded in Chapter 1: https://github.com/PacktPublishing/Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI/tree/main/Chapter-01.

Open Power Query within Power BI, and you will see the four tables of data that constitute the raw data from the FAA:

  • STRIKE_REPORTS
  • Aircraft Type
  • Engine Codes
  • Engine Position

Review the preliminary data model that you created in Chapter 1. For the first step, you will organize the existing queries of the raw data into a folder, which can be referenced as you create...

Building a curated table of the primary STRIKE_REPORTS data

You can now start building out a curated version of the STRIKE_REPORTS metadata and query logic in Power Query. You will follow these steps:

  1. Reference the raw table to create a new query.
  2. Keep only the columns that you need.
  3. Make data type changes.
  4. Make column name changes.

Let’s begin.

Referencing the raw table to create a new query

You will put your new query into a new group in Power Query called Curated Reporting Queries:

  1. Create a new group in Power Query called Curated Reporting Queries using the same methods by which you created the Raw Data group.
  2. Right-click on the STRIKE_REPORTS query in the Raw Data group and select the option for Reference. Your new query will reference the unaltered source query from the wildlife.accdb source. This way, you can make changes to the metadata and query logic while still having an unaltered view of the source in Power Query.
  3. Right...

Building curated versions of the Aircraft Type, Engine Codes, and Engine Position queries

Next, you will create queries for new versions of the reference (dimension) tables in the Curated Reporting Queries group of Power Query. Before starting these tasks, you also consider the Date table. The Date table will be an essential part of the Power BI dataset but is an architectural component that does not exist within the dataset. Therefore, it will be added to the dataset layer in a downstream group later in this chapter. You have moved on to another phase of the effort:

Figure 2.13 – Moving on to the reference table queries for analytic data

Figure 2.13 – Moving on to the reference table queries for analytic data

For each of the three tables, Aircraft Type, Engine Codes, and Engine Position, in the Raw Data group, right-click it, select Reference, move the resulting queries to the Curated Reporting Queries group, and then rename it to include Info at the end of its name so that they have unique names. Your Power Query queries...

Building a curated query to populate a Date table

Date aggregations are an important component of both Power BI and business intelligence tools as a whole. Understanding and recalculating trends at the level of weeks, months, quarters, and years adds robust analytical capabilities. Date tables can even be used to slice and dice data by weekends, holidays, fiscal calendars, and more. Power BI even has the capability to specify a table as a Date table to enable special time intelligence capabilities.

A .csv file of Date data is available at the Packt GitHub site link: https://github.com/PacktPublishing/Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI/tree/main/Chapter-02. Follow these steps to bring it into Power Query:

  1. Copy https://raw.githubusercontent.com/PacktPublishing/Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI/main/Chapter-02/date.csv.
  2. In Power Query, select New Source and then Web. Paste in the URL and hit OK.
  3. When the preview...

Summary

In this chapter, you created queries coming from the FAA Wildlife Strike data, which will be used as the basis for both Power BI datasets and ML training datasets. Throughout the chapter, you removed unnecessary columns, cleaned up the column names, and formatted the queries so that they can be used as tables of data.

In the next chapter, you will explore the data in these queries and create a semantic model in a Power BI dataset, which relates all of the data together for the purpose of creating a Power BI report – this will kickstart analytics on the FAA Wildlife Strike data.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Power BI Machine Learning and OpenAI
Published in: May 2023Publisher: PacktISBN-13: 9781837636150
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Greg Beaumont

Greg Beaumont is a data architect at Microsoft, where he enjoys identifying and solving complex problems backed by his experience in data architecture and a passion for innovation. Focusing on the healthcare industry, Greg works closely with customers to plan enterprise analytics strategies, evaluate new tools and products, conduct training sessions and hackathons, and architect solutions that improve the quality of care and reduce costs. He strives to be a trusted advisor to his customers and is always seeking new ways to drive progress and help organizations thrive. He is a veteran of the Microsoft data speaker network and has worked with hundreds of customers on their data management and analytics strategies.
Read more about Greg Beaumont