Reader small image

You're reading from  Managing Data Integrity for Finance

Product typeBook
Published inJan 2024
PublisherPackt
ISBN-139781837630141
Edition1st Edition
Right arrow
Author (1)
Jane Sarah Lat
Jane Sarah Lat
author image
Jane Sarah Lat

Jane Sarah Lat is a finance consultant with over 14 years of experience in financial management and analysis for multiple blue-chip multinational organizations. In addition to being a Certified Management Accountant (CMA U.S.) and having a Graduate Diploma in Chartered Accounting (GradDipCA), she also holds various technical certifications, including Microsoft Certified Data Analyst Associate and Advanced Proficiency in KNIME Analytics Platform. Over the past few years, she has been sharing her experience and expertise at international conferences to discuss practical strategies on finance, data analysis, and management accounting. She is also president of the Institute of Management Accountants (IMA) Australia and New Zealand chapter.
Read more about Jane Sarah Lat

Right arrow

Using Business Intelligence Tools to Fix Data Integrity Issues

In the previous chapter, we discussed the data integrity features available in the most common business intelligence (BI) tools. We also covered how to get started with using them. In this chapter, we will delve into how to manage data integrity issues when using Microsoft Power BI, Tableau from Salesforce, and Alteryx. We will build on top of what we’ve learned already and explore various features to ensure the integrity of financial data.

That said, these are the topics that we will explore in this chapter:

  • Managing data integrity issues with BI tools
  • Data profiling features
  • Data cleansing methods
  • Managing relationships in data models
  • Dealing with large financial datasets using data validation

By the end of this chapter, you will have gained a deeper understanding of what these BI tools are capable of, particularly in terms of fixing data integrity issues.

Technical requirements

This chapter is a continuation of our discussion from Chapter 4, Understanding the Data Integrity Management Capabilities of Business Intelligence Tools. Thus, we will be using the software we used in the previous chapter:

  • Power BI Desktop, April 2023 or later (no licenses required)
  • Tableau Desktop, 2023.1 or later (free trial version)
  • Tableau Prep Builder, 2023.1.0 or later (free trial version)
  • Tableau Cloud (free trial version)
  • Alteryx Designer, 2022.3 version or later (free trial version)

Our sample datasets are saved in the official GitHub repo: https://github.com/PacktPublishing/Managing-Data-Integrity-for-Finance/tree/main/ch05.

We will be using the datasets created about a fictional company in our hands-on exercises throughout this chapter and Chapter 6, Implementing Best Practices When Using Business Intelligence Tools. Save the files on your computer, and then work on the hands-on solutions as you read along.

Note...

Managing data integrity issues with BI tools

BI tools have a powerful suite of capabilities, one of which is the ability to extract, transform, and load (ETL) data from multiple sources. We will discuss first the ETL capabilities of Microsoft Power BI Desktop in reviewing the data type format used for an Excel file prior to loading the information for further analysis, and then we will cover how to do this in Tableau Desktop. Let’s dive into the topics!

First, we load the data into Power BI by clicking on Get data under the Home tab and selecting Excel workbook, as highlighted in Figure 5.1:

Figure 5.1 – Getting data from Excel workbook

Figure 5.1 – Getting data from Excel workbook

We can see from this figure that from Get data, we can load various common data sources, such as an Excel workbook, Power BI datasets, SQL Server, or even data from the web. You can also click on More… to extract the data from other sources, such as a PDF file, a SharePoint folder, and online services...

Data profiling features

Microsoft Power BI has data profiling features that offer simple ways to examine and analyze the data in the Power Query Editor. These are found under the View tab on the ribbon, as shown in Figure 5.22:

Figure 5.22 – Data profiling tools in the View tab

Figure 5.22 – Data profiling tools in the View tab

In the next subsections, we will cover the data profiling features of column quality, column distribution, and column profile.

Column quality

This feature indicates what the column quality of the data is in five categories as shown in Figure 5.23:

Figure 5.23 – Column quality indicators

Figure 5.23 – Column quality indicators

The colors reflect the quality of the data in the columns and make it easier to analyze and examine.

Figure 5.24 shows where the column quality is found in the Power Query Editor:

Figure 5.24 – Column quality

Figure 5.24 – Column quality

We can see from the indicators that the first five columns are valid, while the last three are empty...

Data cleansing methods

As we mentioned in the previous chapter, in the Exploring common data quality management capabilities of BI tools section, data cleansing is a key step in improving the integrity of the data. In this section, we will continue our hands-on examples to cleanse the dataset using Power BI Desktop.

Removing empty cells

If we examine the column statistics in Figure 5.34, our total transaction count is 1,464, no errors have been found, and there are 335 empty cells. If we scroll to the bottom of the table, we can find that rows 1,130 up to 1,464 have null values or empty cells. We can remove these by going to the Home tab, clicking on Remove Rows, and selecting Remove Blank Rows, as shown in Figure 5.35:

Figure 5.35 – Removing empty cells

Figure 5.35 – Removing empty cells

If we scroll down the data table after this step, we can see that the table now ends on row 1,129 and the empty cells have been removed, as illustrated in Figure 5.36:

Figure 5.36 – After removing empty cells
...

Managing relationships in data models

Ensuring that the relationships in our data models and tables are well defined enhances the reliability and accuracy of the reports we generate. When data is loaded in Power BI Desktop, it tries to automatically identify the relationships in the table similar to detecting the data types when loading data to Power Query. We can manage the relationships by going to Model view, as we will discuss in this section.

Continuing with our hands-on example, let’s close the Power Query Editor by selecting Close & Apply. This will lead us to the report canvas. Next, click on the Model view icon, as seen on the left-side panel in Figure 5.50, to show the data model:

Figure 5.50 – Model view to view and edit the relationships

Figure 5.50 – Model view to view and edit the relationships

When we do this, we see the relationships that Power BI has automatically created. If we click on the relationship lines, as shown in Figure 5.51, we can see that Product_ID under 2020...

Dealing with large financial datasets using data validation

When dealing with large financial datasets, the tendency is to allow a certain percentage of incorrectness or inaccuracy due to the effort needed to clean the data. However, outliers in the data will affect analysis and report generation, especially if these outliers and errors aren’t cleaned due to time-saving methods in the overall process. That said, guidelines should be created on what the thresholds are in advance for each column and set of records. These guidelines then need to be converted into automated processes available in the BI tool.

An example would be a guideline where column values cannot be negative, cannot exceed a certain threshold, or should be a particular set of values. This guideline would then be converted into a rule that can then be used to automatically detect data issues. Once the incorrect records have been tagged accordingly, these records can be analyzed and corrected manually. In some...

Summary

In this chapter, we discussed how to manage data integrity issues with BI tools. At the beginning, we learned how to ensure we have consistent data type formats in our working files. Then, we covered data profiling features such as column quality, column distribution, and column profiling. After that, we worked out how to cleanse the data. In addition to this, we learned how to identify data outliers as well as how to manage relationships in data models. Lastly, we went through how to deal with large financial datasets using data validation. We have explored really powerful techniques and concepts in this chapter. You should be feeling confident about what you have learned, knowing that you can use these techniques whenever you need to clean the data prior to analysis.

The topics in the next chapter will be pretty exciting! We will continue our journey with these BI tools and cover how to implement best practices!

Further reading

For additional information on the topics covered in this chapter, you may find the following resources helpful:

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Managing Data Integrity for Finance
Published in: Jan 2024Publisher: PacktISBN-13: 9781837630141
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Jane Sarah Lat

Jane Sarah Lat is a finance consultant with over 14 years of experience in financial management and analysis for multiple blue-chip multinational organizations. In addition to being a Certified Management Accountant (CMA U.S.) and having a Graduate Diploma in Chartered Accounting (GradDipCA), she also holds various technical certifications, including Microsoft Certified Data Analyst Associate and Advanced Proficiency in KNIME Analytics Platform. Over the past few years, she has been sharing her experience and expertise at international conferences to discuss practical strategies on finance, data analysis, and management accounting. She is also president of the Institute of Management Accountants (IMA) Australia and New Zealand chapter.
Read more about Jane Sarah Lat