Reader small image

You're reading from  Data Modeling with Microsoft Excel

Product typeBook
Published inNov 2023
PublisherPackt
ISBN-139781803240282
Edition1st Edition
Right arrow
Author (1)
Bernard Obeng Boateng
Bernard Obeng Boateng
author image
Bernard Obeng Boateng

Bernard Obeng Boateng is a Microsoft Excel MVP, Microsoft Certified Trainer, with over 10 years work experience in Banking, Insurance and Business Development. He is founder of Finex Skills Hub an approved Training Provider of the Financial Modeling Institute, Canada. Finex Skills Hub runs the Finex Project in Ghana, a pro bono student training outreach program for students in Data Analytics and Financial Modeling. Bernard also provides consultancy services for SMEs (start-ups and existing) in Financial Management, Business Planning and Research. He has an active audience online with about 17,000 followers on his LinkedIn Account. where he shares tips and tricks on Microsoft Excel and other Office Apps.
Read more about Bernard Obeng Boateng

Right arrow

Data Structuring for Data Models – What’s the best way to layout your data?

How do you structure your data for the data model? Getting your data in the right shape and form is important in the data modeling process. In this chapter, we will learn about the three golden rules for data structuring. These rules make it easier to model your data and create your calculations and analysis in Microsoft Excel. You will understand the concept of data normalization and the advantages it has over flat tables for analysis. The chapter will round off with an introduction to Power Query, Microsoft Excel’s data transformation tool. When you encounter data layouts that do not conform to the best practices described in the chapter, Power Query plays a critical role in shaping and transforming the data.

The following topics will be covered in this chapter:

  • Data structuring – understanding the three golden rules
  • Denormalized and normalized data
  • Understanding...

Data structuring – understanding the three golden rules

To get the best out of Power Pivot, we need to lay out our dataset in a way that will improve performance and help us get the right insights from our datasets. The following figure summarizes the ideal layout for a single dataset or table:

Figure 2.1 – The three golden rules for laying out your data in a single dataset

Figure 2.1 – The three golden rules for laying out your data in a single dataset

Let’s look at each of these rules in detail:

  • Rule 1: Each column should have a single data type. As we learned in Chapter 1, Getting Started with Data Modeling, Power Pivot manages data using a columnar structure. It is therefore important to commit one data type to each column of your dataset.

    For example, if you have data for a payroll containing the names of staff members, one column should be dedicated to the names of staff with the same format and no other data type. Apart from this, there shouldn’t be any column in your dataset that contains...

Denormalized and normalized data

Denormalized data combines redundant data into one table while normalized data stores related data in separate tables. Each type has its pros and cons when analyzing data. However, for data modeling, normalized data is ideal. We will go into the process of converting denormalized data into normalized data in the next section.

The following table lists the important differences between normalization and denormalization:

Understanding table relationships

In a data model, relationships refer to the connections between different entities or tables within a database. There are three types of relationships in a data model: one-to-many, many-to-many, and one-to-one. Let’s discuss each type in detail along with some examples.

One-to-many relationship

In a one-to-many relationship, one record in a table is associated with multiple records in another table. This is the most common type of relationship in a database.

For example, a customer may place many orders, but each order is associated with only one customer. In this case, the customer table is on the one side of the relationship, and the order table is on many sides. The relationship is established by creating a foreign key in the order table that refers to the primary key in the customer table.

Another example is the relationship between a department table and an employee table. A department may have many employees, but each employee...

Understanding dimension and fact tables

In a relational database, a fact table is a table that stores quantitative information or facts about a business process or activity, such as sales, inventory, or customer transactions. A fact table typically contains numerical values and foreign keys to link to dimension tables.

On the other hand, a dimension table is a table that stores descriptive information about the objects, events, or entities in a business process or activity. Dimension tables are typically used to provide context and structure to the data in a fact table.

To understand this better, let’s consider an example of a retail business that sells products through its online store. In this example, we can identify the following fact and dimension tables.

Fact table – sales

The sales fact table would contain quantitative information about sales transactions, such as sales revenue, quantity sold, and price. The sales fact table would typically contain...

Understanding the role of primary keys and foreign keys

Keys are very important in a relational database because they help create a relationship between two tables and ensure that each record in a table can be identified in a unique way. When we connect the related tables, the version of the connecting columns in the dimension tables is called the primary key while the version in the fact table is called the foreign key.

In a relational database, the primary key helps us identify a single row in a table. All primary keys must be unique in the columns where they sit. Foreign keys help connect our dimension and fact tables and, more importantly, make our fact tables accessible in our calculations. Foreign keys are typically not unique.

Going back to our mock view of our relational database, we can add primary keys and foreign keys, as shown here:

Figure 2.8: Assigning primary keys and foreign keys

Figure 2.8: Assigning primary keys and foreign keys

In our data model, these primary keys will be linked...

Summary

This chapter has introduced you to best practices in structuring your data for data modeling. In the real world, the data produced from most relational database management systems is already normalized. However, there will be instances where you would need to convert denormalized data into normalized data to improve calculations and reduce redundancy. After reading this chapter, you are now able to appreciate the steps you need to take to achieve this. The chapter also covered the key differences between primary keys and foreign keys and the role they play in our data model.

In the next chapter, we will learn about Power Query and how we can use this tool to shape and transform data for our data model.

Key terms in this chapter

  • Data redundancy
  • Normalization
  • Denormalization
  • Primary keys
  • Foreign keys

Questions for discussion

  • What are the key considerations for laying out data properly for the data model?
  • When is it best to consider performing data normalization?
  • List the key advantages normalized data has over denormalized data.
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Modeling with Microsoft Excel
Published in: Nov 2023Publisher: PacktISBN-13: 9781803240282
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Bernard Obeng Boateng

Bernard Obeng Boateng is a Microsoft Excel MVP, Microsoft Certified Trainer, with over 10 years work experience in Banking, Insurance and Business Development. He is founder of Finex Skills Hub an approved Training Provider of the Financial Modeling Institute, Canada. Finex Skills Hub runs the Finex Project in Ghana, a pro bono student training outreach program for students in Data Analytics and Financial Modeling. Bernard also provides consultancy services for SMEs (start-ups and existing) in Financial Management, Business Planning and Research. He has an active audience online with about 17,000 followers on his LinkedIn Account. where he shares tips and tricks on Microsoft Excel and other Office Apps.
Read more about Bernard Obeng Boateng

Criteria

Normalized Data

Denormalized Data

Definition

Data is organized in such a way that each piece or dimension of the data is stored in only one place in separate tables.

Data is organized in such a way that multiple pieces of information are stored together in one place.

Duplication

There is minimal duplication...