Reader small image

You're reading from  Fundamentals of Analytics Engineering

Product typeBook
Published inMar 2024
PublisherPackt
ISBN-139781837636457
Edition1st Edition
Right arrow
Authors (7):
Dumky De Wilde
Dumky De Wilde
author image
Dumky De Wilde

Dumky is an award-winning analytics engineer with close to 10 years of experience in setting up data pipelines, data models and cloud infrastructure. Dumky has worked with a multitude of clients from government to fintech and retail. His background is in marketing analytics and web tracking implementations, but he has since branched out to include other areas and deliver value from data and analytics across the entire organization.
Read more about Dumky De Wilde

Fanny Kassapian
Fanny Kassapian
author image
Fanny Kassapian

Fanny has a multidisciplinary background across various industries, giving her a unique perspective on analytics workflows, from engineering pipelines to driving value for the business. As a consultant, Fanny helps companies translate opportunities and business needs into technical solutions, implement analytics engineering best practices to streamline their pipelines, and treat data as a product. She is an avid promoter of data democratization, through technology and literacy
Read more about Fanny Kassapian

Jovan Gligorevic
Jovan Gligorevic
author image
Jovan Gligorevic

Jovan, an Analytics Engineer, specializes in data modeling and building analytical dashboards. Passionate about delivering end-to-end analytics solutions and enabling self-service analytics, he has a background in business and data science. With skills ranging from machine learning to dashboarding, Jovan has democratized data across diverse industries. Proficient in various tools and programming languages, he has extensive experience with the modern data stack. Jovan enjoys providing trainings in dbt and Power BI, sharing his knowledge generously
Read more about Jovan Gligorevic

Juan Manuel Perafan
Juan Manuel Perafan
author image
Juan Manuel Perafan

Juan Manuel Perafan 8 years of experience in the realm of analytics (5 years as a consultant). Juan was the first analytics engineer hired by Xebia back in 2020. Making him one of the earliest adopters of this way of working. Besides helping his clients realizing the value of their data, Juan is also very active in the data community. He has spoken at dozens of conferences and meetups around the world (including Coalesce 2023). Additionally, he is the founder of the Analytics Engineering meetup in the Netherlands as well as the Dutch dbt meetup
Read more about Juan Manuel Perafan

Lasse Benninga
Lasse Benninga
author image
Lasse Benninga

Lasse has been working in the dataspace since 2018, starting out as a Data Engineer at a large airline, then switching towards Cloud Engineering for a consultancy and working for different clients in the retailing and healthcare space. Since 2021, he's an Analytics Engineer at Xebia Data, merging software/platform engineering with analytics passion. As a consultant Lasse has seen many different clients, ranging from retail, healthcare, ridesharing industry, and trading companies. He has implemented multiple data platforms and worked in all three major clouds, leveraging his knowledge of data and analytics to provide value
Read more about Lasse Benninga

Ricardo Angel Granados Lopez
Ricardo Angel Granados Lopez
author image
Ricardo Angel Granados Lopez

Ricardo, an Analytics Engineer with a strong background in data engineering and analysis, is a quick learner and tech enthusiast. With a Master's in IT Management specializing in Data Science, he excels in using various programming languages and tools to deliver valuable insights. Ricardo, experienced in diverse industries like energy, transport, and fintech, is adept at finding alternative solutions for optimal results. As an Analytics Engineer, he focuses on driving value from data through efficient data modeling, using best practices, automating tasks and improving data quality
Read more about Ricardo Angel Granados Lopez

Taís Laurindo Pereira
Taís Laurindo Pereira
author image
Taís Laurindo Pereira

Taís is a versatile data professional with experience in a diverse range of organizations - from big corporations to scale-ups. Before her move to Xebia, she had the chance to develop distinct data products, such as dashboards and machine learning implementations. Currently, she has been focusing on end-to-end analytics as an Analytics Engineer. With a mixed background in engineering and business, her mission is to contribute to data democratization in organizations, by helping them to overcome challenges when working with data at scale
Read more about Taís Laurindo Pereira

View More author details
Right arrow

Data Modeling

Data modeling is a proactive design process that establishes relationships between data within information systems. A well-designed data model can lower costs, contribute to data democratization, and adapt easily to changing needs.

This chapter explores the various data modeling techniques and what to consider when building a data model. In particular, it focuses on the following:

  • The importance of data models
  • Designing your data model
  • Data modeling techniques
  • Choosing a data model

The importance of data models

The primary objective of an effective data model is to translate business requirements into data structures that facilitate the understanding of the current status, identify new opportunities, and drive initiatives based on insights gained from data analysis.

For analytics engineers, it is crucial to build data systems with long-term planning in mind, ensuring seamless integration between business requirements and the data procured in the system. As a result, data modeling is one of the most vital tools to prevent future issues and maintain optimal system performance as complexity increases.

In traditional data warehouses, data models were essential to manage storage and computational costs. However, as seen in Chapter 4, the shift to cloud computing presents another set of challenges.

With the increasing adoption of cloud computing, many of the problems data modeling solves can now be overcome by harnessing the increased computational power of...

Designing your data model

The design process of a data model progresses from generic to specific. The first step is the conceptual diagram, which provides a high-level, generic view of how the data will be stored and connected within the information system and its relationship with business activities. It is usually technology-agnostic. With this initial diagram, the modeler seeks informed feedback from business specialists, data owners, and end users to ensure that the model accurately represents the business and that the main data relationships are represented.

Figure 5.2 shows an example of a conceptual diagram for a simple management system. This diagram contains the basic building blocks of the data model to start the modeling process. We can see how the elements are related. For example, one order item represents a product and is part of an order; the customer places an order, and the model generates an invoice with invoice items; each invoice item represents a product:

...

Data modeling techniques

During the 1990s, as data warehouses rapidly integrated into numerous enterprises and businesses, Bill Inmon, Ralph Kimball, and Daniel Linstedt developed methodologies with architectures for constructing data warehouses.

Each of these methods has its own associated data model. While other methods or possible combinations exist, the three main forms discussed in the following subsections are the most common.

Bill Inmon and relational modeling

In 1992, William (Bill) Inmon published Building the Data Warehouse (https://www.wiley.com/en-us/Building+the+Data+Warehouse,+4th+Edition-p-9780764599446). For this work, he is recognized as one of the fathers of the data warehouse.

Inmon’s methodology for data warehouse architecture highlights the importance of a unified data storage system structured according to the third normal form (3NF), which will be described in depth in the next section. He asserts that robust relational modeling contributes...

Choosing a data model

Modelers should be familiar with these modeling techniques and apply them when appropriate. In practice, most systems have mixed data models, and finding a pure data model that adheres to all the recommended characteristics explored in this chapter is nearly impossible.

As a general guideline, normalization techniques are suitable for transactional systems focused on capturing events, sometimes involving millions of rows in a short period. The strength of the 3NF is that the updates and inserts of these transactions impact the database in only one place.

The dimensional model or star schema is more suitable for analytical purposes, such as drilling down on data, generating reports, and performing aggregations or calculations. This modeling technique is integrated into some business intelligence tools, making it easier for users to understand and query directly for data extraction and create reports or dashboards.

Highly complex systems with multiple sources...

Summary

In this chapter, we have covered the benefits of data modeling, some of the best practices to build a data model, and the principal methodologies used to model a data warehouse. After reviewing their main characteristics, we covered how to choose the best model to fulfill the needs of the business according to the use case.

In the next chapter, we will talk about data transformation and how the evolution of tools has drastically changed the way it is carried out.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Fundamentals of Analytics Engineering
Published in: Mar 2024Publisher: PacktISBN-13: 9781837636457
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (7)

author image
Dumky De Wilde

Dumky is an award-winning analytics engineer with close to 10 years of experience in setting up data pipelines, data models and cloud infrastructure. Dumky has worked with a multitude of clients from government to fintech and retail. His background is in marketing analytics and web tracking implementations, but he has since branched out to include other areas and deliver value from data and analytics across the entire organization.
Read more about Dumky De Wilde

author image
Fanny Kassapian

Fanny has a multidisciplinary background across various industries, giving her a unique perspective on analytics workflows, from engineering pipelines to driving value for the business. As a consultant, Fanny helps companies translate opportunities and business needs into technical solutions, implement analytics engineering best practices to streamline their pipelines, and treat data as a product. She is an avid promoter of data democratization, through technology and literacy
Read more about Fanny Kassapian

author image
Jovan Gligorevic

Jovan, an Analytics Engineer, specializes in data modeling and building analytical dashboards. Passionate about delivering end-to-end analytics solutions and enabling self-service analytics, he has a background in business and data science. With skills ranging from machine learning to dashboarding, Jovan has democratized data across diverse industries. Proficient in various tools and programming languages, he has extensive experience with the modern data stack. Jovan enjoys providing trainings in dbt and Power BI, sharing his knowledge generously
Read more about Jovan Gligorevic

author image
Juan Manuel Perafan

Juan Manuel Perafan 8 years of experience in the realm of analytics (5 years as a consultant). Juan was the first analytics engineer hired by Xebia back in 2020. Making him one of the earliest adopters of this way of working. Besides helping his clients realizing the value of their data, Juan is also very active in the data community. He has spoken at dozens of conferences and meetups around the world (including Coalesce 2023). Additionally, he is the founder of the Analytics Engineering meetup in the Netherlands as well as the Dutch dbt meetup
Read more about Juan Manuel Perafan

author image
Lasse Benninga

Lasse has been working in the dataspace since 2018, starting out as a Data Engineer at a large airline, then switching towards Cloud Engineering for a consultancy and working for different clients in the retailing and healthcare space. Since 2021, he's an Analytics Engineer at Xebia Data, merging software/platform engineering with analytics passion. As a consultant Lasse has seen many different clients, ranging from retail, healthcare, ridesharing industry, and trading companies. He has implemented multiple data platforms and worked in all three major clouds, leveraging his knowledge of data and analytics to provide value
Read more about Lasse Benninga

author image
Ricardo Angel Granados Lopez

Ricardo, an Analytics Engineer with a strong background in data engineering and analysis, is a quick learner and tech enthusiast. With a Master's in IT Management specializing in Data Science, he excels in using various programming languages and tools to deliver valuable insights. Ricardo, experienced in diverse industries like energy, transport, and fintech, is adept at finding alternative solutions for optimal results. As an Analytics Engineer, he focuses on driving value from data through efficient data modeling, using best practices, automating tasks and improving data quality
Read more about Ricardo Angel Granados Lopez

author image
Taís Laurindo Pereira

Taís is a versatile data professional with experience in a diverse range of organizations - from big corporations to scale-ups. Before her move to Xebia, she had the chance to develop distinct data products, such as dashboards and machine learning implementations. Currently, she has been focusing on end-to-end analytics as an Analytics Engineer. With a mixed background in engineering and business, her mission is to contribute to data democratization in organizations, by helping them to overcome challenges when working with data at scale
Read more about Taís Laurindo Pereira