Reader small image

You're reading from  Fundamentals of Analytics Engineering

Product typeBook
Published inMar 2024
PublisherPackt
ISBN-139781837636457
Edition1st Edition
Right arrow
Authors (7):
Dumky De Wilde
Dumky De Wilde
author image
Dumky De Wilde

Dumky is an award-winning analytics engineer with close to 10 years of experience in setting up data pipelines, data models and cloud infrastructure. Dumky has worked with a multitude of clients from government to fintech and retail. His background is in marketing analytics and web tracking implementations, but he has since branched out to include other areas and deliver value from data and analytics across the entire organization.
Read more about Dumky De Wilde

Fanny Kassapian
Fanny Kassapian
author image
Fanny Kassapian

Fanny has a multidisciplinary background across various industries, giving her a unique perspective on analytics workflows, from engineering pipelines to driving value for the business. As a consultant, Fanny helps companies translate opportunities and business needs into technical solutions, implement analytics engineering best practices to streamline their pipelines, and treat data as a product. She is an avid promoter of data democratization, through technology and literacy
Read more about Fanny Kassapian

Jovan Gligorevic
Jovan Gligorevic
author image
Jovan Gligorevic

Jovan, an Analytics Engineer, specializes in data modeling and building analytical dashboards. Passionate about delivering end-to-end analytics solutions and enabling self-service analytics, he has a background in business and data science. With skills ranging from machine learning to dashboarding, Jovan has democratized data across diverse industries. Proficient in various tools and programming languages, he has extensive experience with the modern data stack. Jovan enjoys providing trainings in dbt and Power BI, sharing his knowledge generously
Read more about Jovan Gligorevic

Juan Manuel Perafan
Juan Manuel Perafan
author image
Juan Manuel Perafan

Juan Manuel Perafan 8 years of experience in the realm of analytics (5 years as a consultant). Juan was the first analytics engineer hired by Xebia back in 2020. Making him one of the earliest adopters of this way of working. Besides helping his clients realizing the value of their data, Juan is also very active in the data community. He has spoken at dozens of conferences and meetups around the world (including Coalesce 2023). Additionally, he is the founder of the Analytics Engineering meetup in the Netherlands as well as the Dutch dbt meetup
Read more about Juan Manuel Perafan

Lasse Benninga
Lasse Benninga
author image
Lasse Benninga

Lasse has been working in the dataspace since 2018, starting out as a Data Engineer at a large airline, then switching towards Cloud Engineering for a consultancy and working for different clients in the retailing and healthcare space. Since 2021, he's an Analytics Engineer at Xebia Data, merging software/platform engineering with analytics passion. As a consultant Lasse has seen many different clients, ranging from retail, healthcare, ridesharing industry, and trading companies. He has implemented multiple data platforms and worked in all three major clouds, leveraging his knowledge of data and analytics to provide value
Read more about Lasse Benninga

Ricardo Angel Granados Lopez
Ricardo Angel Granados Lopez
author image
Ricardo Angel Granados Lopez

Ricardo, an Analytics Engineer with a strong background in data engineering and analysis, is a quick learner and tech enthusiast. With a Master's in IT Management specializing in Data Science, he excels in using various programming languages and tools to deliver valuable insights. Ricardo, experienced in diverse industries like energy, transport, and fintech, is adept at finding alternative solutions for optimal results. As an Analytics Engineer, he focuses on driving value from data through efficient data modeling, using best practices, automating tasks and improving data quality
Read more about Ricardo Angel Granados Lopez

Taís Laurindo Pereira
Taís Laurindo Pereira
author image
Taís Laurindo Pereira

Taís is a versatile data professional with experience in a diverse range of organizations - from big corporations to scale-ups. Before her move to Xebia, she had the chance to develop distinct data products, such as dashboards and machine learning implementations. Currently, she has been focusing on end-to-end analytics as an Analytics Engineer. With a mixed background in engineering and business, her mission is to contribute to data democratization in organizations, by helping them to overcome challenges when working with data at scale
Read more about Taís Laurindo Pereira

View More author details
Right arrow

Transforming Data

In the fast-paced world of data-driven decision-making, the ability to transform raw data into valuable insights is a critical competitive advantage. This is part of the foundation of analytics work: enabling organizations to extract meaningful information from vast and complex datasets.

In Chapter 5, Data Modeling, we learned about the benefits and techniques of data modeling. A well-designed data model guides the transformation process that turns raw data into ready-to-use data.

This chapter covers the following topics:

  • Transforming data – the foundation of analytics work
  • Design choices
  • Tools that facilitate data transformations

Transforming data – the foundation of analytics work

Raw data is of little value unless it is transformed into valuable insights. In this section, we look into why transforming data is such a crucial step within the data value chain.

A key step in the data value chain

The digitalization of businesses, rapid advancements in technologies such as the Internet of Things (IoT), and the proliferation of smart devices, social media, and advertising platforms have generated abundant data that companies can leverage to understand their customers better, optimize their operations, and enhance their products and services. These phenomena, together known as big data, have revolutionized how organizations operate, make decisions, and generate revenue.

However, collecting data is not enough to create a competitive edge in the market. The zettabytes of data that organizations collect daily must be transformed into actionable insight to be of any value. Quantity does not equal quality...

Design choices

To implement data transformations that are robust and scalable, we must make some conscious design choices. Agreeing on how and where you will transform your data will allow your team to collaborate more effectively and coherently across pipelines.

Where to apply transformations

As seen in Chapter 2, The Modern Data Stack, a high-level architecture of the data stack resembles the following:

 Figure 6.2 – High-level architecture example of a data stack (see Chapter 2, The Modern Data Stack)

Figure 6.2 – High-level architecture example of a data stack (see Chapter 2, The Modern Data Stack)

At each of these steps, you might consider applying transformations. For instance, as seen in Chapter 3, Data Ingestion, transformations performed during ingestion focus on shaping data into a format and structure that are compatible with the destination system, such as a relational database. This mainly involves parsing and translating source data. Sometimes, however, one might consider cleaning, aggregation, and enrichment during ingestion...

Data transformation best practices

As seen in previous chapters, analytics engineering embraces software engineering best practices to model, transform, test, deploy, and document data in a reusable way.

When it comes to writing transformation pipelines, SQL is the industry standard. Still, you might also want to use other languages, such as Python or Scala, depending on the tools you use for transformation.

The barrier to entry to writing SQL code is quite low. Thanks to its declarative nature, SQL is easy to read. Most data specialists know how to write SQL, making it easier for organizations to hire talent who can work with SQL pipelines, an important factor in democratizing transformation capabilities.

In this section, we will tackle SQL best practices for your transformation pipelines. We will also mention language specific to dbt and Databricks. In dbt, the SQL files in which developers write SELECT statements are called models. In Databricks, code is organized within...

Tools that facilitate data transformations

As data volumes, complexity, and usage continue to grow, tools that facilitate data transformations have emerged to simplify and streamline this process. During the early 2020s, dbt became one of the most prominent and used tools for data transformation.

These tools have revolutionized the way data engineers and analysts work, making it easier and more efficient to transform and analyze data. Previously, data transformations consisted of a series of files containing SQL queries, requiring extensive coding and scripting to ensure they ran in the desired order.

Now, most tools can transform data, sometimes with low or even no code. However, the fact that you can doesn’t mean that you should. Most of the time, the answer to whether you should transform data in a tool is “it depends.” The following section explores what to consider when making that decision.

Types of transformation tools

As seen in the Where to...

Summary

In this chapter, we have discussed the importance of data transformation pipelines as a crucial step in the data value chain. We have delved into the practical topics and best practices that developers should consider when designing their transformation pipelines.

Finally, we explored the families of tools that facilitate transformation and how to break down your needs to identify the right tools for your organization in an evolving industry.

In the next chapter, we will delve into the next step of the analytics workflow: Serving data.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Fundamentals of Analytics Engineering
Published in: Mar 2024Publisher: PacktISBN-13: 9781837636457
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (7)

author image
Dumky De Wilde

Dumky is an award-winning analytics engineer with close to 10 years of experience in setting up data pipelines, data models and cloud infrastructure. Dumky has worked with a multitude of clients from government to fintech and retail. His background is in marketing analytics and web tracking implementations, but he has since branched out to include other areas and deliver value from data and analytics across the entire organization.
Read more about Dumky De Wilde

author image
Fanny Kassapian

Fanny has a multidisciplinary background across various industries, giving her a unique perspective on analytics workflows, from engineering pipelines to driving value for the business. As a consultant, Fanny helps companies translate opportunities and business needs into technical solutions, implement analytics engineering best practices to streamline their pipelines, and treat data as a product. She is an avid promoter of data democratization, through technology and literacy
Read more about Fanny Kassapian

author image
Jovan Gligorevic

Jovan, an Analytics Engineer, specializes in data modeling and building analytical dashboards. Passionate about delivering end-to-end analytics solutions and enabling self-service analytics, he has a background in business and data science. With skills ranging from machine learning to dashboarding, Jovan has democratized data across diverse industries. Proficient in various tools and programming languages, he has extensive experience with the modern data stack. Jovan enjoys providing trainings in dbt and Power BI, sharing his knowledge generously
Read more about Jovan Gligorevic

author image
Juan Manuel Perafan

Juan Manuel Perafan 8 years of experience in the realm of analytics (5 years as a consultant). Juan was the first analytics engineer hired by Xebia back in 2020. Making him one of the earliest adopters of this way of working. Besides helping his clients realizing the value of their data, Juan is also very active in the data community. He has spoken at dozens of conferences and meetups around the world (including Coalesce 2023). Additionally, he is the founder of the Analytics Engineering meetup in the Netherlands as well as the Dutch dbt meetup
Read more about Juan Manuel Perafan

author image
Lasse Benninga

Lasse has been working in the dataspace since 2018, starting out as a Data Engineer at a large airline, then switching towards Cloud Engineering for a consultancy and working for different clients in the retailing and healthcare space. Since 2021, he's an Analytics Engineer at Xebia Data, merging software/platform engineering with analytics passion. As a consultant Lasse has seen many different clients, ranging from retail, healthcare, ridesharing industry, and trading companies. He has implemented multiple data platforms and worked in all three major clouds, leveraging his knowledge of data and analytics to provide value
Read more about Lasse Benninga

author image
Ricardo Angel Granados Lopez

Ricardo, an Analytics Engineer with a strong background in data engineering and analysis, is a quick learner and tech enthusiast. With a Master's in IT Management specializing in Data Science, he excels in using various programming languages and tools to deliver valuable insights. Ricardo, experienced in diverse industries like energy, transport, and fintech, is adept at finding alternative solutions for optimal results. As an Analytics Engineer, he focuses on driving value from data through efficient data modeling, using best practices, automating tasks and improving data quality
Read more about Ricardo Angel Granados Lopez

author image
Taís Laurindo Pereira

Taís is a versatile data professional with experience in a diverse range of organizations - from big corporations to scale-ups. Before her move to Xebia, she had the chance to develop distinct data products, such as dashboards and machine learning implementations. Currently, she has been focusing on end-to-end analytics as an Analytics Engineer. With a mixed background in engineering and business, her mission is to contribute to data democratization in organizations, by helping them to overcome challenges when working with data at scale
Read more about Taís Laurindo Pereira