Reader small image

You're reading from  Fundamentals of Analytics Engineering

Product typeBook
Published inMar 2024
PublisherPackt
ISBN-139781837636457
Edition1st Edition
Right arrow
Authors (7):
Dumky De Wilde
Dumky De Wilde
author image
Dumky De Wilde

Dumky is an award-winning analytics engineer with close to 10 years of experience in setting up data pipelines, data models and cloud infrastructure. Dumky has worked with a multitude of clients from government to fintech and retail. His background is in marketing analytics and web tracking implementations, but he has since branched out to include other areas and deliver value from data and analytics across the entire organization.
Read more about Dumky De Wilde

Fanny Kassapian
Fanny Kassapian
author image
Fanny Kassapian

Fanny has a multidisciplinary background across various industries, giving her a unique perspective on analytics workflows, from engineering pipelines to driving value for the business. As a consultant, Fanny helps companies translate opportunities and business needs into technical solutions, implement analytics engineering best practices to streamline their pipelines, and treat data as a product. She is an avid promoter of data democratization, through technology and literacy
Read more about Fanny Kassapian

Jovan Gligorevic
Jovan Gligorevic
author image
Jovan Gligorevic

Jovan, an Analytics Engineer, specializes in data modeling and building analytical dashboards. Passionate about delivering end-to-end analytics solutions and enabling self-service analytics, he has a background in business and data science. With skills ranging from machine learning to dashboarding, Jovan has democratized data across diverse industries. Proficient in various tools and programming languages, he has extensive experience with the modern data stack. Jovan enjoys providing trainings in dbt and Power BI, sharing his knowledge generously
Read more about Jovan Gligorevic

Juan Manuel Perafan
Juan Manuel Perafan
author image
Juan Manuel Perafan

Juan Manuel Perafan 8 years of experience in the realm of analytics (5 years as a consultant). Juan was the first analytics engineer hired by Xebia back in 2020. Making him one of the earliest adopters of this way of working. Besides helping his clients realizing the value of their data, Juan is also very active in the data community. He has spoken at dozens of conferences and meetups around the world (including Coalesce 2023). Additionally, he is the founder of the Analytics Engineering meetup in the Netherlands as well as the Dutch dbt meetup
Read more about Juan Manuel Perafan

Lasse Benninga
Lasse Benninga
author image
Lasse Benninga

Lasse has been working in the dataspace since 2018, starting out as a Data Engineer at a large airline, then switching towards Cloud Engineering for a consultancy and working for different clients in the retailing and healthcare space. Since 2021, he's an Analytics Engineer at Xebia Data, merging software/platform engineering with analytics passion. As a consultant Lasse has seen many different clients, ranging from retail, healthcare, ridesharing industry, and trading companies. He has implemented multiple data platforms and worked in all three major clouds, leveraging his knowledge of data and analytics to provide value
Read more about Lasse Benninga

Ricardo Angel Granados Lopez
Ricardo Angel Granados Lopez
author image
Ricardo Angel Granados Lopez

Ricardo, an Analytics Engineer with a strong background in data engineering and analysis, is a quick learner and tech enthusiast. With a Master's in IT Management specializing in Data Science, he excels in using various programming languages and tools to deliver valuable insights. Ricardo, experienced in diverse industries like energy, transport, and fintech, is adept at finding alternative solutions for optimal results. As an Analytics Engineer, he focuses on driving value from data through efficient data modeling, using best practices, automating tasks and improving data quality
Read more about Ricardo Angel Granados Lopez

Taís Laurindo Pereira
Taís Laurindo Pereira
author image
Taís Laurindo Pereira

Taís is a versatile data professional with experience in a diverse range of organizations - from big corporations to scale-ups. Before her move to Xebia, she had the chance to develop distinct data products, such as dashboards and machine learning implementations. Currently, she has been focusing on end-to-end analytics as an Analytics Engineer. With a mixed background in engineering and business, her mission is to contribute to data democratization in organizations, by helping them to overcome challenges when working with data at scale
Read more about Taís Laurindo Pereira

View More author details
Right arrow

Automating Workflows

In the previous chapter, you got acquainted with challenges surrounding ownership and accountability in a development team, ensuring coding standards are adhered to and task tracking. We had a look at how to do a code review and got to know some best practices around writing documentation for easily onboarding new colleagues. We briefly mentioned Continuous Integration/Continuous Deployment or CI/CD as a way to do functionality tests and formatting checks on your code and deploy changes into production in an automated fashion. In this chapter, you will learn about data orchestration and automating workflows, and we will give you all the details you need to know to successfully build a CI/CD pipeline as an analytics engineer.

By the end of this chapter, you will have gained a comprehensive understanding of several key concepts and practices. Firstly, you will understand the critical role of CI/CD within the realm of DataOps. Additionally, you will have a clear...

Introducing DataOps

DataOps is a relatively new term and is derived from the more prominent term DevOps, which has been around for quite some time now. This term hails from software engineering and refers to the practices that aim to automate and streamline the software delivery process, making it faster, more reliable, and more efficient.

DevOps and DataOps are related. DevOps is a set of practices that emphasizes collaboration and communication between development and operations teams. The goal of DevOps is to streamline the software development and deployment process, enabling teams to release software faster and more reliably.

The key elements of DevOps include continuous integration (CI), continuous delivery, and continuous deployment (CD). CI involves regularly merging code changes into a central repository and testing this code, while CD involves automating the deployment of code changes.

With the increasing importance of data in today’s world, a new term has...

Orchestrating data pipelines

Data orchestration refers to the automated and coordinated management of data workflows across various stages of the analytics life cycle. This process involves integrating, cleansing, transforming, and moving data from diverse sources to a data warehouse, where it can be readily accessed and analyzed for insights.

We have already covered how dbt empowers data engineers and data analysts to move away from repetitive stored procedures and employ version control and testing found in traditional software engineering to help run data workflows. dbt is largely used for the transformation part of the analytics workflow.

On the other hand, the goal of data orchestration in analytics engineering is to streamline the flow of data through different tools and processes, ensuring that it is accurate, timely, and in the right format for analysis. This involves scheduling dbt jobs in the correct sequence – for example, managing dependencies between tasks...

Continuous integration

CI is part of CI/CD, a practice hailing from software engineering and DevOps. CI allows developers to test their code before code is merged into production, in an automated fashion.

Integration

The term integration in CI refers to the process of combining code changes from multiple contributors into a shared unified code base. The key principle behind CI is to automate the process of merging code changes regularly and consistently, thereby reducing the risk of merge conflicts that may arise when integrating changes made by different team members.

Merge conflicts arise when code from different branches is merged, and certain code or functionality is present in one branch but not in the other, where you may have different lines of code. When you build on a separate feature branch, the code will probably have diverged from what is in the main branch.

Before code is merged to production, this could be your main or master branch in your version control...

Continuous deployment

In analytics engineering, CD refers to the automated process of deploying all changes in data models, scripts, and configurations to the production environment, post-testing. This practice ensures that new features, bug fixes, and updates are swiftly and reliably deployed into production, usually on the condition that the CI pipeline has been completed successfully. Therefore, in essence, a deployment job deploys the modified models into production. As you can imagine, this defaults to running a dbt build or run command, just like any other dbt job. The subtle difference is in the naming – a deployment job is a dedicated job that is only ever called after a CI job.

Let’s break it down:

  • Deployment: The deployment aspect of CI/CD refers to deploying changes in your dbt models, YAML configuration files, and any changes you have committed to your development feature branch. Essentially, with deployment, we mean running your dbt models in a...

Continuous delivery

Continuous delivery also involves testing any code change, whether it’s a new feature, bug fix, or experiment, and preparing it for release to a production environment. However, the actual deployment to production in continuous delivery is a manual step, requiring a human decision to make the final push. The goal of continuous delivery is to ensure that software can be reliably released at any time, creating a stable, reliable, and repeatable process for releasing software. The benefits include having a deployable product after every change and more reliable software delivery. This approach also eases the pressure on the decision to release as the product is always ready for deployment.

First, let’s understand how continuous delivery and CD are different.

Continuous delivery versus continuous deployment

Continuous deployment (CD) and continuous delivery, while closely related in the realm of software development, particularly in automated...

Summary

In this chapter, we saw how integrating and deploying your code can be automated. We also discussed how analytics engineering is leveraging existing practices from software engineering when it comes to using CI/CD for automated tests and deployment to ensure code quality and delivery promptly. Testing code for formatting, compilation errors, and unit testing functions and macros is an essential part of testing your code before deploying to production.

In this chapter, we introduced the paradigm of DataOps, its ideas and principles, and how it is strongly influenced by practices from software engineering. Then, we introduced the concept of CI, discussed the necessity of testing your code for formatting issues, compilation and runtime errors, and covered unit testing your functions and macros. Next, we talked about CD, the concept of state comparison and idempotency, and enabling Slim CI/CD runs in dbt Cloud using state deferral to reduce costs and time. Lastly, we briefly...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Fundamentals of Analytics Engineering
Published in: Mar 2024Publisher: PacktISBN-13: 9781837636457
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (7)

author image
Dumky De Wilde

Dumky is an award-winning analytics engineer with close to 10 years of experience in setting up data pipelines, data models and cloud infrastructure. Dumky has worked with a multitude of clients from government to fintech and retail. His background is in marketing analytics and web tracking implementations, but he has since branched out to include other areas and deliver value from data and analytics across the entire organization.
Read more about Dumky De Wilde

author image
Fanny Kassapian

Fanny has a multidisciplinary background across various industries, giving her a unique perspective on analytics workflows, from engineering pipelines to driving value for the business. As a consultant, Fanny helps companies translate opportunities and business needs into technical solutions, implement analytics engineering best practices to streamline their pipelines, and treat data as a product. She is an avid promoter of data democratization, through technology and literacy
Read more about Fanny Kassapian

author image
Jovan Gligorevic

Jovan, an Analytics Engineer, specializes in data modeling and building analytical dashboards. Passionate about delivering end-to-end analytics solutions and enabling self-service analytics, he has a background in business and data science. With skills ranging from machine learning to dashboarding, Jovan has democratized data across diverse industries. Proficient in various tools and programming languages, he has extensive experience with the modern data stack. Jovan enjoys providing trainings in dbt and Power BI, sharing his knowledge generously
Read more about Jovan Gligorevic

author image
Juan Manuel Perafan

Juan Manuel Perafan 8 years of experience in the realm of analytics (5 years as a consultant). Juan was the first analytics engineer hired by Xebia back in 2020. Making him one of the earliest adopters of this way of working. Besides helping his clients realizing the value of their data, Juan is also very active in the data community. He has spoken at dozens of conferences and meetups around the world (including Coalesce 2023). Additionally, he is the founder of the Analytics Engineering meetup in the Netherlands as well as the Dutch dbt meetup
Read more about Juan Manuel Perafan

author image
Lasse Benninga

Lasse has been working in the dataspace since 2018, starting out as a Data Engineer at a large airline, then switching towards Cloud Engineering for a consultancy and working for different clients in the retailing and healthcare space. Since 2021, he's an Analytics Engineer at Xebia Data, merging software/platform engineering with analytics passion. As a consultant Lasse has seen many different clients, ranging from retail, healthcare, ridesharing industry, and trading companies. He has implemented multiple data platforms and worked in all three major clouds, leveraging his knowledge of data and analytics to provide value
Read more about Lasse Benninga

author image
Ricardo Angel Granados Lopez

Ricardo, an Analytics Engineer with a strong background in data engineering and analysis, is a quick learner and tech enthusiast. With a Master's in IT Management specializing in Data Science, he excels in using various programming languages and tools to deliver valuable insights. Ricardo, experienced in diverse industries like energy, transport, and fintech, is adept at finding alternative solutions for optimal results. As an Analytics Engineer, he focuses on driving value from data through efficient data modeling, using best practices, automating tasks and improving data quality
Read more about Ricardo Angel Granados Lopez

author image
Taís Laurindo Pereira

Taís is a versatile data professional with experience in a diverse range of organizations - from big corporations to scale-ups. Before her move to Xebia, she had the chance to develop distinct data products, such as dashboards and machine learning implementations. Currently, she has been focusing on end-to-end analytics as an Analytics Engineer. With a mixed background in engineering and business, her mission is to contribute to data democratization in organizations, by helping them to overcome challenges when working with data at scale
Read more about Taís Laurindo Pereira