Reader small image

You're reading from  Data Engineering with dbt

Product typeBook
Published inJun 2023
PublisherPackt
ISBN-139781803246284
Edition1st Edition
Right arrow
Author (1)
Roberto Zagni
Roberto Zagni
author image
Roberto Zagni

Roberto Zagni is a senior leader with extensive hands-on experience in data architecture, software development and agile methodologies. Roberto is an Electronic Engineer by training with a special interest in bringing software engineering best practices to cloud data platforms and growing great teams that enjoy what they do. He has been helping companies to better use their data, and now to transition to cloud based Data Automation with an agile mindset and proper SW engineering tools and processes, aka DataOps. Roberto also coaches data teams hands-on about practical data architecture and the use of patterns, testing, version control and agile collaboration. Since 2019 his go to tools are dbt, dbt Cloud and Snowflake or BigQuery.
Read more about Roberto Zagni

Right arrow

Delivering Reliability in Your Data

In the previous chapter, we learned how to ensure that your code behaves consistently and how to use macros and external libraries in the code.

In this chapter, you will learn how to ensure the reliability of your data by adding tests that verify your expectations and check the results of your transformations.

Finally, you will learn which tests make the most sense in each layer, and we will add many tests and macros to our sample project.

This chapter covers the following topics:

  • Testing to provide reliability
  • Testing the right things in the right places

Testing to provide reliability

If you take away only one thing from this section, please be it this: “Tests are going to be your sleep-well-at-night superpower.

Tests ensure that things go as you wish them to go and alert you when that’s not true.

They are your army, protecting your work and time from the evil influences of broken data sources and the inevitable code bugs (yours and from your colleagues).

That’s real power.

Lack of testing compounded with development without rigor, as people writing SQL queries often lack a software engineering background, makes firefighting issues the most common activity in too many data teams, and the idea of changing some heavily used part of the data pipelines (think customers or orders) can make a whole team sweat. Ever been there?

We want you to be free of these nightmares. In the previous chapters, we provided sound guidelines on how to organize your code and make it easy to read and maintain. In...

Testing the right things in the right places

In the previous sections, we saw how easy is to add tests in dbt; most of the time, you just add a tests: property with a few test names under a table or column in a YAML file.

Adding tests is so simple that it is easy to go overboard and start testing everything and everywhere, but please refrain from doing that, and remember that tests are code and, therefore, bring with them their fair share of errors (wrong application or configuration of generic tests from libraries and coding errors in our own singular tests), maintenance (false positives, errors without business value that can just be ignored, and constant test refactoring), and consequences (ignoring important tests because of constant test failure fatigue).

Tests also take time and money to execute, so waiting for useless tests to complete is a total waste of money, and of a developer’s time when they run tests in their development environments. It’s also not...

Summary

In this chapter, you saw multiple ways to test that incoming and resulting data conforms to your expectations, how to add more tests, and how to unit-test macros or your own code.

We have discussed what tests to apply in what layer of your projects, and we have applied such tests in the sample project.

In the next chapter, you will learn how to use dbt to work better as part of a team and automate the deployment and execution of your platform.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Engineering with dbt
Published in: Jun 2023Publisher: PacktISBN-13: 9781803246284
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Roberto Zagni

Roberto Zagni is a senior leader with extensive hands-on experience in data architecture, software development and agile methodologies. Roberto is an Electronic Engineer by training with a special interest in bringing software engineering best practices to cloud data platforms and growing great teams that enjoy what they do. He has been helping companies to better use their data, and now to transition to cloud based Data Automation with an agile mindset and proper SW engineering tools and processes, aka DataOps. Roberto also coaches data teams hands-on about practical data architecture and the use of patterns, testing, version control and agile collaboration. Since 2019 his go to tools are dbt, dbt Cloud and Snowflake or BigQuery.
Read more about Roberto Zagni