Reader small image

You're reading from  Data Engineering with dbt

Product typeBook
Published inJun 2023
PublisherPackt
ISBN-139781803246284
Edition1st Edition
Right arrow
Author (1)
Roberto Zagni
Roberto Zagni
author image
Roberto Zagni

Roberto Zagni is a senior leader with extensive hands-on experience in data architecture, software development and agile methodologies. Roberto is an Electronic Engineer by training with a special interest in bringing software engineering best practices to cloud data platforms and growing great teams that enjoy what they do. He has been helping companies to better use their data, and now to transition to cloud based Data Automation with an agile mindset and proper SW engineering tools and processes, aka DataOps. Roberto also coaches data teams hands-on about practical data architecture and the use of patterns, testing, version control and agile collaboration. Since 2019 his go to tools are dbt, dbt Cloud and Snowflake or BigQuery.
Read more about Roberto Zagni

Right arrow

Setting Up Your dbt Cloud Development Environment

In this chapter, we will start to work hands-on with dbt.

We will start by setting up a development environment, first creating a free GitHub account with a repository for your dbt project, then creating a free dbt Cloud account to host your first project. We will connect the dbt project to the Snowflake account that we created in the first chapter.

Once we have dbt Cloud up and running, we will have a look at the differences between dbt Cloud and the open-source version, dbt Core. In the rest of the book, we will use dbt Cloud as it requires no installation and offers the same experience independent of your operating system, plus a host of extra services and functionalities.

In this chapter, you will start to learn about the data engineering workflow when working with dbt and why Version Control (VC) is important.

We will close the chapter by experimenting with some SQL we saw in the first chapter and then by looking at...

Technical requirements

This chapter assumes only basic SQL knowledge, which you can get from Chapter 1 if you are new to it.

You will only need an email address to create a free GitHub account and a free dbt Cloud account.

Setting up your GitHub account

GitHub is an online service that offers free and paid access to VC based on the git Version Control System (VCS).

In this chapter, we will briefly introduce you to what a VCS is and why we need one, then we will guide you on how to create an account on GitHub, and finally, we will guide you through setting up a repository to hold your first project in dbt.

Introducing Version Control

Modern code development is based on VC, which is the ability to store source code in a central server so that multiple developers can retrieve the code, work on it, and send back a newer version to be stored. VC allows multiple people to collaborate on the same code base.

The main functionality of a VCS is to allow storing all the different versions of code in the order they are produced and going back and forth to a version of any file at any point in time. VC also allows easily identifying the changes between one version and the next.

A key functionality...

Setting up your dbt Cloud account

In this section, we will guide you through how to create a dbt Cloud account and then create your first dbt project using the repository you just created on GitHub.

While creating the first dbt project, we will also configure the integration between dbt Cloud and GitHub, so the process will have a few more steps that only need to be carried out once.

Signing up for a dbt Cloud account

You can easily sign up for a free dbt Cloud trial account, and after the initial trial period of 15 days, you will be able to choose between a paid Team plan or the forever free Developer plan.

Let’s get started:

  1. Go to the dbt home page at https://www.getdbt.com/ and click one of the two Start free buttons.
Figure 2.10: dbt home page

Figure 2.10: dbt home page

  1. Fill in your details in the form you land on and then click the Create my account button.
Figure 2.11: dbt subscription form

Figure 2.11: dbt subscription form

  1. You will...

Comparing dbt Core and dbt Cloud workflows

There are two dbt versions that you can decide to use:

  • dbt Core: This is open source software created by dbt Labs, developed in Python, that you can freely download and use locally from the command line on many operating systems, such as Windows, Mac, and Linux. It provides all the core functionalities of dbt and can also be used for commercial projects.
  • dbt Cloud: This is a commercial software created by dbt Labs that offers a Software-as-a-Service (SaaS) experience that includes all the functionalities from dbt Core wrapped in a web interface that makes it easier to use the dbt Core functionalities, adding many features that are useful when running a real-life data project.

The dbt Core open source product offers all the features to run a full dbt project, with no differences nor limitations in the core functionalities with respect to the dbt Cloud product, to the point that you can execute the same project under any of...

Experimenting with SQL in dbt Cloud

Now that you have dbt Cloud set up and connected to your Snowflake data platform, you can use the dbt IDE to issue SQL commands and see the results. The Preview feature allows you to easily test models while you are developing them, but it can also be used to just run any SQL on your DB.

In this section, we are going to run some of the examples we saw in the previous chapter so that you can get used to the dbt interface and experiment with SQL if you are not familiar with it.

Exploring the dbt Cloud IDE

Let’s start by looking at the dbt Cloud IDE. The default layout of the IDE is vertically divided in two; on the left, you have the VC area and below it a folder structure that is used to organize the files of your dbt project, while on the right, there are the main editing and result areas.

The main area of the screen is again divided in two; at the top, there is the editor, where you edit the source code of your models, while in...

Introducing the source and ref dbt functions

You have seen that in the dbt Cloud IDE, you can write any SQL and use the Preview button to execute it on the DB configured for your project. This is handy when you are exploring a dataset or perfecting a query, but it is just the tip of the iceberg.

In this section, we will look at the dbt default project and you will learn about the source and ref functions that are at the real core of how dbt works.

Exploring the dbt default model

Let’s list what the dbt default project contains:

  • README.md: This is a text file with some instructions and pointers to the dbt documentation
  • dbt_project.yml: The main configuration file
  • .gitignore: A git-specific file that lists resources to exclude from VC, such as the dbt_packages, target, and logs folders
  • Inside the models/example folder, we have two models and a config file:
    • my_first_dbt_model.sql: As the name suggests, this is the first model, which is just made up of...

Summary

Congratulations!

In this chapter, you have completed the setup of a world-class environment completely in the cloud and you have started to use dbt Cloud to edit and deploy your project to your DB.

You opened a GitHub account and created your first repository, then created your dbt Cloud account. Then, we walked through creating your first dbt project by connecting dbt Cloud to the GitHub account you just created and to the Snowflake account you created in the first chapter.

We then compared the dbt Core and dbt Cloud workflows and started to learn how to use the dbt Cloud IDE by trying out some SQL code from the first chapter.

In the last part of this chapter, you explored the default dbt project, learning about the ref and source functions, and how to run, test, and edit the models of a dbt project.

In the next chapter, Data Modeling for Data Engineering, you will learn the basics of data modeling and how to represent data models.

Further reading

If you want to learn more about git and how it works, the best place to start is the git website at https://git-scm.com/. You can even download a full book on git from there.

In this chapter, we have used the dbt default project to get you acquainted with both the dbt Cloud IDE and how dbt works, but there is much more to say about it, so you may be interested in the following sources:

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Engineering with dbt
Published in: Jun 2023Publisher: PacktISBN-13: 9781803246284
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Roberto Zagni

Roberto Zagni is a senior leader with extensive hands-on experience in data architecture, software development and agile methodologies. Roberto is an Electronic Engineer by training with a special interest in bringing software engineering best practices to cloud data platforms and growing great teams that enjoy what they do. He has been helping companies to better use their data, and now to transition to cloud based Data Automation with an agile mindset and proper SW engineering tools and processes, aka DataOps. Roberto also coaches data teams hands-on about practical data architecture and the use of patterns, testing, version control and agile collaboration. Since 2019 his go to tools are dbt, dbt Cloud and Snowflake or BigQuery.
Read more about Roberto Zagni