Chapter 2: Introducing ETL
When I first started in the data warehousing business, something like 20 years ago, I was asked in an interview to define ETL. Being at my first job interview, I had no clue what the interviewer meant by ETL. Luckily, the interviewer kind of liked me and hired me anyway. He told me that I would know all about ETL quite soon. Being in data warehouse businesses for many years, and more recently a data engineer, ETL is what has kept me busy most of the time since then.
ETL stands for Extract, Transform, and Load. ETL is a data moving technique that has been used in various forms since the first enterprise data warehouses' inceptions.
Microsoft formalized the ETL concept near the end of the 1990s with a tool called DTS: Data Transformation Service. This ETL tool, aimed at helping database administrators load data into and from SQL Server, used SQL and ActiveX to move and transform data on-premises.
Microsoft brought its ETL tool to the cloud with...
Creating a SQL Azure database
To do ETL, we need to have a source to query data and a target (often called a sink) to land the transformed data. Our first recipe will create a simple SQL Azure database that will be used as both the source and sink in all recipes of this chapter.
Getting ready
In Chapter 1, Getting Started with Azure and SSIS 2019, you were shown how to install SQL Server on-premises, SQL Server Management Studio (SSMS), and Visual Studio 2019 with the SSIS extension. This chapter will show you how to set up SQL Server in Azure and the recipes will use this version of SQL Server.
How to do it…
In a browser, go to the Microsoft Azure portal using the following URL: https://portal.azure.com/#home.
On the main portal page, we'll create a resource group to group all Azure services together. It's much easier to find and manage them later.:
- From the main portal page, click on the + sign to create a new resource, as shown in the...
Connecting SQL Management Studio to Azure SQL
This recipe will show you how to connect to our Azure SQL database from your computer.
Getting ready
This recipe assumes that you have installed SSMS.
How to do it…
This is a very short and simple recipe! Let's begin by opening SSMS:
- We're going to connect to the Azure database that we created in the previous recipe, Creating a SQL Azure database. Log into your Azure subscription and navigate to the ETLInAzureCookBook resource group.
- Click on the AdventureWorksLT SQL database to open its blade. Copy the server name found at the top right as shown in the following screenshot:
Figure 2.14 – Get the server name from the Azure subscription
- Using SSMS, log into the server using the username and password that we used in Step 7 of the previous recipe: Creating a SQL Azure database. Once connected, the Object Explorer shows the database and its related objects. Expanding the Tables folder lists...
Creating a simple ETL package
This recipe will show you how to build a simple ETL package that does the following:
- Extract data from the AdventureWorksLT SalesLT.Customer table.
- Transform data: Concatenating all customer-name-related information into a single column called FullName.
- Load the data into a new table called SalesLT.CustomerFullName.
The table we create will be used in the final recipe in this chapter, Loading data before its transformation, to explain ELT.
Getting ready
This recipe assumes that you have installed SSMS, Visual Studio 2019, and SSIS.
How to do it…
Let's begin by opening Visual Studio 2019:
- Select Create a new project from the main screen.
- Select Integration Services Project from the screen that appears, as shown in the following screenshot:
Figure 2.16 – Visual Studio – creating a new Integration Services Project
- As shown in the following screenshot, name the project AdventureWorksETL...
Loading data before its transformation
ELT is very similar to ETL, but with a crucial difference: the order of the transform and load steps are inverted. This is very useful with big data in the cloud or when we do not have an ETL tool on-premises. This recipe will be much simpler than the previous one, as we'll implement ELT using a database, so no tools are involved here except for calling the ELT task.
It also relies on the previous recipe, Creating a simple ETL package, since we're going to use the SalesLT.CustomerFullName table data to implement the ELT pattern.
There are essentially two parts to this recipe:
- Extract and load data into our data lake. Here, we don't have a real data lake; we're using AdventureWorksLT on Azure to mimic the data lake concept.
- Transform the data inside the database using a simple SQL script. We're going to add the FullName column to the SalesLT.Customer table and update it using this script.