Reader small image

You're reading from  Hands-On Data Warehousing with Azure Data Factory

Product typeBook
Published inMay 2018
PublisherPackt
ISBN-139781789137620
Edition1st Edition
Tools
Concepts
Right arrow
Authors (3):
Christian Cote
Christian Cote
author image
Christian Cote

Christian Cote is an IT professional with more than 15 years of experience working in a data warehouse, Big Data, and business intelligence projects. Christian developed expertise in data warehousing and data lakes over the years and designed many ETL/BI processes using a range of tools on multiple platforms. He's been presenting at several conferences and code camps. He currently co-leads the SQL Server PASS chapter. He is also a Microsoft Data Platform Most Valuable Professional (MVP).
Read more about Christian Cote

Michelle Gutzait
Michelle Gutzait
author image
Michelle Gutzait

Michelle Gutzait has been in IT for 30 years as a developer, business analyst, and database
Read more about Michelle Gutzait

Giuseppe Ciaburro
Giuseppe Ciaburro
author image
Giuseppe Ciaburro

Giuseppe Ciaburro holds a PhD and two master's degrees. He works at the Built Environment Control Laboratory - Università degli Studi della Campania "Luigi Vanvitelli". He has over 25 years of work experience in programming, first in the field of combustion and then in acoustics and noise control. His core programming knowledge is in MATLAB, Python and R. As an expert in AI applications to acoustics and noise control problems, Giuseppe has wide experience in researching and teaching. He has several publications to his credit: monographs, scientific journals, and thematic conferences. He was recently included in the world's top 2% scientists list by Stanford University (2022).
Read more about Giuseppe Ciaburro

View More author details
Right arrow

Chapter 2. Getting Started with Our First Data Factory

OK, enough talk, let's get our hands dirty. In this chapter, we will cover the basics of the data factory and show you what the components discussed in the previous chapter are doing.

Here is what's needed for this chapter, as well as all others in this book:

  • An active Azure subscription: if you don't have one, you can use a trial version at https://azure.microsoft.com/en-in/free/.
  • Once we have an Azure subscription, we will create the following Azure components:
    • A resource group that will group everything we're going to create together
    • An ADF with a copy activity
    • An Azure Blob storage
    • A SQL Azure database

Resource group


To create a resource group, we need to log on to the Azure portal by typing the following URL in a browser: http://Portal.Azure.com.

From the resource panel, click on the + New icon and enter resource group in the search box, as shown in the following screenshot:

Clicking on the search results opens the Resource group blade. We simply need to click Create at the bottom of the screen.

As shown in the following screenshot, we'll fill the textboxes with the following properties:

  • Resource group name: ADFV2Book.
  • Subscription: Should be filled with the active subscription used to create the resource group.
  • Resource group location: We can choose any location we want. Location is not that important for a resource group but for compliance reasons, we should use a location where all our Azure artifacts will be created.

Once we click on Create at the bottom of the blade, we get the Resource group created notification, as shown in the following screenshot:

We can pin it to the dashboard if needed...

Azure Data Factory


Now, we'll create the factory. The goal of the exercise is to copy data from a SQL Server table and bring it in an Azure Blob storage.

Log in to the Azure portal (http://Portal.Azure.com). In the resource section, click the + New icon. Click on Data + Analytics and select Data Factory, as shown in the following screenshot:

The New data factory blade opens. As shown in the following screenshot, fill the textboxes with the following values:

  • Name: The name of the factory might be later registered as DNS. It should be unique if at all possible. To make it unique, we might use our initials in front of it. There are naming rules for data factories, which can be found at https://docs.microsoft.com/en-us/azure/data-factory/naming-rules.
  • Subscription: Should be filled with the active subscription used to create the data factory.
  • Resource Group: We're using the resource group created earlier in this chapter.
  • Version: Since this book talks about V2, we'll use V2 of the data factory. At...

Azure Blob storage


At the heart of the Azure ecosystem is the blob storage service. This service, as its name indicates, provides file storage. There are different types of blobs, as not all of them are created equally. The next few sections will describe the different blobs and their usage.

Blob containers

Blob containers are the first level of blob storage. All blobs need a container as root access. It's like a folder in a standard operating system.

Types of blobs


There are several types of blobs for different usage. The next sections briefly describe the various blob types. For more information, please see the following link: https://docs.microsoft.com/en-us/azure/storage/.

Block blobs

Block blobs are used by most Azure data transfers. They can store application files (CSV, ZIP, and so on), tables used by NoSQL applications, and queues used by streaming services such as Azure ML (short for Azure Machine Learning). Throughout the examples in this book, we'll use block blobs as storage for some of our data transfers.

Page blobs

Page blobs are used for large file storage. Azure Virtual Machines (VMs) use this type of blob to store their disk image.

Replication of storage

Replication of storage represents how the blobs are replicated to ensure the safety of their contents in case of hardware failure. When we create a blob, one of the options we have to select is the replication type:

  • LRS (short for Local Redundant Storage): This storage replicates...

SQL Azure database


We'll now set up a database that will be used by our factory to copy data from. The Wide World Importers sample database is available at: https://github.com/Microsoft/sql-server-samples/releases/tag/wide-world-importers-v1.0.

A BACPAC is a file that contains database structures and data, similar to a database backup. The difference is that a BACPAC is a snapshot of a database at a specific time. A database backup is much more than that: the database can be restored up to the last few seconds. Also, a database backup can be incremental—that is, contain data and structures since the last backup. A BACPAC always contains all data.

The version we're using is the standard one, as shown in the following screenshot:

We'll now upload the BACPAC to the storage that we created in the previous section:

  1. Open Microsoft Azure Storage Explorer and right-click on the adfv2book storage account to create a container called database-bacpac, as shown in the following screenshot:

  1. Now, click Upload...

Summary


In this chapter, we created our factory with a pipeline that copies data from an Azure SQL database to an Azure storage account. The following chapters will dig more into the various services available in Azure as well as how we can integrate an existing SSIS solution into the factory.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Data Warehousing with Azure Data Factory
Published in: May 2018Publisher: PacktISBN-13: 9781789137620
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Christian Cote

Christian Cote is an IT professional with more than 15 years of experience working in a data warehouse, Big Data, and business intelligence projects. Christian developed expertise in data warehousing and data lakes over the years and designed many ETL/BI processes using a range of tools on multiple platforms. He's been presenting at several conferences and code camps. He currently co-leads the SQL Server PASS chapter. He is also a Microsoft Data Platform Most Valuable Professional (MVP).
Read more about Christian Cote

author image
Michelle Gutzait

Michelle Gutzait has been in IT for 30 years as a developer, business analyst, and database
Read more about Michelle Gutzait

author image
Giuseppe Ciaburro

Giuseppe Ciaburro holds a PhD and two master's degrees. He works at the Built Environment Control Laboratory - Università degli Studi della Campania "Luigi Vanvitelli". He has over 25 years of work experience in programming, first in the field of combustion and then in acoustics and noise control. His core programming knowledge is in MATLAB, Python and R. As an expert in AI applications to acoustics and noise control problems, Giuseppe has wide experience in researching and teaching. He has several publications to his credit: monographs, scientific journals, and thematic conferences. He was recently included in the world's top 2% scientists list by Stanford University (2022).
Read more about Giuseppe Ciaburro