Packt+ | Advance your knowledge in tech

You're reading from Hands-On Data Warehousing with Azure Data Factory

Product typeBook

Published inMay 2018

PublisherPackt

ISBN-139781789137620

Edition1st Edition

Tools

Azure

Concepts

Big Data

Authors (3):

Christian Cote

Michelle Gutzait

Giuseppe Ciaburro

View More author details

Chapter 4. Azure Data Lake

One of the biggest problems that mid enterprise-sized organizations face is that data resides everywhere. Over the years, data has been accumulated usually by different systems, third-party, or in-house developed applications. Many vendors have set up a requirement to segregate their database servers in order to ensure performance, security, and management of their systems. Also, third-party vendors did not or do not want to take responsibility for their systems in a shared environment.

Organizations are starting to realize, or are already in the process of realizing, that consolidation is a must, both from the cost perspective as well as for easier manageability. However, in many cases, the vendors or developers are no longer to be found, which makes it very hard to make decisions to upgrade and/or migrate to the cloud. What could complicate things even further is the fact that shared or centralized data may be replicated everywhere and there may not even be one...

Creating and configuring Data Lake Store

We will first create and configure the Data Lake Store:

Open the Azure Portal. If you are just starting, you will not see any resource configured under the All resources and ALL SUBSCRIPTIONS section:

On the top left, click on Create a resource; enter the words Data Lake in Search the Marketplace:

Select Data Lake Store from the list (third option in the image) if you have no Data Lake stores yet; the following screen will open up:

Select Create.
Enter the details of the Data Lake. Note that the name has to be all lowercase and with no special characters. You will get a message as you type if you've entered any incorrect character. In this case, we are not using any encryption, for simplicity. Note that the default is encryption enabled. For more information about the encryption options, see Encryption of data in Azure Data Lake Store (https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-get-started-portal).

Select Create.

Once the Data...

Creating a Data Lake Analytics resource

In order to be able to run a U-SQL task or job, we need to create the Data Lake Analytics resource. In the Azure dashboard, click on New to create a new resource and look for the Data Lake Analytics resource in the new window:

Press Enter, and in the new window, click on Create:

Data Lake Analytics blade

Enter the name of the new resource (note that the resource name should contain only lowercase letters and numbers) and the rest of the information:

We click on the Data Lake Store section and choose the Data Lake Store we have previously created:

And click on Create:

Find the new resource to ensure it was created:

All resources blade

We have created the Data Lake Analytics resource and now we can run U-SQL to manipulate or summarize data. We can run U-SQL either directly from the Data Lake Analytics Resource, via job, or from the Data Factory in a pipeline.

The next two sections will show you how to do the following:

Run U-SQL via a job in Data Lake Analytics...

Using the data factory to manipulate data in the Data Lake

In the previous section, we created the Data Lake Analytics Resource for the U-SQL task:

Even though possible, it is not at all straightforward to run U-SQL to connect directly to an SQL database. It involves tweaking firewalls and permissions. This is why we do not cover this part in the next section, which describes how to run a U-SQL job directly from the Data Lake Analytics resource.
It is much simpler to copy data from an SQL Server database to a file on Azure Blob Storage via the Azure Data Factory.
In this section, we show how to do this and then how to manipulate the copied data with U-SQL using the Azure Data Factory.

We will now create a pipeline in Azure Data Factory that will do the following:

Task 1: Import data from SQL Server (from a view) into a file on blob storage
Task 2: Use U-SQL to export summary data to a file on blob storage

Task 1 – copy/import data from SQL Server to a blob storage file using data factory

Let's create...

Run U-SQL from a job in the Data Lake Analytics

In this section, we will learn how to create a Data Lake Analytics job that will debug and run a U-SQL script. This job will summarize data from the file created by Task 1 in the preceding data factory pipeline (the task that imports SQL Server data into a blob file). The summary data will be copied to a new file on the blob storage.

With U-SQL, we can join different blob files and manipulate/summarize the data. We can also import data from different data sources. However, in this section, we will only provide a very basic U-SQL as an example.

Let's get started...

First, we open the Data Lake Analytics resource from the dashboard. We first need to add the Blob Storage account here. Open Data sources:

Click on Add data source:

Fill in the details:

You should see the added blob storage in the list:

You can explore the containers in the blob storage and files from the Data Lake Analytics | Data explorer:

Click on Data explorer:

In order to get the path...

Summary

In this chapter we saw the components of the Azure Data Lake and basic implementation of those components.

The rest of the chapter is locked

You have been reading a chapter from

Hands-On Data Warehousing with Azure Data Factory

Published in: May 2018Publisher: PacktISBN-13: 9781789137620

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Christian Cote

Christian Cote is an IT professional with more than 15 years of experience working in a data warehouse, Big Data, and business intelligence projects. Christian developed expertise in data warehousing and data lakes over the years and designed many ETL/BI processes using a range of tools on multiple platforms. He's been presenting at several conferences and code camps. He currently co-leads the SQL Server PASS chapter. He is also a Microsoft Data Platform Most Valuable Professional (MVP).
Read more about Christian Cote

Michelle Gutzait

Michelle Gutzait has been in IT for 30 years as a developer, business analyst, and database
Read more about Michelle Gutzait

Giuseppe Ciaburro

Giuseppe Ciaburro holds a PhD and two master's degrees. He works at the Built Environment Control Laboratory - Università degli Studi della Campania "Luigi Vanvitelli". He has over 25 years of work experience in programming, first in the field of combustion and then in acoustics and noise control. His core programming knowledge is in MATLAB, Python and R. As an expert in AI applications to acoustics and noise control problems, Giuseppe has wide experience in researching and teaching. He has several publications to his credit: monographs, scientific journals, and thematic conferences. He was recently included in the world's top 2% scientists list by Stanford University (2022).
Read more about Giuseppe Ciaburro

Other recommended products

Related to this chapter

ETL with Azure Cookbook

This book will take you through hand-on recipes for extracting, transforming, and loading data using big data tools and Azure services such as Data Factory and Azure Databricks. You will learn how to interact effectively with Azure services, along with covering automation with BIML and data profiling in Azure.

BookSep 2020446 pages

Azure Data Engineering Cookbook

This book will help you design and implement modern ETL workflows along with data management, monitoring, and security aspects to meet the current organization's needs. You will use various services such as Azure Data Factory, Azure Databricks, Azure Stream Analytics, and Azure Data Explorer to design efficient data processing solutions.

BookApr 2021454 pages

SQL Server 2017 Integration Services Cookbook

SQL Server Integration Services is a tool that facilitates data extraction, consolidation, and loading options (ETL), SQL Server coding enhancements, data warehousing, and customizations. With the help of this book, you’ll gain complete hands-on experience of SSIS 2017’s new features, and design and development improvements including SCD, Profiling, Tuning, and Customizations.

BookJun 2017558 pages

Azure Data Factory Cookbook

With the help of well-structured and practical recipes, this book will teach you how to integrate data from the cloud and on-premise. You’ll learn how to transform, clean, and consolidate data into a single data platform and get to grips with using ADF as the main ETL and orchestration tool for your data warehouse or data platform project.

BookDec 2020382 pages

MATLAB for Machine Learning

MATLAB is the language of choice for many researchers and mathematics experts for machine learning. This book will build a foundation for machine learning using MATLAB for beginners. It will also help you learn regression, clustering, classification, predictive analytics, artificial neural networks, and more with MATLAB.

BookAug 2017382 pages

Hands-On Machine Learning with Azure

This book will teach you how advanced machine learning can be performed in the cloud in a very cheap way. You will learn more about Azure ML processes as an enterprise-ready methodology. By the end of this book, you will implement machine learning and artificial intelligence concepts in your model to solve real-world problems.

BookOct 2018340 pages

Enterprise Internet of Things Handbook

Internet of Things is today and now. This “hand” book covers almost all the bare essential knowledge that is needed for an architect or a developer to build IoT solution. Right from understanding what IoT is and exploring various off of the shelf IoT platforms, this book has it all. This book also covers Machine Learning IoT at a basic level using Azure Machine Learning Studio.

BookApr 2018332 pages

Limitless Analytics with Azure Synapse

This book helps you understand the basic concepts and techniques of using Azure Synapse step-by-step. You'll gradually gain the skills you need to work with data and develop analytics solutions using the Azure analytics platform even with no prior knowledge of Azure.

BookJun 2021392 pages

Learn Power Query

This book will effectively guide you through Power Query, starting with the shortcomings of other tools with regard to data analysis and management. You’ll then delve into the Power Query interface, understand how to connect, combine, and refine data with query tools, and finally create dashboards and multi-dimensional reports in Power Query.

BookJul 2020428 pages

Hands-On SQL Server 2019 Analysis Services

This book will expand your ability to deliver meaningful, performant solutions to your organization. You’ll learn how to use an analytical engine for decision making and business analytics. With the help of this practical guide, you’ll also be able to work confidently with data and analytics.

BookOct 2020474 pages

Cloud Scale Analytics with Azure Data Services

This book will help you to understand the architectural components of a modern data warehouse and select those suitable for your requirements. You’ll learn everything from how to integrate your source data into Azure Data Lake at scale to how to structure your analytical data estate and more.

BookJul 2021520 pages

Introducing Microsoft SQL Server 2019

Introducing Microsoft SQL Server 2019 takes you through what’s new in SQL Server 2019 and why it matters. After reading this book, you’ll be well placed to explore exactly how you can make MIcrosoft SQL Server 2019 work best for you.

BookApr 2020488 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages