You're reading from Azure Data Engineering Cookbook

Product typeBook

Published inApr 2021

Reading LevelBeginner

PublisherPackt

ISBN-139781800206557

Edition1st Edition

Languages

Python

Tools

Azure Stream Analytics

Concepts

Real Time Streaming

Author (1)

Ahmad Osama

Chapter 5: Control Flow Transformation and the Copy Data Activity in Azure Data Factory

In this chapter, we'll look at the transformation activities available in Azure Data Factory control flows. Transformation activities allow us to perform data transformation within the pipeline before loading data at the source.

In this chapter, we'll cover the following recipes:

Implementing HDInsight Hive and Pig activities
Implementing an Azure Functions activity
Implementing a Data Lake Analytics U-SQL activity
Copying data from Azure Data Lake Gen2 to an Azure Synapse SQL pool using the copy activity
Copying data from Azure Data Lake Gen2 to Azure Cosmos DB using the copy activity

Technical requirements

For this chapter, the following are required:

A Microsoft Azure subscription
PowerShell 7
Microsoft Azure PowerShell

Implementing HDInsight Hive and Pig activities

Azure HDInsight is an Infrastructure as a Service (IaaS) offering that lets you create big data clusters to use Apache Hadoop, Spark, and Kafka to process big data. We can also scale up or down the clusters as and when required.

Apache Hive, built on top of Apache Hadoop, facilitates querying big data on Hadoop clusters using SQL syntax. Using Hive, we can read files stored in the Apache Hadoop Distributed File System (HDFS) as an external table. We can then apply transformations to the table and write the data back to HDFS as files.

Apache Pig, built on top of Apache Hadoop, is a language to perform Extract, Transform, and Load (ETL) operations on big data. Using Pig, we can read, transform, and write the data stored in HDFS.

In this recipe, we'll use Azure Data Factory, HDInsight Hive, and Pig activities to read data from Azure Blob storage, aggregate the data, and write it back to Azure Blob storage.

Getting ready

...

Implementing an Azure Functions activity

Azure Functions is a serverless compute service that lets us run code without the need for any virtual machine or containers. In this recipe, we'll implement an Azure Functions activity to run an Azure function to resume an Azure Synapse SQL database.

Getting ready

To get started, do the following:

Log in to https://portal.azure.com using your Azure credentials.
Open a new PowerShell prompt. Execute the Connect-AzAccount command to log in to your Azure account from PowerShell.
You will need an existing Data Factory account. If you don't have one, create one by executing the ~/azure-data-engineering-cookbook\Chapter04\3_CreatingAzureDataFactory.ps1 PowerShell script.

How to do it…

Let's start by creating an Azure function to resume an Azure Synapse SQL database:

In the Azure portal, type functions in the Search box and select Function App from the search results:
Figure 5.13 –...

Implementing a Data Lake Analytics U-SQL activity

Azure Data Lake Analytics is an on-demand analytics service that allows you to process data using R, Python, and U-SQL without provisioning any infrastructure. All we need to do is to upload the data onto Data Lake, provision the Data Lake Analytics account, and run U-SQL to process the data.

In this recipe, we'll implement a Data Lake Analytics U-SQL activity to calculate total sales by country from the orders data stored in the Data Lake store.

Getting ready

To get started, do the following:

Log in to https://portal.azure.com using your Azure credentials.
Open a new PowerShell prompt. Execute the Connect-AzAccount command to log in to your Azure account from PowerShell.
You will need an existing Data Factory account. If you don't have one, create one by executing the ~/azure-data-engineering-cookbook\Chapter04\3_CreatingAzureDataFactory.ps1 PowerShell script.

How to do it…

Let&apos...

Copying data from Azure Data Lake Gen2 to an Azure Synapse SQL pool using the copy activity

The copy activity, as the name suggests, is used to copy data quickly from a source to a destination. In this recipe, we'll learn how to use the copy activity to copy data from Azure Data Lake Gen2 to an Azure Synapse SQL pool.

Getting ready

Before you start, do the following:

Log in to Azure from PowerShell. To do this, execute the Connect-AzAccount command and follow the instructions to log in to Azure.
Open https://portal.azure.com and log in using your Azure credentials.

How to do it…

Follow the given steps to perform the activity:

The first step is to create a new Azure Data Lake Gen2 storage account and upload the data. To create the storage account and upload the data, execute the following PowerShell command:
```
.\ADE\azure-data-engineering-cookbook\Chapter04\1_UploadOrderstoDataLake.ps1 -resourcegroupname packtade -storageaccountname packtdatalakestore...
```

Copying data from Azure Data Lake Gen2 to Azure Cosmos DB using the copy activity

In this recipe, we'll copy data from Azure Data Lake Gen2 to an Azure Cosmos DB SQL API. Azure Cosmos DB is a managed NoSQL database service and offers multiple NoSQL databases, such as MongoDB, DocumentDB, GraphDB (Gremlin), Azure Table storage, and Cassandra, to store data.

Getting ready

Before you start, do the following:

Log in to Azure from PowerShell. To do this, execute the following command and follow the instructions to log in to Azure:
```
Connect-AzAccount
```
Open https://portal.azure.com and log in using your Azure credentials.
Follow step 1 of the Copying data from Azure Data Lake Gen2 to an Azure Synapse SQL pool using the copy activity recipe to create and upload files to Azure Data Lake Storage Gen2.

To copy data from Azure Data Lake Storage Gen2 to a Cosmos DB SQL API, we'll do the following:

Create and upload data to the Azure Data Lake Storage...

The rest of the chapter is locked

You have been reading a chapter from

Azure Data Engineering Cookbook

Published in: Apr 2021Publisher: PacktISBN-13: 9781800206557

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Ahmad Osama

Ahmad Osama works for Pitney Bowes Pvt. Ltd. as a technical architect and is a former Microsoft Data Platform MVP. In his day job, he works on developing and maintaining high performant, on-premises and cloud SQL Server OLTP environments as well as deployment and automating tasks using PowerShell. When not working, Ahmad blogs at DataPlatformLabs and can be found glued to his Xbox.
Read more about Ahmad Osama

Other recommended products

Related to this chapter

Azure Data Factory Cookbook

With the help of well-structured and practical recipes, this book will teach you how to integrate data from the cloud and on-premise. You’ll learn how to transform, clean, and consolidate data into a single data platform and get to grips with using ADF as the main ETL and orchestration tool for your data warehouse or data platform project.

BookDec 2020382 pages

Professional Azure SQL Database Administration

As the cloud version of SQL Server, Azure SQL Database differs in key ways when it comes to management, maintenance, and administration. It’s important to know how to administer SQL Database to fully benefit from all of the features and functionality that it provides. This book addresses important aspects of an Azure SQL Database instance such as migration, backup restorations, pricing policies, security, scalability, monitoring, performance optimization, high availability, and disaster recovery. It is a complete guide for database administrators, and ideal for those who are planning to migrate from on premise SQL Server database to an Azure SQL Server database.

BookJul 2018398 pages

Professional Azure SQL Database Administration

This book is your one-stop solution to learning all that is needed to migrate a traditional on-premise SQL server database to a cloud-based solution with Microsoft Azure. Built with database administrators in mind, this book emulates different scenarios you might come across while working with large, complex SQL database migrations and provides solutions for effectively managing the migrated databases.

BookJul 2019562 pages

Limitless Analytics with Azure Synapse

This book helps you understand the basic concepts and techniques of using Azure Synapse step-by-step. You'll gradually gain the skills you need to work with data and develop analytics solutions using the Azure analytics platform even with no prior knowledge of Azure.

BookJun 2021392 pages

Professional Azure SQL Managed Database Administration

Whether it is learning different techniques to monitor and tune an Azure SQL database or improving performance using in-memory technology, this book will enable you to make the most out of Azure SQL database features and functionality for data management solutions.

BookMar 2021724 pages

Learning Microsoft Azure Storage

Microsoft Azure Storage is the bedrock of Microsoft's core storage solution offering in Azure. No matter what solution you are building for the cloud, you'll find a compelling use for Azure Storage. This book will help you get up-to-speed quickly on Microsoft Azure Storage by teaching you how to use the different storage services. You will be able to leverage secure design patterns based on real-world scenarios and develop a strong storage foundation for Azure virtual machines.

BookNov 2017276 pages

Azure Databricks Cookbook

The Azure Databricks Cookbook shows you how to work with the latest as well as older versions of Apache Spark and integrate with various Azure resources for orchestrating, deploying, and monitoring big data solutions. You'll use Azure Databricks to build end-to-end solutions and address challenges in securing, productionizing, and monitoring them.

BookSep 2021452 pages

ETL with Azure Cookbook

This book will take you through hand-on recipes for extracting, transforming, and loading data using big data tools and Azure services such as Data Factory and Azure Databricks. You will learn how to interact effectively with Azure services, along with covering automation with BIML and data profiling in Azure.

BookSep 2020446 pages

Hands-On Data Warehousing with Azure Data Factory

Azure Data Factory (ADF) is a Microsoft Azure PaaS solution which supports data movement between many on premises and cloud data sources. This book covers custom tailored tutorials to help you develop , maintain and troubleshoot data movement processes and environments using Azure Data Factory V2 and SQL Server Integration Services 2017

BookMay 2018284 pages

Cloud Scale Analytics with Azure Data Services

This book will help you to understand the architectural components of a modern data warehouse and select those suitable for your requirements. You’ll learn everything from how to integrate your source data into Azure Data Lake at scale to how to structure your analytical data estate and more.

BookJul 2021520 pages

Professional SQL Server High Availability and Disaster Recovery

The Professional SQL Server High Availability and Disaster Recovery book explains the high availability and the disaster recovery technologies, their technical implementation, and different topologies that you can use when creating a highly available infrastructure with hybrid topologies.

BookJan 2019564 pages

Cloud Analytics with Microsoft Azure

Cloud Analytics with Microsoft Azure enables you to understand the design and business considerations that you must keep in mind while planning to adopt the cloud analytics model for your business.

BookJan 2021184 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages