You're reading from Azure for Architects. - Second Edition

Product typeBook

Published inJan 2019

PublisherPackt

ISBN-139781789614503

Edition2nd Edition

Tools

Azure

Concepts

Cloud Computing

Author (1)

Ritesh Modi

Azure Big Data Solutions Using Azure Data Lake Storage and Data Factory

Big data has gaining significant traction in last few years. Specialized tools, software and storage are required to handle them. These tools, platforms and storage were not available as a service few years back. However, with new cloud technology, Azure is providing numerous tools, platform and resources to create big data solution easily.

The following topics will be covered in this chapter:

Data integration
Extract-Transform-Load (ETL)
Data Factory
Data Lake Storage
Migrating data from Azure Storage to Data Lake Storage

Data integration

We are all well aware of how integration patterns are used for applications: applications composed of multiple services are integrated together using a variety of patterns. However, there is another paradigm that is a requirement for many organizations, known as data integration. This has happened especially during the last decade, when the generation and availability of data has been incredibly high. The velocity, variety, and volume of data being generated has increased drastically, and there is data almost everywhere.

Every organization has many different types of applications, and they all generate data in their own proprietary format. Often, data is also purchased from the marketplace. Even during mergers and amalgamations of organizations, data needs to be migrated and combined.

Data integration refers to the process of bringing data from multiple sources...

ETL

A very popular process known as ETL helps in building a target data source to house data that is consumable by applications. Generally, the data is in a raw format, and to make it consumable, the data should go through the following three distinct phases:

Extract: During this phase, data is extracted from multiple places. There could be multiple sources and they all need to be connected to in order to retrieve the data. Extract phases typically use data connectors consisting of connection information related to the target data source. They might also have temporary storage to bring the data from the data source and store it for faster retrieval. This phase is responsible for the ingestion of data.
Transform: The data that is available after the extract phase might not be consumable directly by applications. This could be for a variety of reasons. The data might have irregularities...

A primer on Data Factory

Data Factory is a fully managed, highly available, highly scalable, and easy-to-use tool for creating integration solutions and implementing ETL phases. Data Factory helps create new pipelines in a drag and drop fashion using a user interface without writing any code; however, it still provides features to write code in your preferred language.

There are a few important concepts to learn about before using the Data Factory service, which we will be looking into in the following sections:

Activities: Activities are individual tasks that enable the execution and processing of logic within a Data Factory pipeline. There are multiple types of activities. There are activities related to data movement, data transformation, and control activities. Each activity has a policy through which it can decide the retry mechanism and retry interval.
Pipelines: Pipelines...

A primer on Data Lake Storage

Azure Data Lake Storage provides storage for big data solutions. It is especially designed for storing the large amounts of data that are typically needed in big data solutions. It is an Azure-provided managed service and is therefore completely managed by Azure. Customers need only bring their data and store it in a Data Lake.

There are two versions: version 1 (Gen1) and the current version, version 2 (Gen2). Gen2 has all the functionality of Gen1, with the difference that it is built on top of Azure Blob Storage.

As Azure Blob Storage is highly available, can be replicated multiple times, is disaster ready, and is low in cost, these benefits are transferred to Gen2 Data Lake. Data Lake can store any kind of data, including relational, non-relational, filesystem-based, and hierarchical data.

Creating a Data Lake Gen2 instance is as simple as creating...

Migrating data from Azure Storage to Data Lake Gen2 Storage

In this section, we will be migrating data from Azure Blob Storage to another Azure container of the same Azure Blob Storage instance, and we will also migrate data to an Azure Gen2 Data Lake instance using an Azure Data Factory pipeline. The following are the steps for creating such an end-to-end solution.

Preparing the source storage account

Before we can create Azure Data Factory pipelines and use them for migration, we need to create a new storage account, consisting of a couple of containers, and upload the data files. In the real world, these files and the storage connection would already be prepared.

...

Summary

This was another chapter on handling big data. This chapter dealt with the Azure Data Factory service, which is responsible for providing ETL services on Azure. Since it is a PaaS, it provides unlimited scalability, high availability, and easy-to-configure pipelines. Its integration with Azure DevOps and GitHub is also seamless. We also saw the features and benefits of using Azure Data Lake Gen2 storage for storing any kind of big data. It is a cost-effective, highly scalable, hierarchical data store for handling big data, with compatibility with Azure HDInsight, Databricks, and the Hadoop ecosystem.

The rest of the chapter is locked

You have been reading a chapter from

Azure for Architects. - Second Edition

Published in: Jan 2019Publisher: PacktISBN-13: 9781789614503

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at £13.99/month. Cancel anytime

Author (1)

Ritesh Modi

Ritesh Modi is a technologist with more than 18 years of experience. He holds a master's degree in science in AI/ML from LJMU. He has been recognized as a Microsoft Regional Director for his contributions to building tech communities, products, and services. He has published more than 10 tech books in the past and is a cloud architect, speaker, and leader who is popular for his contributions to data centers, Azure, Kubernetes, blockchain, cognitive services, DevOps, AI, and automation.
Read more about Ritesh Modi

Other recommended products

Related to this chapter

Azure for Architects

Azure cloud services have risen rapidly and there is also a gradual increase in the number of organizations that adopt Azure for their cloud services. This 3rd edition will assist readers to create a comprehensive Azure cloud solution that is Enterprise-class and ready for the future.

BookJul 2020698 pages

Azure Resource Manager Templates Quick Start Guide

Azure Resource Manager (ARM) templates are declarations of Azure resources in the JSON format to provision and maintain them using infrastructure as code. This book gives practical solutions and examples for provisioning and managing various Azure services using ARM templates.

BookFeb 2019234 pages

DevOps with Windows Server 2016

BookMar 2017558 pages

Azure PowerShell Quick Start Guide

As an IT professional, it is important to keep up with cloud technologies and learn to manage those technologies. PowerShell is a critical tool that must be learned in order to effectively and more easily manage many Azure resources. This book is designed to teach you to leverage PowerShell to perform many day-to-day tasks in Microsoft Azure.

BookDec 2018118 pages

Azure Networking Cookbook

Azure offers an extensive stack of network services that we can use, from simple Virtual Network and Subnets, over Network Security Groups and Gateways, to Load Balancers and many other services. In this book, we’ll offer the best Azure networking recipes to help you quickly create network resources and use them to your advantage.

BookMar 2019234 pages

Azure IoT Development Cookbook

It is important to develop robust and reliable solutions for your organization to leverage IoT services. Microsoft’s end-to-end IoT platform is the most complete IoT offering. It empowers enterprises to build and realize value from their IoT solutions for innovative business outcomes with Microsoft Azure IoT. This book will teach you how to connect multiple devices to the Azure IoT hub, develop, manage the IoT hub service, and integrate the hub with the cloud.

BookAug 2017254 pages

Azure Data Engineering Cookbook

This book will help you design and implement modern ETL workflows along with data management, monitoring, and security aspects to meet the current organization's needs. You will use various services such as Azure Data Factory, Azure Databricks, Azure Stream Analytics, and Azure Data Explorer to design efficient data processing solutions.

BookApr 2021454 pages

Implementing Hybrid Cloud with Azure Arc

Azure Arc helps in simplifying the governance and monitoring of your infrastructure and gives you a single pane view across hybrid architectures deployed over multiple clouds or on-premises. If you are new to multi-cloud or cloud with Edge infrastructure, this book will provide a comprehensive introduction and get you up-to-speed in no time.

BookJul 2021242 pages

Learning Azure Functions

Learning Azure Functions covers fundamental concepts and terminology of Cloud computing, Cloud service models, Cloud Deployment Models, what Functions mean, Serverless Architectures, Anatomy and Structure of a Function App, Triggers, and Bindings. It describes in detail how Visual Studio Team Service and Azure Functions can be utilized for Continuous Integration and Continuous Delivery based on successful execution.

BookSep 2017240 pages

Azure Serverless Computing Cookbook

Microsoft provides a solution to easily run small segment of code in the Cloud with Azure Functions. Azure Functions provides solutions for processing data, integrating systems, and building simple APIs and microservices. This recipe-based guide will help you gain all the skills required to work with serverless code architecture and providing continuous delivery.

BookNov 2018424 pages

Implementing Microsoft Azure Infrastructure Solutions: Exam Guide 70-533

This book focuses on skills and knowledge for provisioning and managing services in Microsoft Azure. This includes implementing infrastructure components such as virtual networks, virtual machines, containers, web and mobile apps, storage, planning and managing Azure AD and configuring Azure AD integration with on-premises Active Directory domains.

BookAug 2018516 pages

Architecting Microsoft Azure Solutions - Exam Guide 70-535

This study guide includes all the topics that are still relevant from the previous 70-534 exam, updated with the latest features like Artificial Intelligence, IoT, and architecture styles. This guide will help Azure Architects, Developers or anyone interested in designing and implementing effective Cloud architecture strategies.

BookApr 2018418 pages

Personalised recommendations for you

Based on your interests and search pattern

Designing and Implementing Microsoft Azure Networking Solutions

Designing and Implementing Microsoft Azure Networking Solutions Exam Ref AZ-700 is an all-encompassing guide to the AZ-700 exam and contains all the information you need to succeed in the world of virtual networking with Azure. With this book, you will be fully prepared for the exam and the world of cloud networking.

BookAug 2023524 pages

Microsoft 365 Security, Compliance, and Identity Administration

The Microsoft 365 Security, Compliance, and Identity Administration is a comprehensive guide that helps you employ Microsoft 365's robust suite of features and empowers you to optimize your administrative tasks.

BookAug 2023630 pages

Zero Trust Overview and Playbook Introduction

Get started on Zero Trust with this step-by-step playbook and learn everything you need to know for a successful Zero Trust journey with tailored guidance for every role, covering strategy, operations, architecture, implementation, and measuring success. This book will become an indispensable reference for everyone in your organization.

BookOct 2023240 pages

The Self-Taught Cloud Computing Engineer

This self-study book helps you master multiple clouds, including AWS, Azure, and GCP, and serves as a roadmap to becoming a certified cloud computing expert. The book will guide you to develop a professional cloud career by helping you build a broad cloud knowledge base, developing hands-on cloud computing skills, and getting cloud certified.

BookSep 2023472 pages

Technology Operating Models for Cloud and Edge

This book will help you build and create ownership of a technology operating model, as well as connect your leadership with engineering and operations, keeping your internal and external customers in mind. It provides practical tips on why, where, and how to make the cloud and edge platform paradigm sing for you, your team, and your organization.

BookAug 2023228 pages

Azure Architecture Explained

Azure is the preferred platform to build mission-critical and secure apps. This book provides comprehensive coverage of essential Azure products, services, and solutions vital for every solution architect's success. Elevate your knowledge and master the critical components of Azure to excel in your role with Azure Architecture Explained.

BookSep 2023446 pages

Pentesting Active Directory and Windows-based Infrastructure

This practical guide helps you explore the pentesting of Microsoft infrastructure in detail, and enhances your offensive skillset by showing you the different ways to perform security assessment. This book will help blue teamers and IT engineers get up to speed with possible security issues they may encounter in their Windows environments.

BookNov 2023360 pages

Practical Ansible

In Practical Ansible, you'll work with the latest release of Ansible and learn to solve complex issues quickly with the help of task-oriented scenarios. You'll start by installing and configuring Ansible to automate monotonous and repetitive IT tasks and get to grips with concepts such as playbooks, inventories, plugins, collections, and network modules.

BookSep 2023420 pages

Windows 11 for Enterprise Administrators

Microsoft’s launch of Windows 11 is a step toward satisfying the enterprise administrator’s needs for better management and enhanced user experience customization. This book provides the enterprise administrator with the knowledge needed to fully utilize the advanced feature set of Windows 11 Enterprise.

BookOct 2023286 pages

The Linux DevOps Handbook

This book is for software and IT professionals seeking knowledge on Linux systems and DevOps practices. This book will provide you with guidance and tools to learn and gain proficiency in managing Linux-based infrastructures and knowledge of DevOps.

BookNov 2023428 pages2