You're reading from Exam Ref AZ-304 Microsoft Azure Architect Design Certification and Beyond

Product type Book

Published in Jul 2021

Publisher Packt

ISBN-13 9781800566934

Pages 520 pages

Edition 1st Edition

Languages

Concepts

Cloud Computing

Author (1):

Brett Hargreaves

Table of Contents (30) Chapters

Preface

1. Section 1: Exploring Modern Architecture

2. Chapter 1: Architecture for the Cloud

3. Chapter 2: Principles of Modern Architecture

4. Section 2: Identity and Security

5. Chapter 3: Understanding User Authentication

6. Chapter 4: Managing User Authorization

7. Chapter 5: Ensuring Platform Governance

8. Chapter 6: Building Application Security

9. Section 3: Infrastructure and Storage Components

10. Chapter 7: Designing Compute Solutions

11. Chapter 8: Network Connectivity and Security

12. Chapter 9: Exploring Storage Solutions

13. Chapter 10: Migrating Workloads to Azure

14. Section 4: Applications and Databases

15. Chapter 11: Comparing Application Components

16. Chapter 12: Creating Scalable and Secure Databases

17. Chapter 13: Options for Data Integration

18. Chapter 14: High Availability and Redundancy Concepts

19. Section 5: Operations and Monitoring

20. Chapter 15: Designing for Logging and Monitoring

21. Chapter 16: Developing Business Continuity

22. Chapter 17: Scripted Deployments and DevOps Automation

23. Section 6: Beyond the Exam

24. Chapter 18: Engaging with Real-World Customers

25. Chapter 19: Enterprise Design Considerations

26. Mock Exam

27. Mock Answers

28. Assessments

29. Other Books You May Enjoy

Chapter 13: Options for Data Integration

In the previous chapter, we looked at how to architect database solutions that are scalable and secure. This chapter will look at several options available for architects when designing solutions that must work with large datasets for analysis and reporting.

Big data is an industry term for working with terabytes (TB), or even petabytes (PB) of data, to create analytical dashboards and gain insights. Specialist tools are often required to perform this kind of processing, and it would be expensive to build them in your own data center.

Azure provides some of the world's most popular data tools for loading, transforming, and analyzing data. We will examine what a data pipeline looks like and then delve deeper into some of those tools.

Specifically, this chapter will cover the following topics:

Understanding data flows
Comparing integration tools
Exploring data analytics

Technical requirements

This chapter will use the Azure portal (https://portal.azure.com) for examples.

Understanding data flows

Many organizations gather massive amounts of data and continue to amass data in many different forms from various systems. This data can be used to bring great value to a company.

One example may be an e-commerce company that collects sales and marketing data from its day-to-day operations. By analyzing the data, customer patterns could be ascertained, as well as the relative success of different advertising campaigns. This information could then be used to develop the company website to create a better customer journey or to identify the strongest performing marketing activities so that these can be honed while less effective ones are dropped.

Scientific organizations also make use of data to create better treatments, drugs, and methodologies.

Manufacturers can use data from internet of things (IoT) devices and sensors to optimize supply chains, increase operational efficiencies, or identify risks in products or processes.

Data sources include sales...

Comparing integration tools

One of the greatest benefits of using cloud services such as Azure is that it gives you the ability to create the necessary resources required without needing to invest large amounts of capital. The tools you can choose from cover end-to-end processes and are scaled in and out as needed.

One of the first decisions you may need to consider is where to initially store raw data. Except in the case of streaming analytics, whereby you continually ingest data from a source such as an IoT device (for example, a temperature sensor), you need a place to store and retrieve your data files from.

Azure storage accounts provide storage capabilities in the form of file storage or Blob storage; however, a specific type of account called an Azure Data Lake Storage Gen2 (ADLS Gen2) account might be better suited to data analytics.

ADLS Gen2

ADLS is an optional configuration feature of a standard storage account. One of the key differences is that it supports filesystem...

Exploring data analytics

Once data has been ingested, transformed, and aggregated, the next step will be to analyze and explore it. There are many tools available on the market to achieve this, and one of the most popular is Databricks.

Databricks uses the Apache Spark engine that is well suited to dealing with massive amounts of data due to its internal architecture. Whereas a traditional database server would typically run workloads, Databricks uses Spark clusters built from multiple nodes. Data analytics processes are then distributed between those nodes to process them in parallel, as shown in the following diagram:

Figure 13.6 – Example Spark cluster architecture

Azure Databricks is a managed Databricks service that provides excellent flexibility for creating and using Spark clusters as and when needed.

Azure Databricks

Azure Databricks provides workspaces that multiple users can use to build and run analytics jobs collaboratively. A Databricks workspace contains...

Summary

This chapter looked at a growing capability in the cloud, especially in Azure—data integration and analytics.

Azure provides a range of tools for creating end-to-end data pipelines for storing, ingesting, transforming, aggregating, and analyzing data. So, we started the chapter with a high-level view of what a typical pipeline might look like.

We looked at how to configure Azure Storage to use ADLS Gen2, what extra capabilities this gives you, and how Azure Data Factory can create automated and secure pipelines for data loading and transformation.

Finally, we looked at the two primary tools for exploring and analyzing data with Azure: Azure Databricks and Azure Synapse Analytics.

After reading this chapter, you should have a better understanding of the different components that comprise a data analytics solution, including the strengths of each service and where one might be a better choice over another.

In the next chapter, we conclude Part 4, Applications...

Exam scenario

MegaCorp Inc. is building a new data analytics capability to help understand its marketing campaigns' effectiveness and how they relate to product sales.

Marketing campaign data is exported daily and stored as flat CSV files. Sales data is exported overnight from the sales database into a normalized data warehouse database.

The management team would like data to be automatically imported and aggregated, and then modeled. It is expected that large amounts of data will be processed, and this needs to be performed relatively quickly. The data analytics teams are seasoned developers who are currently using the latest version of Spark.

Design an end-to-end solution that can accommodate the management team's requirements.