Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Exam Ref AZ-304 Microsoft Azure Architect Design Certification and Beyond

You're reading from  Exam Ref AZ-304 Microsoft Azure Architect Design Certification and Beyond

Product type Book
Published in Jul 2021
Publisher Packt
ISBN-13 9781800566934
Pages 520 pages
Edition 1st Edition
Languages
Author (1):
Brett Hargreaves Brett Hargreaves
Profile icon Brett Hargreaves

Table of Contents (30) Chapters

Preface 1. Section 1: Exploring Modern Architecture
2. Chapter 1: Architecture for the Cloud 3. Chapter 2: Principles of Modern Architecture 4. Section 2: Identity and Security
5. Chapter 3: Understanding User Authentication 6. Chapter 4: Managing User Authorization 7. Chapter 5: Ensuring Platform Governance 8. Chapter 6: Building Application Security 9. Section 3: Infrastructure and Storage Components
10. Chapter 7: Designing Compute Solutions 11. Chapter 8: Network Connectivity and Security 12. Chapter 9: Exploring Storage Solutions 13. Chapter 10: Migrating Workloads to Azure 14. Section 4: Applications and Databases
15. Chapter 11: Comparing Application Components 16. Chapter 12: Creating Scalable and Secure Databases 17. Chapter 13: Options for Data Integration 18. Chapter 14: High Availability and Redundancy Concepts 19. Section 5: Operations and Monitoring
20. Chapter 15: Designing for Logging and Monitoring 21. Chapter 16: Developing Business Continuity 22. Chapter 17: Scripted Deployments and DevOps Automation 23. Section 6: Beyond the Exam
24. Chapter 18: Engaging with Real-World Customers 25. Chapter 19: Enterprise Design Considerations 26. Mock Exam
27. Mock Answers
28. Assessments 29. Other Books You May Enjoy

Chapter 13: Options for Data Integration

In the previous chapter, we looked at how to architect database solutions that are scalable and secure. This chapter will look at several options available for architects when designing solutions that must work with large datasets for analysis and reporting.

Big data is an industry term for working with terabytes (TB), or even petabytes (PB) of data, to create analytical dashboards and gain insights. Specialist tools are often required to perform this kind of processing, and it would be expensive to build them in your own data center.

Azure provides some of the world's most popular data tools for loading, transforming, and analyzing data. We will examine what a data pipeline looks like and then delve deeper into some of those tools.

Specifically, this chapter will cover the following topics:

  • Understanding data flows
  • Comparing integration tools
  • Exploring data analytics

Technical requirements

This chapter will use the Azure portal (https://portal.azure.com) for examples.

Understanding data flows

Many organizations gather massive amounts of data and continue to amass data in many different forms from various systems. This data can be used to bring great value to a company.

One example may be an e-commerce company that collects sales and marketing data from its day-to-day operations. By analyzing the data, customer patterns could be ascertained, as well as the relative success of different advertising campaigns. This information could then be used to develop the company website to create a better customer journey or to identify the strongest performing marketing activities so that these can be honed while less effective ones are dropped.

Scientific organizations also make use of data to create better treatments, drugs, and methodologies.

Manufacturers can use data from internet of things (IoT) devices and sensors to optimize supply chains, increase operational efficiencies, or identify risks in products or processes.

Data sources include sales...

Comparing integration tools

One of the greatest benefits of using cloud services such as Azure is that it gives you the ability to create the necessary resources required without needing to invest large amounts of capital. The tools you can choose from cover end-to-end processes and are scaled in and out as needed.

One of the first decisions you may need to consider is where to initially store raw data. Except in the case of streaming analytics, whereby you continually ingest data from a source such as an IoT device (for example, a temperature sensor), you need a place to store and retrieve your data files from.

Azure storage accounts provide storage capabilities in the form of file storage or Blob storage; however, a specific type of account called an Azure Data Lake Storage Gen2 (ADLS Gen2) account might be better suited to data analytics.

ADLS Gen2

ADLS is an optional configuration feature of a standard storage account. One of the key differences is that it supports filesystem...

Exploring data analytics

Once data has been ingested, transformed, and aggregated, the next step will be to analyze and explore it. There are many tools available on the market to achieve this, and one of the most popular is Databricks.

Databricks uses the Apache Spark engine that is well suited to dealing with massive amounts of data due to its internal architecture. Whereas a traditional database server would typically run workloads, Databricks uses Spark clusters built from multiple nodes. Data analytics processes are then distributed between those nodes to process them in parallel, as shown in the following diagram:

Figure 13.6 – Example Spark cluster architecture

Azure Databricks is a managed Databricks service that provides excellent flexibility for creating and using Spark clusters as and when needed.

Azure Databricks

Azure Databricks provides workspaces that multiple users can use to build and run analytics jobs collaboratively. A Databricks workspace contains...

Summary

This chapter looked at a growing capability in the cloud, especially in Azure—data integration and analytics.

Azure provides a range of tools for creating end-to-end data pipelines for storing, ingesting, transforming, aggregating, and analyzing data. So, we started the chapter with a high-level view of what a typical pipeline might look like.

We looked at how to configure Azure Storage to use ADLS Gen2, what extra capabilities this gives you, and how Azure Data Factory can create automated and secure pipelines for data loading and transformation.

Finally, we looked at the two primary tools for exploring and analyzing data with Azure: Azure Databricks and Azure Synapse Analytics.

After reading this chapter, you should have a better understanding of the different components that comprise a data analytics solution, including the strengths of each service and where one might be a better choice over another.

In the next chapter, we conclude Part 4, Applications...

Exam scenario

MegaCorp Inc. is building a new data analytics capability to help understand its marketing campaigns' effectiveness and how they relate to product sales.

Marketing campaign data is exported daily and stored as flat CSV files. Sales data is exported overnight from the sales database into a normalized data warehouse database.

The management team would like data to be automatically imported and aggregated, and then modeled. It is expected that large amounts of data will be processed, and this needs to be performed relatively quickly. The data analytics teams are seasoned developers who are currently using the latest version of Spark.

Design an end-to-end solution that can accommodate the management team's requirements.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Exam Ref AZ-304 Microsoft Azure Architect Design Certification and Beyond
Published in: Jul 2021 Publisher: Packt ISBN-13: 9781800566934
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}