Search icon
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Azure Data Engineer Associate Certification Guide

You're reading from  Azure Data Engineer Associate Certification Guide

Product type Book
Published in Feb 2022
Publisher Packt
ISBN-13 9781801816069
Pages 574 pages
Edition 1st Edition
Languages
Concepts
Author (1):
Newton Alex Newton Alex
Profile icon Newton Alex

Table of Contents (23) Chapters

Preface 1. Part 1: Azure Basics
2. Chapter 1: Introducing Azure Basics 3. Part 2: Data Storage
4. Chapter 2: Designing a Data Storage Structure 5. Chapter 3: Designing a Partition Strategy 6. Chapter 4: Designing the Serving Layer 7. Chapter 5: Implementing Physical Data Storage Structures 8. Chapter 6: Implementing Logical Data Structures 9. Chapter 7: Implementing the Serving Layer 10. Part 3: Design and Develop Data Processing (25-30%)
11. Chapter 8: Ingesting and Transforming Data 12. Chapter 9: Designing and Developing a Batch Processing Solution 13. Chapter 10: Designing and Developing a Stream Processing Solution 14. Chapter 11: Managing Batches and Pipelines 15. Part 4: Design and Implement Data Security (10-15%)
16. Chapter 12: Designing Security for Data Policies and Standards 17. Part 5: Monitor and Optimize Data Storage and Data Processing (10-15%)
18. Chapter 13: Monitoring Data Storage and Data Processing 19. Chapter 14: Optimizing and Troubleshooting Data Storage and Data Processing 20. Part 6: Practice Exercises
21. Chapter 15: Sample Questions with Solutions 22. Other Books You May Enjoy

What this book covers

The chapters in this book are designed around the skill sets listed by Microsoft for the coursework:

Exam DP-203: Data Engineering on Microsoft Azure – Skills Measured

https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE4MbYT

Chapter 1, Introducing Azure Basics, introduces the audience to Azure and explains its general capabilities. This is a refresher chapter designed to renew our understanding of some of the core Azure concepts, including VMs, data storage, compute options, the Azure portal, accounts, and subscriptions. We will be building on top of these technologies in future chapters.

Chapter 2, Designing a Data Storage Structure, focuses on the various storage solutions available in Azure. We will cover topics such as Azure Data Lake Storage, Blob storage, and SQL- and NoSQL-based storage. We will also get into the details of when to choose what storage and how to optimize this storage using techniques such as data pruning, data distribution, and data archiving.

Chapter 3, Designing a Partition Strategy, explores the different partition strategies available. We will focus on how to efficiently split and store the data for different types of workloads and will see some recommendations on when and how to partition the data for different use cases, including analytics and batch processing.

Chapter 4, Designing the Serving Layer, is dedicated to the design of the different types of schemas, such as the Star and Snowflake schemas. We will focus on designing slowly-changing dimensions, building a dimensional hierarchy, temporal solutions, and other such advanced topics. We will also focus on sharing data between the different compute technologies, including Azure Databricks and Azure Synapse, using metastores.

Chapter 5, Implementing Physical Data Storage Structures, focuses on the implementation of lower-level aspects of data storage, including compression, sharding, data distribution, indexing, data redundancy, archiving, storage tiers, and replication, with the help of examples.

Chapter 6, Implementing Logical Data Structures, focuses on the implementation of temporal data structures and slowly-changing dimensions using Azure Data Factory (ADF), building folder structures for analytics, as well as streaming and other data to improve query performance and to assist with data pruning.

Chapter 7, Implementing the Serving Layer, focuses on implementing a relational star schema, storing files in different formats, such as Parquet and ORC, and building and using a metastore between Synapse and Azure Databricks.

Chapter 8, Ingesting and Transforming Data, introduces the various Azure data processing technologies, including Synapse Analytics, ADF, Azure Databricks, and Stream Analytics. We will focus on the various data transformations that can be performed using T-SQL, Spark, and ADF. We will also look into aspects of data pipelines, such as cleansing the data, parsing data, encoding and decoding data, normalizing and denormalizing values, error handling, and basic data exploration techniques.

Chapter 9, Designing and Developing a Batch Processing Solution, focuses on building an end-to-end batch processing system. We will cover techniques for handling incremental data, slowly-changing dimensions, missing data, late-arriving data, duplicate data, and more. We will also cover security and compliance aspects, along with techniques to debug issues in data pipelines.

Chapter 10, Designing and Developing a Stream Processing Solution, is dedicated to stream processing. We will build end-to-end streaming systems using Stream Analytics, Event Hubs, and Azure Databricks. We will explore the various windowed aggregation options available and learn how to handle schema drifts, along with time series data, partitions, checkpointing, replaying data, and so on. We will also cover techniques to handle interruptions, scale the resources, error handling, and so on.

Chapter 11, Managing Batches and Pipelines, is dedicated to managing and debugging the batch and streaming pipelines. We will look into the techniques to configure and trigger jobs, and to debug failed jobs. We will dive deeper into the features available in the data factory and Synapse pipelines to schedule the pipelines. We will also look at implementing version control in ADF.

Chapter 12, Designing Security for Data Policies and Standards, focuses on how to design and implement data encryption, both at rest and in transit, data auditing, data masking, data retention, data purging, and so on. In addition, we will also learn about the RBAC features of ADLS Gen2 storage and explore the row- and column-level security in Azure SQL and Synapse Analytics. We will deep dive into techniques for handling managed identities, keys, secrets, resource tokens, and so on and learn how to handle sensitive information.

Chapter 13, Monitoring Data Storage and Data Processing, focuses on logging, configuring monitoring services, measuring performance, integrating with CI/CD systems, custom logging and monitoring options, querying using Kusto, and finally, tips on debugging Spark jobs.

Chapter 14, Optimizing and Troubleshooting Data Storage and Data Processing, focuses on tuning and debugging Spark or Synapse queries. We will dive deeper into query-level debugging, including how to handle shuffles, UDFs, data skews, indexing, and cache management. We will also spend some time troubleshooting Spark and Synapse pipelines.

Chapter 15, Sample Questions with Solutions, is where we put everything we have learned into practice. We will explore a bunch of real-world problems and learn how to use the information we learned in this book to answer the certification questions. This will help you prepare for both the exam and real-world problems.

Note

All the information provided in this book is based on public Azure documents. The author is neither associated with the Azure Certification team nor has access to any of the Azure Certification questions, other than what is publicly made available by Microsoft.

lock icon The rest of the chapter is locked
Next Chapter arrow right
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}