You're reading from Azure Databricks Cookbook Accelerate and scale real-time analytics solutions using the Apache Spark-based analytics service

Product type Paperback

Published in Sep 2021

Last Updated in Feb 2025

Publisher Packt

ISBN-13 9781789809718

Length 452 pages

Edition 1st Edition

Languages

SQL

Tools

Azure

Concepts

Data Streaming

Authors (2):

Raj

Jaiswal

View More author details

Table of Contents (12) Chapters

Preface

1. Chapter 1: Creating an Azure Databricks Service

2. Chapter 2: Reading and Writing Data from and to Various Azure Services and File Formats FREE CHAPTER

3. Chapter 3: Understanding Spark Query Execution

4. Chapter 4: Working with Streaming Data

5. Chapter 5: Integrating with Azure Key Vault, App Configuration, and Log Analytics

6. Chapter 6: Exploring Delta Lake in Azure Databricks

7. Chapter 7: Implementing Near-Real-Time Analytics and Building a Modern Data Warehouse

8. Chapter 8: Databricks SQL

9. Chapter 9: DevOps Integrations and Implementing CI/CD for Azure Databricks

10. Chapter 10: Understanding Security and Monitoring in Azure Databricks

11. Other Books You May Enjoy

How joins work in Spark

In this recipe, you will learn how query joins are executed by the Spark optimizer using different types of sorting algorithms such as SortMerge and BroadcastHash joins. You will learn how to identify which algorithm has been used by looking at the DAG that Spark generates. You will also learn how to use the hints that are provided in the queries to influence the optimizer to use a specific join algorithm.

Getting ready

To follow along with this recipe, run the cells in the 3-5.Joins notebook, which you can find in your local cloned repository, in the Chapter03 folder (https://github.com/PacktPublishing/Azure-Databricks-Cookbook/tree/main/Chapter03).

Upload the csvFiles folders, which can be found in the Common/Customer and Common/Orders folders in your local cloned repository, to the ADLS Gen-2 account in the rawdata filesystem. You will need to create two folders called Customer and Orders in the rawdata filesystem:

Figure...