Reader small image

You're reading from  Learn T-SQL Querying - Second Edition

Product typeBook
Published inFeb 2024
PublisherPackt
ISBN-139781837638994
Edition2nd Edition
Right arrow
Authors (2):
Pedro Lopes
Pedro Lopes
author image
Pedro Lopes

Pedro Lopes is a Program Manager in the Database Systems group, based in Redmond, WA, USA. He has over 19 years of industry experience and has been with Microsoft for 9 years. He is currently responsible for program management of Database Engine features for in-market and vNext versions of SQL Server, with a special focus on the Relational Engine. He has extensive experience with query performance troubleshooting and is a regular speaker at numerous conferences such as SQLBits, PASS Summit, SQLIntersection, Microsoft Ignite, and Microsoft Build. He blogs about SQL on the SQL Server Team blog. He has authored several tools in the Tiger toolbox on GitHub: AdaptiveIndexDefrag maintenance solution, BPCheck, and usp_WhatsUp.
Read more about Pedro Lopes

Pam Lahoud
Pam Lahoud
author image
Pam Lahoud

Pam Lahoud is a Program Manager in the Database Systems group, based in Redmond, WA, USA. She has been with Microsoft for 13 years and is currently responsible for program management of Database Engine features for in-market and vNext versions of SQL Server, with a special focus on the Storage Engine area. She is passionate about SQL Server performance and has focused on performance tuning and optimization, particularly from the developer's perspective, throughout her career. She is a SQL Server 2008 Microsoft Certified Master (MCM) with over 20 years of experience working with SQL Server.
Read more about Pam Lahoud

View More author details
Right arrow

Mechanics of the Query Optimizer

The next step in our journey to writing efficient T-SQL queries is understanding how the SQL Database Engine optimizes a query by exploring T-SQL query optimization internals and architecture, starting with the infamous cardinality estimation process and its building blocks. From there, understand how the Query Optimizer uses that information to produce a just-in-time good-enough execution plan. This chapter will be referenced throughout all chapters, as we bridge architectural topics to real-world uses.

In this chapter, we’re going to cover the following main topics:

  • Introducing the Cardinality Estimator (CE)
  • Understanding the query optimization workflow

Technical requirements

The examples that will be used in this chapter are designed for use on SQL Server 2022 and Azure SQL Database, but they should work on any version of SQL Server, 2012 or later. The Developer Edition of SQL Server is free for development environments and can be used to run all the code samples. There is also a free tier of Azure SQL Database you can use for testing at https://aka.ms/freedb.

You will need the sample AdventureWorks2016_EXT database (referred to as AdventureWorks), which can be found on GitHub at https://github.com/Microsoft/sql-server-samples/releases/tag/adventureworks.

The code samples for this chapter can also be found on GitHub at https://github.com/PacktPublishing/Learn-T-SQL-Querying-Second-Edition/tree/main/ch2.

Introducing the Cardinality Estimator

Before we get started, it’s important to have a common frame of reference for a few terms that will be referenced throughout this book:

  • Cardinality: Cardinality in a database is defined as the number of records, also called tuples, in each table or view.
  • Density: This term represents the average number of duplicate values in each column or column set – in other words, the average distribution of unique values in the data. It’s defined as 1 divided by the number of distinct values.
  • Frequency: This term represents the average number of occurrences of a given value in a column or column set. It’s defined as the number of rows times the density.
  • Selectivity: This term represents the fraction of the row count that satisfies a given predicate, between zero and one. This is calculated as the predicate cardinality (Pc) divided by the table cardinality (Tc) multiplied by 100: (Pc ÷ Tc) × 100. As...

Understanding the query optimization workflow

Now, it’s time to take a deeper look at how the SQL Database Engine creates optimized query execution plans. As referenced in Chapter 1, Understanding Query Processing, this is the second phase of query processing and for the most part, only Data Manipulation Language (DML) statements undergo query optimization. The query optimization process is defined by the following cumulative stages:

  • Trivial Plan
  • Exploration, which, in turn, includes three phases:
    • Transaction Processing
    • Quick Plan
    • Full Optimization

In the Exploration stage, what differentiates between the several phases is the increasing sets of rules that apply to each one as the search for a good-enough query plan progresses. Users can learn about the optimization level of a given query execution plan by looking at the properties of that plan. The following sections include sample execution plans to illustrate the concepts covered here. Query execution plans...

Knobs for query optimization

As advanced as the query optimization process is, inefficient plans are still a possibility, which is why a database developer can use hints in the T-SQL statement and guide the Query Optimizer toward producing an intended plan. There are several classes of thoroughly documented query hints that affect query optimization, and it is important to call out a few that can be useful when troubleshooting a query performance issue, some of which we will use in upcoming chapters.

Note

Keep in mind that hints force certain behaviors with T-SQL statement optimization and execution. Microsoft recommends that hints are thoroughly tested and only used as a last resort. Hinted statements must be reviewed with every upgrade to a major version to determine if they are still needed since new versions may change behavior, rendering the hint unnecessary or even harmful.

Let’s look at some of the available hints for the Query Optimizer:

  • FORCE ORDER...

Summary

In this chapter, we explored the internals of the SQL Database Engine’s query optimization process and defined many important concepts that any database professional writing T-SQL queries will keep coming back to, especially when troubleshooting query performance issues. The CE is a fundamental part of the SQL Database Engine’s Query Optimizer: knowing how it uses statistics and the importance of keeping updated and relevant statistics for the overall query optimization process empowers database professionals to write good queries – queries that both drive and leverage good database schema designs. But also, understanding the main estimation model assumptions allows us to account for these when writing queries and avoid pitfalls that hurt query performance. We will see these pitfalls in much more detail in Chapter 5, Writing Elegant T-SQL Queries, and Chapter 6, Discovering T-SQL Anti-Patterns in Depth.

If, at the end of the optimization process, we...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learn T-SQL Querying - Second Edition
Published in: Feb 2024Publisher: PacktISBN-13: 9781837638994
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Pedro Lopes

Pedro Lopes is a Program Manager in the Database Systems group, based in Redmond, WA, USA. He has over 19 years of industry experience and has been with Microsoft for 9 years. He is currently responsible for program management of Database Engine features for in-market and vNext versions of SQL Server, with a special focus on the Relational Engine. He has extensive experience with query performance troubleshooting and is a regular speaker at numerous conferences such as SQLBits, PASS Summit, SQLIntersection, Microsoft Ignite, and Microsoft Build. He blogs about SQL on the SQL Server Team blog. He has authored several tools in the Tiger toolbox on GitHub: AdaptiveIndexDefrag maintenance solution, BPCheck, and usp_WhatsUp.
Read more about Pedro Lopes

author image
Pam Lahoud

Pam Lahoud is a Program Manager in the Database Systems group, based in Redmond, WA, USA. She has been with Microsoft for 13 years and is currently responsible for program management of Database Engine features for in-market and vNext versions of SQL Server, with a special focus on the Storage Engine area. She is passionate about SQL Server performance and has focused on performance tuning and optimization, particularly from the developer's perspective, throughout her career. She is a SQL Server 2008 Microsoft Certified Master (MCM) with over 20 years of experience working with SQL Server.
Read more about Pam Lahoud