Reader small image

You're reading from  SQL Server Query Tuning and Optimization

Product typeBook
Published inAug 2022
PublisherPackt
ISBN-139781803242620
Edition1st Edition
Concepts
Right arrow
Author (1)
Benjamin Nevarez
Benjamin Nevarez
author image
Benjamin Nevarez

Benjamin Nevarez is a database professional based in Los Angeles, California who specializes in SQL Server query tuning and optimization. He is the author of several books including “High Performance SQL Server'', “SQL Server 2017 on Linux”, “SQL Server 2014 Query Tuning & Optimization” and “Inside the SQL Server Query Optimizer” and has also co-authored other books such as “SQL Server 2012 Internals”. Benjamin has also been a speaker at many SQL Server conferences around the world
Read more about Benjamin Nevarez

Right arrow

Chapter 3: The Query Optimizer

In this chapter, we will cover how the SQL Server Query Optimizer works and introduce the steps it performs in the background. This covers everything, from the time a query is submitted to SQL Server until an execution plan is generated and is ready to be executed. This includes steps such as parsing, binding, simplification, trivial plan optimization, and full optimization. Important components and mechanisms that are part of the Query Optimizer architecture, such as transformation rules and the Memo structure, are also introduced.

The purpose of the Query Optimizer is to provide an optimum execution plan, or at least a good enough execution plan, and to do so, it generates many possible alternatives through the use of transformation rules. These alternative plans are stored for the duration of the optimization process in a structure called the Memo. Unfortunately, a drawback of cost-based optimization is the cost of optimization itself. Given that...

Query optimization research

Query optimization research dates back to the early 1970s. One of the earliest works describing a cost-based query optimizer was Access Path Selection in a Relational Database Management System, published in 1979 by Pat Selinger et al, to describe the query optimizer for an experimental database management system developed in 1975 at what is now the IBM Almaden Research Center. This database management system, called System R, advanced the field of query optimization by introducing the use of cost-based query optimization, the use of statistics, an efficient method of determining join orders, and the addition of CPU cost to the optimizer’s cost estimation formulae.

Yet, despite being an enormous influence in the field of query optimization research, it suffered a major drawback: its framework could not be easily extended to include additional transformations. This led to the development of more extensible optimization architectures that facilitated...

Introduction to query processing

The query optimization and execution process was introduced in Chapter 1, An Introduction to Query Tuning and Optimization, and will be explained in more detail throughout this chapter. However, before we get started, we’ll briefly explore the inner workings of the query optimization process, which extends both before and after the Query Optimizer itself. The diagram in Figure 3.1 shows the major phases of the query processing process, and each phase will be explained in more detail in the remaining sections of this chapter.

Figure 3.1 – The query-processing process

Parsing and binding are the first operations performed when a query is submitted to a SQL Server database engine. Parsing and binding produce a tree representation of the query, which is then sent to the Query Optimizer to perform the optimization process. At the beginning of this optimization process, this logical tree will be simplified, and the Query...

The sys.dm_exec_query_optimizer_info DMV

You can use the sys.dm_exec_query_optimizer_info DMV to gain additional insight into the work being performed by the Query Optimizer. This DMV, which is only partially documented, provides information regarding the optimizations performed on the SQL Server instance. Although this DMV contains cumulative statistics recorded since the given SQL Server instance was started, it can also be used to get optimization information for a specific query or workload, as we’ll see in a moment.

As mentioned, you can use this DMV to obtain statistics regarding the operation of the Query Optimizer, such as how queries have been optimized and how many of them have been optimized since the instance started. This DMV returns three columns:

  • Counter: The name of the optimizer event
  • Occurrence: The number of occurrences of the optimization event for this counter
  • Value: The average value per event occurrence

Thirty-eight counters were...

Parsing and binding

Parsing and binding are the first operations that SQL Server executes when you submit a query to the database engine and are performed by a component called the Algebrizer. Parsing first makes sure that the Transact-SQL (T-SQL) query has a valid syntax and then uses the query information to build a tree of relational operators. By that, I mean the parser translates the SQL query into an algebra tree representation of logical operators, which is called a parse tree. Parsing only checks for valid T-SQL syntax, not for valid table or column names, which are verified in the next phase: binding.

Parsing is similar to the Parse functionality available in Management Studio (by clicking the Parse button on the default toolbar) or the SET PARSEONLY statement. For example, the following query will successfully parse on the AdventureWorks2019 database, even when the listed columns and table do not exist in the said database:

SELECT lname, fname FROM authors

However...

Simplification

Query rewrites or, more exactly, tree rewrites are performed at this stage to reduce the query tree into a simpler form to make the optimization process easier. Some of these simplifications include the following:

  • Subqueries are converted into joins, but because a subquery does not always translate directly to an inner join, outer join and group-by operations may be added as necessary.
  • Redundant inner and outer joins may be removed. A typical example is the Foreign Key Join elimination, which occurs when the SQL Server can detect that some joins may not be needed, as foreign key constraints are available and only columns of the referencing table are requested. An example of Foreign Key Join elimination is shown later.
  • Filters in WHERE clauses are pushed down in the query tree to enable early data filtering as well as potentially better matching of indexes and computed columns later in the optimization process (this simplification is known as predicate...

Trivial plan optimization

The optimization process may be expensive to initialize and run for simple queries that don’t require any cost estimation. To avoid this expensive operation for simple queries, SQL Server uses trivial plan optimization. In short, if there’s only one way, or one obvious best way, to execute the query, depending on the query definition and available metadata, a lot of work can be avoided. For example, the following AdventureWorks2019 query will produce a trivial plan:

SELECT * FROM Sales.SalesOrderDetail
WHERE SalesOrderID = 43659

The execution plan will show whether a trivial plan optimization was performed; the Optimization Level entry in the Properties window of a graphical plan will show TRIVIAL. In the same way, an XML plan will show the StatementOptmLevel attribute as TRIVIAL, as you can see in the next XML fragment:

<StmtSimple StatementCompId=”1” StatementEstRows=”12” StatementId=”1”...

Joins

Join ordering is one of the most complex problems in query optimization and one that has been the subject of extensive research since the 1970s. It refers to the process of calculating the optimal join order (that is, the order in which the necessary tables are joined) when executing a query. Because the order of joins is a key factor in controlling the amount of data flowing between each operator in an execution plan, it’s a factor to which the Query Optimizer needs to pay close attention. As suggested earlier, join order is directly related to the size of the search space because the number of possible plans for a query grows very rapidly, depending on the number of tables joined.

A join operation combines records from two tables based on some common information, and the predicate that defines which columns are used to join the tables is called a join predicate. A join works with only two tables at a time, so a query requesting data from n tables must be executed...

Transformation rules

As we’ve seen earlier, the SQL Server Query Optimizer uses transformation rules to explore the search space—that is, to explore the set of possible execution plans for a specific query. Transformation rules are based on relational algebra, taking a relational operator tree and generating equivalent alternatives, in the form of equivalent relational operator trees. At the most fundamental level, a query consists of logical expressions, and applying these transformation rules will generate equivalent logical and physical alternatives, which are stored in memory (in a structure called the Memo) for the entire duration of the optimization process. As explained in this chapter, the Query Optimizer uses up to three optimization stages, and different transformation rules are applied in each stage.

Each transformation rule has a pattern and a substitute. The pattern is the expression to be analyzed and matched, and the substitute is the equivalent expression...

The Memo

The Memo structure was originally defined in The Volcano Optimizer Generator by Goetz Graefe and William McKenna in 1993. In the same way, the SQL Server Query Optimizer is based on the Cascades Framework, which was, in fact, a descendent of the Volcano Optimizer.

The Memo is a search data structure used to store the alternatives generated and analyzed by the Query Optimizer. These alternatives can be logical or physical operators and are organized into groups of equivalent alternatives, such that each alternative in the same group produces the same results. Alternatives in the same group also share the same logical properties, and in the same way that operators can reference other operators on a relational tree, groups can also reference other groups in the Memo structure.

A new Memo structure is created for each optimization. The Query Optimizer first copies the original query tree’s logical expressions into the Memo structure, placing each operator from the...

Statistics

To estimate the cost of an execution plan, the query optimizer needs to know, as precisely as possible, the number of records returned by a given query, and to help with this cardinality estimation, SQL Server uses and maintains optimizer statistics. Statistics contain information describing the distribution of values in one or more columns of a table, and will be explained in greater detail in Chapter 6, Understanding Statistics.

As covered in Chapter 1, An Introduction to Query Tuning and Optimization, you can use the OptimizerStatsUsage plan property to identify the statistics used by the Query Optimizer to produce an execution plan. The OptimizerStatsUsage property is available only starting with SQL Server 2017 and SQL Server 2016 SP2. Alternatively, you can use the undocumented 9292 and 9204 trace flags to show similar information about the statistics loaded during the optimization process. Although this method works with all the supported and some older versions...

Full optimization

As shown in the processing steps in Figure 3.15, if a query does not qualify for a trivial plan, SQL Server will run the cost-based optimization process, which uses transformation rules to generate alternative plans, stores these alternatives in the Memo structure, and uses cost estimation to select the best plan. This optimization process can be executed in up to three stages or phases, with different transformation rules being applied at each stage.

Because some queries may have a huge number of possible query plans, it’s sometimes not feasible to explore their entire search space—query optimization would take too long. So, in addition to applying transformation rules, several heuristics are used by the Query Optimizer to control the search strategy and to limit the number of alternatives generated to quickly find a good plan. The Query Optimizer needs to balance the optimization time and the quality of the selected plan. For example, as explained...

Summary

This chapter showed how the Query Optimizer works, explaining how your query goes from a SQL statement submitted to SQL Server, all the way to the selected execution plan, including operations such as parsing, binding, simplification, trivial plan, and the full optimization stages. Important concepts that are part of the Query Optimizer architecture, such as transformation rules and the Memo structure, were also introduced.

The Query Optimizer generates a solution space and selects the best possible execution plan from it based on the plan cost. Transformation rules are used to generate these alternatives, which are stored in a memory structure called the Memo. Instead of exploring the search space exhaustively, heuristics are introduced to limit the number of possible solutions. Finally, the alternatives stored in the Memo are costed, and the best solution is returned for execution.

This chapter also showed that understanding how the Query Optimizer works can give you...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
SQL Server Query Tuning and Optimization
Published in: Aug 2022Publisher: PacktISBN-13: 9781803242620
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Benjamin Nevarez

Benjamin Nevarez is a database professional based in Los Angeles, California who specializes in SQL Server query tuning and optimization. He is the author of several books including “High Performance SQL Server'', “SQL Server 2017 on Linux”, “SQL Server 2014 Query Tuning & Optimization” and “Inside the SQL Server Query Optimizer” and has also co-authored other books such as “SQL Server 2012 Internals”. Benjamin has also been a speaker at many SQL Server conferences around the world
Read more about Benjamin Nevarez