Logical statement processing flow
When writing T-SQL, it is important to be familiar with the order in which the SQL Database Engine interprets queries, to later create an execution plan. This helps anticipate possible performance issues arising from poorly written queries, as well as helping you understand cases of unintended results. The following steps outline a summarized view of the method that the Database Engine follows to process a T-SQL statement:
- Process all the source and target objects stated in the
FROMclause (tables, views, and TVFs), together with the intended logical operation (JOINandAPPLY) to perform on those objects. - Apply whatever pre-filters are defined in the
WHEREclause to reduce the number of incoming rows from those objects. - Apply any aggregation defined in the
GROUP BYor aggregate functions (for example, aMINorMAXfunction). - Apply filters that can only be applied on the aggregations as defined in the
HAVINGclause. - Compute the logic for windowing functions such as
ROW_NUMBER,RANK,NTILE,LAG, andLEAD. - Keep only the required columns for the output as specified in the
SELECTclause, and if aUNIONclause is present, combine the row sets. - Remove duplicates from the row set if a
DISTINCTclause exists. - Order the resulting row set as specified by the
ORDERBYclause. - Account for any limits stated in the
TOPclause.
It becomes clearer now that properly defining how tables are joined (the logical join type) is important to any scalable T-SQL query, namely by carefully planning on which columns the tables are joined. For example, in an inner join, these join arguments are the first level of data filtering that can be enforced, because only the rows that represent the intersection of two tables are eligible for subsequent operations.
Then it also makes sense to filter out rows from the result set using a WHERE clause, rather than applying any post-filtering conditions that apply to sub-groupings using a HAVING clause. Consider these two example queries:
SELECT p.ProductNumber, AVG(sod.UnitPrice) FROM Production.Product AS p INNER JOIN Sales.SalesOrderDetail AS sod ON p.ProductID = sod.ProductID GROUP BY p.ProductNumber HAVING p.ProductNumber LIKE 'L%'; SELECT p.ProductNumber, AVG(sod.UnitPrice) FROM Production.Product AS p INNER JOIN Sales.SalesOrderDetail AS sod ON p.ProductID = sod.ProductID WHERE p.ProductNumber LIKE 'L%' GROUP BY p.ProductNumber;
While these two queries are logically equivalent, the second one is more efficient because the rows that do not have a ProductNumber starting with L will be filtered out of the results before the aggregation is calculated. This is because the SQL Database Engine evaluates a WHERE clause before a HAVING clause and can limit the row count earlier in the execution phase, translating into reduced I/O and memory requirements, and also reduced CPU usage when applying the post-filter to the group.
The following diagram summarizes the logical statement-processing flow for the building blocks discussed previously in this chapter:
Figure 1.1: Flow chart summarizing the logical statement-processing flow of a query
Now that we understand the order in which the SQL Database Engine processes queries, let’s explore the essentials of query compilation.