Reader small image

You're reading from  Learn T-SQL Querying - Second Edition

Product typeBook
Published inFeb 2024
PublisherPackt
ISBN-139781837638994
Edition2nd Edition
Right arrow
Authors (2):
Pedro Lopes
Pedro Lopes
author image
Pedro Lopes

Pedro Lopes is a Program Manager in the Database Systems group, based in Redmond, WA, USA. He has over 19 years of industry experience and has been with Microsoft for 9 years. He is currently responsible for program management of Database Engine features for in-market and vNext versions of SQL Server, with a special focus on the Relational Engine. He has extensive experience with query performance troubleshooting and is a regular speaker at numerous conferences such as SQLBits, PASS Summit, SQLIntersection, Microsoft Ignite, and Microsoft Build. He blogs about SQL on the SQL Server Team blog. He has authored several tools in the Tiger toolbox on GitHub: AdaptiveIndexDefrag maintenance solution, BPCheck, and usp_WhatsUp.
Read more about Pedro Lopes

Pam Lahoud
Pam Lahoud
author image
Pam Lahoud

Pam Lahoud is a Program Manager in the Database Systems group, based in Redmond, WA, USA. She has been with Microsoft for 13 years and is currently responsible for program management of Database Engine features for in-market and vNext versions of SQL Server, with a special focus on the Storage Engine area. She is passionate about SQL Server performance and has focused on performance tuning and optimization, particularly from the developer's perspective, throughout her career. She is a SQL Server 2008 Microsoft Certified Master (MCM) with over 20 years of experience working with SQL Server.
Read more about Pam Lahoud

View More author details
Right arrow

Writing Elegant T-SQL Queries

At this point, we should have a good understanding of how to build a T-SQL query, and the building blocks of writing T-SQL code such as query optimization fundamentals, reading and interpreting query plans, and some best practices around indexing and writing efficient T-SQL code. But how do we build an elegant T-SQL query? One that not only gets the job done but does so efficiently?

There are a few guidelines that are important to keep in mind when writing T-SQL queries to ensure that they perform and scale well while avoiding some common pitfalls that even experienced T-SQL developers can encounter that will make a query perform poorly.

In this chapter, we will examine some common T-SQL patterns and anti-patterns, specifically those that should be easily identified just by looking at the T-SQL code. We’re going to cover the following main topics:

  • Best practices for T-SQL querying
  • The perils of SELECT *
  • Functions in our predicate...

Technical requirements

The examples used in this chapter are designed for use on SQL Server 2022 and Azure SQL Database, but they should work on any version of SQL Server, 2012 or later. The "Developer Edition" of SQL Server is free for development environments and can be used to run all the code samples. There is also a free tier of Azure SQL Database you can use for testing at https://aka.ms/freedb.

You will need the sample databases AdventureWorks2016_EXT (referred to as AdventureWorks) and AdventureWorksDW2016_EXT (referred to as AdventureWorksDW), which can be found on GitHub at https://github.com/Microsoft/sql-server-samples/releases/tag/adventureworks. Code samples for this chapter can also be found on GitHub at https://github.com/PacktPublishing/Learn-T-SQL-Querying-Second-Edition/tree/main/ch5.

Best practices for T-SQL querying

There are a number of best practices for writing good T-SQL that don’t constitute a pattern or anti-pattern, which is something we will discuss next in this chapter, but are important enough to observe when we want to write good queries. This section covers those practices.

Referencing objects

Always reference objects by their two-part name (<schema>.<name>) in T-SQL code because not doing so has some performance implications.

Using two-part object names prevents name resolution delays during query compilation: if the default schema for a user connecting to the SQL Database Engine is HumanResources, and that user attempts to execute the stored procedure dbo.uspGetEmployeeManagers for which it also has permissions, but simply references uspGetEmployeeManagers, the SQL Database Engine first searches the HumanResources schema for that stored procedure before searching other schemas, thus delaying resolution and therefore execution...

The perils of SELECT *

SELECT * should be avoided in stored procedures, views, and Multi-Statement Table-Valued Functions (MSTVFs) because our T-SQL code might break if there are any changes to the underlying schema. For example, applications that reference SELECT * may rely on the ordinal position rather than column names and may encounter errors if the underlying table definition is changed. Instead, fully qualify the names of columns that are relevant to our result set.

This also has important performance implications. Some application patterns may rely on reading an entire dataset and applying filters in the client layer only. For example, imagine a web application where a sales supervisor can see a report of orders registered for a given month, with details per product. The application connects to the AdventureWorks sample database and runs a query:

Dim myConnection As New SqlConnection("Our Connection String")
Dim cmd As New SqlCommand
Dim reader As SqlDataReader...

Functions in our predicate

Search predicates should only use deterministic function calls. Calls to non-deterministic functions with columns for parameters cause the SQL Database Engine to be unable to reference the selectivity of those columns, as the result of the function is unknown at compile time. Because of this, they cause unnecessary scans.

Keep in mind what was discussed in previous chapters: that the Query Optimizer uses statistics and some internal transformation rules and heuristics at compile time to determine a good enough plan to execute a query; and how the WHERE clause is one of the first to be evaluated during logical query processing. The Query Optimizer depends on the estimated cost to resolve the search predicates to choose whether to do seeks or scans over indexes.

The following example shows a query executed in the AdventureWorks sample database that uses non-deterministic function calls in the search predicate:

SELECT SalesOrderID, OrderDate
FROM Sales...

Deconstructing table-valued functions

A User-Defined Function (UDF) is like a stored procedure in that it is a block of T-SQL statements saved as an object, but it differs in that it does not generate a result set; it returns a value of a specified type. A scalar UDF is a function that returns a single value; a Table-Valued Function (TVF) is a function that returns a table.

There are two types of TVFs in the SQL Database Engine:

  • Multi-statement TVFs (MSTVFs): MSTVFs declare a return table type, populate the table, then return the table at the end of the function
  • Inline TVFs: You can think of an inline TVF as a view that takes a parameter, the body of the function is a single query, and the return value is the result of that query

The following is an example of an MSTVF that we can create in the AdventureWorks sample database:

CREATE OR ALTER FUNCTION dbo.ufn_FindReports (@InEmpID INTEGER)
RETURNS @retFindReports TABLE
(
    EmployeeID int...

Complex expressions

Search predicates should not use complex expressions. Much like the deterministic function calls we discussed in the Functions in our predicate section, complex expressions can also cause unnecessary scans.

As was discussed in previous chapters, the Query Optimizer uses statistics, internal transformation rules, and heuristics at compile time to determine a good enough plan to execute a query. This includes the ability to fold expressions, which is the process of simplifying constant expressions at compile time. For example, a predicate such as WHERE Column = 320 * 200 * 32 is computed at compile time to its arithmetic result and, internally, the predicate is evaluated as WHERE Column = 2048000. But unlike constants, calculations that involve column values, parameters, non-deterministic functions, or variables are only evaluated at runtime – this is another example of how the Query Optimizer can’t accurately estimate row counts beforehand, resulting...

Optimizing OR logic

A common query pattern involves the need to express several conditions of which at least one must be true to filter the result set, usually with OR logic. Expressing these OR conditions can have serious performance drawbacks and can often be replaced with other constructs that provide better scalability and performance.

The following example shows a query executed in the AdventureWorks sample database that uses an OR condition in the search predicate. The query lists all rows for a specific product, or where the price is set at a predetermined value:

SELECT ProductID, [UnitPrice], [UnitPriceDiscount],
      [UnitPrice] * (1 - [UnitPriceDiscount]) AS FinalUnitPrice,
      [UnitPriceDiscount] + 0.10 AS NewUnitPriceDiscount,
      [UnitPrice] * (1 - 0.30) AS NewFinalUnitPrice
FROM Sales.SalesOrderDetail
WHERE ProductID = 770
     OR UnitPrice = 3399...

NULL means unknown

In the context of a database, if a column is set to NULL, it effectively means that the value is unknown. If we compare any other value with NULL, the result of that comparison is also unknown. In other words, a value can never be equal to NULL, as NULL is the absence of a value. This means the expression ColumnValue = NULL will never evaluate to true or false; even if ColumnValue is in fact NULL, it will always evaluate to unknown. To detect if a column value is NULL, we must use the special expressions IS NULL or IS NOT NULL rather than = or <>.

Note

This handling of NULL is not unique to the SQL Database Engine, it is based on the ANSI standard handling of NULL values.

Having NULL values in our database is not an anti-pattern in and of itself, but when we assign a meaning to the value NULL in our application, we may face some challenges when it comes to writing performant T-SQL due to the need for special handling of NULL comparisons.

Let’...

Fuzzy string matching

When searching for strings in the SQL Database Engine using =, the strings must match exactly for the expression to evaluate to true. If we want to match only part of the string, however, we must use a LIKE operator with wildcards. If we want to search for a pattern anywhere within a string, we need both leading and trailing wildcards. The problem with this is that it prevents us from being able to use an index or accurately estimate the cardinality. An index with a string key is sorted starting with the first character of the string, but if we are searching for a pattern that may appear in the middle of the string, the SQL Database Engine must scan every value and search for the matching pattern in each string in the column. A LIKE operator with a leading wildcard (%a value or %a value%) almost always causes a scan operation.

Consider an example from the AdventureWorks database where we want to find all the Flat Washers in the Product table. We know they all...

Inequality logic

Inequality logic is logic that involves negative comparisons such as !=, <>, NOT IN, and NOT LIKE. This type of predicate can be costly because it often results in evaluating each row, which translates to scan operations. Consider the following queries, 1 and 2, from the AdventureWorks database:

SELECT BusinessEntityID, FirstName, LastName
FROM Person.Person
WHERE PersonType NOT IN ('EM','SP','IN','VC','GC');
SELECT BusinessEntityID, FirstName, LastName
FROM Person.Person
WHERE PersonType = 'SC';

These queries are logically equivalent, since 'SC' is the only PersonType that is not listed in the first query. Out of the box, the execution plans look like this:

Figure 5.38 – Execution plan for Query 1 and Query 2

Figure 5.38 – Execution plan for Query 1 and Query 2

At this point, they appear to have the same estimated cost, but notice that both are doing a Clustered Index Scan and there is a missing index...

EXECUTE versus sp_executesql

There are times when an application must build a T-SQL statement dynamically before executing it on the server. In order to execute a dynamically created T-SQL statement, we can use either the EXECUTE command or the sp_executesql stored procedure. The sp_executesql procedure is the preferred method for executing dynamic T-SQL because it allows us to add parameter markers and thus increases the likelihood that the SQL Database Engine will be able to reuse the plan and avoid costly query compilations.

Here’s an example script from the AdventureWorks database that builds a dynamic T-SQL statement and executes it via the EXECUTE command:

DECLARE @sql nvarchar(MAX), @JobTitle nvarchar(50) = N'Sales Representative';
SET @sql = 'SELECT e.BusinessEntityID, p.FirstName, p.LastName
FROM HumanResources.Employee e
INNER JOIN Person.Person p ON p.BusinessEntityID = e.BusinessEntityID
WHERE e.JobTitle = N''' + @JobTitle + &apos...

Composable logic

Composable logic is what some developers use to make a single T-SQL statement do more than one thing, which allows us to reuse the same code for multiple tasks. When writing procedural code, reusability is desired because it makes the code more concise and maintainable. It allows developers to create libraries of modules that can be reused in other areas of the application, or even in other applications altogether. In T-SQL, however, there can be a hefty performance penalty for writing generic reusable code.

For the SQL Database Engine to execute a query in the most efficient way, it needs to estimate the cost of the query and choose operators that will return the results in the cheapest way possible. This is all done at compile-time based on how the query is written. With composable logic, however, the true cost of the query cannot be known until runtime because it is based on variables that change whenever the query is run. This type of generic code causes the...

Summary

In this chapter, we reviewed a few T-SQL anti-patterns, such as SELECT * syntax, OR logic, and functions in our predicates, that are relatively easy to find simply by looking at our T-SQL code and how it is written. The scenarios covered in this chapter are some of the most common examples of patterns that prevent our T-SQL queries from scaling well and maintaining the expected level of performance throughout the lifetime of the application. All are easy to detect, and most have easy workarounds. Therefore, when writing queries, try to avoid these anti-patterns by leveraging some of the techniques we outlined here.

In the next chapter, we will investigate some T-SQL anti-patterns that are a bit more difficult to identify as they require some additional research beyond simply reading the code.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learn T-SQL Querying - Second Edition
Published in: Feb 2024Publisher: PacktISBN-13: 9781837638994
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Pedro Lopes

Pedro Lopes is a Program Manager in the Database Systems group, based in Redmond, WA, USA. He has over 19 years of industry experience and has been with Microsoft for 9 years. He is currently responsible for program management of Database Engine features for in-market and vNext versions of SQL Server, with a special focus on the Relational Engine. He has extensive experience with query performance troubleshooting and is a regular speaker at numerous conferences such as SQLBits, PASS Summit, SQLIntersection, Microsoft Ignite, and Microsoft Build. He blogs about SQL on the SQL Server Team blog. He has authored several tools in the Tiger toolbox on GitHub: AdaptiveIndexDefrag maintenance solution, BPCheck, and usp_WhatsUp.
Read more about Pedro Lopes

author image
Pam Lahoud

Pam Lahoud is a Program Manager in the Database Systems group, based in Redmond, WA, USA. She has been with Microsoft for 13 years and is currently responsible for program management of Database Engine features for in-market and vNext versions of SQL Server, with a special focus on the Storage Engine area. She is passionate about SQL Server performance and has focused on performance tuning and optimization, particularly from the developer's perspective, throughout her career. She is a SQL Server 2008 Microsoft Certified Master (MCM) with over 20 years of experience working with SQL Server.
Read more about Pam Lahoud