Reader small image

You're reading from  Learn T-SQL Querying - Second Edition

Product typeBook
Published inFeb 2024
PublisherPackt
ISBN-139781837638994
Edition2nd Edition
Right arrow
Authors (2):
Pedro Lopes
Pedro Lopes
author image
Pedro Lopes

Pedro Lopes is a Program Manager in the Database Systems group, based in Redmond, WA, USA. He has over 19 years of industry experience and has been with Microsoft for 9 years. He is currently responsible for program management of Database Engine features for in-market and vNext versions of SQL Server, with a special focus on the Relational Engine. He has extensive experience with query performance troubleshooting and is a regular speaker at numerous conferences such as SQLBits, PASS Summit, SQLIntersection, Microsoft Ignite, and Microsoft Build. He blogs about SQL on the SQL Server Team blog. He has authored several tools in the Tiger toolbox on GitHub: AdaptiveIndexDefrag maintenance solution, BPCheck, and usp_WhatsUp.
Read more about Pedro Lopes

Pam Lahoud
Pam Lahoud
author image
Pam Lahoud

Pam Lahoud is a Program Manager in the Database Systems group, based in Redmond, WA, USA. She has been with Microsoft for 13 years and is currently responsible for program management of Database Engine features for in-market and vNext versions of SQL Server, with a special focus on the Storage Engine area. She is passionate about SQL Server performance and has focused on performance tuning and optimization, particularly from the developer's perspective, throughout her career. She is a SQL Server 2008 Microsoft Certified Master (MCM) with over 20 years of experience working with SQL Server.
Read more about Pam Lahoud

View More author details
Right arrow

Indexing for T-SQL Performance

In the previous chapter, we explored execution plans and the various operators that the SQL Database Engine uses to retrieve the data requested by a query. While the Query Optimizer does most of the heavy lifting when choosing the best way to retrieve the data required to satisfy the query, it can only do so efficiently if the proper indexes are in place.

An index is a structure in the database that speeds up access to data by organizing it in a specific way based on the type of index. The data structure that works best for your application will depend on many factors, including the type of data being stored, the volatility of the data, and the data access patterns that will be used to retrieve the data. The SQL Database Engine offers a few different index types, such as rowstore, columnstore, XML, and others. Rowstore indexes are the most common indexes in the SQL Database Engine and are what most people think of when considering indexes for query...

Technical requirements

The examples used in this chapter are designed for use on SQL Server 2022 and Azure SQL Database, but they should work on any version of SQL Server, 2012 or later. The Developer Edition of SQL Server is free for development environments and can be used to run all the code samples. There is also a free tier of Azure SQL Database you can use for testing at https://aka.ms/freedb.

You will need the AdventureWorks2016_EXT (referred to as AdventureWorks) and AdventureWorksDW2016_EXT (referred to as AdventureWorksDW) sample databases, which can be found on GitHub at https://github.com/Microsoft/sql-server-samples/releases/tag/adventureworks. The code samples for this chapter can also be found on GitHub at https://github.com/PacktPublishing/Learn-T-SQL-Querying-Second-Edition/tree/main/ch4.

Understanding predicate SARGability

A predicate is a filter that can be used to determine the set of conditions to apply to a query to trim the result set. As we have discussed in previous chapters, these are typically applicable to the following clauses:

  • JOIN clauses, which filter the rows matching the type of join
  • WHERE clauses, which filter source rows from a table or an index
  • HAVING clauses, which filter the results

Most queries will make use of predicates, usually through a WHERE clause. When a predicate is serviceable by an index, it is said the predicate is SARGable, which is an acronym for Search ARGument-able. Having SARGable predicates should be a goal for our T-SQL queries because it can reduce the number of rows that need to be processed by a plan earlier in the execution – that is, when the data is being read by the SQL Database Engine. The implementation of this early row count reduction is called predicate pushdown; it is the action of using...

Data access using indexes

Now that we have discussed how the Query Optimizer uses indexes to facilitate predicate pushdown and make queries more efficient, let’s explore how indexes are structured and why they are so important for query performance.

Before we begin discussing the structure of indexes, it’s worth understanding how data is stored and accessed in the SQL Database Engine. Data is stored on 8 KB pages. An object such as a table or an index is essentially a collection of pages, along with metadata that maps out the structure of the object. The SQL Database Engine uses a special metadata page called an Index Allocation Map (IAM) page to locate the pages in an object. IAM pages contain a list of all the pages in a database file that belong to an object. Each object will have at least one IAM page but depending on the size of the object and the file structure of the database, there may be more than one IAM page, forming a chain.

Tables that do not have a...

Indexing strategy using rowstore indexes

Now that we’ve covered the basics of how rowstore indexes are structured and how they are used to access data, let’s move on to where and when they should be used, along with some best practices for efficient index design.

The goal of an indexing strategy is to minimize the amount of I/O required to satisfy the queries being generated against the database. This translates into a few simple guidelines:

  • Keep indexes as small as possible. The more rows that fit on a page, the fewer page reads that are required to access the data.
  • Avoid lookups – they add unnecessary I/O and can sometimes lead to suboptimal query plans.
  • Choose index keys that support query predicates so that indexes can be used for seeks rather than scans.
  • When creating indexes with multiple key columns, columns used for equality comparisons should be first, followed by columns used for inequality comparisons. The leading column should...

Index maintenance

While index maintenance is more of a database administration topic than a developer topic, it’s worth discussing the importance of index maintenance. As we discussed in the section on index structure, over time, INSERT, UPDATE, and DELETE operations can cause an index to become fragmented. Once the data is in memory, fragmentation doesn’t cause a noticeable performance issue, so the main concern is I/O. The SQL Database Engine has a few I/O optimizations, such as the readahead mechanism that’s used when scanning an index, that rely on the data being stored contiguously. When the data is fragmented, I/O may not be as efficient.

Another side effect of fragmentation is lower page density. A page is the smallest unit of I/O in the SQL Database Engine, so an index that contains a lot of partially empty pages will generate a lot more I/O than necessary. If the pages are full, it will take fewer of them to store the same amount of data. This is a...

Summary

This chapter covered a lot of ground, so let’s review the overall indexing strategy guidance:

  1. Clustered index data access is generally more efficient than heaps and every table in the database should have a clustered index, except for short-lived tables such as staging tables.
  2. Create clustered indexes first based on the data structure. These should generally be primary keys unless there’s a specific reason to cluster a different column or columns (for example, surrogate versus natural keys).
  3. Create non-clustered indexes on all foreign key columns.
  4. Once you begin writing queries, create additional non-clustered indexes to support the application queries, or add additional columns to existing foreign key indexes.
  5. Create covering indexes where practical, balancing overhead with performance.
  6. Do not over-index heavily updated tables; balance the cost of index maintenance with the benefit to queries. Just because the SQL Database Engine allows...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learn T-SQL Querying - Second Edition
Published in: Feb 2024Publisher: PacktISBN-13: 9781837638994
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Pedro Lopes

Pedro Lopes is a Program Manager in the Database Systems group, based in Redmond, WA, USA. He has over 19 years of industry experience and has been with Microsoft for 9 years. He is currently responsible for program management of Database Engine features for in-market and vNext versions of SQL Server, with a special focus on the Relational Engine. He has extensive experience with query performance troubleshooting and is a regular speaker at numerous conferences such as SQLBits, PASS Summit, SQLIntersection, Microsoft Ignite, and Microsoft Build. He blogs about SQL on the SQL Server Team blog. He has authored several tools in the Tiger toolbox on GitHub: AdaptiveIndexDefrag maintenance solution, BPCheck, and usp_WhatsUp.
Read more about Pedro Lopes

author image
Pam Lahoud

Pam Lahoud is a Program Manager in the Database Systems group, based in Redmond, WA, USA. She has been with Microsoft for 13 years and is currently responsible for program management of Database Engine features for in-market and vNext versions of SQL Server, with a special focus on the Storage Engine area. She is passionate about SQL Server performance and has focused on performance tuning and optimization, particularly from the developer's perspective, throughout her career. She is a SQL Server 2008 Microsoft Certified Master (MCM) with over 20 years of experience working with SQL Server.
Read more about Pam Lahoud