Home Data SQL Query Design Patterns and Best Practices

SQL Query Design Patterns and Best Practices

By Steve Hughes , Dennis Neer , Dr. Ram Babu Singh and 3 more
books-svg-icon Book
eBook $31.99 $21.99
Print $39.99 $23.98
Subscription $15.99 $10 p/m for three months
$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
eBook $31.99 $21.99
Print $39.99 $23.98
Subscription $15.99 $10 p/m for three months
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
  1. Free Chapter
    Chapter 1: Reducing Rows and Columns in Your Result Sets
About this book
SQL has been the de facto standard when interacting with databases for decades and shows no signs of going away. Through the years, report developers or data wranglers have had to learn SQL on the fly to meet the business needs, so if you are someone who needs to write queries, SQL Query Design and Pattern Best Practices is for you. This book will guide you through making efficient SQL queries by reducing set sizes for effective results. You’ll learn how to format your results to make them easier to consume at their destination. From there, the book will take you through solving complex business problems using more advanced techniques, such as common table expressions and window functions, and advance to uncovering issues resulting from security in the underlying dataset. Armed with this knowledge, you’ll have a foundation for building queries and be ready to shift focus to using tools, such as query plans and indexes, to optimize those queries. The book will go over the modern data estate, which includes data lakes and JSON data, and wrap up with a brief on how to use Jupyter notebooks in your SQL journey. By the end of this SQL book, you’ll be able to make efficient SQL queries that will improve your report writing and the overall SQL experience.
Publication date:
March 2023
Publisher
Packt
Pages
270
ISBN
9781837633289

 

Reducing Rows and Columns in Your Result Sets

Today the sources of data that a data analyst has access to have grown to the point that the amount of data that is available to you is unlimited. The challenge that a data analyst faces today is to determine how to generate a result set that is manageable and has the information that ensures that it will meet the needs of the analyst for their reports and analysis. If there is too much data, the result set will become unmanageable and unusable due to information overload; too little data and the data will have gaps, and the end user will lose trust in the data.

In this chapter, we will review how you determine how much data and what data you should keep in your result set and how to filter that data efficiently. We will also review how to determine which columns you should keep and how you can efficiently select the correct columns. The chapter will then wrap up with a short discussion on how these activities will impact future data aggregations.

By the end of this chapter, you will understand how to identify the data and columns that you need and the most efficient method for getting the data into a usable result set that can easily be recreated.

In this chapter, we will cover the following main topics:

  • Identifying data to be removed from the dataset
  • Understanding the value of creating views versus removing data
  • Exploring the impact of row and column reductions on aggregations
 

Technical requirements

To work with the examples and illustrations in this chapter, you will need to have SQL Server Management Studio installed. We will be working with the WideWorldImportersDW database on SQL Server. Please refer to the Appendix for tool installation and database restoration guidance.

You will find the code from this chapter on GitHub: https://github.com/PacktPublishing/SQL-Query-Design-Best-Practices/tree/main/Chapter01

 

Identifying data to remove from the dataset

With the amount of data that is freely available today, databases are getting larger and larger, and that makes it a challenge for data analysts to analyze the data properly. A challenge that data analysts face is determining what data is required to be able to produce a dataset that provides the information that is relevant for analysis. In this chapter, you will learn how to reduce the amount of data and columns that are contained in a dataset without affecting the result set.

To do this, you will need to first understand what data is required through interviews with the people requesting the result set. The interview process will help you to understand what the person requesting the result set wants to accomplish and give you an idea of where to find the data and what database and table contain the information that is required. With this knowledge, you will need to perform some initial analysis of the data in the database tables to determine how much and what columns of data are needed. This is often done through simple queries that perform row counts and table descriptions. The following are examples of the type of queries that may be used.

The following is a query to get an idea of the data in a table:

SELECT TOP (1000) *
  FROM database.schema.table;

This type of query will give you an idea of what data is available in a particular table by showing you up to the first 1,000 rows in the table, and if the table has fewer than 1,000 rows, it will show you all of the rows in the table.

The following query will show you all of the columns and their data types in a particular schema:

SELECT Table_name as [Table] ,
       Column_name as [Column] ,
       Table_catalog as [Database],
       table_schema as [Schema]
FROM   information_schema.columns
WHERE  table_schema = 'Fact'
ORDER BY Table_name, Column_name;

This type of query will read the system tables in the database to return the names of all of the Column instances that each table in the schema contains. The table that we are interested in is the information_schema.columns table. With this information, you can determine what columns are available for you to use.

With this information, let’s look at an example for solving the following sample request that was determined by interviewing a user. For the examples in this chapter, we will assume that the interview has resulted in the following analysis:

We want to be able to analyze the number of orders that resort in a back-order item being created by each year and month and how many customers were impacted.

How do we go about this? Let us check in the following sections.

Reducing the amount of data

We start by determining which tables seem to contain the data that is required as just described:

SELECT Table_name as [Table] ,
       Column_name as [Column] ,
       Table_catalog as [Database],
       Table_schema as [Schema]
FROM information_schema.columns
Where Table_schema = 'fact'
AND Table_name = 'Order'
ORDER BY Table_name, Column_name;

Figure 1.1 shows the results of the query:

Figure 1.1 – Results of the query to show all columns in a table

Figure 1.1 – Results of the query to show all columns in a table

Based on the results, the Fact.Order table is a good candidate to start with, so let’s run the following query:

SELECT TOP (1000) *
  FROM [WideWorldImportersDW].[Fact].[Order];

Figure 1.2 shows the results of this query:

Figure 1.2 – Sample data and columns from the Fact.Order query

Figure 1.2 – Sample data and columns from the Fact.Order query

This query shows us that there are 19 columns, and of those columns, we are only interested in Customer Key, Stock Item Key, Order Date Key, and WWI Backorder ID. So this is, in fact, the table that we should be using. Now that we have identified the table that we want to use, let’s run the following query to see how much data is in the table:

SELECT count(*)
FROM [WideWorldImportersDW].[Fact].[Order]

The results show that there are 231,412 rows of data in the table, so how do we reduce the amount of data that will be required in the result set? The answer is that we do some more analysis; for example, instead of keeping all the columns in the table, we will only include the columns in the query that are needed, as identified earlier. We also know that we are only interested in orders with a back-ordered item. So, let’s run this query and see how many records remain:

SELECT count(*)
FROM [WideWorldImportersDW].[Fact].[Order]
WHERE [WWI Backorder ID] IS NOT NULL;

The result from this query shows that we have reduced the size to 20,593 records; this is a much more manageable size to work with. The query that will get us the initial result set is as follows:

SELECT [Order Date Key] as "Order Date",
    [Order Key] as "Order",
 [stock item key] as "Stock Item",
 [Customer Key] as "Customer",
 [WWI Order ID] as "WWI Order",
 [WWI Backorder ID] as "WWI Backorder"
FROM [WideWorldImportersDW].[Fact].[Order]
WHERE [WWI Backorder ID] IS NOT NULL;

To explain what you have just done with this query, let’s break down each step.

The names after SELECT are the columns that you want the query to return in the result set. In this case, you are returning Order Date Key, Order Key, Stock item key, Customer Key, WWI Order ID, and WWI Backorder ID.

With these columns, you will have a date field in which to analyze the data by month, Order Key allows you to see how many distinct orders are impacted, stock item key tells you which items in the order have been back-ordered, Customer Key tells you which customer has been impacted by the backorders, and WWI Order ID lets you determine how many orders have been impacted. WWI Backorder ID is included when you want to see how many backorders are in the system.

FROM tells the query where to get the data from, in this case, from the WWI database using the Fact Order table.

The most important part is the WHERE clause; this is the part of the code that reduces the size of the result set to a manageable size. After all, you are not interested in all the orders in the table, only the orders that have an item that is on backorder. Figure 1.3 shows what the result set will look like:

Figure 1.3 – Sample result set

Figure 1.3 – Sample result set

Since you are interested in data by months, you will want to modify the Order Date Key column for the year and a column for the month as follows:

SELECT Year([Order Date Key]) as "Order Year",
       Month([Order Date Key]) as "Order Month",
  [Order Key] as "Order",
  [stock item key] as "Stock Item",
  [Customer Key] as "Customer",
  [WWI Order ID] as "WWI Order",
  [WWI Backorder ID] as "WWI Backorder"
FROM [WideWorldImportersDW].[Fact].[Order]
WHERE [WWI Backorder ID] IS NOT NULL;

You now have this result set you can see in Figure 1.4, and you are ready to answer the question that came from the interview with the user:

Figure 1.4 – Result set totaling by month

Figure 1.4 – Result set totaling by month

Now that you have learned how to get the data that you require for analysis, we will discuss the impact this filtering of data has on the aggregations that you may want to do in the analysis.

 

Understanding the value of creating views versus removing data

You have now learned how to create a query to get a result set that you can use for analysis and answer questions for a user. The next challenge is how you make this reusable so that you do not have to recreate the query every time you need the same data for other analyses. The reason for the challenge is that as the query gets more complex, the more likely the query is to be incorrectly typed. The solution to this challenge is to create a view. A view is a way to save the query as a logical table so that anybody with access to the database can run the query, and if you move on to another opportunity, the next person can recreate the result set with very little effort.

So, how do you create a view? It is as simple as adding the following line to the beginning of the SELECT query:

Create View 'name of the view' AS

Here is how the query that we created earlier would look to create a view of the data by adding the following line to the beginning of the SELECT query:

CREATE VIEW v_Backorders as
SELECT Year([Order Date Key]) as "Order Year",
       Month([Order Date Key]) as "Order Month",
  [Order Key] as "Order",
  [stock item key] as "Stock Item",
  [Customer Key] as "Customer",
  [WWI Order ID] as "WWI Order",
  [WWI Backorder ID] as "WWI Backorder"
FROM [WideWorldImportersDW].[Fact].[Order]
WHERE [WWI Backorder ID] IS NOT NULL;

Now you can run the analysis query as the following:

SELECT [Order Year],
       [Order Month],
   [Order],
   [Stock Item],
   [Customer],
   [WWI Order],
   [WWI Backorder]
FROM [dbo].[v_Backorders];

In Figure 1.5, you will notice that the following results are the same as you saw in the preceding result, and you do not have to include the filters because they are already included in the view:

Figure 1.5 – Result set using a view

Figure 1.5 – Result set using a view

This can save you the time of having to create the query in the future once the initial query has been created, and you can be assured that the data is correct. Most things that you can do in a query can also be done in a view, and you can use the view as though it is a table and just select columns from the view as you would in the table.

Now let’s look at how this filtering impacts any aggregations that you may plan to do with the result set.

 

Exploring the impact of row and column reductions on aggregations

Now you know how to reduce the number of rows and specify the columns that you need in your result set, let’s talk about what the impact will be on any aggregations that you may be interested in.

First of all, based on this result set, you can view the number of backorders by any combination of columns. For example, to see the number of backorders based on year and month, you could use the following query:

SELECT Year([Order Date Key]) as "Order Year",
       Month([Order Date Key]) as "Order Month",
  COUNT([WWI Backorder ID]) as "Number of backorders",
  COUNT(distinct [Customer Key]) as "Impacted Customers",
  COUNT([Order Key]) as "Number of orders"
FROM [WideWorldImportersDW].[Fact].[Order]
WHERE [WWI Backorder ID] IS NOT NULL
GROUP BY Year([Order Date Key]),
         Month([Order Date Key])
ORDER BY Year([Order Date Key]),
         Month([Order Date Key]);

You could also run the following query using the view that you created, and you will get the same results:

SELECT [Order Year],
       [Order Month],
   COUNT([WWI Backorder]) as "number of backorders",
   COUNT([customer]) as "number of impacted customers",
   COUNT([Order]) as "number of orders"
FROM v_backorders
GROUP BY [Order Year],
     [Order Month];

Figure 1.6 shows a subset of the results from the query:

Figure 1.6 – Subset of results from the earlier queries

Figure 1.6 – Subset of results from the earlier queries

On closer investigation, you will notice that the values in the number of backorders and number of orders columns are the same. Why is this? The answer is in the filter; it only returns records that have an associated backorder, so you will not get the actual number of orders that have been placed, and any calculations will not be accurate. So, to get an accurate number of orders, you will need to get all the orders in the table, not just the orders associated with a backorder. You will also see that the order of the result set is different for the query that uses the view and the query that does not use the view. This is due to the query that uses the view being stored in a memory heap from the view and accessed on demand, whereas the query that does not use the view is stored on the disk in the primary key order of the table.

This is the impact of using a filter; to get around this, you can add a subquery to remove the filter. So here is how you can update your query to get that additional information:

SELECT Year([Order Date Key]) as [Order Year],
       Month([Order Date Key]) as [Order Month],
   COUNT(distinct [Customer Key]) as [Impacted Customers],
   COUNT(distinct [Stock Item Key]) as [Backorder Items],
   COUNT([WWI Backorder ID]) as [Number of backorders],
   fo.orders,
       fo.Customers
FROM [WideWorldImportersDW].[Fact].[Order] o,
     (select  Year([Order Date Key]) as [Order Year],
              Month([Order Date Key]) as [Order Month],
              COUNT (distinct [Order Key]) as [orders],
              COUNT (distinct [customer Key]) as [Customers]
      FROM [WideWorldImportersDW].[Fact].[Order]
  GROUP BY Year([Order Date Key]),
               Month([Order Date Key])) as fo
WHERE [WWI Backorder ID] IS NOT NULL
AND year(o.[Order Date Key]) = fo.[Order Year]
AND month(o.[Order Date Key]) = fo.[Order Month]
GROUP BY Year([Order Date Key]),
         Month([Order Date Key]),
    fo.orders,
    fo.Customers
ORDER BY Year([Order Date Key]),
         Month([Order Date Key]);

This will give you the following result set, as seen in Figure 1.7. Notice you now see values in the orders, Impacted Customers, Number of backorders, and Customers columns for each Order Month:

Figure 1.7 – Results of using filters in a query

Figure 1.7 – Results of using filters in a query

This can be done for any number of columns as long as the subquery is grouped by the same fields as the main query. To simplify this query, you could very easily create a view of all the orders and then use the views to get the same results with less query development.

Here is a sample of the query to create the order view that you can use in future queries:

CREATE VIEW [dbo].[v_orders] AS
SELECT Year([Order Date Key]) as [Order Year],
      MONTH([Order Date Key]) as [Order Month],
      COUNT(distinct [Order Key]) as [orders],
      COUNT(distinct [customer Key]) as [Customers]
FROM [WideWorldImportersDW].[Fact].[Order]
GROUP BY Year([Order Date Key]),
         Month([Order Date Key])

Here is a sample of the query that uses the two views (v_Backorders and v_orders) that have been created in this chapter:

  SELECT o.[Order Year],
         o.[Order Month],
         o.Customers,
         o.orders,
     COUNT(b.[WWI Backorder]) as [total backorders],
     COUNT(distinct b.[customer]) as [impacted customers]
  FROM [WideWorldImportersDW].[dbo].[v_Backorders] b,
  [WideWorldImportersDW].[dbo].[v_orders] o
  WHERE b.[Order Year] = o.[Order Year]
  AND b.[Order Month] = o.[Order Month]
  GROUP BY o.[Order Year],
           o.[Order Month],
  o.Customers,
  o.orders
  ORDER BY o.[Order Year],
           o.[Order Month];

The following is the result of the query:

   Figure 1.8 – Sample results of using multiple views in a query

Figure 1.8 – Sample results of using multiple views in a query

So as you can see, it is easy to reduce the amount of data that you bring into your result, but sometimes, you will need to include other data that may have been filtered out. This is where views come in handy and allow you to include data that may have been filtered in previous activities. You will need to use caution when deciding between using a view over a subquery due to the performance implications. In this example, the data size is small enough that the performance implications are negligible.

As you can see from this section, there are multiple ways in which you will be able to create result sets and get the same results.

 

Summary

In this chapter, we discussed how to determine what data you need in your result set to meet your analysis needs.

We started with some simple queries to identify what table we needed to get the data we needed, and then we queried the table to get a sample of data that was contained in it. Next, we created a query that would create a result set that met the needs of the request and showed how it could be turned into a view that makes the query easily reusable. We then wrapped up the chapter by showing how there may be a need to get data that is not available because it was filtered out.

In the next chapter, we will expand upon this knowledge to look at how to efficiently aggregate the data so that your queries will run more efficiently.

About the Authors
  • Steve Hughes

    Steve Hughes is a senior director of data and analytics at 3Cloud. In this role, he focuses on growing the teams’ skills and capabilities to deliver data projects with Azure. In his 20 years of experience in technology, much of the time was spent on creating business intelligence (BI) solutions and helping customers implement BI and cloud solutions. He is passionate about helping customers understand that data is valuable and profitable. Steve has recently been diagnosed with ALS but continues to work and share with others what he has learned. Steve is also the founder of Data on Wheels where he blogs with one of his daughters on topics such as data, analytics, and work enablement

    Browse publications by this author
  • Dennis Neer

    Dennis Neer is a senior architect of data and analytics at 3Cloud. In this role, he focuses on working with clients to design Azure solutions for their data and analytic needs to improve their business decision-making process. This includes using tools such as SQL Server databases, Synapse, data lakes, and Power BI. In his 30 years of experience in technology, much of the time was spent on designing and building database and visualization solutions. He is passionate about helping businesses to understand data and use it to their advantage during their decision-making process

    Browse publications by this author
  • Dr. Ram Babu Singh

    Dr. Ram Babu Singh is a Microsoft Certified Professional with a Ph.D. in computer science. He is a lead architect of data and analytics at 3Cloud, using complex data analytics and data science solutions. In his 20 years of career, over a decade was spent in leadership positions providing data science and data platform solutions to the world’s top IT consulting companies. He has been published in international journals and has a patent in his name

    Browse publications by this author
  • Shabbir H. Mala

    Shabbir H. Mala is a director of data and analytics at 3Cloud. He has over 23 years of experience in thought leadership and consulting, developing complex data solutions, business systems, and processes using Azure Data Services and Power BI. He currently manages a team of over 40 principals and senior architects focusing on business growth, successful project delivery, and client experience. He has done talks at Power BI conferences as well as user groups. He has been awarded Microsoft FastTrack Solution Architect in 2021 and 2022. He is currently living in Chicago, married, and has three beautiful kids

    Browse publications by this author
  • Leslie Andrews

    Andrews is a lead data architect of data and analytics at 3Cloud. Working in the IT sector for over 20 years in local government, electrical supply, law enforcement, and healthcare, she has broad experience to draw from to assist clients in making data more accessible for end users. Leslie’s expertise includes SQL development, ETL, data warehousing, data modeling, and analytics. Leslie currently holds several Microsoft certifications related to Azure technologies for data engineering. She approaches each day as an opportunity to use data and technology to make it easier for others to do their daily work. Leslie has been doing public speaking since 2015, was an IDERA ACE in 2019, and is a supporter of and contributor to the SQL community

    Browse publications by this author
  • Chi Zhang

    Chi Zhang is a lead data architect at 3Cloud. After completing her master’s degree at Carnegie Mellon University, she worked in data consulting for the past 5 years. She has helped clients from various industries to deliver customized data solutions within Azure. Focusing on ETL, data warehousing, data modeling, and BI reporting, Chi solves data problems for clients and builds data solutions that provide business users with better operational insights and a smoother experience. Recently, she has continued to grow her footprint in the Azure community: earning certifications in Azure data engineering and data science, giving her first public tech talk, co-authoring a technical book, and becoming an active contributor to the SQL community

    Browse publications by this author
Latest Reviews (1 reviews total)
Un libro interessante ed utile, lo consiglio anche a chi ha esperienza, per un ripasso ed un'integrazione