Reader small image

You're reading from  SQL Query Design Patterns and Best Practices

Product typeBook
Published inMar 2023
Reading LevelIntermediate
PublisherPackt
ISBN-139781837633289
Edition1st Edition
Languages
Right arrow
Authors (6):
Steve Hughes
Steve Hughes
author image
Steve Hughes

Steve Hughes is a senior director of data and analytics at 3Cloud. In this role, he focuses on growing the teams' skills and capabilities to deliver data projects with Azure. In his 20 years of experience in technology, much of the time was spent on creating business intelligence (BI) solutions and helping customers implement BI and cloud solutions. He is passionate about helping customers understand that data is valuable and profitable. Steve has recently been diagnosed with ALS but continues to work and share with others what he has learned. Steve is also the founder of Data on Wheels where he blogs with one of his daughters on topics such as data, analytics, and work enablement
Read more about Steve Hughes

Dennis Neer
Dennis Neer
author image
Dennis Neer

Dennis Neer is a senior architect of data and analytics at 3Cloud. In this role, he focuses on working with clients to design Azure solutions for their data and analytic needs to improve their business decision-making process. This includes using tools such as SQL Server databases, Synapse, data lakes, and Power BI. In his 30 years of experience in technology, much of the time was spent on designing and building database and visualization solutions. He is passionate about helping businesses to understand data and use it to their advantage during their decision-making process
Read more about Dennis Neer

Dr. Ram Babu Singh
Dr. Ram Babu Singh
author image
Dr. Ram Babu Singh

Dr. Ram Babu Singh is a Microsoft Certified Professional with a Ph.D. in computer science. He is a lead architect of data and analytics at 3Cloud, using complex data analytics and data science solutions. In his 20 years of career, over a decade was spent in leadership positions providing data science and data platform solutions to the world's top IT consulting companies. He has been published in international journals and has a patent in his name
Read more about Dr. Ram Babu Singh

Shabbir H. Mala
Shabbir H. Mala
author image
Shabbir H. Mala

Shabbir H. Mala is a director of data and analytics at 3Cloud. He has over 23 years of experience in thought leadership and consulting, developing complex data solutions, business systems, and processes using Azure Data Services and Power BI. He currently manages a team of over 40 principals and senior architects focusing on business growth, successful project delivery, and client experience. He has done talks at Power BI conferences as well as user groups. He has been awarded Microsoft FastTrack Solution Architect in 2021 and 2022. He is currently living in Chicago, married, and has three beautiful kids
Read more about Shabbir H. Mala

Leslie Andrews
Leslie Andrews
author image
Leslie Andrews

Andrews is a lead data architect of data and analytics at 3Cloud. Working in the IT sector for over 20 years in local government, electrical supply, law enforcement, and healthcare, she has broad experience to draw from to assist clients in making data more accessible for end users. Leslie's expertise includes SQL development, ETL, data warehousing, data modeling, and analytics. Leslie currently holds several Microsoft certifications related to Azure technologies for data engineering. She approaches each day as an opportunity to use data and technology to make it easier for others to do their daily work. Leslie has been doing public speaking since 2015, was an IDERA ACE in 2019, and is a supporter of and contributor to the SQL community
Read more about Leslie Andrews

Chi Zhang
Chi Zhang
author image
Chi Zhang

Chi Zhang is a lead data architect at 3Cloud. After completing her master's degree at Carnegie Mellon University, she worked in data consulting for the past 5 years. She has helped clients from various industries to deliver customized data solutions within Azure. Focusing on ETL, data warehousing, data modeling, and BI reporting, Chi solves data problems for clients and builds data solutions that provide business users with better operational insights and a smoother experience. Recently, she has continued to grow her footprint in the Azure community: earning certifications in Azure data engineering and data science, giving her first public tech talk, co-authoring a technical book, and becoming an active contributor to the SQL community
Read more about Chi Zhang

View More author details
Right arrow

Efficiently Aggregating Data

In the previous chapter, you learned how to reduce the number of rows and columns contained in your dataset so that you have the data required to do the analysis that you are interested in. The next step that you will want to do is to aggregate your data so that you have summarized information making it easier to show the information in an easy-to-consume dataset.

In this chapter, you will learn the what, when, and how of data aggregation in SQL so that the dataset can be used by most end users to do successful analytics. We will start with what data should be aggregated, followed by a discussion on when the data should be aggregated and the different methods of aggregating the data. The chapter will wrap up with a short discussion on how to develop aggregations that are efficient and have minimal impact on your SQL server resources.

In this chapter, we will cover the following main topics:

  • Identifying data to be aggregated
  • Determining...

Technical requirements

To work with the examples and illustrations in this chapter, you will need to have SQL Server Management Studio installed. We will be working with the WideWorldImportersDW database on SQL Server. Please refer to the Appendix for tool installation and database restoration guidance.

You will find the code from this chapter here on GitHub: https://github.com/PacktPublishing/SQL-Query-Design-Best-Practices/tree/main/Chapter02

Identifying data to be aggregated

To identify the data that you are going to aggregate in the initial dataset that you have created, it is important to understand the story that you are trying to tell about the data. Some items will need to be counted, some will need to be summarized, and some will need to be aggregated by summarizing and counting. For example, if you are interested in doing an analysis of the data showing how many times an item has been ordered, then this would be an example of when you would aggregate the data by performing counts. If you are looking at data to see how many sales have occurred or the profits that have been made, this would be an example of summarizing the data. In addition, you may be interested in the average profit for a time period or the earliest and latest that an order has been delivered.

With these aggregations in place, the size of the dataset will be reduced, and you will be able to perform several other calculations based on these aggregations...

Determining when data should be aggregated

Now that we know what data we want to aggregate, when should this aggregation be performed? The aggregation should occur once you have identified the level of granularity you require. So, what does the level of granularity mean? It refers to the level of detail that an aggregation is completed to; for example, you want to know your profits at a daily level or monthly level. Other examples include aggregating to the day, month, year, store location, product, and so on.

Going back to Figure 2.1, we have the aggregations based on the invoice date, but we really want to know the totals based on the year, so you would then want to perform the aggregation based on the created dataset. Refer to the following query for how to do this:

SELECT YEAR([Invoice Date Key]) as Year
      ,SUM([Quantity]) as "# of items sold"
      ,SUM([Profit]) as profit
    ...

Improving performance when aggregating data

Developing SQL queries to aggregate data is a relatively simple process if you understand the granularity that you want to achieve. But there are times that you will need to rework your SQL to enable it to perform more efficiently; this mostly happens when there are many columns that are part of many aggregations. For example, if the result set contains aggregations that are part of another aggregation, you would want to develop the SQL query containing a subquery that creates the initial aggregations and then performs the final aggregation. An alternative would be to create multiple queries to aggregate the data appropriately for each aggregation and then use a MERGE function to create a single dataset to be able to perform your analysis. Here is a sample SQL query that uses subqueries to create an aggregation from two different subjects:

SELECT YEAR([Invoice Date Key]) as [Invoice Year]
      ,MONTH([Invoice...

Summary

In this chapter, we discussed when and how you aggregate which data that will be collected in your dataset.

We started by discussing what data you should be aggregating to make your data easier to analyze once the dataset is created. We followed that up with a brief discussion of when you should aggregate the data. Once we learned what data to aggregate and when it should be aggregated, we discussed the various functions that can be used to aggregate the data. We then wrapped up the chapter by discussing how a SQL query could be broken down into multiple SQL queries to make them easier to troubleshoot and maintain. Then we showed how the queries could be combined using a subquery or a JOIN function to combine the results of multiple queries into a single result set.

You now have the necessary skills to collect data and aggregate it into a result set that will be easy to analyze. In the next chapter, we will continue your SQL journey by learning how to format the data...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
SQL Query Design Patterns and Best Practices
Published in: Mar 2023Publisher: PacktISBN-13: 9781837633289
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (6)

author image
Steve Hughes

Steve Hughes is a senior director of data and analytics at 3Cloud. In this role, he focuses on growing the teams' skills and capabilities to deliver data projects with Azure. In his 20 years of experience in technology, much of the time was spent on creating business intelligence (BI) solutions and helping customers implement BI and cloud solutions. He is passionate about helping customers understand that data is valuable and profitable. Steve has recently been diagnosed with ALS but continues to work and share with others what he has learned. Steve is also the founder of Data on Wheels where he blogs with one of his daughters on topics such as data, analytics, and work enablement
Read more about Steve Hughes

author image
Dennis Neer

Dennis Neer is a senior architect of data and analytics at 3Cloud. In this role, he focuses on working with clients to design Azure solutions for their data and analytic needs to improve their business decision-making process. This includes using tools such as SQL Server databases, Synapse, data lakes, and Power BI. In his 30 years of experience in technology, much of the time was spent on designing and building database and visualization solutions. He is passionate about helping businesses to understand data and use it to their advantage during their decision-making process
Read more about Dennis Neer

author image
Dr. Ram Babu Singh

Dr. Ram Babu Singh is a Microsoft Certified Professional with a Ph.D. in computer science. He is a lead architect of data and analytics at 3Cloud, using complex data analytics and data science solutions. In his 20 years of career, over a decade was spent in leadership positions providing data science and data platform solutions to the world's top IT consulting companies. He has been published in international journals and has a patent in his name
Read more about Dr. Ram Babu Singh

author image
Shabbir H. Mala

Shabbir H. Mala is a director of data and analytics at 3Cloud. He has over 23 years of experience in thought leadership and consulting, developing complex data solutions, business systems, and processes using Azure Data Services and Power BI. He currently manages a team of over 40 principals and senior architects focusing on business growth, successful project delivery, and client experience. He has done talks at Power BI conferences as well as user groups. He has been awarded Microsoft FastTrack Solution Architect in 2021 and 2022. He is currently living in Chicago, married, and has three beautiful kids
Read more about Shabbir H. Mala

author image
Leslie Andrews

Andrews is a lead data architect of data and analytics at 3Cloud. Working in the IT sector for over 20 years in local government, electrical supply, law enforcement, and healthcare, she has broad experience to draw from to assist clients in making data more accessible for end users. Leslie's expertise includes SQL development, ETL, data warehousing, data modeling, and analytics. Leslie currently holds several Microsoft certifications related to Azure technologies for data engineering. She approaches each day as an opportunity to use data and technology to make it easier for others to do their daily work. Leslie has been doing public speaking since 2015, was an IDERA ACE in 2019, and is a supporter of and contributor to the SQL community
Read more about Leslie Andrews

author image
Chi Zhang

Chi Zhang is a lead data architect at 3Cloud. After completing her master's degree at Carnegie Mellon University, she worked in data consulting for the past 5 years. She has helped clients from various industries to deliver customized data solutions within Azure. Focusing on ETL, data warehousing, data modeling, and BI reporting, Chi solves data problems for clients and builds data solutions that provide business users with better operational insights and a smoother experience. Recently, she has continued to grow her footprint in the Azure community: earning certifications in Azure data engineering and data science, giving her first public tech talk, co-authoring a technical book, and becoming an active contributor to the SQL community
Read more about Chi Zhang