Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
The Applied SQL Data Analytics Workshop - Second Edition

You're reading from  The Applied SQL Data Analytics Workshop - Second Edition

Product type Book
Published in Feb 2020
Publisher Packt
ISBN-13 9781800203679
Pages 484 pages
Edition 2nd Edition
Languages
Authors (3):
Matt Goldwasser Matt Goldwasser
Profile icon Matt Goldwasser
Upom Malik Upom Malik
Profile icon Upom Malik
Benjamin Johnston Benjamin Johnston
Profile icon Benjamin Johnston
View More author details

6. Performant SQL

Overview

By the end of this chapter, we will be able to optimize database use to allow queries to be executed with fewer resources. First, we will look at how a database engine performs basic queries by developing our understanding of the sequential scan. Next, we will look at optimizing SELECT queries by creating indexes on our tables that improve performance. We will also examine the benefits of using joins in place of other functionality. We will explore advanced functionality by creating custom functions for special computations and examine how to apply custom constraints on our database by leveraging triggers. Finally, we will learn about tools and techniques for terminating inefficient queries that are consuming our database resources.

Introduction

In the previous chapter, we developed the skills necessary to effectively analyze data within a SQL database, and in this chapter, we will turn our attention to the efficiency of this analysis, investigating how we can increase the performance of our SQL queries. Efficiency and performance are key components of data analytics since without considering these factors, physical constraints such as time and processing power can significantly affect the outcome of an analysis. To elaborate on these limitations, we can consider two separate scenarios.

Say that we are performing post hoc analysis (analysis after the fact or event). In this first scenario, we have completed a study and have collected a large dataset of individual observations of a variety of different factors or features. One such example is described within our dealership sales database, which analyzes the sales data for each customer.

With the data collection process, we want to analyze the data for patterns...

Database Scanning Methods

SQL-compliant databases provide a number of different methods for scanning, searching, and selecting data. The right scan method to use is very much dependent on the use case and the state of the database at the time of scanning. How many records are in the database? Which fields are we interested in? How many records do we expect to be returned? How often do we need to execute the query? These are just some of the questions that we may want to ask when selecting the most appropriate scanning method.

Throughout this section, we will describe some of the search methods available, how they are used within SQL to execute scans, and a number of scenarios where they should/should not be used.

Query Planning

Before investigating the different methods of executing queries or scanning a database for information, it is useful to understand how the SQL server makes various decisions about the types of queries to be used. SQL-compliant databases possess a powerful...

Performant Joins

The JOIN functionality in SQL-compliant databases provides a very powerful and efficient method of combining data from different sources, without the need for complicated looping structures or a series of individual SQL statements. We covered joins and join theory in detail in Chapter 2, SQL for Data Preparation.

As suggested by the name of the command, a join takes information from two or more tables and uses the contents of the records within each table to combine the two sets of information. Because we are combining this information without the use of looping structures, this can be done very efficiently. In this section, we will consider the use of joins as a more performant alternative to looping structures. The following is the Customer Information table:

Figure 6.31: Customer information

Figure 6.31: Customer information

The following table shows the Order Information table:

Figure 6.32: Order information

Figure 6.32: Order information

So, with this information...

Functions and Triggers

So far in this chapter, we have discovered how to quantify query performance via the query planner, as well as the benefits of using joins to collate and extract information from multiple database tables. In this section, we will construct reusable queries and statements via functions, as well as automatic function execution via trigger callbacks. The combination of these two SQL features can be used to not only run queries or re-index tables as data is added to/updated in/removed from the database, but also to run hypothesis tests and track their results throughout the life of the database.

Function Definitions

As in almost all other programming or scripting languages, functions in SQL are contained sections of code, which provides a lot of benefits, such as efficient code reuse and simplified troubleshooting processes. We can use functions to repeat/modify statements or queries without re-entering the statement each time or searching for its use throughout...

Summary

In this chapter, we have covered a wide variety of topics designed to help us understand and improve the performance of our SQL queries. The chapter began with a thorough discussion of the query planner, (including the EXPLAIN and ANALYZE statements) as well as various indexing methods. We discussed a number of different compromises and considerations that can be made to reduce the time needed to execute queries.

We considered a number of scenarios where indexing methods would be of benefit and others where the query planner may disregard the index, thus reducing the efficiency of the query. We then moved on to the use of joins to efficiently combine information from a number of different tables and ended with an in-depth look at functions and automatic function calls through the use of triggers. We have also covered the /df and /sf commands and learned how to kill long-running queries.

In the next chapter, we will combine all of the topics we have covered thus far in...

lock icon The rest of the chapter is locked
You have been reading a chapter from
The Applied SQL Data Analytics Workshop - Second Edition
Published in: Feb 2020 Publisher: Packt ISBN-13: 9781800203679
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}