You're reading from SQL for Data Analytics

Product type Book

Published in Aug 2019

Publisher Packt

ISBN-13 9781789807356

Pages 386 pages

Edition 1st Edition

Languages

SQL

Concepts

Data Analysis

Authors (3):

Upom Malik

Matt Goldwasser

Benjamin Johnston

View More author details

Table of Contents (11) Chapters

Preface

About the Book

1. Understanding and Describing Data

2. The Basics of SQL for Analytics

3. SQL for Data Preparation

4. Aggregate Functions for Data Analysis

5. Window Functions for Data Analysis

6. Importing and Exporting Data

7. Analytics Using Complex Data Types

8. Performant SQL

9. Using SQL to Uncover the Truth – a Case Study

Appendix

8. Performant SQL

Learning Objectives

By the end of this chapter, you will be able to:

Optimize database use to allow more queries to be executed with fewer resources
Implement index and sequential scans and understand when to most effectively use them
Interpret the output of EXPLAIN ANALYZE
Understand the benefits of using joins in place of other functionality
Identify bottlenecks in queries
Implement triggers in response to specific events
Create and use functions to create more sophisticated and efficient queries
Identify long-running queries and terminate them

In this chapter, we will improve the performance of some of our previous SQL queries. Now that we have a good understanding of the basics, we will build upon this foundation by making our queries more resource and time efficient. As we begin to work with larger datasets, these efficiencies become even more important, with each computational step taking longer to compute...

Introduction

In the previous chapter, we developed the skills necessary to effectively analyze data within a SQL database, and in this chapter, we will turn our attention to the efficiency of this analysis, investigating how we can increase the performance of our SQL queries. Efficiency and performance are key components of data analytics, since without considering these factors, physical constraints such as time and processing power can significantly affect the outcome of an analysis. To elaborate on these limitations, we can consider two separate scenarios.

Let's say that we are performing post-hoc analysis (analysis after the fact or event). In this first scenario, we have completed a study and have collected a large dataset of individual observations of a variety of different factors or features. One such example is that described within our dealership sales database – analyzing the sales data for each customer. With the data collection process, we want to analyze...

Database Scanning Methods

SQL-compliant databases provide a number of different methods for scanning, searching, and selecting data. The right scan method to use is very much dependent on the use case and the state of the database at the time of scanning. How many records are in the database? Which fields are we interested in? How many records do we expect to be returned? How often do we need to execute the query? These are just some of the questions that we may want to ask when selecting the most appropriate scanning method. Throughout this section, we will describe some of the search methods available, how they are used within SQL to execute scans, and a number of scenarios where they should/should not be used.

Query Planning

Before investigating the different methods of executing queries or scanning a database for information, it is useful to understand how the SQL server makes various decisions about the types of queries to be used. SQL-compliant databases possess a powerful...

Performant Joins

The JOIN functionality in SQL-compliant databases provides a very powerful and efficient method of combining data from different sources, without the need for complicated looping structures or a series of individual SQL statements. We covered joins and join theory in detail in Chapter 3, SQL for Data Preparation. As suggested by the name of the command, a join takes information from two or more tables and uses the contents of the records within each table to combine the two sets of information. Because we are combining this information without the use of looping structures, this can be done very efficiently. In this section, we will consider the use of joins as a more performant alternative to looping structures. The following is the Customer Information table:

Figure 8.33: Customer informati on

The following table shows the Order Information table:

Figure 8.34: Order information

So, with this information, we may want...

Functions and Triggers

So far in this chapter, we have discovered how to quantify query performance via the query planner, as well as the benefits of using joins to collate and extract information from multiple database tables. In this section, we will construct reusable queries and statements via functions, as well as automatic function execution via trigger callbacks. The combination of these two SQL features can be used to not only run queries or re-index tables as data is added to/updated in/removed from the database, but also to run hypothesis tests and track the results of the tests throughout the life of the database.

Function Definitions

As in almost all other programming or scripting languages, functions in SQL are contained sections of code, which provides a lot of benefits, such as efficient code reuse and simplified troubleshooting processes. We can use functions to repeat/modify statements or queries without re-entering the statement each time or searching...

Summary

In this chapter, we have covered a wide variety of topics all designed to help us understand and improve the performance of our SQL queries. The chapter began with a thorough discussion of the query planner, including the EXPLAIN and ANALYZE statements, as well as various indexing methods. We discussed a number of different compromises and considerations that can be made to reduce the time taken to execute queries. We considered a number of scenarios where indexing methods would be of benefit and others where the query planner may disregard the index, thus reducing the efficiency of the query. We then moved on to the use of joins to efficiently combine information from a number of different tables and ended with an in-depth look at functions and automatic function calls through the use of triggers.

In the next chapter, we will combine all of the topics we have covered thus far in a final case study, applying our SQL knowledge and the scientific method in general, as we solve...