You're reading from Learn PostgreSQL - Second Edition

Product typeBook

Published inOct 2023

PublisherPackt

ISBN-139781837635641

Edition2nd Edition

Concepts

Data Analysis

Authors (2):

Luca Ferrari

Enrico Pirozzi

View More author details

Query Tuning, Indexes, and Performance Optimization

Performance tuning is one of the most complex tasks in the daily job of a database administrator (DBA). SQL is a declarative language, and therefore it does not define how to access the underlying data – that responsibility is left to the database engine. PostgreSQL, therefore, must select, for every statement, the best available access to the data.

A particular component, the planner, is responsible for deciding on the best out of all the available paths to the underlying data, while another component, the optimizer, is responsible for executing the statement with such a particular access plan.

The aim of this chapter is to teach you how PostgreSQL executes a query, how the planner computes the best execution plan, and how you can help in improving the performance by means of indexes.

You will learn about the following topics in this chapter:

Execution of a statement
Indexes
The EXPLAIN...

Technical requirements

You need to know the following:

How to execute queries against the database
How to execute data description language (DDL) statements

The chapter examples can be run on the chapter_13 image that you can find in the book’s GitHub repository: https://github.com/PacktPublishing/Learn-PostgreSQL-Second-Edition. For installation and usage of the Docker images available for this book, please refer to the instructions provided in Chapter 1, Introduction to PostgreSQL.

Execution of a statement

SQL is a declarative language: you ask the database to execute something on the data it contains, but you do not specify how the database is supposed to complete the SQL statement. For instance, when you ask to get back some data, you execute a SELECT statement, but you only provide the clauses that specify which subset of data you need, not how the database is supposed to pull the data from its persistent storage. You have to trust the database – in particular, PostgreSQL – to be able to do its job and get you the fastest path to the data, always, under any circumstance of workload. The good news is that PostgreSQL is really good at doing this and is able to understand (and to some extent, interpret) your SQL statements and its current workload to provide you with access to the data in the fastest way.

However, finding the fastest path to the data often requires an equilibrium between searching for the absolute fastest path and the time...

Indexes

An index is a data structure that allows faster access to the underlying table so that specific tuples can be found quickly. Here, “quickly” means faster than scanning the whole underlying table and analyzing every single tuple.

PostgreSQL supports different types of indexes, and not all types are optimal for every scenario and workload. In the following sections, you will discover the main types of indexes that PostgreSQL provides, but in any case, you can extend PostgreSQL with your own indexes or indexes provided by extensions.

An index in PostgreSQL can be built on a single column or multiple columns at once; PostgreSQL supports indexes with up to 32 columns.

An index can cover all the data in the underlying table, or can index specific values only – in that case, the index is known as “partial.” For example, you can decide to index only those values of certain columns that you are going to use the most.

An index can also...

The EXPLAIN statement

EXPLAIN is the statement that allows you to see how PostgreSQL is going to execute a specific query. You have to pass the statement you want to analyze to EXPLAIN, and the execution plan will be shown.

There are a few important things to know before using EXPLAIN:

It will only show the best plan, which is the one with the lowest cost among all the evaluated plans.
It will not execute the statement you are asking the plan for, at least unless you explicitly ask for its execution. Therefore, the EXPLAIN execution is fast and pretty much constant each time.
It will present you with all the execution nodes that the executor will use to provide you with the dataset.

Let’s see an example of EXPLAIN in action to better understand. Imagine we need to understand the execution plan of the SELECT * FROM categories statement. In this case, you need to prefix the statement with the EXPLAIN command, as follows:

forumdb=> EXPLAIN...

Examples of query tuning

In the previous section, you learned how EXPLAIN can show the plan PostgreSQL will use to access the underlying data; it is now time to use EXPLAIN to tune some slow queries and improve performance.

This section will show you some basic concepts of the day-to-day usage of EXPLAIN as a powerful tool to determine where and how to instrument PostgreSQL in accessing data faster. Of course, query tuning is a very complex subject and often requires repeated trial-based optimization, so the aim of this section is not to provide you with in-depth knowledge about query tuning but rather a basic understanding of how to improve your own database and queries.

Sometimes, tuning a query involves simply rewriting it in a way that is more comfortable for – or better, more comprehensible to – PostgreSQL, but most often, query tuning means using an appropriate index to speed up access to the underlying data.

One important thing to take into account...

ANALYZE and how to update statistics

PostgreSQL exploits a statistical approach to evaluate different execution plans. This means that PostgreSQL does not know how many tuples there are in a table, but has a good approximation that allows the planner to compute the cost of the execution plan.

Statistics are not only related to the quantity (how many tuples) but also to the quality of the underlying data – for example, how many distinct values there are, which values are more frequent in a column, and so on. Thanks to the combination of all of this data, PostgreSQL is able to make a performant decision.

There are times, however, when the quality of the statistical data is not good enough for PostgreSQL to choose the best plan, a problem commonly known as “out-of-date statistics.” In fact, statistics are not updated in real time; rather, PostgreSQL keeps track of what is ongoing in every table in every database and summarizes the number of new tuples, updated...

Auto-explain

Auto-explain is an extension that helps the DBA get an idea of slow queries and their execution plan. Essentially, auto-explain triggers when a running query is slower than a specified threshold, and then dumps the execution plan of the query in the PostgreSQL logs (refer to Chapter 14, Logging and Auditing, for more detail).

Note: the Docker image for this chapter comes with auto-explain and log machinery pre-configured.

In this way, the DBA can get an insight into slow queries and their execution plans without having to re-execute these queries. Thanks to this, the DBA can inspect the execution plans and decide if and where to apply indexes or perform a deeper analysis.

The auto-explain module is configured via a set of auto_explain parameter options that can be inserted in the PostgreSQL configuration (the postgresql.conf file), but you need to remember that in order to activate the module, you need to restart the cluster.

The auto-explain...

Summary

PostgreSQL has a very complex cost-based query planner and optimizer that does its best to provide the fastest access to the underlying data.

Thanks to the EXPLAIN command, database administrators can monitor queries to track down the costs and the time taken for execution, and decide on how to improve them in order to get faster results. Usually, the creation of indexes is the less intrusive choice in query tuning, and PostgreSQL has a very rich and expressive index interface that allows the creation of single-column, multi-column,and partial indexes of different types and technologies. When indexes do not suffice, query rewriting could be a possible solution to perform query tuning.

Costs used by the planner are based on statistical data that has to be kept, as much as possible, up to date. While the auto-analyze daemon aims to do this, the DBA can always rely on the manual ANALYZE command to update the statistics.

Understanding a query plan, knowing which nodes...

Verify your knowledge

How can you inspect the plan of a query?
The special command EXPLAIN allows you to inspect how PostgreSQL is going to execute a given query, showing a “node” for each execution step. See the The EXPLAIN statement section for more details.

What is the difference between EXPLAIN and EXPLAIN EXPLAIN’?
The EXPLAIN command will not execute the query, computing only the access plan; on the other hand, the EXPLAIN ANALYZE command will execute the query and print the query plan in the output. See the EXPLAIN ANALYZE section for more details.

How does PostgreSQL keep the statistics up to date?
The statistics are updated every time a manual ANALYZE command is executed or the auto-vacuum (auto-analyze) daemon runs against a table. See the ANALYZE and how to update statistics section for more details.

How does PostgreSQL choose to use a specific access method (e.g., an index...

References

PostgreSQL official documentation about CREATE INDEX: https://www.postgresql.org/docs/current/sql-createindex.html
PostgreSQL official documentation about pg_stats: https://www.postgresql.org/docs/current/view-pg-stats.html
PostgreSQL official documentation about EXPLAIN: https://www.postgresql.org/docs/current/using-explain.html
PostgreSQL official documentation about ANALYZE: https://www.postgresql.org/docs/current/sql-analyze.html
Auto-explain official documentation: https://www.postgresql.org/docs/current/auto-explain.html

Learn more on Discord

To join the Discord community for this book – where you can share feedback, ask questions to the author, and learn about new releases – follow the QR code below:

https://discord.gg/jYWCjF6Tku

The rest of the chapter is locked

You have been reading a chapter from

Learn PostgreSQL - Second Edition

Published in: Oct 2023Publisher: PacktISBN-13: 9781837635641

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Luca Ferrari

Luca Ferrari has been passionate about computer science since the Commodore 64 era, and today holds a master's degree (with honors) and a Ph.D. from the University of Modena and Reggio Emilia. He has written several research papers, technical articles, and book chapters. In 2011, he was named an Adjunct Professor by Nipissing University. An avid Unix user, he is a strong advocate of open source, and in his free time, he collaborates with a few projects. He met PostgreSQL back in release 7.3; he was a founder and former president of the Italian PostgreSQL Community (ITPUG). He also talks regularly at technical conferences and events and delivers professional training.
Read more about Luca Ferrari

Enrico Pirozzi

Enrico Pirozzi, EnterpriseDB certified on implementation management and tuning, with a master's in computer science, has been a PostgreSQL DBA since 2003. Based in Italy, he has been providing database advice to clients in industries such as manufacturing and web development for 10 years. He has been training others on PostgreSQL since 2008. Dedicated to open source technology since early in his career, he is a cofounder of the PostgreSQL Italian mailing list, PostgreSQL-it, and of the PostgreSQL Italian community site, PSQL
Read more about Enrico Pirozzi

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages