Reader small image

You're reading from  Learn PostgreSQL - Second Edition

Product typeBook
Published inOct 2023
PublisherPackt
ISBN-139781837635641
Edition2nd Edition
Right arrow
Authors (2):
Luca Ferrari
Luca Ferrari
author image
Luca Ferrari

Luca Ferrari has been passionate about computer science since the Commodore 64 era, and today holds a master's degree (with honors) and a Ph.D. from the University of Modena and Reggio Emilia. He has written several research papers, technical articles, and book chapters. In 2011, he was named an Adjunct Professor by Nipissing University. An avid Unix user, he is a strong advocate of open source, and in his free time, he collaborates with a few projects. He met PostgreSQL back in release 7.3; he was a founder and former president of the Italian PostgreSQL Community (ITPUG). He also talks regularly at technical conferences and events and delivers professional training.
Read more about Luca Ferrari

Enrico Pirozzi
Enrico Pirozzi
author image
Enrico Pirozzi

Enrico Pirozzi, EnterpriseDB certified on implementation management and tuning, with a master's in computer science, has been a PostgreSQL DBA since 2003. Based in Italy, he has been providing database advice to clients in industries such as manufacturing and web development for 10 years. He has been training others on PostgreSQL since 2008. Dedicated to open source technology since early in his career, he is a cofounder of the PostgreSQL Italian mailing list, PostgreSQL-it, and of the PostgreSQL Italian community site, PSQL
Read more about Enrico Pirozzi

View More author details
Right arrow

Query Tuning, Indexes, and Performance Optimization

Performance tuning is one of the most complex tasks in the daily job of a database administrator (DBA). SQL is a declarative language, and therefore it does not define how to access the underlying data – that responsibility is left to the database engine. PostgreSQL, therefore, must select, for every statement, the best available access to the data.

A particular component, the planner, is responsible for deciding on the best out of all the available paths to the underlying data, while another component, the optimizer, is responsible for executing the statement with such a particular access plan.

The aim of this chapter is to teach you how PostgreSQL executes a query, how the planner computes the best execution plan, and how you can help in improving the performance by means of indexes.

You will learn about the following topics in this chapter:

  • Execution of a statement
  • Indexes
  • The EXPLAIN...

Technical requirements

You need to know the following:

  • How to execute queries against the database
  • How to execute data description language (DDL) statements

The chapter examples can be run on the chapter_13 image that you can find in the book’s GitHub repository: https://github.com/PacktPublishing/Learn-PostgreSQL-Second-Edition. For installation and usage of the Docker images available for this book, please refer to the instructions provided in Chapter 1, Introduction to PostgreSQL.

Execution of a statement

SQL is a declarative language: you ask the database to execute something on the data it contains, but you do not specify how the database is supposed to complete the SQL statement. For instance, when you ask to get back some data, you execute a SELECT statement, but you only provide the clauses that specify which subset of data you need, not how the database is supposed to pull the data from its persistent storage. You have to trust the database – in particular, PostgreSQL – to be able to do its job and get you the fastest path to the data, always, under any circumstance of workload. The good news is that PostgreSQL is really good at doing this and is able to understand (and to some extent, interpret) your SQL statements and its current workload to provide you with access to the data in the fastest way.

However, finding the fastest path to the data often requires an equilibrium between searching for the absolute fastest path and the time...

Indexes

An index is a data structure that allows faster access to the underlying table so that specific tuples can be found quickly. Here, “quickly” means faster than scanning the whole underlying table and analyzing every single tuple.

PostgreSQL supports different types of indexes, and not all types are optimal for every scenario and workload. In the following sections, you will discover the main types of indexes that PostgreSQL provides, but in any case, you can extend PostgreSQL with your own indexes or indexes provided by extensions.

An index in PostgreSQL can be built on a single column or multiple columns at once; PostgreSQL supports indexes with up to 32 columns.

An index can cover all the data in the underlying table, or can index specific values only – in that case, the index is known as “partial.” For example, you can decide to index only those values of certain columns that you are going to use the most.

An index can also...

The EXPLAIN statement

EXPLAIN is the statement that allows you to see how PostgreSQL is going to execute a specific query. You have to pass the statement you want to analyze to EXPLAIN, and the execution plan will be shown.

There are a few important things to know before using EXPLAIN:

  • It will only show the best plan, which is the one with the lowest cost among all the evaluated plans.
  • It will not execute the statement you are asking the plan for, at least unless you explicitly ask for its execution. Therefore, the EXPLAIN execution is fast and pretty much constant each time.
  • It will present you with all the execution nodes that the executor will use to provide you with the dataset.

Let’s see an example of EXPLAIN in action to better understand. Imagine we need to understand the execution plan of the SELECT * FROM categories statement. In this case, you need to prefix the statement with the EXPLAIN command, as follows:

forumdb=> EXPLAIN...

Examples of query tuning

In the previous section, you learned how EXPLAIN can show the plan PostgreSQL will use to access the underlying data; it is now time to use EXPLAIN to tune some slow queries and improve performance.

This section will show you some basic concepts of the day-to-day usage of EXPLAIN as a powerful tool to determine where and how to instrument PostgreSQL in accessing data faster. Of course, query tuning is a very complex subject and often requires repeated trial-based optimization, so the aim of this section is not to provide you with in-depth knowledge about query tuning but rather a basic understanding of how to improve your own database and queries.

Sometimes, tuning a query involves simply rewriting it in a way that is more comfortable for – or better, more comprehensible to – PostgreSQL, but most often, query tuning means using an appropriate index to speed up access to the underlying data.

One important thing to take into account...

ANALYZE and how to update statistics

PostgreSQL exploits a statistical approach to evaluate different execution plans. This means that PostgreSQL does not know how many tuples there are in a table, but has a good approximation that allows the planner to compute the cost of the execution plan.

Statistics are not only related to the quantity (how many tuples) but also to the quality of the underlying data – for example, how many distinct values there are, which values are more frequent in a column, and so on. Thanks to the combination of all of this data, PostgreSQL is able to make a performant decision.

There are times, however, when the quality of the statistical data is not good enough for PostgreSQL to choose the best plan, a problem commonly known as “out-of-date statistics.” In fact, statistics are not updated in real time; rather, PostgreSQL keeps track of what is ongoing in every table in every database and summarizes the number of new tuples, updated...

Auto-explain

Auto-explain is an extension that helps the DBA get an idea of slow queries and their execution plan. Essentially, auto-explain triggers when a running query is slower than a specified threshold, and then dumps the execution plan of the query in the PostgreSQL logs (refer to Chapter 14, Logging and Auditing, for more detail).

Note: the Docker image for this chapter comes with auto-explain and log machinery pre-configured.

In this way, the DBA can get an insight into slow queries and their execution plans without having to re-execute these queries. Thanks to this, the DBA can inspect the execution plans and decide if and where to apply indexes or perform a deeper analysis.

The auto-explain module is configured via a set of auto_explain parameter options that can be inserted in the PostgreSQL configuration (the postgresql.conf file), but you need to remember that in order to activate the module, you need to restart the cluster.

The auto-explain...

Summary

PostgreSQL has a very complex cost-based query planner and optimizer that does its best to provide the fastest access to the underlying data.

Thanks to the EXPLAIN command, database administrators can monitor queries to track down the costs and the time taken for execution, and decide on how to improve them in order to get faster results. Usually, the creation of indexes is the less intrusive choice in query tuning, and PostgreSQL has a very rich and expressive index interface that allows the creation of single-column, multi-column,and partial indexes of different types and technologies. When indexes do not suffice, query rewriting could be a possible solution to perform query tuning.

Costs used by the planner are based on statistical data that has to be kept, as much as possible, up to date. While the auto-analyze daemon aims to do this, the DBA can always rely on the manual ANALYZE command to update the statistics.

Understanding a query plan, knowing which nodes...

Verify your knowledge

  • How can you inspect the plan of a query?

    The special command EXPLAIN allows you to inspect how PostgreSQL is going to execute a given query, showing a “node” for each execution step. See the The EXPLAIN statement section for more details.

  • What is the difference between EXPLAIN and EXPLAIN EXPLAIN’?

    The EXPLAIN command will not execute the query, computing only the access plan; on the other hand, the EXPLAIN ANALYZE command will execute the query and print the query plan in the output. See the EXPLAIN ANALYZE section for more details.

  • How does PostgreSQL keep the statistics up to date?

    The statistics are updated every time a manual ANALYZE command is executed or the auto-vacuum (auto-analyze) daemon runs against a table. See the ANALYZE and how to update statistics section for more details.

  • How does PostgreSQL choose to use a specific access method (e.g., an index...

References

Learn more on Discord

To join the Discord community for this book – where you can share feedback, ask questions to the author, and learn about new releases – follow the QR code below:

https://discord.gg/jYWCjF6Tku

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learn PostgreSQL - Second Edition
Published in: Oct 2023Publisher: PacktISBN-13: 9781837635641
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Luca Ferrari

Luca Ferrari has been passionate about computer science since the Commodore 64 era, and today holds a master's degree (with honors) and a Ph.D. from the University of Modena and Reggio Emilia. He has written several research papers, technical articles, and book chapters. In 2011, he was named an Adjunct Professor by Nipissing University. An avid Unix user, he is a strong advocate of open source, and in his free time, he collaborates with a few projects. He met PostgreSQL back in release 7.3; he was a founder and former president of the Italian PostgreSQL Community (ITPUG). He also talks regularly at technical conferences and events and delivers professional training.
Read more about Luca Ferrari

author image
Enrico Pirozzi

Enrico Pirozzi, EnterpriseDB certified on implementation management and tuning, with a master's in computer science, has been a PostgreSQL DBA since 2003. Based in Italy, he has been providing database advice to clients in industries such as manufacturing and web development for 10 years. He has been training others on PostgreSQL since 2008. Dedicated to open source technology since early in his career, he is a cofounder of the PostgreSQL Italian mailing list, PostgreSQL-it, and of the PostgreSQL Italian community site, PSQL
Read more about Enrico Pirozzi