Reader small image

You're reading from  Learn PostgreSQL

Product typeBook
Published inOct 2020
Reading LevelIntermediate
PublisherPackt
ISBN-139781838985288
Edition1st Edition
Languages
Concepts
Right arrow
Authors (2):
Luca Ferrari
Luca Ferrari
author image
Luca Ferrari

Luca Ferrari has been passionate about computer science since the Commodore 64 era, and today holds a master's degree (with honors) and a Ph.D. from the University of Modena and Reggio Emilia. He has written several research papers, technical articles, and book chapters. In 2011, he was named an Adjunct Professor by Nipissing University. An avid Unix user, he is a strong advocate of open source, and in his free time, he collaborates with a few projects. He met PostgreSQL back in release 7.3; he was a founder and former president of the Italian PostgreSQL Community (ITPUG). He also talks regularly at technical conferences and events and delivers professional training.
Read more about Luca Ferrari

Enrico Pirozzi
Enrico Pirozzi
author image
Enrico Pirozzi

Enrico Pirozzi, EnterpriseDB certified on implementation management and tuning, with a master's in computer science, has been a PostgreSQL DBA since 2003. Based in Italy, he has been providing database advice to clients in industries such as manufacturing and web development for 10 years. He has been training others on PostgreSQL since 2008. Dedicated to open source technology since early in his career, he is a cofounder of the PostgreSQL Italian mailing list, PostgreSQL-it, and of the PostgreSQL Italian community site, PSQL
Read more about Enrico Pirozzi

View More author details
Right arrow
Indexes and Performance Optimization

Performance tuning is one of the most complex tasks in the daily job of a database administrator. SQL is a declarative language, and therefore it does not define how to access the underlying data – that responsibility is left to the database engine. PostgreSQL, therefore, must select, for every statement, the best available access to the data.

A particular component, the planner, is responsible for deciding on the best among all the available paths to the underlying data and another component, the optimizer, is responsible for executing the statement with such a particular access plan.

The aim of this chapter is to teach you how PostgreSQL executes a query, how the planner computes the best execution plan, and how you can help in improving the performance by means of indexes.

You will learn about the following topics in this chapter...

Technical requirements

You need to know the following:

  • How to execute queries against the database
  • How to execute data description language (DDL) statements

The code for this chapter can be found in the following GitHub repository: https://github.com/PacktPublishing/Learn-PostgreSQL.

Execution of a statement

SQL is a declarative language: you ask the database to execute something on the data it contains, but you do not specify how the database is supposed to complete the SQL statement. For instance, when you ask to get back some data, you execute a SELECT statement, but you only specify the clauses that specify which subset of data you need, not how the database is supposed to pull the data from its persistent storage. You have to trust the database – in particular, PostgreSQL – to be able to do its job and get you the fastest path to the data, always, under any circumstance of workload. The good news is that PostgreSQL is really good at doing this and is able to understand (and to some extent, interpret) your SQL statements and its current workload to provide you with access to the data in the fastest way.

However, finding the fastest path to the data often requires an equilibrium between searching for the absolute fastest path and the time spent in...

Indexes

An index is a data structure that allows faster access to the underlying table so that specific tuples can be found quickly. Here, "quickly" means faster than scanning the whole underlying table and analyzing every single tuple.

PostgreSQL supports different types of indexes, and not all types are optimal for every scenario and workload. In the following sections, you will discover the main types of indexes that PostgreSQL provides, but in any case, you can extend PostgreSQL with your own indexes or indexes provided by extensions.

An index in PostgreSQL can be built on a single column or multiple columns at once; PostgreSQL supports indexes with up to 32 columns.

An index can cover all the data in the underlying table, or can index specific values only – in that case, the index is known as "partial." For example, you can decide to index only those values of certain columns that you are going to use the most.

An index can also be unique, meaning that...

The EXPLAIN statement

EXPLAIN is the statement that allows you to see how PostgreSQL is going to execute a specific query. You have to pass the statement you want to analyze to EXPLAIN, and the execution plan will be shown.

There are a few important things to know before using EXPLAIN:

  • It will only show the best plan, which is the one with the lowest cost among all the evaluated plans.
  • It will not execute the statement you are asking the plan for, therefore the EXPLAIN execution is fast and pretty much constant each time.
  • It will present you with all the execution nodes that the executor will use to provide you with the dataset.

Let's see an example of EXPLAIN in action to better understand. Imagine we need to understand the execution plan of the SELECT * FROM categories statement. In this case, you need to prefix the statement with the EXPLAIN command, as follows:



forumdb=> EXPLAIN SELECT * FROM categories;
QUERY PLAN
--------...

An example of query tuning

In the previous section, you have learned how to use EXPLAIN to understand how PostgreSQL is going to execute a query; it is now time to use EXPLAIN in action to tune some slow queries and improve performance.

This section will show you some basic concepts of the day-to-day usage of EXPLAIN as a powerful tool to determine where and how to instrument PostgreSQL in doing faster data access. Of course, query tuning is a very complex subject and often requires repeated trial-based optimization, so the aim of this section is not to provide you with true knowledge about query tuning but rather a basic understanding of how to improve your own database and queries.

Sometimes, tuning a query involves simply rewriting it a way that is more comfortable – or better, more comprehensible –to PostgreSQL, but most often, that means using an appropriate index to speed up access to the underlying data.

Let's start with a simple example: we want to extract all...

ANALYZE and how to update statistics

PostgreSQL exploits a statistical approach to evaluate different execution plans. This means that PostgreSQL does not know how many tuples there are in a table, but has a good approximation that allows the planner to compute the cost of the execution plan.

Statistics are not only related to the quantity (how many tuples) but also to the quality of the underlying data – for example, how many distinct values, which values are more frequent in a column, and so on. Thanks to the combination of all of this data, PostgreSQL is able to make a good decision.

There are times, however, when the quality of the statistical data is not good enough for PostgreSQL to choose the best plan, a problem commonly known as "out-of-date statistics." In fact, statistics are not updated in real time; rather, PostgreSQL keeps track of what is ongoing in every table in every database and summarizes the number of new tuples, updated ones, and deleted ones, as...

Auto-explain

Auto-explain is an extension that helps the database administrator get an idea of slow queries and their execution plan. Essentially, auto-explain triggers when a running query is slower than a specified threshold, and then dumps in the PostgreSQL logs (refer to Chapter 14, Logging and Auditing) the execution plan of the query.

In this way, the database administrator can get an insight into slow queries and their execution plan without having to re-execute these queries. Thanks to this, the database administrator can inspect the execution plans and decide if and where to apply indexes or perform a deeper analysis.

The auto-explain module is configured via a set of auto_explain parameter options that can be inserted in the PostgreSQL configuration (the postgresql.conf file), but you need to remember that in order to activate the module, you need to restart the cluster.

The auto-explain module can do pretty much the same things that a manual EXPLAIN command can do, including...

Summary

PostgreSQL provides very rich features for creating and managing indexes, both single column-based or multi-column-based, as well as multiple types of indexes that can be built.

Thanks to the EXPLAIN command, a database administrator can inspect a slow query and see how the optimizer has thought about what the best access to the underlying data is, and thanks to an understanding of how PostgreSQL works, the administrator can decide which indexes to create in order to tune the performances.

PostgreSQL also provides a rich set of statistics that is used to both extract the quality and the quantity of data within every table, therefore being able to generate an execution plan, and to monitor which indexes are used and when. Auto-explain is another useful module that can be used to silently monitor slow queries and execution plans and see how the cluster is performing without any need to manually execute every suspect statement.

It is important to emphasize that performance tuning...

References

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learn PostgreSQL
Published in: Oct 2020Publisher: PacktISBN-13: 9781838985288
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Luca Ferrari

Luca Ferrari has been passionate about computer science since the Commodore 64 era, and today holds a master's degree (with honors) and a Ph.D. from the University of Modena and Reggio Emilia. He has written several research papers, technical articles, and book chapters. In 2011, he was named an Adjunct Professor by Nipissing University. An avid Unix user, he is a strong advocate of open source, and in his free time, he collaborates with a few projects. He met PostgreSQL back in release 7.3; he was a founder and former president of the Italian PostgreSQL Community (ITPUG). He also talks regularly at technical conferences and events and delivers professional training.
Read more about Luca Ferrari

author image
Enrico Pirozzi

Enrico Pirozzi, EnterpriseDB certified on implementation management and tuning, with a master's in computer science, has been a PostgreSQL DBA since 2003. Based in Italy, he has been providing database advice to clients in industries such as manufacturing and web development for 10 years. He has been training others on PostgreSQL since 2008. Dedicated to open source technology since early in his career, he is a cofounder of the PostgreSQL Italian mailing list, PostgreSQL-it, and of the PostgreSQL Italian community site, PSQL
Read more about Enrico Pirozzi