Reader small image

You're reading from  Mastering PostgreSQL 15 - Fifth Edition

Product typeBook
Published inJan 2023
PublisherPackt
ISBN-139781803248349
Edition5th Edition
Right arrow
Author (1)
Hans-Jürgen Schönig
Hans-Jürgen Schönig
author image
Hans-Jürgen Schönig

Hans-Jürgen Schönig has 20 years' experience with PostgreSQL. He is the CEO of a PostgreSQL consulting and support company called CYBERTEC PostgreSQL International GmbH. It has successfully served countless customers around the globe. Before founding CYBERTEC PostgreSQL International GmbH in 2000, he worked as a database developer at a private research company that focused on the Austrian labor market, where he primarily worked on data mining and forecast models. He has also written several books about PostgreSQL.
Read more about Hans-Jürgen Schönig

Right arrow

Optimizing Queries for Good Performance

In Chapter 5, Log Files and System Statistics, you learned how to read system statistics and how to make use of what PostgreSQL provides. Now that we are armed with this knowledge, this chapter is all about good query performance. Everybody is looking for good query performance. Therefore, it is important to address this topic in an in-depth way.

In this chapter, you will learn about the following topics:

  • Learning what the optimizer does
  • Understanding execution plans
  • Understanding and fixing joins
  • Enabling and disabling optimizer settings
  • Partitioning data
  • Adjusting parameters for good query performance
  • Making use of parallel queries
  • Introducing just-in-time (JIT) compilation

By the end of this chapter, we will be able to write better and faster queries. If the queries still aren’t very good, we should be able to understand why this is the case. We will also be able to use the new techniques...

Learning what the optimizer does

Before even attempting to think about query performance, it makes sense to familiarize yourself with what the query optimizer does. Having a deeper understanding of what is going on under the hood makes a lot of sense because it helps you see what the database is really up to.

A practical example – how the query optimizer handles a sample query

To demonstrate how the optimizer works, I have compiled an example. It is something that I have used over the years for PostgreSQL training. Let’s assume that there are three tables, as follows:

CREATE TABLE a (aid int,);         -- 100 million rows
CREATE TABLE b (bid int,);         -- 200 million rows
CREATE TABLE c (cid int,);         -- 300 million rows

Let’s further assume that those tables contain millions, or maybe hundreds of millions,...

Understanding execution plans

Now that we’ve dug into some important optimizations that are implemented in PostgreSQL, let’s proceed to take a closer look at execution plans. You have already seen some execution plans in this book. However, in order to make full use of plans, it is important to develop a systematic approach when it comes to reading this information.

Approaching plans systematically

The first thing you have to know is that an EXPLAIN clause can do quite a lot for you, and I highly recommend making full use of these features.

As many of you may already know, an EXPLAIN ANALYZE clause will execute the query and return the plan, including real runtime information. Here is an example:

test=# EXPLAIN ANALYZE SELECT *
FROM
(
    SELECT *
    FROM b
    LIMIT 1000000
) AS b
ORDER BY cos(bid);
                ...

Understanding and fixing joins

Joins are important; everybody needs them on a regular basis. Consequently, joins are also relevant for maintaining or achieving good performance. To ensure that you can write good joins, we will also learn about joining in this book.

Getting joins right

Before we dive into optimizing joins, it is important to take a look at some of the most common problems that arise with joins and which of them should sound alarm bells for you.

Here is an example of a simple table structure to demonstrate how joins work (make sure the table we used before is deleted):

test=# CREATE TABLE a (aid int);
CREATE TABLE
test=# CREATE TABLE b (bid int);
CREATE TABLE
test=# INSERT INTO a VALUES (1), (2), (3);
INSERT 0 3
test=# INSERT INTO b VALUES (2), (3), (4);
INSERT 0 3

Two tables containing a couple of rows have been created.

The following example shows a simple outer join:

test=# SELECT * FROM a LEFT JOIN b ON (aid = bid);
 aid | bid
-----+-----
 ...

Enabling and disabling optimizer settings

So far, the most important optimizations that are performed by the planner have been discussed in detail. PostgreSQL has improved a lot over the years. Still, something can go south, and users have to convince the planner to do the right thing.

To modify plans, PostgreSQL offers a couple of runtime variables that will have a significant impact on planning. The idea is to give the end user a chance to make certain types of nodes in the plan more expensive than others. What does that mean in practice? Here is a simple plan:

test=# explain SELECT *
 FROM generate_series(1, 100) AS a,
      generate_series(1, 100) AS b
 WHERE a = b;
                          QUERY PLAN
----------------------------------------------------------
 Hash Join (cost=2.25..4.63 rows=100 width=8)
  ...

Partitioning data

Given a default of 8,000 blocks, PostgreSQL can store up to 32 TB of data inside a single table. If you compile PostgreSQL with 32,000 blocks, you can even put up to 128 TB into a single table. However, large tables such as this aren’t necessarily convenient anymore, and it can make sense to partition tables to make processing easier and, in some cases, a bit faster. Starting from version 10.0, PostgreSQL offers improved partitioning, which offers end users significantly easier handling of data partitioning.

In this chapter, the old means of partitioning, as well as the new features that are available as of PostgreSQL 13.0, will be covered. As we speak, features in partitioning are added in all areas so that people can expect more and better partitioning in all future versions of PostgreSQL. The first thing you will learn is how to work with classical PostgreSQL inheritance.

Creating inherited tables

First, we will take a closer look at the outdated...

Adjusting parameters for good query performance

Writing good queries is the first step to achieving good performance. Without a good query, you will most likely suffer from bad performance. Therefore, writing good and intelligent code will give you the greatest edge possible. Once your queries have been optimized from a logical and semantic point of view, good memory settings can provide you with a nice final increase in speed.

In this section, we will learn about what more memory can do for you and how PostgreSQL can use it for your benefit. Again, this section assumes that we are using single-core queries to make the plans more readable. To ensure that there is always just one core at work, use the following command:

test=# SET max_parallel_workers_per_gather TO 0;
SET

Here is a simple example demonstrating what memory parameters can do for you:

test=# CREATE TABLE t_test (id serial, name text);
CREATE TABLE
test=# INSERT INTO t_test (name)
   SELECT &apos...

Making use of parallel queries

From version 9.6, PostgreSQL has supported parallel queries. This support for parallelism has been improved gradually over time, and version 11 has added even more functionality to this important feature. In this section, we will take a look at how parallelism works and what can be done to speed things up.

Before digging into the details, it is necessary to create some sample data as follows:

test=# CREATE TABLE t_parallel AS
 SELECT * FROM generate_series(1, 25000000) AS id;
SELECT 25000000

After loading the initial data, we can run our first parallel query. A simple count will show what a parallel query looks like in general:

test=# explain SELECT count(*) FROM t_parallel;
                              QUERY PLAN
----------------------------------------------------------
 Finalize Aggregate...

Introducing JIT compilation

JIT compilation has been one of the hot topics in PostgreSQL 11. It has been a major undertaking, and the first results look promising. However, let’s start with the fundamentals: what is JIT compilation all about? When you run a query, PostgreSQL has to figure out a lot of stuff at runtime. When PostgreSQL itself is compiled, it doesn’t know which kind of query you will run next, so it has to be prepared for all kinds of scenarios.

The core is generic, meaning that it can do all kinds of stuff. However, when you are in a query, you just want to execute the current query as fast as possible – not some other random stuff. The point is, at runtime, you know a lot more about what you have to do than at compile time (that is, when PostgreSQL is compiled). That is exactly the point: when JIT compilation is enabled, PostgreSQL will check your query, and if it happens to be time-consuming enough, highly optimized code for your query will...

Summary

In this chapter, a number of query optimizations were discussed. You learned about the optimizer and about various internal optimizations, such as constant folding, view inlining, joins, and much more. All of these optimizations contribute to good performance and help speed things up considerably.

Now that we’ve covered this introduction to optimizations, in the next chapter, Chapter 7, Writing Stored Procedures, we will talk about stored procedures. You will learn about all the options PostgreSQL has with which we can handle user-defined code.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering PostgreSQL 15 - Fifth Edition
Published in: Jan 2023Publisher: PacktISBN-13: 9781803248349
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Hans-Jürgen Schönig

Hans-Jürgen Schönig has 20 years' experience with PostgreSQL. He is the CEO of a PostgreSQL consulting and support company called CYBERTEC PostgreSQL International GmbH. It has successfully served countless customers around the globe. Before founding CYBERTEC PostgreSQL International GmbH in 2000, he worked as a database developer at a private research company that focused on the Austrian labor market, where he primarily worked on data mining and forecast models. He has also written several books about PostgreSQL.
Read more about Hans-Jürgen Schönig