Reader small image

You're reading from  Mastering PostgreSQL 15 - Fifth Edition

Product typeBook
Published inJan 2023
PublisherPackt
ISBN-139781803248349
Edition5th Edition
Right arrow
Author (1)
Hans-Jürgen Schönig
Hans-Jürgen Schönig
author image
Hans-Jürgen Schönig

Hans-Jürgen Schönig has 20 years' experience with PostgreSQL. He is the CEO of a PostgreSQL consulting and support company called CYBERTEC PostgreSQL International GmbH. It has successfully served countless customers around the globe. Before founding CYBERTEC PostgreSQL International GmbH in 2000, he worked as a database developer at a private research company that focused on the Austrian labor market, where he primarily worked on data mining and forecast models. He has also written several books about PostgreSQL.
Read more about Hans-Jürgen Schönig

Right arrow

Making Use of Indexes

In Chapter 2, Understanding Transactions and Locking, you learned about concurrency and locking. In this chapter, it is time to attack indexing head-on. The importance of this topic cannot be stressed enough—indexing is (and will most likely remain) one of the most important topics in the work of every database engineer.

Having over 20 years of professional, full-time PostgreSQL consulting and PostgreSQL 24/7 support experience (www.cybertec-postgresql.com), I can say one thing for sure—bad indexing is the main source of bad performance. Of course, it is important to adjust memory parameters and all that. However, this is all in vain if indexes are not used properly. There is simply no replacement for a missing index. To make my point clear, there is no way to achieve good performance without proper indexing, so always make it a point to check the indexing if performance is bad. My personal recommendation is that if performance is bad, always...

Understanding simple queries and the cost model

In this section, we will get started with indexes. To understand how things work, some test data is needed. The following code snippet shows how data can easily be created:

test=# DROP TABLE IF EXISTS t_test;
 DROP TABLE
test=# CREATE TABLE t_test (id serial, name text);
CREATE TABLE
test=# INSERT INTO t_test (name) SELECT 'hans'
   FROM generate_series(1, 2000000);
INSERT 0 2000000
test=# INSERT INTO t_test (name) SELECT 'paul'
   FROM generate_series(1, 2000000);
INSERT 0 2000000

In the first line, a simple table is created. Two columns are used; the first is an auto-increment column that continually creates numbers, and the second is a column that will be filled with static values.

The generate_series function will generate numbers from 1 to 2 million, so in this example, 2 million static values for hans and 2 million static values for paul are created.

All in all, 4 million...

Improving speed using clustered tables

In this section, you will learn about the power of correlation and clustered tables. What is this about? Imagine that you want to read a whole area of data. This might be a certain time range, a block, some IDs, and so on.

The runtime of these queries will vary depending on the amount of data and the physical arrangement of data on the disk, so even if you are running queries that return the same number of rows, two systems might not provide the answer within the same time span, as the physical disk layout might make a difference.

Here is an example:

test=# EXPLAIN (analyze true, buffers true, timing true)
    SELECT *
    FROM  t_test
    WHERE id < 10000;
                           QUERY PLAN
-----------------------------------...

Understanding additional B-tree features

In PostgreSQL, indexing is a large field and covers many aspects of database work. As I have outlined in this book already, indexing is the key to performance. There is no good performance without proper indexing. Therefore, it is worth inspecting the indexing-related features that we will cover in the following subsections in detail.

Combined indexes

In my job as a professional PostgreSQL support vendor, I am often asked about the difference between combined and individual indexes. In this section, I will try to shed some light on this question.

The general rule is that if a single index can answer your question, it is usually the best choice. However, you cannot index all of the possible combinations of fields people are filtering with. What you can do instead is use the properties of combined indexes to achieve as much as possible.

Let’s suppose we have a table containing three columns: postal_code, last_name, and first_name...

Introducing operator classes

So far, the goal has been to figure out what to index and whether to indiscriminately apply an index on a column or a group of columns. There is one assumption, however, that we have silently accepted to make this work. Up until now, we have worked under the assumption that the order in which the data has to be sorted is a somewhat fixed constant. In reality, this assumption might not hold. Sure, numbers will always be in the same order, but other kinds of data will most likely not have a predefined, fixed sort order.

To prove my point, I have compiled a real-world example. Take a look at the following two records:

1118 09 08 78
2345 01 05 77

My question now is whether these two rows are ordered properly. They might be because one comes before another. However, this is wrong because these two rows have some hidden semantics. What you are looking at here are two Austrian social security numbers. 09 08 78 actually refers to August 9 1978, and 01...

Understanding PostgreSQL index types

So far, only binary trees have been discussed. However, in many cases, B-trees are just not enough. Why is that? As we’ve already discussed in this chapter, B-trees are basically based on sorting. The <, <=, =, >=, and > operators can be handled using B-trees. The trouble is that not every data type can be sorted in a useful way. Just imagine a polygon. How would you sort these objects in a useful way? Sure, you can sort by the area covered, its length, and so on, but doing this won’t allow you to find them using a geometric search.

The solution to this problem is to provide more than just one index type. Each index will serve a special purpose and do exactly what is needed. The following six index types are available (as of PostgreSQL 10.0):

test=# SELECT * FROM pg_am;
  oid | amname |       amhandler      | amtype
------+--------+-------...

Achieving better answers with fuzzy searching

Performing precise searching is not the only thing expected by users these days. Modern websites have taught users to always expect a result, regardless of their input. If you search on Google, there will always be an answer, even if the user input is wrong, full of typos, or simply pointless. People expect good results, regardless of the input data.

Taking advantage of pg_trgm

To do fuzzy searching with PostgreSQL, you can add the pg_trgm extension. To activate this extension, just run the following command:

test=# CREATE EXTENSION pg_trgm;
CREATE EXTENSION

The pg_trgm extension is pretty powerful, and to show you what it is capable of, I have compiled some sample data consisting of 2,354 names of villages and cities here in Austria.

Our sample data can be stored in a simple table:

test=# CREATE TABLE t_location (name text);
CREATE TABLE

My company website has all the data available, and PostgreSQL allows you to load...

Understanding full-text searches

If you are looking up names or looking for simple strings, you are usually querying the entire content of a field. With a full-text search, this is different. The purpose of the full-text search is to look for words or groups of words that can be found in a text. Therefore, a full-text search is more of a contains operation, as you are basically never looking for an exact string.

In PostgreSQL, a full-text search can be done using GIN indexes. The idea is to dissect a text, extract valuable lexeme (preprocessed tokens of words) strings, and index those elements rather than the underlying text. To make your search even more successful, those words are preprocessed.

Here is an example:

test=# SELECT to_tsvector('english','A car,
    I want a car. I would not even mind having many cars');
                    ...

Summary

This chapter was all about indexes. We learned about when PostgreSQL will decide on an index and which types of indexes exist. On top of just using indexes, it is also possible to implement your own strategies to speed up your applications with custom operators and indexing strategies.

For those of you who really want to push its capabilities to the limit, PostgreSQL offers custom access methods.

In Chapter 4, Handling Advanced SQL, we will talk about advanced SQL. Many people are not aware of what SQL is really capable of, and therefore, I am going to show you some efficient, more advanced SQL techniques.

Questions

  1. Do indexes always improve performance?
  2. Does an index use a lot of space?
  3. How can I find missing indexes?
  4. Can indexes be built in parallel?
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering PostgreSQL 15 - Fifth Edition
Published in: Jan 2023Publisher: PacktISBN-13: 9781803248349
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Hans-Jürgen Schönig

Hans-Jürgen Schönig has 20 years' experience with PostgreSQL. He is the CEO of a PostgreSQL consulting and support company called CYBERTEC PostgreSQL International GmbH. It has successfully served countless customers around the globe. Before founding CYBERTEC PostgreSQL International GmbH in 2000, he worked as a database developer at a private research company that focused on the Austrian labor market, where he primarily worked on data mining and forecast models. He has also written several books about PostgreSQL.
Read more about Hans-Jürgen Schönig