Mastering PostgreSQL 11 - Second Edition

4.8 (11 reviews total)
By Hans-Jürgen Schönig
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. PostgreSQL Overview

About this book

This second edition of Mastering PostgreSQL 11 helps you build dynamic database solutions for enterprise applications using the latest release of PostgreSQL, which enables database analysts to design both the physical and technical aspects of the system architecture with ease.

This book begins with an introduction to the newly released features in PostgreSQL 11 to help you build efficient and fault-tolerant PostgreSQL applications. You’ll examine all of the advanced aspects of PostgreSQL in detail, including logical replication, database clusters, performance tuning, monitoring, and user management. You will also work with the PostgreSQL optimizer, configuring PostgreSQL for high speed, and see how to move from Oracle to PostgreSQL. As you progress through the chapters, you will cover transactions, locking, indexes, and optimizing queries to improve performance. Additionally, you’ll learn to manage network security and explore backups and replications, while understanding the useful extensions of PostgreSQL so that you can optimize the speed and performance of large databases.

By the end of this book, you will be able to use your database to its utmost capacity by implementing advanced administrative tasks with ease.

Publication date:
October 2018
Publisher
Packt
Pages
446
ISBN
9781789537819

 

Chapter 1. PostgreSQL Overview

It has been a while since I embarked on writing another book on PostgreSQL. I have come a long way and I am proud to have made it to the third release of Mastering PostgreSQL, which is now covering all the cool features present PostgreSQL 11.

PostgreSQL is one of the world's most advanced open source database systems, and it has many features that are widely used by developers and system administrators alike. To begin with PostgreSQL 11, many new features have been added to PostgreSQL which contribute greatly to the success of this exceptional open source product.

This book will cover and discuss many of these cool features in great detail. 

In this chapter, you will be introduced to PostgreSQL and the cool new features available in PostgreSQL 11 and beyond. All of the relevant new functionalities will be covered in detail. Given the sheer number of changes made to the code and given the size of the PostgreSQL project, this list of features is, of course, far from complete, so I have tried to focus on the most important aspects, relevant to most people.

The features outlined in this chapter will be split into the following categories:

  • What is new in PostgreSQL 11?
  • SQL and developer related
  • Backup, recovery, and replication
  • Performance-related topics

 

 

What is new in PostgreSQL 11.0?


PostgreSQL 11 was released in the fall of 2018 and provides users with a couple of modern features. These are useful to professionals and beginners alike. PostgreSQL 11 is the second major release following the new numbering scheme introduced by the PostgreSQL community. The next major release of PostgreSQL after version 11 will be 12. The service releases will be called PostgreSQL 11.1, 11.2, 11.3, and so on. Compared to the pre-10 world, this is a major change, which should be pointed out.

Which version should you use? The recommendation is to always use the most recent release. There is no point in getting started with, say, PostgreSQL 9.6 or so anymore. If you are new to PostgreSQL, begin with version 11. There is no such thing known as bugs in PostgreSQL—the community will always provide you with working code so there is no need to be afraid of PostgreSQL 10 or PostgreSQL 11. It just works.

Understanding the new database administration functions

PostgreSQL 11 has many new features that can help the administrator reduce work and run the system more reliably and in a more robust way.

One of the features that is supposed to help people to run even more efficient databases is the ability to configure the size of database instances, commonly known as WAL-segments.

Using configurable WAL-segment sizes

Since PostgreSQL was introduced 20 years ago, the size of a single WAL file has always been 16 MB. In the beginning, it was even a compiled in limit, which was later changed to a compile-time option. Starting with PostgreSQL 11 the size of those WAL segments can be changed at instance creation, which gives administrators an additional knob to configure and optimize PostgreSQL. Here is how it works. The following example shows how to configure the WAL-segment size when running initdb: 

initdb -D /pgdata --wal-segsize=32

The initdb command is the tool that is called to create a database instance. It is usually the call you see, if hidden by some operating system scripts provided by your favorite Linux distribution, Windows, or whatever you like to use. However, initdb now has an option to pass the desired WAL-segment size directly to the program.

 

As I have already mentioned, the default size is 16 MB; hence, in most cases, it makes sense to use larger segments to improve performance. There is no point in using smaller ones unless you are running a really, really small database instance on an embedded system.

What is the real performance impact going to be? As always, this really depends on what you are doing. If you are a running a database system facing 99% reads, the impact of larger WAL-segments will be zero. Yes, you heard it right—ZERO. If you are facing writes while your system is 95% idle and not under severe load, the impact will still be zero or close to zero. You will only be able to witness gains if you are running a heavy, write-intense workload. Then, and only then, might a change be worth it. If you are only running a couple of online forms visited by an occasional customer—why bother? The new feature will only show its strength when there are many changes leading to a lot of WAL.

Larger queryid in pg_stat_statements

If you really want to dig into PostgreSQL performance, pg_stat_statements is the tool to look at. Personally, I consider it to be the gold standard, that is, if you really want to figure out what is going on in the system. The pg_stat_statements command is loaded via shared_preload_libraries as soon as PostgreSQL starts and aggregates statistics about queries running in your server. It will instantly show you if something goes wrong. 

The pg_stat_statements command provides a field called queryid, which used to be a 32-bit identifier up to now. In some cases, this has led to problems because it is possible that keys collide in certain cases. Magnus Hagander calculated in one of his papers that after running 3 billion different queries, around 50,000 collisions could be expected. By introducing a 64-bitqueryid, this number is expected to drop to around 0.25 conflicts after 3 billion different types of queries, which is a substantial improvement.

Keep in mind that you might have to update your scripts if you are moving to PostgreSQL 11 and also if you are using pg_stat_statements to track down performance problems.

Improved indexing and better optimization

PostgreSQL 11 offers more than just a couple of improved functions to handle administration. There is also improved functionality around indexes. One of the most important features is related to indexes and statistics.

Expression index statistics

If you are running a simple query, PostgreSQL will optimize it by looking at internal statistics. Here is an example:

SELECT * FROM person WHERE gender = 'female';

In this case, PostgreSQL will consult the internal stats and estimate the number of girls in the table. If the number is low, PostgreSQL will consider an index. If the majority are female, PostgreSQL will consider a sequential scan instead of an index. Statistics are available per column. In addition, it is also possible to keep track of cross-column statistics, which have been introduced in PostgreSQL 10 (check out the CREATE STATISTICS command to find out more). The good news is that PostgreSQL will also keep track of statistics for functional indexes:

CREATE INDEX idx_cos ON t_data (cos(data));

What has not been possible so far is to use more sophisticated statistics on functional indexes.

Consider the following example of an index covering various columns:

CREATE INDEX coord_idx ON measured (x, y, (z + t)); 
ALTER INDEX coord_idx ALTER COLUMN 3 SET STATISTICS 1000;

In this case, there is an index on two columns, which also provides a virtual third column represented by the expression. The new feature allows us to create more statistics explicitly for the third column, which would otherwise be suboptimally covered. In my example, we will tell PostgreSQL explicitly that we want the third column to have 1,000 entries in the system statistics. It will allow the optimizer to come up with better estimates and therefore potentially create better plans. It will be a highly valuable contribution to the efficiency of some specialized applications.

INCLUDE indexes or covering indexes

Many other database systems have long provided a feature known as covering indexes. What does this mean? Consider the following example, which simply selects two columns from a table:

SELECT id, name FROM person WHERE id = 10;

Suppose we have an index on id only. In this case, PostgreSQL will look up the index and do a lookup in the table to fetch those additional fields. This is generally known as an index scan. It consists of checking the index and the underlying table to compose a row. The solution here used to be to create an index consisting of two columns. The idea is to allow PostgreSQL to perform an index-only scan instead of an index scan. If an index has all the columns needed, there is no need to do additional lookups in the table (for most cases).

Note

Only select the columns you really need—otherwise you might trigger pointless table lookups. The following type of query is therefore generally assumed to be quite bad for performance: SELECT * FROM person WHERE id  = 10;

The problem here would be if you need a primary key constraint on the id and still want to end up triggering an index-only scan when reading an additional column. This is where the new feature steps in to save the day:

CREATE UNIQUE INDEX some_name ON person USING btree (id) INCLUDE (name);

PostgreSQL will ensure that the id is unique but will still store additional fields in the index to trigger an index-only scan if asked for both columns. In a high-volume OLTP environment, this will increase performance dramatically. Of course, it is always hard to provide you with hard numbers because every table and every type of query is different. However, the gain can be absolutely substantial. PostgreSQL 11 will give us even more options to trigger even more index-only scans.

Parallel index creation

When an index is built in PostgreSQL, the database traditionally used one core to do the job. In many cases, this was not an issue. However, PostgreSQL is used for ever-growing systems and therefore index creation starts to be an issue in many cases. At the moment the community is trying to improve sorting as well. The first step is therefore to allow for the parallel creation of btrees, which has made it into PostgreSQL 11. Future versions of PostgreSQL will also allow provide parallel sorts for normal operations, which is unfortunately not yet supported by PostgreSQL 11.

Parallel index creation can speed up indexing dramatically and we are eager to see future improvements in this area (maybe support for other index types, and so on).

 

 

Better cache management

PostgreSQL 11 will also provide you with better ways to manage the I/O cache (the shared buffers). The pg_prewarm command is especially noteworthy.

Improving pg_prewarm

The pg_prewarm command allows you to restore the content of the PostgreSQL I/O cache after a restart. It has already been around for quite some time and is widely used by the PostgreSQL user base. In PostgreSQL 11, pg_prewarm has been extended and allows for automatic dumping of the buffer list in regular intervals.

It is also possible to automatically preload the old cache contents so that users will have better database performance after a restart. In particular, systems with a lot of RAM can benefit from these new improvements.

Enhancing windowing functions

Windowing functions and analytics are a cornerstone of any modern SQL implementation and are therefore widely used by professionals. PostgreSQL has provided support for windowing functions for quite some time now. However, there were still some small features proposed by the SQL standard missing. PostgreSQL 11 now fully supports what SQL: 2011 proposes.

The following features have been added:

  • Range between:
    • Previously just ROWS
    • Now handles values
  • Exclusion clauses:
    • Exclude current row
    • Exclude ties

To demonstrate how the new features work, I have decided to include an example. The code contains two windowing functions. They are explained as follows:

  • The first one uses what is already available in PostgreSQL 10 and previously.
  • The second array_agg excludes the current row, which is a new feature provided by PostgreSQL 11.

The following code generates five rows and contains two windowing functions:

test=# SELECT *,
         array_agg(x) OVER (ORDER BY x ROWS BETWEEN 
                        1 PRECEDING AND 1 FOLLOWING),
         array_agg(x) OVER (ORDER BY x ROWS BETWEEN 
                        1 PRECEDING AND 1 FOLLOWING EXCLUDE CURRENT ROW)
     FROM generate_series(1, 5) AS x;
  x | array_agg | array_agg
 ---+-----------+-----------
 1  | {1,2}     | {2}
 2  | {1,2,3}   | {1,3}
 3  | {2,3,4}   | {2,4}
 4  | {3,4,5}   | {3,5}
 5  | {4,5}     | {4}
 (5 rows)

Excluding the current row is a pretty common requirement and should therefore not be underestimated. 

Introducing just-in-time compilation

Just-in-time (JIT) compilation is really one of the highlights of PostgreSQL 11. A lot of infrastructure has been added to support even more JIT compilation in the future, and PostgreSQL 11 is the first release that makes full use of this modern technique. Before we dig into details, what is JIT compilation all about? When running a query, a lot of stuff is actually only known at runtime and not at compile time (when PostgreSQL is compiled). Therefore, a traditional compiler is always at a disadvantage because it does not know what will happen at runtime. A JIT compiler already knows a lot more and can react accordingly. 

Starting with PostgreSQL 11, you can make use of JIT compilation, which is especially useful for big queries. We will dig into the finer details in the later chapters of this book.

Enhanced partitioning

PostgreSQL 10 introduced the first version of partitioning in PostgreSQL. Of course, we used to have inheritance and all that before. However, PostgreSQL 10 was really the first version that provided a modern version of doing things. PostgreSQL 11 will add some new functionality to this already powerful feature by introducing a couple of new highlights, such as the ability to create a default partition if none of the existing partitions match.

Here is how it works:

postgres=# CREATE TABLE default_part PARTITION OF some_table DEFAULT; 
CREATE TABLE

In this case, all the rows that simply don't match anywhere will end up in the default partition.

But there is more. In PostgreSQL, a row could not (easily) be moved from one partition to the other. Suppose you had one partition per country. If a person moved from, say, France to Estonia, you would not do that with a single UPDATE statement. You had to delete the old row and insert a new one. In PostgreSQL 11, this problem has been solved. Rows can now be moved from one partition to some other place in a totally transparent way.

PostgreSQL had many more shortcomings. In the old version, all partitions had to be indexed separately. There was no way to create a single index for all partitions. In PostgreSQL 11, an index added to the parent table will automatically make sure that all child tables are indexed too. This is really beneficial as it becomes less likely that indexes will simply be forgotten. Also, in PostgreSQL 11, you can actually add a global unique index. A partitioned table can therefore enforce a unique constraint.

Up to PostgreSQL 10, we had range partitioning and list partitioning. PostgreSQL 11 adds the ability to do hash partitioning. Here is an example showing how hash partitioning works:

test=# CREATE TABLE tab(i int, t text) PARTITION BY HASH (i); 
CREATE TABLE 
test=# CREATE table tab_1 PARTITION OF tab 
                  FOR VALUES WITH (MODULUS 4, REMAINDER 0); 
CREATE TABLE

But there is not just more functionality. There is also a lot of new stuff to improve performance. Partition pruning is now a lot faster and PostgreSQL has the ability to consider partition-wise joins, as well as partition-wise aggregates, which is exactly what is needed for analytics and data warehousing. 

Adding support for stored procedures

PostgreSQL has always supported functions, which were often referred to as stored procedures. However, there is a distinction between a stored procedure and a function. As pointed out previously, up to PostgreSQL 10, we only had functions and no procedures.

The point is that a function is part of a larger structure, a transaction. A procedure can contain more than just one transaction. Therefore, it cannot be called by a larger transaction but is rather a standalone thing.

Here is the syntax of CREATE PROCEDURE:

test=# \h CREATE PROCEDURE
Command: CREATE PROCEDURE
Description: define a new procedure
Syntax:
CREATE [ OR REPLACE ] PROCEDURE
    name ( [ [ argmode ] [ argname ] 
    argtype [ { DEFAULT | = } default_expr ] [, ...] ] )
  { LANGUAGE lang_name
    | TRANSFORM { FOR TYPE type_name } [, ... ]
    | [ EXTERNAL ] SECURITY INVOKER | [ EXTERNAL ] SECURITY DEFINER
    | SET configuration_parameter { TO value | = value | FROM CURRENT }
    | AS 'definition'
    | AS 'obj_file', 'link_symbol'
} …

The following procedure shows how two transactions can be executed within the very same procedure:

test=# CREATE PROCEDURE test_proc()
       LANGUAGE plpgsql
AS $$
  BEGIN
    CREATE TABLE a (aid int);
    CREATE TABLE b (bid int);
    COMMIT;
   CREATE TABLE c (cid int);
    ROLLBACK;
  END;
$$;
CREATE PROCEDURE

Note that the first two statements have been committed while the second transaction has been aborted. You will see later in this example what the effect of this change is.

To run the procedure, we can use CALL:

test=# CALL test_proc();
CALL

The first two tables were committed—the third table has not been created because of the rollback inside the procedure:

test=# \d
List of relations
 Schema | Name | Type  | Owner
--------+------+-------+-------
 public | a    | table | hs
 public | b    | table | hs
(2 rows)

Procedures are an important step toward a complete and fully featured database system.

Improving ALTER TABLE

The ALTER TABLE command can be used to change the definition of a table. In PostgreSQL 11, the behavior of ALTER TABLE ... ADD COLUMN has been improved substantially. Let us take a look at the details. The following examples shows how columns can be added to a table and how PostgreSQL will handle those new columns:

ALTER TABLE x ADD COLUMN y int;
ALTER TABLE x ADD COLUMN z int DEFAULT 57;

The first command in the listing has always been fast, the reason being that in PostgreSQL, the default value of a column is NULL. So what PostgreSQL does is it adds a column to the system catalog without actually touching storage. The column will be added to the end of the table so if the row is too short on disk, we know that it will be NULL anyway. In other words, even if you add a column to a 10 TB table, the operation will be really fast because the system does not have to change rows on disk.

The situation used to be quite different in the second case. DEFAULT 57 actually does add real data to the row and in PostgreSQL 10 and older, this meant that the database had to rewrite the entire table to add this new default value. If you have a small table, it is not a big deal. However, if your table contains billions of rows, you cannot just lock it up and rewrite it—in a professional online transaction processing (OLTP) system, downtime is out of the question.

Starting with PostgreSQL 11, it is possible to add immutable default values to a table without rewriting the entire table, which greatly reduces the burden of a changing data structure.

 

 

 

Summary


In PostgreSQL 11, a lot of functionalities have been added that allow people to run even more professional applications even faster and more efficiently. Many areas of the database server have been improved and many new professional features have been added. In the future, even more improvements will be made. Of course, the changes listed in this chapter are by far not complete because many small changes were made. In the next chapter, you will learn about indexing and the PostgreSQL cost model, which is highly important if you want to maintain good performance.

 

Questions


What is the most important feature of PostgreSQL 11?

Actually, this is hard to tell. It really depends on how you are using the database and which features are most important to your application. Everybody has a personal favorite, however. In my case, it is the parallel index creation, which will be really important to customers running giant databases. Still, it is up to you to decide what you like most and what you don't like.

Does PostgreSQL 11 work on my platform?

PostgreSQL 11 works on all common platforms, including but not limited to Linux, Windows, Solaris, AIX, and macOS X. The community tries to cover as many platforms as possible so as not to exclude people from using PostgreSQL. For most common systems, PostgreSQL will even be prepackaged.

Did the license model change?

No, nothing has changed and most likely nothing ever will.

When can we expect PostgreSQL 12?

Usually, a major release can be expected once a year. So the next major release of PostgreSQL 12 will be in the fall of 2019.

About the Author

  • Hans-Jürgen Schönig

    Hans-Jürgen Schönig has 18 years' experience with PostgreSQL. He is the CEO of a PostgreSQL consulting and support company called Cybertec Schönig and Schönig GmbH. It has successfully served countless customers around the globe. Before founding Cybertec Schönig and Schönig GmbH in 2000, he worked as a database developer at a private research company that focused on the Austrian labor market, where he primarily worked on data mining and forecast models. He has also written several books about PostgreSQL.

    Browse publications by this author

Latest Reviews

(11 reviews total)
The buy is very easy.
A lot of the topics I am interested in are well represented, compared to the competition.
Os livro é muito bom e a compra foi rápida, sem nenhum problema.

Recommended For You

Book Title
Unlock this full book FREE 10 day trial
Start Free Trial