Reader small image

You're reading from  Mastering PostgreSQL 15 - Fifth Edition

Product typeBook
Published inJan 2023
PublisherPackt
ISBN-139781803248349
Edition5th Edition
Right arrow
Author (1)
Hans-Jürgen Schönig
Hans-Jürgen Schönig
author image
Hans-Jürgen Schönig

Hans-Jürgen Schönig has 20 years' experience with PostgreSQL. He is the CEO of a PostgreSQL consulting and support company called CYBERTEC PostgreSQL International GmbH. It has successfully served countless customers around the globe. Before founding CYBERTEC PostgreSQL International GmbH in 2000, he worked as a database developer at a private research company that focused on the Austrian labor market, where he primarily worked on data mining and forecast models. He has also written several books about PostgreSQL.
Read more about Hans-Jürgen Schönig

Right arrow

Making Sense of Backups and Replication

In Chapter 9, Handling Backup and Recovery, we learned a lot about backup and recovery, which is essential for administration. So far, only logical backups have been covered – that will be remedied in this chapter.

This chapter is all about PostgreSQL’s transaction log and what we can do with it to improve our setup and make things more secure.

In this chapter, we will cover the following topics:

  • Understanding the transaction log
  • Transaction log archiving and recovery
  • Setting up asynchronous replication
  • Upgrading to synchronous replication
  • Making use of replication slots
  • Making use of the CREATE PUBLICATION and CREATE SUBSCRIPTION commands
  • Setting up an HA cluster using Patroni

By the end of this chapter, you will be able to set up transaction log archiving and replication. In addition to that, you will be able to understand basic concepts to handle High Availability and apply them in...

Understanding the transaction log

Every modern database system provides functionality to make sure that a system can survive a crash if something goes wrong or somebody pulls the plug. This is true for filesystems and database systems alike.

PostgreSQL also provides a means to ensure that a crash cannot harm the integrity of data or the data itself. It guarantees that if the power cuts out, the system will always be able to come back on again and do its job.

The means of providing this kind of security is achieved by the Write-Ahead Log (WAL), or xlog. The idea is to not write into a data file directly but instead to write to the log first. Why is this important? Imagine that we are writing some data, as follows:

INSERT INTO data ... VALUES ('12345678');

Let’s assume that this data was written directly to the data file. If the operation fails midway, the data file would be corrupted. It might contain half-written rows, columns without index pointers, missing...

Transaction log archiving and recovery

After our brief introduction to the transaction log in general, it is time to focus on the process of transaction log archiving. As we have already seen, the transaction log contains a sequence of binary changes that are made to the storage system. So, why not use it to replicate database instances and do a lot of other cool stuff, such as archiving?

Configuring for archiving

The first thing we want to do in this chapter is to create a configuration to perform standard Point-in-Time Recovery (PITR). There are a couple of advantages of using PITR over ordinary dumps:

  • We will lose less data because we can restore it to a certain point in time and not just to the fixed backup point.
  • Restoring will be faster because indexes don’t have to be created from scratch. They are just copied over and are ready to use.

Configuration for PITR is easy. Just a handful of changes have to be made in the postgresql.conf file, as shown...

Setting up asynchronous replication

The idea behind streaming replication is simple. After an initial base backup, the secondary backup can connect to the master, fetch a transaction log in real time, and apply it. Transaction log replay is not a single operation anymore but rather a continuous process that is supposed to keep running for as long as a cluster exists.

Performing a basic setup

In this section, we will learn how to set up asynchronous replication quickly and easily. The goal is to set up a system that consists of two nodes.

Basically, most of the work has already been done for WAL archiving. However, to make it easy to understand, we will look at the entire process of setting up streaming because we cannot assume that WAL shipping is really already set up as needed.

The first thing to do is to go to the postgresql.conf file and adjust the following parameters:

wal_level = replica
max_wal_senders = 10     # or whatever value >...

Upgrading to synchronous replication

So far, asynchronous replication has been covered in reasonable detail. However, asynchronous replication means that a commit on a slave is allowed to happen after the commit on a master. If the master crashes, data that has not made it to the slave might yet be lost even if replication is occurring.

Synchronous replication is here to solve the problem – if PostgreSQL replicates synchronously, a commit has to be flushed to disk by at least one replica to go through on the master. Therefore, synchronous replication basically reduces the odds of data loss substantially.

In PostgreSQL, configuring synchronous replication is easy. Only two things have to be done (in any order):

  • Adjust the synchronous_standby_names setting in the postgresql.conf file on the master
  • Add an application_name setting to the primary_conninfo parameter in the config file in the replica

Let’s get started with the postgresql.conf file on...

Making use of replication slots

After that introduction to synchronous replication and dynamically adjustable durability, I want to focus on a feature called replication slots.

What is the purpose of a replication slot? Let’s consider the following example – there is a master and a slave. On the master, a large transaction is executed, and the network connection is not fast enough to ship all data in time. At some point, the master removes its transaction log (checkpoint). If the slave is too far behind, a resync is needed. As we have already seen, the wal_keep_segments setting can be used to reduce the risk of failing replication. The question is this – what is the best value for the wal_keep_segments setting? Sure, more is better, but how much is best?

Replication slots will solve this problem for us – if we are using a replication slot, a master can only recycle the transaction log once it has been consumed by all replicas. The advantage here is...

Making use of the CREATE PUBLICATION and CREATE SUBSCRIPTION commands

Starting with version 10.0, the PostgreSQL community created two new commands – CREATE PUBLICATION and CREATE SUBSCRIPTION. These can be used for logical replication, which means that you can now selectively replicate data and achieve close-to-zero downtime upgrades. So far, binary replication and transaction log replication have been fully covered. However, sometimes, we might not want to replicate an entire database instance – replicating a table or two might be enough. This is exactly when logical replication is the right thing to use.

Before getting started, the first thing to do is change wal_level to logical in postgresql.conf as follows, and then restart:

wal_level = logical

Then, we can create a simple table:

test=# CREATE TABLE t_test (a int, b int);
CREATE TABLE

The same table layout has to exist in the second database as well to make this work. PostgreSQL will not automatically...

Setting up an HA cluster using Patroni

After this introduction to PostgreSQL replication and some basic techniques, it is time to put it all together and solve an important problem – PostgreSQL HA. A single server or a single piece of hardware is always prone to error. If a server fails, we want to avoid the entire infrastructure going down. What we want is some kind of backup server to take over in case of failure.

In the PostgreSQL world, Patroni has become the de facto standard for achieving HA. Therefore, this section is all about Patroni and how you can use it to make PostgreSQL database setups more reliable.

Understand how Patroni operates

Before we can get started, it is important to explain the underlying logic of Patroni in general. Traditionally, an HA setup consisted of two servers that were connected using two network interfaces. One network interface would serve to move the data inside a cluster. The second connection served as a witness to distinguish...

Summary

In this chapter, we learned about the most important features of PostgreSQL replication, such as streaming replication and replication conflicts. We then learned about PITR, as well as replication slots. A book on replication is never complete unless it spans around 400 pages or thereabouts, but we have learned the key points that every administrator should know. You have learned how to set up replication and now understand the most important aspects.

The next chapter, Chapter 11, Deciding on Useful Extensions, is about useful extensions to PostgreSQL. We will learn about extensions that provide even more functionality and have been widely adopted by the industry.

Questions

  • What is the purpose of logical replication?
  • What is the performance impact of synchronous replication?
  • Why wouldn’t you always use synchronous replication?

The answers to these questions can be found in the GitHub repository (https://github.com/PacktPublishing/Mastering-PostgreSQL-15-).

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering PostgreSQL 15 - Fifth Edition
Published in: Jan 2023Publisher: PacktISBN-13: 9781803248349
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Hans-Jürgen Schönig

Hans-Jürgen Schönig has 20 years' experience with PostgreSQL. He is the CEO of a PostgreSQL consulting and support company called CYBERTEC PostgreSQL International GmbH. It has successfully served countless customers around the globe. Before founding CYBERTEC PostgreSQL International GmbH in 2000, he worked as a database developer at a private research company that focused on the Austrian labor market, where he primarily worked on data mining and forecast models. He has also written several books about PostgreSQL.
Read more about Hans-Jürgen Schönig