You're reading from Mastering PostgreSQL 15 - Fifth Edition

Product typeBook

Published inJan 2023

PublisherPackt

ISBN-139781803248349

Edition5th Edition

Concepts

Robotic Process Automation

Author (1)

Hans-Jürgen Schönig

Making Sense of Backups and Replication

In Chapter 9, Handling Backup and Recovery, we learned a lot about backup and recovery, which is essential for administration. So far, only logical backups have been covered – that will be remedied in this chapter.

This chapter is all about PostgreSQL’s transaction log and what we can do with it to improve our setup and make things more secure.

In this chapter, we will cover the following topics:

Understanding the transaction log
Transaction log archiving and recovery
Setting up asynchronous replication
Upgrading to synchronous replication
Making use of replication slots
Making use of the CREATE PUBLICATION and CREATE SUBSCRIPTION commands
Setting up an HA cluster using Patroni

By the end of this chapter, you will be able to set up transaction log archiving and replication. In addition to that, you will be able to understand basic concepts to handle High Availability and apply them in...

Understanding the transaction log

Every modern database system provides functionality to make sure that a system can survive a crash if something goes wrong or somebody pulls the plug. This is true for filesystems and database systems alike.

PostgreSQL also provides a means to ensure that a crash cannot harm the integrity of data or the data itself. It guarantees that if the power cuts out, the system will always be able to come back on again and do its job.

The means of providing this kind of security is achieved by the Write-Ahead Log (WAL), or xlog. The idea is to not write into a data file directly but instead to write to the log first. Why is this important? Imagine that we are writing some data, as follows:

INSERT INTO data ... VALUES ('12345678');

Let’s assume that this data was written directly to the data file. If the operation fails midway, the data file would be corrupted. It might contain half-written rows, columns without index pointers, missing...

Transaction log archiving and recovery

After our brief introduction to the transaction log in general, it is time to focus on the process of transaction log archiving. As we have already seen, the transaction log contains a sequence of binary changes that are made to the storage system. So, why not use it to replicate database instances and do a lot of other cool stuff, such as archiving?

Configuring for archiving

The first thing we want to do in this chapter is to create a configuration to perform standard Point-in-Time Recovery (PITR). There are a couple of advantages of using PITR over ordinary dumps:

We will lose less data because we can restore it to a certain point in time and not just to the fixed backup point.
Restoring will be faster because indexes don’t have to be created from scratch. They are just copied over and are ready to use.

Configuration for PITR is easy. Just a handful of changes have to be made in the postgresql.conf file, as shown...

Setting up asynchronous replication

The idea behind streaming replication is simple. After an initial base backup, the secondary backup can connect to the master, fetch a transaction log in real time, and apply it. Transaction log replay is not a single operation anymore but rather a continuous process that is supposed to keep running for as long as a cluster exists.

Performing a basic setup

In this section, we will learn how to set up asynchronous replication quickly and easily. The goal is to set up a system that consists of two nodes.

Basically, most of the work has already been done for WAL archiving. However, to make it easy to understand, we will look at the entire process of setting up streaming because we cannot assume that WAL shipping is really already set up as needed.

The first thing to do is to go to the postgresql.conf file and adjust the following parameters:

wal_level = replica
max_wal_senders = 10     # or whatever value >...

Upgrading to synchronous replication

So far, asynchronous replication has been covered in reasonable detail. However, asynchronous replication means that a commit on a slave is allowed to happen after the commit on a master. If the master crashes, data that has not made it to the slave might yet be lost even if replication is occurring.

Synchronous replication is here to solve the problem – if PostgreSQL replicates synchronously, a commit has to be flushed to disk by at least one replica to go through on the master. Therefore, synchronous replication basically reduces the odds of data loss substantially.

In PostgreSQL, configuring synchronous replication is easy. Only two things have to be done (in any order):

Adjust the synchronous_standby_names setting in the postgresql.conf file on the master
Add an application_name setting to the primary_conninfo parameter in the config file in the replica

Let’s get started with the postgresql.conf file on...

Making use of replication slots

After that introduction to synchronous replication and dynamically adjustable durability, I want to focus on a feature called replication slots.

What is the purpose of a replication slot? Let’s consider the following example – there is a master and a slave. On the master, a large transaction is executed, and the network connection is not fast enough to ship all data in time. At some point, the master removes its transaction log (checkpoint). If the slave is too far behind, a resync is needed. As we have already seen, the wal_keep_segments setting can be used to reduce the risk of failing replication. The question is this – what is the best value for the wal_keep_segments setting? Sure, more is better, but how much is best?

Replication slots will solve this problem for us – if we are using a replication slot, a master can only recycle the transaction log once it has been consumed by all replicas. The advantage here is...

Making use of the CREATE PUBLICATION and CREATE SUBSCRIPTION commands

Starting with version 10.0, the PostgreSQL community created two new commands – CREATE PUBLICATION and CREATE SUBSCRIPTION. These can be used for logical replication, which means that you can now selectively replicate data and achieve close-to-zero downtime upgrades. So far, binary replication and transaction log replication have been fully covered. However, sometimes, we might not want to replicate an entire database instance – replicating a table or two might be enough. This is exactly when logical replication is the right thing to use.

Before getting started, the first thing to do is change wal_level to logical in postgresql.conf as follows, and then restart:

wal_level = logical

Then, we can create a simple table:

test=# CREATE TABLE t_test (a int, b int);
CREATE TABLE

The same table layout has to exist in the second database as well to make this work. PostgreSQL will not automatically...

Setting up an HA cluster using Patroni

After this introduction to PostgreSQL replication and some basic techniques, it is time to put it all together and solve an important problem – PostgreSQL HA. A single server or a single piece of hardware is always prone to error. If a server fails, we want to avoid the entire infrastructure going down. What we want is some kind of backup server to take over in case of failure.

In the PostgreSQL world, Patroni has become the de facto standard for achieving HA. Therefore, this section is all about Patroni and how you can use it to make PostgreSQL database setups more reliable.

Understand how Patroni operates

Before we can get started, it is important to explain the underlying logic of Patroni in general. Traditionally, an HA setup consisted of two servers that were connected using two network interfaces. One network interface would serve to move the data inside a cluster. The second connection served as a witness to distinguish...

Summary

In this chapter, we learned about the most important features of PostgreSQL replication, such as streaming replication and replication conflicts. We then learned about PITR, as well as replication slots. A book on replication is never complete unless it spans around 400 pages or thereabouts, but we have learned the key points that every administrator should know. You have learned how to set up replication and now understand the most important aspects.

The next chapter, Chapter 11, Deciding on Useful Extensions, is about useful extensions to PostgreSQL. We will learn about extensions that provide even more functionality and have been widely adopted by the industry.

Questions

What is the purpose of logical replication?
What is the performance impact of synchronous replication?
Why wouldn’t you always use synchronous replication?

The answers to these questions can be found in the GitHub repository (https://github.com/PacktPublishing/Mastering-PostgreSQL-15-).

The rest of the chapter is locked

You have been reading a chapter from

Mastering PostgreSQL 15 - Fifth Edition

Published in: Jan 2023Publisher: PacktISBN-13: 9781803248349

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Hans-Jürgen Schönig

Hans-Jürgen Schönig has 20 years' experience with PostgreSQL. He is the CEO of a PostgreSQL consulting and support company called CYBERTEC PostgreSQL International GmbH. It has successfully served countless customers around the globe. Before founding CYBERTEC PostgreSQL International GmbH in 2000, he worked as a database developer at a private research company that focused on the Austrian labor market, where he primarily worked on data mining and forecast models. He has also written several books about PostgreSQL.
Read more about Hans-Jürgen Schönig

Personalised recommendations for you

Based on your interests and search pattern

Engineering Manager's Handbook

Engineering Manager's Handbook is a comprehensive guide for managers to excel in their role, foster customer-centric digital products, learn leadership, team building, and balancing technical work with management. You’ll also explore how to develop trust, authority, and collaboration to drive success and make a lasting impact.

BookSep 2023278 pages

C++ Game Animation Programming

Video game characters have a fascinating history, evolving from simple 2D sprites to high-polygon 3D models. Take a look behind the curtain and learn how to build a 3D renderer, load character models, play animations and blend between them, and create large crowds of animated people with this comprehensive C++ game animation programming guide.

BookDec 2023480 pages

Gamification for Product Excellence

This book helps you to take your product management strategy to the next level by standing out in crowded markets. Along with boosting user adoption rates by creating engaging products that incorporate playful elements, learn gamification theory and how to integrate it into your design, product development, and product management processes.

BookSep 2023350 pages

Supercharging Productivity with Trello

Supercharging Productivity with Trello is the ultimate guide for anyone looking to boost their productivity with digital tools. Whether you're new to Trello or a seasoned professional, this book covers everything from core features to advanced automation, and Power-Ups.

BookAug 2023342 pages

Automate It with Zapier and Generative AI

This comprehensive guide takes you through the concepts of business process automation, showing you how Zapier can facilitate it without having to write code and helping you to boost productivity. You’ll learn how to save time, reduce costs, and make your business recession-proof by using Zapier to automate tasks in your cloud-based business apps.

BookAug 2023706 pages

Scoring to Picture in Logic Pro

In this book, you’ll explore a variety of techniques to synchronize music to picture using Logic Pro. Though this is not a technical manual, it will teach you how to make the best use of Logic Pro and how to wield this technology to maximize your potential when scoring to picture.

BookSep 2023412 pages

Mastering Information Security Compliance Management

This concise book equips you with the knowledge and practices needed to establish and maintain an effective information security management system. The chapters provide insights into ISO/IEC 27001/27002:2022, risk management, ISMS development, incident management, audit processes, and strategies for continuous improvement.

BookAug 2023236 pages1

Implementing Atlassian Confluence

Implementing Atlassian Confluence provides both a high-level overview and an insightful path for remote collaboration with Atlassian Confluence. With this multi-layered yet practical guide, you’ll be able to set up Confluence-based collaboration with minimum external consultancy services to ensure smooth and close coordination between teams.

BookSep 2023406 pages

R Bioinformatics Cookbook

This book takes a unique problem–solution approach to handling complex tasks in the bioinformatics domain using different datasets present in the book. With the help of real-world examples, you’ll learn to put each independent recipe to use to tackle problems in the field of bioinformatics.

BookOct 2023396 pages

Build Your Own Metaverse with Unity

Build Your own Metaverse with Unity is a practical guide for developers to create their own metaverse - a virtual world with infinite possibilities. It empowers you to identify gaps in existing metaverses and improve upon them, enabling you to shape your virtual world.

BookSep 2023586 pages5