You're reading from PostgreSQL 14 Administration Cookbook

Product type Book

Published in Mar 2022

Publisher Packt

ISBN-13 9781803248974

Pages 608 pages

Edition 1st Edition

Languages

Concepts

Databases

Authors (2):

Simon Riggs

Gianni Ciolli

View More author details

Table of Contents (14) Chapters

Preface

Chapter 1: First Steps

Chapter 2: Exploring the Database

Chapter 3: Server Configuration

Chapter 4: Server Control

Chapter 5: Tables and Data

Chapter 6: Security

Chapter 7: Database Administration

Chapter 8: Monitoring and Diagnosis

Chapter 9: Regular Maintenance

Chapter 10: Performance and Concurrency

Chapter 11: Backup and Recovery

Chapter 12: Replication and Upgrades

Other Books You May Enjoy

Chapter 11: Backup and Recovery

Most people admit that backups are essential, though they also devote a very small amount of time to thinking about the topic.

The first recipe in this chapter is about understanding and controlling crash recovery. You need to understand what happens if a database server crashes so that you can understand whether you need to perform a recovery operation.

The next recipe is all about planning. That's really the best place to start before you perform backups.

The physical backup mechanisms here were initially written by Simon Riggs (one of the authors of this book) for PostgreSQL 8.0 in 2004 and have been supported by him ever since, now with increasing help from the community as its popularity grows. 2ndQuadrant and EDB have also been providing database recovery services since 2004, and regrettably, many people have needed them as a result of missing or damaged backups.

It is important to note that, in the last few years, the native streaming...

Understanding and controlling crash recovery

Crash recovery is the PostgreSQL subsystem that saves us should the server crash or fail as part of a system crash.

It's good to understand a little about it and what we can do to control it in our favor.

How to do it…

If PostgreSQL crashes, there will be a message in the server log with the severity level set to PANIC. PostgreSQL will immediately restart and attempt to recover using the transaction log or the Write-Ahead Log (WAL).

The WAL consists of a series of files written to the pg_wal subdirectory of the PostgreSQL data directory. Each change made to the database is recorded first in WAL, hence the name write-ahead log, which is a synonym for a transaction log. Note that the former is probably more accurate, since, in the WAL, there are also changes not related to transactions. When a transaction commits, the default (and safe) behavior is to force the...

Planning your backups

This recipe is all about thinking ahead and planning. If you're reading this recipe before you've decided to take a backup, well done!

The key thing to understand is that you should plan your recovery, not your backup. The type of backup you take influences the type of recovery that is possible, so you must give some thought to what you are trying to achieve beforehand.

If you want to plan your recovery, then you need to consider the different types of failure that can occur. What type of recovery do you wish to perform?

You need to consider the following main aspects:

Full or partial database?
Everything or just object definitions?
Point-in-Time Recovery (PITR)
Restore performance

We need to look at the characteristics of the utilities to understand what our backup and recovery options are. It's often beneficial to have multiple types of backup to cover the different possible types of...

Hot logical backups of one database

Logical backup makes a copy of the data in the database by dumping the content of each table, as well as object definitions for that same database (such as schemas, tables, indexes, views, privileges, triggers, and constraints).

How to do it…

The command to do this is simple. The following is an example of doing this when using a database called pgbench:

pg_dump -F c pgbench > dumpfile

Alternatively, you can use the following command:

pg_dump -F c -f dumpfile pgbench

Finally, note that you can also run pg_dump via the pgAdmin 4 GUI, as shown in the following screenshot:

Figure 11.2 – Using the pgAdmin 4 GUI

How it works…

The pg_dump utility produces a single output file. This output file can use the split command to separate the file into multiple pieces if required.

The pg_dump archive file, also known ...

Hot logical backups of all databases

If you have more than one database in your PostgreSQL server, you may want to take a logical backup of all of the databases at the same time.

How to do it…

Our recommendation is that you repeat exactly what you do for one database to each database in your cluster. You can run individual dumps in parallel if you want to speed things up.

Once this is complete, dump the global information using the following command:

pg_dumpall -g

How it works…

To back up all databases, you may be told that you need to use the pg_dumpall utility. The following are four good reasons why you shouldn't do that:

If you use pg_dumpall, the only output produced will be in a script file. Script files can't benefit from all the features of archive files, such as parallel and selective restore of pg_restore. By making your backup in this way, you will immediately deprive yourself of flexibility...

Backups of database object definitions

Sometimes, it's useful to get a dump of the object definitions that make up a database. This is useful for comparing what's in the database against the definitions in a data- or object-modeling tool. It's also useful to make sure that you can recreate objects in the correct schema, tablespace, and database with the correct ownership and permissions.

How to do it…

There are several important commands to note here.

The basic command to dump the definitions for every database of your PostgreSQL instance is as follows:
```
pg_dumpall --schema-only > myscriptdump.sql
```

This includes all objects, including roles, tablespaces, databases, schemas, tables, indexes, triggers, constraints, views, functions, ownerships, and privileges.

If you want to dump PostgreSQL role definitions, use the following command:
```
pg_dumpall --roles-only > myroles.sql
```
If you want to dump PostgreSQL tablespace definitions...

A standalone hot physical backup

Hot physical backup is an important capability for databases.

Physical backup allows us to get a completely consistent view of the changes to all databases at once. Physical backup also allows us to back up even while DDL changes are being executed on the database. Apart from resource constraints, there is no additional overhead or locking with this approach.

Physical backup procedures used to be slightly more complex than logical backup procedures, but in version 10, some defaults have been changed, making them easier; after these changes, making a backup with pg_basebackup has become very easy, even with default settings.

In this recipe, we will first describe the easiest method, which is to use the pg_basebackup utility, and then provide a lower-level equivalent process to explain physical backups in more detail and describe the changes required for additional features, such as differential backup or a parallel...

Hot physical backups with Barman

The main motivation to start a new open source project for disaster recovery of PostgreSQL databases was the lack (back in 2011) of a simple and standard procedure for managing backups and, most importantly, recovery. Disasters and failures in ICT will happen.

As a database administrator, your duty is to plan for backups and the recovery of PostgreSQL databases and perform regular tests in order to sweep away stress and fear, which typically follow those unexpected events. Barman, which stands for Backup and Recovery Manager, is definitely a tool that you can use for these purposes.

Barman hides most of the complexity of working with PostgreSQL backups. For more information on the underlying technologies, you can refer to other recipes in this chapter: Understanding and controlling crash recovery, Planning backups, Hot physical backup and continuous archiving, and Recovery to a point in time. It is important to be aware of how Barman...

Recovery of all databases

Recovery of a complete database server, including all of its databases, is an important feature. This recipe covers how to execute a recovery in the simplest way possible.

Some complexities are discussed here, though most are covered in later recipes.

Getting ready

Find a suitable server on which to perform the restore.

Before you recover onto a live server, always make another backup. Whatever problem you thought you had can get worse if you aren't prepared.

Physical backups (including Barman ones) are more efficient than logical ones, but they are subject to additional restrictions.

To be precise, a single instance of Barman can manage backups of several servers having different versions of PostgreSQL. However, when it comes to recovery, the same requirements for the PITR technology of PostgreSQL apply – in particular, the following:

You must recover on a server with the same hardware architecture and...

Recovery to a point in time

If your database suffers a problem at 3:22 p.m. and your backup was taken at 4:00 a.m., you're probably hoping there is a way to recover the changes made between those two times. What you need is known as Point-in-Time Recovery (PITR).

Regrettably, if you've made a backup with the pg_dump utility at 4:00 a.m., then you won't be able to recover to any other time. As a result, the term PITR has become synonymous with the physical backup and restore technique in PostgreSQL.

Getting ready

If you have a backup made with pg_dump utility, then give up all hope of using that as a starting point for a PITR. It's a frequently asked question, but the answer is still no. The reason it gets asked is exactly why we are pleading with you to plan your backups ahead of time.

First, you need to decide the point in time you would like to recover to. If the answer is as late as possible,...

Recovery of a dropped/damaged table

You may drop or even damage a table in some way. Tables could be damaged for physical reasons, such as disk corruption, or they could also be damaged by running poorly specified UPDATE or DELETE commands, which update too many rows or overwrite critical data.

Recovering from this backup situation is a common request.

How to do it…

The methods to this approach differ, depending on the type of backup you have available. If you have multiple types of backup, you have a choice.

Logical – from the custom dump taken with pg_dump -F c

If you've taken a logical backup using the pg_dump utility in a custom file, then you can simply extract the table you want from the dumpfile, like so:

pg_restore -t mydroppedtable dumpfile | psql

Alternatively, you can directly connect to the database using -d. If you use this option, then you can allow multiple jobs in parallel...

Recovery of a dropped/damaged database

Recovering a complete database is also required sometimes. It's actually a lot easier than recovering a single table. Many users choose to place all of their tables in a single database; in that case, this recipe isn't relevant.

How to do it…

The methods differ, depending on the type of backup you have available. If you have multiple types of backup, you have a choice.

Logical – from the custom dump -F c

Recreate the database in the original server using parallel tasks to speed things along. This can be executed remotely without needing to transfer dumpfile between systems, as shown in the following example, where we use the -j option to specify four parallel processes:

pg_restore -h myhost -d postgres --create -j 4 dumpfile

Logical – from the script dump created by pg_dump

Recreate the database in the original server. This can be executed remotely...

Extracting a logical backup from a physical one

Once you have a physical backup, you can extract a logical backup from it, applying some of the recipes that we have already seen.

This recipe is quite short because it is essentially a combination of recipes that we have already described. Nevertheless, it is important because it clarifies that you don't need to worry about extracting logical backups, if you already have physical ones.

Getting ready

You just need to decide whether you want to extract a logical backup corresponding to a specific point in time or simply to the latest available snapshot.

How to do it…

First, perform a PITR, as indicated in the Recovery to a point in time recipe earlier in this chapter. If you want a logical backup corresponding to the latest available snapshot, just omit the --target-time clause. Then, follow the Hot logical backups of one database recipe to take a logical backup from the temporary instance.

Finally...

Improving performance of logical backup/recovery

Performance is often a concern in any medium-sized or large database.

Backup performance is often a delicate issue because resource usage may need to be limited to remain within certain boundaries. There may also be a restriction on the maximum runtime for the backup – for example, a backup that runs every Sunday.

Again, restore performance may be more important than backup performance, even if backup is the more obvious concern.

In this recipe, we will discuss the performance of logical backup and recovery; the physical case is quite different and is examined in the recipes after that.

Getting ready

If performance is a concern or is likely to be, then you should read the Planning backups recipe first.

How to do it…

You can use the -j option to specify the number of parallel processes that pg_dump should use to perform the database backup. This requires that you use...

Improving performance of physical backup/recovery

Physical backups are quite different from logical ones, and this difference extends also to the options available to make them faster.

In both cases, it is possible to use multiple parallel processes, although for quite different reasons. Physical backups are mostly constrained by network and storage bandwidth, meaning that the benefit of parallelism is limited, although not marginally. Usually, there is little benefit in using more than four parallel processes, and you can expect to reduce backup time to 40–60% of what it is with a single thread. And, in any case, the more threads you use, the more it will impact the current system.

Incremental backup and restore are currently available only for physical backups. Although, in theory, it is possible to implement incremental behavior for logical backup/restore, in practice, this feature does not exist yet. Perhaps this is because physical backups are by nature faster and...

Validating backups

In this recipe, we will use the data checksum feature to detect data corruption caused by I/O malfunctioning in advance.

It is important to discover such problems as soon as possible. For instance, we want a chance to recover lost data from one of our older backups, or we may want to stop data errors before they spread to the rest of the database when new data depends on existing data.

Getting ready

This feature is disabled by default, since it results in some overhead; it can be enabled when the cluster is initialized by using the --data-checksums option of the initdb utility, or on an existing cluster, with pg_checksum --enable.

Also, before trying this recipe, you should be familiar with how to take backups and how to restore them afterward, which are the subjects of most of this chapter.

How to do it…

First, check whether data checksums are enabled:

postgres=# SHOW data_checksums ;
 data_checksums...

The rest of the chapter is locked

You're reading from PostgreSQL 14 Administration Cookbook

Table of Contents (14) Chapters

Chapter 11: Backup and Recovery

Understanding and controlling crash recovery

How to do it…

Planning your backups

Hot logical backups of one database

How to do it…

How it works…

Hot logical backups of all databases

How to do it…

How it works…

Backups of database object definitions

How to do it…

A standalone hot physical backup

Hot physical backups with Barman

Recovery of all databases

Getting ready

Recovery to a point in time

Getting ready

Recovery of a dropped/damaged table

How to do it…

Logical – from the custom dump taken with pg_dump -F c

Recovery of a dropped/damaged database

How to do it…

Logical – from the custom dump -F c

Logical – from the script dump created by pg_dump

Extracting a logical backup from a physical one

Getting ready

How to do it…

Improving performance of logical backup/recovery

Getting ready

How to do it…

Improving performance of physical backup/recovery

Validating backups

Getting ready

How to do it…

Unlock this book and the full library FREE for 7 days

Authors (2)

Personalised recommendations for you