Instant PostgreSQL Backup and Restore How-to [Instant] — Save 50%
A step-by-step guide to backing up and restoring your database using safe, efficient, and proven recipes with this book and ebook
This article by Shaun M. Thomas, from the book, Instant PostgreSQL Backup and Restore How-to explains about another backup method available to PostgreSQL, which is a base backup, which consists of the actual data files themselves. These kinds of backups do not need to be restored, only uncompressed or copied. Using them can be more complicated, but they can be ready much faster depending on the database size. The developers have kindly provided pg_basebackup as a simple starting point.
(For more resources related to this topic, see here.)
Next we need to modify the postgresql.conf file for our database to run in the proper mode for this type of backup. Change the following configuration variables:
wal_level = archive max_wal_senders = 5
Then we must allow a super user to connect to the replication database, which is used by pg_basebackup. We do that by adding the following line to pg_hba.conf:
local replication postgres peer
Finally, restart the database instance to commit the changes.
How to do it...
Though it is only one command, pg_basebackup requires at least one switch to obtain a binary backup, as shown in the following step:
Execute the following command to create the backup in a new directory named db_backup:
$> pg_basebackup -D db_backup -x
How it works...
For PostgreSQL, WAL stands for Write Ahead Log. By changing wal_level to archive, those logs are written in a format compatible with pg_basebackup and other replicationbased tools.
By increasing max_wal_senders from the default of zero, the database will allow tools to connect and request data files. In this case, up to five streams can request data files simultaneously. This maximum should be sufficient for all but the most advanced systems.
The pg_hba.conf file is essentially a connection access control list (ACL). Since pg_basebackup uses the replication protocol to obtain data files, we need to allow local connections to request replication.
Next, we send the backup itself to a directory (-D) named db_backup. This directory will effectively contain a complete copy of the binary files that make up the database.
Finally, we added the -x flag to include transaction logs (xlogs), which the database will require to start, if we want to use this backup. When we get into more complex scenarios, we will exclude this option, but for now, it greatly simplifies the process.
The pg_basebackup tool is actually fairly complicated. There is a lot more involved under the hood.
Viewing backup progress
For manually invoked backups, we may want to know how long the process might take, and its current status. Luckily, pg_basebackup has a progress indicator, which does that by using the following command:
$> pg_basebackup -P -D db_backup
Like many of the other switches, -P can be combined with tape archive format, standalone backups, database clones, and so on. This is clearly not necessary for automated backup routines, but could be useful for one-off backups monitored by an administrator.
Compressed tape archive backups
Many binary backup files come in the TAR (Tape Archive) format, which we can activate using the -f flag and setting it to t for TAR. Several Unix backup tools can directly process this type of backup, and most administrators are familiar with it.
If we want a compressed output, we can set the -z flag, especially in the case of large databases. For our sample database, we should see almost a 20x compression ratio. Try the following command:
$> pg_basebackup -Ft -z -D db_backup
The backup file itself will be named base.tar.gz within the db_backup directory, reflecting its status as a compressed tape archive. In case the database contains extra tablespaces, each becomes a separate compressed archive. Each file can be extracted to a separate location, such as a different set of disks, for very complicated database instances.
For the sake of this example, we ignored the possible presence of extra tablespaces than the pg_default default included in every installation. User-created tablespaces will greatly complicate your backup process.
Making the backup standalone
By specifying -x, we tell the database that we want a "complete" backup. This means we could extract or copy the backup anywhere and start it as a fully qualified database. As we mentioned before, the flag means that you want to include transaction logs, which is how the database recovers from crashes, checks integrity, and performs other important tasks. The following is the command again, for reference:
$> pg_basebackup -x -D db_backup
When combined with the TAR output format and compression, standalone binary backups are perfect for archiving to tape for later retrieval, as each backup is compressed and self-contained. By default, pg_basebackup does not include transaction logs, because many (possibly most) administrators back these up separately. These files have multiple uses, and putting them in the basic backup would duplicate efforts and make backups larger than necessary.
We include them at this point because it is still too early for such complicated scenarios. We will get there eventually, of course.
Because pg_basebackup operates through PostgreSQL's replication protocol, it can execute remotely. For instance, if the database was on a server named Production, and we wanted a copy on a server named Recovery, we could execute the following command from Recovery:
$> pg_basebackup -h Production -x -D /full/db/path
For this to work, we would also need this line in pg_hba.conf for Recovery:
host replication postgres Recovery trust
Though we set the authentication method to trust, this is not recommended for a production server installation. However, it is sufficient to allow Recovery to copy all data from Production. With the -x flag, it also means that the database can be started and kept online in case of emergency. It is a backup and a running server.
Compression is very CPU intensive, but there are some utilities capable of threading the process. Tools such as pbzip2 or pigz can do the compression instead. Unfortunately, this only works in the case of a single tablespace (the default one; if you create more, this will not work). The following is the command for compression using pigz:
$> pg_basebackup -Ft -D - | pigz -j 4 > db_backup.tar.gz
It uses four threads of compression, and sets the backup directory to standard output (-) so that pigz can process the output itself.
In this article we saw the process of obtaining a binary backup. Though, we saw that this process is more complex and tedious, but at the same time it is much faster.
Resources for Article :
- Introduction to PostgreSQL 9 [Article]
- Backup in PostgreSQL 9 [Article]
- Recovery in PostgreSQL 9 [Article]
About the Author :
Shaun M. Thomas is a long time database administrator, and has been working with PostgreSQL since late 2000. Over the years, he has contributed frequently to the PostgreSQL Performance mailing list to help other DBAs get the most out of his favorite database. In 2011 and 2012, he presented at the Postgres Open conference on topics such as handling extreme throughput, High Availability, server redundancy, and failover techniques.
Currently he serves as the database architect at OptionsHouse, an online options brokerage with a PostgreSQL database that handles over one billion queries per day. This database, he says, is proof that PostgreSQL is ready for the big time.
He has great hopes for PostgreSQL as it continues to break new ground and make history.