Understanding Point-In-Time-Recovery

Exclusive offer: get 50% off this eBook here
PostgreSQL Replication

PostgreSQL Replication — Save 50%

Understand basic replication concepts and efficiently replicate PostgreSQL using high-end techniques to protect your data and run your server without interruptions with this book and ebook

$29.99    $15.00
by Hans-Jürgen Schönig Zoltan Böszörmenyi | September 2013 | Open Source

In this article by Hans-Jürgen Schönig and Zoltan Böszörmenyi, authors of PostgreSQL Replication, we will cover the following:

  • Understanding the concepts behind PITR
  • Configuring PostreSQL for PITR
  • Running pg_basebackup
  • Recovering PostgreSQL to a certain point in time

(For more resources related to this topic, see here.)

Understanding the purpose of PITR

PostgreSQL offers a tool called pg_dump to backup a database. Basically, pg_dump will connect to the database, read all the data in transaction isolation level "serializable" and return the data as text. As we are using "serializable", the dump is always consistent. So, if your pg_dump starts at midnight and finishes at 6 A.M, you will have created a backup, which contains all the data as of midnight but no further data. This kind of snapshot creation is highly convenient and perfectly feasible for small to medium amounts of data.

A dump is always consistent. This means that all foreign keys are intact; new data added after starting the dump will be missing. It is most likely the most common way to perform standard backups.

But, what if your data is so valuable and maybe so large in size that you want to backup it incrementally? Taking a snapshot from time to time might be enough for some applications; for highly critical data, it is clearly not. In addition to that, replaying 20 TB of data in textual form is not efficient either. Point-In-Time-Recovery has been designed to address this problem. How does it work? Based on a snapshot of the database, the XLOG will be replayed later on. This can happen indefinitely or up to a point chosen by you. This way, you can reach any point in time.

This method opens the door to many different approaches and features:

  • Restoring a database instance up to a given point in time
  • Creating a standby database, which holds a copy of the original data
  • Creating a history of all changes

In this article, we will specifically feature on the incremental backup functionality and describe how you can make your data more secure by incrementally archiving changes to a medium of choice.

Moving to the bigger picture

The following picture provides an overview of the general architecture in use for Point-In-Time-Recovery:

PostgreSQL produces 16 MB segments of transaction log. Every time one of those segments is filled up and ready, PostgreSQL will call the so called archive_command. The goal of archive_command is to transport the XLOG file from the database instance to an archive. In our image, the archive is represented as the pot on the bottom-right side of the image.

The beauty of the design is that you can basically use an arbitrary shell script to archive the transaction log. Here are some ideas:

  • Use some simple copy to transport data to an NFS share
  • Run rsync to move a file
  • Use a custom made script to checksum the XLOG file and move it to an FTP server
  • Copy the XLOG file to a tape

The possible options to manage XLOG are only limited by imagination.

The restore_command is the exact counterpart of the archive_command. Its purpose is to fetch data from the archive and provide it to the instance, which is supposed to replay it (in our image, this is labeled as Restored Backup). As you have seen, replay might be used for replication or simply to restore a database to a given point in time as outlined in this article. Again, the restore_command is simply a shell script doing whatever you wish, file by file.

It is important to mention that you, the almighty administrator, are in charge of the archive. You have to decide how much XLOG to keep and when to delete it. The importance of this task cannot be underestimated.

Keep in mind, when then archive_command fails for some reason, PostgreSQL will keep the XLOG file and retry after a couple of seconds. If archiving fails constantly from a certain point on, it might happen that the master fills up. The sequence of XLOG files must not be interrupted; if a single file is missing, you cannot continue to replay XLOG. All XLOG files must be present because PostgreSQL needs an uninterrupted sequence of XLOG files; if a single file is missing, the recovery process will stop there at the very latest.

Archiving the transaction log

After taking a look at the big picture, we can take a look and see how things can be put to work.

The first thing you have to do when it comes to Point-In-Time-Recovery is to archive the XLOG. PostgreSQL offers all the configuration options related to archiving through postgresql.conf.

Let us see step by step what has to be done in postgresql.conf to start archiving:

  1. First of all, you should turn archive_mode on.
  2. In the second step, you should configure your archive command. The archive command is a simple shell command taking two parameters:
    1. %p: This is a placeholder representing the XLOG file that should be archived, including its full path (source).
    2. %f: This variable holds the name of the XLOG without the path pointing to it.

Let us set up archiving now. To do so, we should create a place to put the XLOG. Ideally, the XLOG is not stored on the same hardware as the database instance you want to archive. For the sake of this example, we assume that we want to apply an archive to /archive. The following changes have to be made to postgresql.conf:

wal_level = archive # minimal, archive, or hot_standby # (change requires restart) archive_mode = on # allows archiving to be done # (change requires restart) archive_command = 'cp %p /archive/%f' # command to use to archive a logfile segment # placeholders: %p = path of file to archive # %f = file name only

Once those changes have been made, archiving is ready for action and you can simply restart the database to activate things.

Before we restart the database instance, we want to focus your attention on wal_level. Currently three different wal_level settings are available:

  • minimal
  • archive
  • hot_standby

The amount of transaction log produced in the case of just a single node is by far not enough to synchronize an entire second instance. There are some optimizations in PostgreSQL, which allow XLOG-writing to be skipped in the case of single-node mode. The following instructions can benefit from wal_level being set to minimal: CREATE TABLE AS, CREATE INDEX, CLUSTER, and COPY (if the table was created or truncated within the same transaction).

To replay the transaction log, at least archive is needed. The difference between archive and hot_standby is that archive does not have to know about currently running transactions. For streaming replication, however, this information is vital.

Restarting can either be done through pg_ctl –D /data_directory –m fast restart directly or through a standard init script.

The easiest way to check if our archiving works is to create some useless data inside the database. The following code snippets shows a million rows can be made easily:

test=# CREATE TABLE t_test AS SELECT * FROM generate_series(1,
1000000);

SELECT 1000000
test=# SELECT * FROM t_test LIMIT 3;
generate_series
-----------------
1
2
3
(3 rows)

We have simply created a list of numbers. The important thing is that 1 million rows will trigger a fair amount of XLOG traffic. You will see that a handful of files have made it to the archive:

iMac:archivehs$ ls -l
total 131072
-rw------- 1 hs wheel 16777216 Mar 5 22:31
000000010000000000000001
-rw------- 1 hs wheel 16777216 Mar 5 22:31
000000010000000000000002
-rw------- 1 hs wheel 16777216 Mar 5 22:31
000000010000000000000003
-rw------- 1 hs wheel 16777216 Mar 5 22:31
000000010000000000000004

Those files can be easily used for future replay operations.

If you want to save storage, you can also compress those XLOG files. Just add gzip to your archive_command.

Taking base backups

In the previous section, you have seen that enabling archiving takes just a handful of lines and offers a great deal of flexibility. In this section, we will see how to create a so called base backup, which can be used to apply XLOG later on. A base backup is an initial copy of the data.

Keep in mind that the XLOG itself is more or less worthless. It is only useful in combination with the initial base backup.

In PostgreSQL, there are two main options to create an initial base backup:

  • Using pg_basebackup
  • Traditional copy/rsync based methods

The following two sections will explain in detail how a base backup can be created:

Using pg_basebackup

The first and most common method to create a backup of an existing server is to run a command called pg_basebackup, which was introduced in PostgreSQL 9.1.0. Basically pg_basebackup is able to fetch a database base backup directly over a database connection. When executed on the slave, pg_basebackup will connect to the database server of your choice and copy all the data files in the data directory over to your machine. There is no need to log into the box anymore, and all it takes is one line of code to run it; pg_basebackup will do all the rest for you.

In this example, we will assume that we want to take a base backup of a host called sample.postgresql-support.de. The following steps must be performed:

  • Modify pg_hba.conf to allow replication
  • Signal the master to take pg_hba.conf changes into account
  • Call pg_basebackup

Modifying pg_hba.conf

To allow remote boxes to log into a PostgreSQL server and to stream XLOG, you have to explicitly allow replication.

In PostgreSQL, there is a file called pg_hba.conf, which tells the server which boxes are allowed to connect using which type of credentials. Entire IP ranges can be allowed or simply discarded through pg_hba.conf.

To enable replication, we have to add one line for each IP range we want to allow. The following listing contains an example of a valid configuration:

# TYPE DATABASE USER ADDRESS METHOD
host replication all 192.168.0.34/32 md5

In this case we allow replication connections from 192.168.0.34. The IP range is identified by 32 (which simply represents a single server in our case). We have decided to use MD5 as our authentication method. It means that the pg_basebackup has to supply a password to the server. If you are doing this in a non-security critical environment, using trust as authentication method might also be an option.

What happens if you actually have a database called replication in your system? Basically, setting the database to replication will just configure your streaming behavior, if you want to put in rules dealing with the database called replication, you have to quote the database name as follows: "replication". However, we strongly advise not to do this kind of trickery to avoid confusion.

Signaling the master server

Once pg_hba.conf has been changed, we can tell PostgreSQL to reload the configuration. There is no need to restart the database completely. We have three options to make PostgreSQL reload pg_hba.conf:

  • By running an SQL command: SELECT pg_reload_conf();
  • By sending a signal to the master: kill –HUP 4711 (with 4711 being the process ID of the master)
  • By calling pg_ctl: pg_ctl –D $PGDATA reload (with $PGDATA being the home directory of your database instance)

Once we have told the server acting as data source to accept streaming connections, we can move forward and run pg_basebackup.

pg_basebackup – basic features

pg_basebackup is a very simple-to-use command-line tool for PostgreSQL. It has to be called on the target system and will provide you with a ready-to-use base backup, which is ready to consume the transaction log for Point-In-Time-Recovery.

The syntax of pg_basebackup is as follows:

iMac:dbhs$ pg_basebackup --help pg_basebackup takes a base backup of a running PostgreSQL server. Usage: pg_basebackup [OPTION]... Options controlling the output: -D, --pgdata=DIRECTORY receive base backup into directory -F, --format=p|t output format (plain (default), tar) -x, --xlog include required WAL files in backup (fetch mode) -X, --xlog-method=fetch|stream include required WAL files with specified method -z, --gzip compress tar output -Z, --compress=0-9 compress tar output with given compression level General options: -c, --checkpoint=fast|spread set fast or spread checkpointing -l, --label=LABEL set backup label -P, --progress show progress information -v, --verbose output verbose messages -V, --version output version information, then exit -?, --help show this help, then exit Connection options: -h, --host=HOSTNAME database server host or socket directory -p, --port=PORT database server port number -s, --status-interval=INTERVAL time between status packets sent to server (in seconds) -U, --username=NAME connect as specified database user -w, --no-password never prompt for password -W, --password force password prompt (should happen automatically)

A basic call to pg_basebackup would look like that:

iMac:dbhs$ pg_basebackup -D /target_directory \
-h sample.postgresql-support.de

In this example, we will fetch the base backup from sample.postgresql-support.de and put it into our local directory called /target_directory. It just takes this simple line to copy an entire database instance to the target system.

When we create a base backup as shown in this section, pg_basebackup will connect to the server and wait for a checkpoint to happen before the actual copy process is started. In this mode, this is necessary because the replay process will start exactly at this point in the XLOG. The problem is that it might take a while until a checkpoint kicks in; pg_basebackup does not enforce a checkpoint on the source server straight away to make sure that normal operations are not disturbed.

If you don't want to wait on a checkpoint, consider using --checkpoint=fast. It will enforce an instant checkpoint and pg_basebackup will start copying instantly.

By default, a plain base backup will be created. It will consist of all the files in directories found on the source server. If the base backup should be stored on tape, we suggest to give –-format=t a try. It will automatically create a TAR archive (maybe on a tape). If you want to move data to a tape, you can save an intermediate step easily this way. When using TAR, it is usually quite beneficial to use it in combination with --gzip to reduce the size of the base backup on disk.

There is also a way to see a progress bar while doing the base backup but we don't recommend to use this option (--progress) because it requires pg_basebackup to determine the size of the source instance first, which can be costly.

pg_basebackup – self-sufficient backups

Usually a base backup without XLOG is pretty worthless. This is because the base backup is taken while the master is fully operational. While the backup is taken, those storage files in the database instance might have been modified heavily. The purpose of the XLOG is to fix those potential issues in the data files reliably.

But, what if we want to create a base backup, which can live without (explicitly archived) XLOG? In this case, we can use the --xlog-method=stream option. If this option has been chosen, pg_basebackup will not just copy the data as it is but it will also stream the XLOG being created during the base backup to our destination server. This will provide us with just enough XLOG to allow us to start a base backup made that way directly. It is self-sufficient and does not need additional XLOG files. This is not Point-In-Time-Recovery but it can come in handy in case of trouble. Having a base backup, which can be started right away, is usually a good thing and it comes at fairly low cost.

Please note that --xlog-method=stream will require two database connections to the source server, not just one. You have to keep that in mind when adjusting max_wal_senders on the source server.

If you are planning to use Point-In-Time-Recovery and if there is absolutely no need to start the backup as it is, you can safely skip the XLOG and save some space this way (default mode).

Making use of traditional methods to create base backups

These days pg_basebackup is the most common way to get an initial copy of a database server. This has not always been the case. Traditionally, a different method has been used which works as follows:

  • Call SELECT pg_start_backup('some label');
  • Copy all data files to the remote box through rsync or any other means.
  • Run SELECT pg_stop_backup();

The main advantage of this old method is that there is no need to open a database connection and no need to configure XLOG-streaming infrastructure on the source server.

A main advantage is also that you can make use of features such as ZFS-snapshots or similar means, which can help to dramatically reduce the amount of I/O needed to create an initial backup.

Once you have started pg_start_backup, there is no need to hurry. It is not necessary and not even especially desirable to leave the backup mode soon. Nothing will happen if you are in backup mode for days. PostgreSQL will archive the transaction log as usual and the user won't face any kind of downside. Of course, it is bad habit not to close backups soon and properly. However, the way PostgreSQL works internally does not change when a base backup is running. There is nothing filling up, no disk I/O delayed, or anything of this sort.

Tablespace issues

If you happen to use more than one tablespace, pg_basebackup will handle this just fine if the filesystem layout on the target box is identical to the filesystem layout on the master. However, if your target system does not use the same filesystem layout there is a bit more work to do. Using the traditional way of doing the base backup might be beneficial in this case.

In case you are using --format=t (for TAR), you will be provided with one TAR file per tablespace.

Keeping an eye on network bandwidth

Let us imagine a simple scenario involving two servers. Each server might have just one disk (no SSDs). Our two boxes might be interconnected through a 1 Gbit link. What will happen to your applications if the second server starts to run a pg_basebackup? The second box will connect, start to stream data at full speed and easily kill your hard drive by using the full bandwidth of your network. An application running on the master might instantly face disk wait and offer bad response times. Therefore it is highly recommended to control the bandwidth used up by rsync to make sure that your business applications have enough spare capacity (mainly disk, CPU is usually not an issue).

If you want to limit rsync to, say, 20 MB/sec, you can simply use rsync --bwlimit=20000. This will definitely make the creation of the base backup take longer but it will make sure that your client apps will not face problems.

In general we recommend a dedicated network interconnect between master and slave to make sure that a base backup does not affect normal operations.

Limiting bandwidth cannot be done with pg_basebackup onboard functionality.Of course, you can use any other tool to copy data and achieve similar results.

If you are using gzip compression with –-gzip, it can work as an implicit speed brake. However, this is mainly a workaround.

Replaying the transaction log

Once we have created ourselves a shiny initial base backup, we can collect the XLOG files created by the database. When the time has come, we can take all those XLOG files and perform our desired recovery process. This works as described in this section.

Performing a basic recovery

In PostgreSQL, the entire recovery process is governed by a file named recovery.conf, which has to reside in the main directory of the base backup. It is read during startup and tells the database server where to find the XLOG archive, when to end replay, and so forth.

To get you started, we have decided to include a simple recovery.conf sample file for performing a basic recovery process:

restore_command = 'cp /archive/%f %p'
recovery_target_time = '2013-10-10 13:43:12'

The restore_command is essentially the exact counterpart of the archive_command you have seen before. While the archive_command is supposed to put data into the archive, the restore_command is supposed to provide the recovering instance with the data file by file. Again, it is a simple shell command or a simple shell script providing one chunk of XLOG after the other. The options you have here are only limited by imagination; all PostgreSQL does is to check for the return code of the code you have written, and replay the data provided by your script.

Just like in postgresql.conf, we have used %p and %f as placeholders; the meaning of those two placeholders is exactly the same.

To tell the system when to stop recovery, we can set the recovery_target_time. The variable is actually optional. If it has not been specified, PostgreSQL will recover until it runs out of XLOG. In many cases, simply consuming the entire XLOG is a highly desirable process; if something crashes, you want to restore as much data as possible. But, it is not always so. If you want to make PostgreSQL stop recovery at a specific point in time, you simply have to put the proper date in. The crucial part here is actually to know how far you want to replay XLOG; in a real work scenario this has proven to be the trickiest question to answer.

If you happen to a recovery_target_time, which is in the future, don't worry, PostgreSQL will start at the very last transaction available in your XLOG and simply stop recovery. The database instance will still be consistent and ready for action. You cannot break PostgreSQL, but, you might break your applications in case data is lost because of missing XLOG.

Before starting PostgreSQL, you have to run chmod 700 on the directory containing the base backup, otherwise, PostgreSQL will error out:

iMac:target_directoryhs$ pg_ctl -D /target_directory\
start

server starting
FATAL: data directory "/target_directory" has group or world access
DETAIL: Permissions should be u=rwx (0700).

This additional security check is supposed to make sure that your data directory cannot be read by some user accidentally. Therefore an explicit permission change is definitely an advantage from a security point of view (better safe than sorry).

Now that we have all the pieces in place, we can start the replay process by starting PostgreSQL:

iMac:target_directoryhs$ pg_ctl –D /target_directory \
start

server starting
LOG: database system was interrupted; last known up at 2013-03-10
18:04:29 CET
LOG: creating missing WAL directory "pg_xlog/archive_status"
LOG: starting point-in-time recovery to 2013-10-10 13:43:12+02
LOG: restored log file "000000010000000000000006" from archive
LOG: redo starts at 0/6000020
LOG: consistent recovery state reached at 0/60000B8
LOG: restored log file "000000010000000000000007" from archive
LOG: restored log file "000000010000000000000008" from archive
LOG: restored log file "000000010000000000000009" from archive
LOG: restored log file "00000001000000000000000A" from archive
cp: /tmp/archive/00000001000000000000000B: No such file or
directory
LOG: could not open file "pg_xlog/00000001000000000000000B" (log
file 0, segment 11): No such file or directory
LOG: redo done at 0/AD5CE40
LOG: last completed transaction was at log time 2013-03-10
18:05:33.852992+01
LOG: restored log file "00000001000000000000000A" from archive
cp: /tmp/archive/00000002.history: No such file or directory
LOG: selected new timeline ID: 2
cp: /tmp/archive/00000001.history: No such file or directory
LOG: archive recovery complete
LOG: database system is ready to accept connections
LOG: autovacuum launcher started

The amount of log produced by the database tells us everything we need to know about the restore process and it is definitely worth investigating this information in detail.

The first line indicates that PostgreSQL has found out that it has been interrupted and that it has to start recovery. From the database instance point of view, a base backup looks more or less like a crash needing some instant care by replaying XLOG; this is precisely what we want.

The next couple of lines (restored log file ...) indicate that we are replaying one XLOG file after the other that have been created since the base backup. It is worth mentioning that the replay process starts at the sixth file. The base backup knows where to start, so PostgreSQL will automatically look for the right XLOG file.

The message displayed after PostgreSQL reaches the sixth file (consistent recovery state reached at 0/60000B8) is of importance. PostgreSQL states that it has reached a consistent state. This is important. The reason is that the data files inside a base backup are actually broken by definition, but, the data files are not broken beyond repair. As long as we have enough XLOG to recover, we are very well off. If you cannot reach a consistent state, your database instance will not be usable and your recovery cannot work without providing additional XLOG.

Practically speaking, not being able to reach a consistent state usually indicates a problem somewhere in your archiving process and your system setup. If everything up to now has been working properly, there is no reason not to reach a consistent state.

Once we have reached a consistent state, one file after the other will be replayed successfully until the system finally looks for the 00000001000000000000000B file. The problem is that this file has not been created by the source database instance. Logically, an error pops up.

Not finding the last file is absolutely normal; this type of error is expected if the recovery_target_time does not ask PostgreSQL to stop recovery before it reaches the end of the XLOG stream. Don't worry, your system is actually fine. You have successfully replayed everything to the file showing up exactly before the error message.

As soon as all the XLOG has been consumed and the error message discussed earlier has been issued, PostgreSQL reports the last transaction it was able or supposed to replay, and starts up. You have a fully recovered database instance now and you can connect to the database instantly. As soon as the recovery has ended, recovery.conf will be renamed by PostgreSQL to recovery.done to make sure that it does not do any harm when the new instance is restarted later on at some point.

More sophisticated positioning in the XLOG

Up to now, we have recovered a database up to the very latest moment available in our 16 MB chunks of transaction log. We have also seen that you can define the desired recovery timestamp. But the question now is: How do you know which point in time to perform the recovery to? Just imagine somebody has deleted a table during the day. What if you cannot easily determine the recovery timestamp right away? What if you want to recover to a certain transaction?

recovery.conf has all you need. If you want to replay until a certain transaction, you can refer to recovery_target_xid. Just specify the transaction you need and configure recovery_target_inclusive to include this very specific transaction or not. Using this setting is technically easy but as mentioned before, it is not easy by far to find the right position to replay to.

In a typical setup, the best way to find a reasonable point to stop recovery is to use pause_at_recovery_target. If this is set to true, PostgreSQL will not automatically turn into a productive instance if the recovery point has been reached. Instead, it will wait for further instructions from the database administrator. This is especially useful if you don't know exactly how far to replay. You can replay, log in, see how far the database is, change to the next target time, and continue replaying in small steps.

You have to set hot_standby = on in postgresql.conf to allow reading during recovery.

Resuming recovery after PostgreSQL has paused can be done by calling a simple SQL statement: SELECT pg_xlog_replay_resume(). It will make the instance move to the next position you have set in recovery.conf. Once you have found the right place, you can set the pause_at_recovery_target back to false and call pg_xlog_replay_resume. Alternatively, you can simply utilize pg_ctl –D ... promote to stop recovery and make the instance operational.

Was this explanation too complicated? Let us boil it down to a simple list:

  • Add a restore_command to the recovery.conf file.
  • Add recovery_target_time to the recovery.conf file.
  • Set pause_at_recovery_target to true in the recovery.conf file.
  • Set hot_standby to on in postgresql.conf file.
  • Start the instance to be recovered.
  • Connect to the instance once it has reached a consistent state and as soon as it stops recovering.
  • Check if you are already where you want to be.
  • If you are not:
    • Change recovery_target_time.
    • Run SELECT pg_xlog_replay_resume().
    • Check again and repeat this section if it is necessary.

Keep in mind that once recovery has finished and once PostgreSQL has started up as a normal database instance, there is (as of 9.2) no way to replay XLOG later on.

Instead of going through this process, you can of course always use filesystem snapshots. A filesystem snapshot will always work with PostgreSQL because when you restart a snapshotted database instance, it will simply believe that it had crashed before and recover normally.

Cleaning up the XLOG on the way

Once you have configured archiving, you have to store the XLOG being created by the source server. Logically, this cannot happen forever. At some point, you really have to get rid of this XLOG; it is essential to have a sane and sustainable cleanup policy for your files.

Keep in mind, however, that you must keep enough XLOG so that you can always perform recovery from the latest base backup. But if you are certain that a specific base backup is not needed anymore, you can safely clean out all the XLOG that is older than the base backup you want to keep.

How can an administrator figure out what to delete? The best method is to simply take a look at your archive directory:

000000010000000000000005
000000010000000000000006
000000010000000000000006.00000020.backup
000000010000000000000007
000000010000000000000008

Check out the filename in the middle of the listing. The .backup file has been created by the base backup. It contains some information about the way the base backup has been made and tells the system where to continue replaying the XLOG. If the backup file belongs to the oldest base backup you need to keep around, you can safely erase all the XLOG lower than file number 6; in this case, file number 5 could be safely deleted.

In our case, 000000010000000000000006.00000020.backup contains the following information:

START WAL LOCATION: 0/6000020 (file 000000010000000000000006)
STOP WAL LOCATION: 0/60000E0 (file 000000010000000000000006)
CHECKPOINT LOCATION: 0/6000058
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2013-03-10 18:04:29 CET
LABEL: pg_basebackup base backup
STOP TIME: 2013-03-10 18:04:30 CET

The .backup file will also provide you with relevant information such as the time the base backup has been made. It is plain there and so it should be easy for ordinary users to read this information.

As an alternative to deleting all the XLOG files at one point, it is also possible to clean them up during replay. One way is to hide an rm command inside your restore_command. While this is technically possible, it is not necessarily wise to do so (what if you want to recover again?).

Also, you can add the recovery_end_command command to your recovery.conf file. The goal of recovery_end_command is to allow you to automatically trigger some action as soon as the recovery ends. Again, PostgreSQL will call a script doing precisely what you want. You can easily abuse this setting to clean up the old XLOG when the database declares itself active.

Switching the XLOG files

If you are going for an XLOG file-based recovery, you have seen that one XLOG will be archived every 16 MB. What would happen if you never manage to create 16 MB of changes? What if you are a small supermarket, which just makes 50 sales a day? Your system will never manage to fill up 16 MB in time.

However, if your system crashes, the potential data loss can be seen as the amount of data in your last unfinished XLOG file. Maybe this is not good enough for you.

A postgresql.conf setting on the source database might help. The archive_timeout tells PostgreSQL to create a new XLOG file at least every x seconds. So, if you are this little supermarket, you can ask the database to create a new XLOG file every day shortly before you are heading for home. In this case, you can be sure that the data of the day will safely be on your backup device already.

It is also possible to make PostgreSQL switch to the next XLOG file by hand. A procedure named pg_switch_xlog() is provided by the server to do the job:

test=# SELECT pg_switch_xlog();
pg_switch_xlog
----------------
0/17C0EF8
(1 row)

You might want to call this procedure when some important patch job has finished or if you want to make sure that a certain chunk of data is safely in your XLOG archive.

Summary

In this article, you have learned about Point-In-Time-Recovery, which is a safe and easy way to restore your PostgreSQL database to any desired point in time. PITR will help you to implement better backup policies and make your setups more robust.

Resources for Article:


Further resources on this subject:


PostgreSQL Replication Understand basic replication concepts and efficiently replicate PostgreSQL using high-end techniques to protect your data and run your server without interruptions with this book and ebook
Published: August 2013
eBook Price: $29.99
Book Price: $49.99
See more
Select your format and quantity:

About the Author :


Hans-Jürgen Schönig

Hans-Jürgen Schönig has 15 years of experience with PostgreSQL. He is the CEO of a PostgreSQL consulting and support company called "Cybertec Schönig & Schönig GmbH" (www.postgresql-support.de), which has successfully served countless customers around the globe.

Before founding Cybertec Schönig & Schönig GmbH in the year 2000, he worked as database developer at a private research company focusing on the Austrian labor market where he was primarily focusing on data mining and forecast models.

He has written several books dealing with PostgreSQL already.

Zoltan Böszörmenyi

Zoltan Böszörmenyi has over 15 years experience in the software development and IT industry. He started working with PostgreSQL in 1995 and has since been working exclusively developing and implementing solutions using it. Among many other things, he has extended ECPG, the embedded SQL-in-C flavor in PostgreSQL. He has also developed unique solutions for POS hardware. He also occasionally does training on PostgreSQL. He has held senior-level positions but now serves as the CTO of Cybertec Schönig & Schönig GmbH.

Books From Packt


PostgreSQL Server Programming
PostgreSQL Server Programming

Instant PostgreSQL Starter [Instant]
Instant PostgreSQL Starter [Instant]

PostgreSQL 9 Admin Cookbook
PostgreSQL 9 Admin Cookbook

Microsoft SQL Server 2012 Integration Services: An Expert Cookbook
Microsoft SQL Server 2012 Integration Services: An Expert Cookbook

PostgreSQL 9.0 High Performance
PostgreSQL 9.0 High Performance

Instant PostgreSQL Backup and Restore How-to [Instant]
Instant PostgreSQL Backup and Restore How-to [Instant]

Mastering phpMyAdmin 3.4 for Effective MySQL Management
Mastering phpMyAdmin 3.4 for Effective MySQL Management

MySQL Management and Administration with Navicat
MySQL Management and Administration with Navicat


No votes yet

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
y
X
D
S
h
S
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software