Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Troubleshooting PostgreSQL

You're reading from  Troubleshooting PostgreSQL

Product type Book
Published in Mar 2015
Publisher Packt
ISBN-13 9781783555314
Pages 164 pages
Edition 1st Edition
Languages

Table of Contents (17) Chapters

Troubleshooting PostgreSQL
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Installing PostgreSQL Creating Data Structures Handling Indexes Reading Data Efficiently and Correctly Getting Transactions and Locking Right Writing Proper Procedures PostgreSQL Monitoring Fixing Backups and Replication Handling Hardware and Software Disasters A Standard Approach to Troubleshooting Index

Chapter 9. Handling Hardware and Software Disasters

In this chapter, you will be introduced to issues related to hardware and software failure. Even expensive hardware is far from perfect, and it may fail from time to time. Troubles could occur in the form of failing memory, broken hard drives, or damaged filesystems, kernel-related issues and so on. In this chapter, some of the most common troubles will be covered.

Here are some of the important topics we will cover:

  • Checksums—Preventing silent corruption

  • Zero out damaged pages

  • Dealing with index corruption

  • Dumping single pages

  • Resetting the transaction log

  • Power out related issues

Checksums – preventing silent corruption


When reading books about high availability and fault tolerance, in general, I always get the impression that most systems work on the assumption that crashes are atomic. What do I mean by that? Let's imagine there are two servers; one server has crashed and the other takes over. Unfortunately, crashes are not like this in many cases. In the real world, crashes are often far from atomic. Trouble is likely to build up gradually until things go really wrong. Consider memory corruption; when the memory goes wrong, it might not lead to an instant crash, and even when it does, the system might restart again without problems before troubles return. In many cases, a problem that accumulates silently is way more dangerous than a simple crash. The main danger in the case of silent corruption is that problems could gradually spread in the system without anybody noticing.

To prevent problems related to silent corruption caused by memory consumption, PostgreSQL...

Zeroing out damaged pages


Once in a while, things do go wrong even if all precautions have been taken. A filesystem might lose some blocks here and there or a disk might simply lose a couple of sectors. The following might happen on the PostgreSQL side in this case:

test=# SELECT count(*) FROM t_test;
ERROR:  invalid page in block 535 of relation 
    base/16384/16436

If a block (or a couple of blocks) has fallen victim to a problem in the filesystem, PostgreSQL will error out and tell the end user that the query cannot be completed anymore.

In the scenario outlined here, you can be certain of one thing: some data has been lost in the storage system. It is important to point out that loss of data is virtually never caused by PostgreSQL itself. In most cases, we are talking about broken hardware, broken filesystems, or some other memory-related problem that has spread its blessings to your data files.

If storage-related problems arise, the general rule is, "don't touch stuff; back up stuff!"...

Dealing with index corruption


Once in a while, an index may blow up. In most cases, broken indexes are caused by hardware failures. Once again, the most suspicious thing in the case of index corruption is the RAM. So, always check your system memory in case an index has blown up.

In general, the same rules apply: take a snapshot first, and then simply recreate those broken indexes. REINDEX can help in this case:

test=# \h REINDEX
Command:     REINDEX
Description: rebuild indexes
Syntax:
REINDEX { INDEX | TABLE | DATABASE | SYSTEM } 
  name [ FORCE ]

Keep in mind that REINDEX needs a SHARE lock, which ensures that no writes can happen. In some setups, this could turn out to be a problem. To avoid locking, it makes sense to turn to CREATE INDEX CONCURRENTLY. A concurrent index build takes longer than a normal build. However, it avoids a nasty table lock that blocks writes. It can be very beneficial to many applications.

In general, an index-related problem is usually not as bad as a bad block...

Dumping individual pages


Advanced users who really want to inspect their broken systems in detail can focus their attention on pageinspect. It allows them to inspect a single data or index page on the disk. This requires a fair amount of knowledge about the inner working of PostgreSQL, but it can give valuable insights into what might have gone wrong in a system.

To make use of the module, first it has to be installed. It works like this:

test=# CREATE EXTENSION pageinspect;
CREATE EXTENSION

Once the module is in place, a table can be inspected. To do so, a couple of functions are available. The most important function is get_raw_page. It will be needed for subsequent inspections:

get_raw_page(relname text, fork text, blkno int) 
  returns bytea

The get_raw_page function returns a bytea field containing the content of the page. The first parameter tells us the relation name we want to inspect. The second parameter needs the so-called relation fork. In PostgreSQL, a table does not only consist...

Resetting the transaction log


In this section, resetting the transaction log will be covered. Before we get started, I want to personally issue a warning: don't take this lightly. Resetting the transaction log is a harsh thing to do. It almost always leads to some data loss, and it does not guarantee that your data will still be fully consistent. Resetting xlog is the last thing to consider when things go south.

The same rules as we covered before apply here: always take a snapshot of the filesystem or shut down the database, and create a binary copy of the data directory before using pg_resetxlog. Let me stress my point. In my 15-year long career as a PostgreSQL consultant, I have had to do this only a handful of times. Usually, this can be resolved in some other way.

However, if PostgreSQL does not start up anymore because the xlog is broken, pg_resetxlog can come to your rescue. Here is how the syntax works:

$ pg_resetxlog --help
pg_resetxlog resets the PostgreSQL transaction log.

Usage...

Power-out-related issues


Hardware is not always a problem. It might easily happen that you face something as simple as a power out. In general, PostgreSQL can easily survive power outs. The transaction log takes care of all failure-related issues, and with the help of xlog, it is always possible to bring the database back to a consistent state. It is safely possible to pull the plug and rely on the quality of PostgreSQL.

However, there are some nasty cases that can lead to even nastier troubles. On commit, PostgreSQL relies on the fact that data can be flushed to the disk. A system call, fsync, is used to ensure that everything relevant is forced to the disk. However, what happens if fsync cannot do its job? What if data cannot be (or data is not) flushed to the disk? If your system is up and running, everything is fine. The data will make it to the disk at some point, and life is good! However, what if a crash happens? The situation is grimmer. In short, anything can happen. If PostgreSQL...

Summary


In this chapter, the basic and most common problems were discussed. This includes hardware failure as well as problems related to broken indexes, broken xlog, and a lot more.

In the next chapter, everything you learned in this book will be summed up and presented in one extensive example.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Troubleshooting PostgreSQL
Published in: Mar 2015 Publisher: Packt ISBN-13: 9781783555314
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}