Troubleshooting PostgreSQL: Intercept problems and challenges typically faced by PostgreSQL database administrators with the best troubleshooting techniques

What do you get with Print?

Instant access to your digital copy whilst your Print order is Shipped

Paperback book shipped to your preferred address

Redeem a companion digital copy on all Print orders

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

Troubleshooting PostgreSQL

Chapter 1. Installing PostgreSQL

In this chapter, we will cover what can go wrong during the installation process and what can be done to avoid those things from happening. At the end of the chapter, you should be able to avoid all of the pitfalls, traps, and dangers you might face during the setup process.

For this chapter, I have compiled some of the core problems that I have seen over the years, as follows:

Deciding on a version during installation
Memory and kernel issues
Preventing problems by adding checksums to your database instance
Wrong encodings and subsequent import errors
Polluted template databases
Killing the postmaster badly

At the end of the chapter, you should be able to install PostgreSQL and protect yourself against the most common issues popping up immediately after installation.

Deciding on a version number

The first thing to work on when installing PostgreSQL is to decide on the version number. In general, a PostgreSQL version number consists of three digits. Here are some examples:

9.4.0, 9.4.1, or 9.4.2
9.3.4, 9.3.5, or 9.3.6

The last digit is the so-called minor release. When a new minor release is issued, it generally means that some bugs have been fixed (for example, some time zone changes, crashes, and so on). There will never be new features, missing functions, or changes of that sort in a minor release. The same applies to something truly important—the storage format. It won't change with a new minor release.

These little facts have a wide range of consequences. As the binary format and the functionality are unchanged, you can simply upgrade your binaries, restart PostgreSQL, and enjoy your improved minor release.

When the digit in the middle changes, things get a bit more complex. A changing middle digit is called a major release. It usually happens around once a year and provides you with significant new functionality. If this happens, we cannot just stop or start the database anymore to replace the binaries. In this case, we face a real migration process, which will be discussed later on in this book.

If the first digit changes, something really important has happened. Examples of such important events were introductions of SQL (6.0), the Windows port (8.0), streaming replication (9.0), and so on. Technically, there is no difference between the first and the second digit—they mean the same thing to the end user. However, a migration process is needed.

The question that now arises is this: if you have a choice, which version of PostgreSQL should you use? Well, in general, it is a good idea to take the latest stable release. In PostgreSQL, every version number following the design patterns I just outlined is a stable release.

Tip

As of PostgreSQL 9.4, the PostgreSQL community provides fixes for versions as old as PostgreSQL 9.0. So, if you are running an older version of PostgreSQL, you can still enjoy bug fixes and so on.

Methods of installing PostgreSQL

Before digging into troubleshooting itself, the installation process will be outlined. The following choices are available:

Installing binary packages
Installing from source

Installing from source is not too hard to do. However, this chapter will focus on installing binary packages only. Nowadays, most people (not including me) like to install PostgreSQL from binary packages because it is easier and faster.

Basically, two types of binary packages are common these days: RPM (Red Hat-based) and DEB (Debian-based). In this chapter, installing these two types of packages will be discussed.

Installing RPM packages

Most Linux distributions include PostgreSQL. However, the shipped PostgreSQL version is somewhat ancient in many cases. Recently, I saw a Linux distribution that still featured PostgreSQL 8.4, a version already abandoned by the PostgreSQL community. Distributors tend to ship older versions to ensure that new bugs are not introduced into their systems. For high-performance production servers, outdated versions might not be the best idea, however.

Clearly, for many people, it is not feasible to run long-outdated versions of PostgreSQL. Therefore, it makes sense to make use of repositories provided by the community. The Yum repository shows which distributions we can use RPMs for, at http://yum.postgresql.org/repopackages.php.

Once you have found your distribution, the first thing is to install this repository information for Fedora 20 as it is shown in the next listing:

yum install http://yum.postgresql.org/9.4/fedora/fedora-20-x86_64/pgdg-fedora94-9.4-1.noarch.rpm

Once the repository has been added, we can install PostgreSQL:

yum install postgresql94-server postgresql94-contrib
/usr/pgsql-9.4/bin/postgresql94-setup initdb
systemctl enable postgresql-9.4.service
systemctl start postgresql-9.4.service

First of all, PostgreSQL 9.4 is installed. Then a so-called database instance is created (initdb). Next, the service is enabled to make sure that it is always there after a reboot, and finally, the postgresql-9.4 service is started.

Tip

The term database instance is an important concept. It basically describes an entire PostgreSQL environment (setup). A database instance is fired up when PostgreSQL is started. Databases are part of a database instance.

Installing Debian packages

Installing Debian packages is also not too hard. By the way, the process on Ubuntu as well as on some other similar distributions is the same as that on Debian, so you can directly use the knowledge gained from this section for other distributions.

A simple file called /etc/apt/sources.list.d/pgdg.list can be created, and a line for the PostgreSQL repository (all the following steps can be done as root user or using sudo) can be added:

deb http://apt.postgresql.org/pub/repos/apt/ YOUR_DEBIAN_VERSION_HERE-pgdg main

So, in the case of Debian Wheezy, the following line would be useful:

deb http://apt.postgresql.org/pub/repos/apt/ wheezy-pgdg main

Once we have added the repository, we can import the signing key:

$# wget --quiet -O - \
  https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -
OK

Voilà! Things are mostly done. In the next step, the repository information can be updated:

apt-get update

Once this has been done successfully, it is time to install PostgreSQL:

apt-get install "postgresql-9.4"

If no error is issued by the operating system, it means you have successfully installed PostgreSQL. The beauty here is that PostgreSQL will fire up automatically after a restart. A simple database instance has also been created for you.

If everything has worked as expected, you can give it a try and log in to the database:

root@chantal:~# su - postgres
$ psql postgres
psql (9.4.1)
Type "help" for help.
postgres=#

Memory and kernel issues

After this brief introduction to installing PostgreSQL, it is time to focus on some of the most common problems.

Fixing memory issues

Some of the most important issues are related to the kernel and memory. Up to version 9.2, PostgreSQL was using the classical system V shared memory to cache data, store locks, and so on. Since PostgreSQL 9.3, things have changed, solving most issues people had been facing during installation.

However, in PostgreSQL 9.2 or before, you might have faced the following error message:

FATAL: Could not create shared memory segment
DETAIL: Failed system call was shmget (key=5432001, size=1122263040, 03600)
HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter. You can either reduce the request size or reconfigure the kernel with larger SHMMAX. To reduce the request size (currently 1122263040 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.

Tip

If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter, in which case raising the request size or reconfiguring SHMMIN is called for.

The PostgreSQL documentation contains more information about shared memory configuration.

If you are facing a message like this, it means that the kernel does not provide you with enough shared memory to satisfy your needs. Where does this need for shared memory come from? Back in the old days, PostgreSQL stored a lot of stuff, such as the I/O cache (shared_buffers, locks, autovacuum-related information and a lot more), in the shared memory. Traditionally, most Linux distributions have had a tight grip on the memory, and they don't issue large shared memory segments; for example, Red Hat has long limited the maximum amount of shared memory available to applications to 32 MB. For most applications, this is not enough to run PostgreSQL in a useful way—especially not if performance does matter (and it usually does).

To fix this problem, you have to adjust kernel parameters. Managing Kernel Resources of the PostgreSQL Administrator's Guide will tell you exactly why we have to adjust kernel parameters.

For more information, check out the PostgreSQL documentation at http://www.postgresql.org/docs/9.4/static/kernel-resources.htm.

This chapter describes all the kernel parameters that are relevant to PostgreSQL. Note that every operating system needs slightly different values here (for open files, semaphores, and so on).

Since not all operating systems can be covered in this little book, only Linux and Mac OS X will be discussed here in detail.

Adjusting kernel parameters for Linux

In this section, parameters relevant to Linux will be covered. If shmget (previously mentioned) fails, two parameters must be changed:

$ sysctl -w kernel.shmmax=17179869184
$ sysctl -w kernel.shmall=4194304

In this example, shmmax and shmall have been adjusted to 16 GB. Note that shmmax is in bytes while shmall is in 4k blocks. The kernel will now provide you with a great deal of shared memory.

Also, there is more; to handle concurrency, PostgreSQL needs something called semaphores. These semaphores are also provided by the operating system. The following kernel variables are available:

SEMMNI: This is the maximum number of semaphore identifiers. It should be at least ceil((max_connections + autovacuum_max_workers + 4) / 16).
SEMMNS: This is the maximum number of system-wide semaphores. It should be at least ceil((max_connections + autovacuum_max_workers + 4) / 16) * 17, and it should have room for other applications in addition to this.
SEMMSL: This is the maximum number of semaphores per set. It should be at least 17.
SEMMAP: This is the number of entries in the semaphore map.
SEMVMX: This is the maximum value of the semaphore. It should be at least 1000.

Don't change these variables unless you really have to. Changes can be made with sysctl, as was shown for the shared memory.

Adjusting kernel parameters for Mac OS X

If you happen to run Mac OS X and plan to run a large system, there are also some kernel parameters that need changes. Again, /etc/sysctl.conf has to be changed. Here is an example:

kern.sysv.shmmax=4194304
kern.sysv.shmmin=1
kern.sysv.shmmni=32
kern.sysv.shmseg=8
kern.sysv.shmall=1024

Mac OS X is somewhat nasty to configure. The reason is that you have to set all five parameters to make this work. Otherwise, your changes will be silently ignored, and this can be really painful.

In addition to that, it has to be assured that SHMMAX is an exact multiple of 4096. If it is not, trouble is near.

If you want to change these parameters on the fly, recent versions of OS X provide a systcl command just like Linux. Here is how it works:

sysctl -w kern.sysv.shmmax
sysctl -w kern.sysv.shmmin
sysctl -w kern.sysv.shmmni
sysctl -w kern.sysv.shmseg
sysctl -w kern.sysv.shmall

Fixing other kernel-related limitations

If you are planning to run a large-scale system, it can also be beneficial to raise the maximum number of open files allowed. To do that, /etc/security/limits.conf can be adapted, as shown in the next example:

postgres    hard    nofile    1024
postgres    soft    nofile    1024

This example says that the postgres user can have up to 1,024 open files per session.

Note that this is only important for large systems; open files won't hurt an average setup.

Avoiding template pollution

It is somewhat important to understand what happens during the creation of a new database in your system. The most important point is that CREATE DATABASE (unless told otherwise) clones the template1 database, which is available in all PostgreSQL setups.

This cloning has some important implications. If you have loaded a very large amount of data into template1, all of that will be copied every time you create a new database. In many cases, this is not really desirable but happens by mistake. People new to PostgreSQL sometimes put data into template1 because they don't know where else to place new tables and so on. The consequences can be disastrous.

However, you can also use this common pitfall to your advantage. You can place the functions you want in all your databases in template1 (maybe for monitoring or whatever benefits).

Killing the postmaster

After PostgreSQL has been installed and started, many people wonder how to stop it. The most simplistic way is, of course, to use your service postgresql stop or /etc/init.d/postgresql stop init scripts.

However, some administrators tend to be a bit crueler and use kill -9 to terminate PostgreSQL. In general, this is not really beneficial because it will cause some nasty side effects. Why is this so?

The PostgreSQL architecture works like this: when you start PostgreSQL you are starting a process called postmaster. Whenever a new connection comes in, this postmaster forks and creates a so-called backend process (BE). This process is in charge of handling exactly one connection. In a working system, you might see hundreds of processes serving hundreds of users. The important thing here is that all of those processes are synchronized through some common chunk of memory (traditionally, shared memory, and in the more recent versions, mapped memory), and all of them have access to this chunk. What might happen if a database connection or any other process in the PostgreSQL infrastructure is killed with kill -9? A process modifying this common chunk of memory might die while making a change. The process killed cannot defend itself against the onslaught, so who can guarantee that the shared memory is not corrupted due to the interruption?

This is exactly when the postmaster steps in. It ensures that one of these backend processes has died unexpectedly. To prevent the potential corruption from spreading, it kills every other database connection, goes into recovery mode, and fixes the database instance. Then new database connections are allowed again.

While this makes a lot of sense, it can be quite disturbing to those users who are connected to the database system. Therefore, it is highly recommended not to use kill -9. A normal kill will be fine.

Keep in mind that a kill -9 cannot corrupt your database instance, which will always start up again. However, it is pretty nasty to kick everybody out of the system just because of one process!

What you will learn

Detect bottlenecks caused by missing indexes

Optimize your data structures for optimal memory footprint

Write better, performanceoptimized stored procedures

Monitor PostgreSQL in an efficient way and deal with system corruption and filesystem issues

Detect replicationrelated problems and make replication more failsafe

Fix missing indexes and problems arising out of transaction locking

Find slow queries and optimize your system for speed

What do you get with Print?

Instant access to your digital copy whilst your Print order is Shipped

Paperback book shipped to your preferred address

Redeem a companion digital copy on all Print orders

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

Frequently bought together

$50.99

$63.99

$34.99

Total $ 149.97

Charles Feduke Aug 20, 2015

Phenomenal book, gets right to the point and reads quickly and conversationally. We had some issues with performance with ingesting large data sets and I was able to read this book in an afternoon and figure out where we needed to go in order to speed things up. There's a ton of just really great info in this book and pretty much no beginner info, other than clarifications where the official documentation might be ambiguous.The only missing thing in the indexing chapter is a discussion about full text indexing (which does a really great job of performing better than the pg_trgm extension suggested in that chapter for our particular use case - wildcard matching with unanchored LIKE predicates).

Amazon Verified review

Charles C. Jensen May 27, 2016

Excellent guide on troubleshooting PostgreSQL--offers a few good tips not covered as thoroughly in my other PostgreSQL books. Very handy reference with with examples and illustrations.

Lisa P. Jul 19, 2017

Libro che offre spunti davvero molto interessanti per configurare postgres e strutturare/gestire al meglio i propri database. Utile anche da affiancare alla documentazione ufficiali.

tony Jul 12, 2015

I felt like I learned quite a bit by reading this book. As someone who works on an ops team that uses PostgreSQL as part of our stack, there were a few things that I found fairly helpful such as the queries related to finding out how Postgres determines the best method of execution and how the order of a query is executed. The query order outline was very helpful. Additionally, this book highlights some useful functionality that is already contained in PostgreSQL, such as the distance operator for geometric values, that save time by including it as a query instead of having to write a specific function.In terms of its organization, it seems to touch on all of the most common pain points such as indices, normalization (although it doesn't touch too much on the concept except to explain how this can get out of hand), replication, joins, etc. I'll continue to use this book as a reference, especially when going through the PostgreSQL documentation to research a problem finds daunting (although I'm usually able to find what I'm looking for with a simple search).I would note that this book is, like PostgreSQL in general, much more for Linux, a little less for OSX and a lot less for Windows. Some sections, such as modifying the OS parameters (primarily kernels) for optimization require some background knowledge as the different kernel parameters are not explained so much as they are presented as "change your kernel parameter to this". Other concepts, such as checksums, are recommended but not necessarily explained very much (although a Google search or browse on Wikipedia should clear that up if you're not familiar).I don't think this book is so much for someone who's working with PostgreSQL for the very first time, but intermediate users should not have any issues following along and picking up tips to optimize their performance. The explanations on data type comparisons were pretty good as well.

Luca Ferrari Dec 05, 2017

I work with PostgreSQL since version 7.3, I don't claim to be an expert, but I believe I know enough concepts about such system. The book is interesting, but it just points out to possible problems the author had faced during his life. I was expecting something more complex and deeper, like an index checklist or hardware sizing.

Troubleshooting PostgreSQL: Intercept problems and challenges typically faced by PostgreSQL database administrators with the best troubleshooting techniques

What do you get with Print?

Troubleshooting PostgreSQL

Chapter 1. Installing PostgreSQL

Deciding on a version number

Tip

Methods of installing PostgreSQL

Installing RPM packages

Tip

Installing Debian packages

Memory and kernel issues

Fixing memory issues

Tip

Adjusting kernel parameters for Linux

Adjusting kernel parameters for Mac OS X

Fixing other kernel-related limitations

Adding checksums to a database instance

Preventing encoding-related issues

Avoiding template pollution

Killing the postmaster

Summary

Page 1 of 8

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

About the author

FAQs

Troubleshooting PostgreSQL: Intercept problems and challenges typically faced by PostgreSQL database administrators with the best troubleshooting techniques

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

About the author

FAQs

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access