You're reading from PostgreSQL 14 Administration Cookbook

Product typeBook

Published inMar 2022

PublisherPackt

ISBN-139781803248974

Edition1st Edition

Concepts

Databases

Authors (2):

Simon Riggs

Gianni Ciolli

View More author details

Chapter 9: Regular Maintenance

In these busy times, many people believe if it ain't broken, don't fix it. I believe that too, but it isn't an excuse for not taking action to maintain your database servers and be sure that nothing will break.

Database maintenance is about making your database run smoothly.

PostgreSQL prefers regular maintenance, so please read the Planning maintenance recipe for more information.

We recognize that you're here for a reason and are looking for a quick solution to your needs. You're probably thinking – Fix the problem first, and I'll plan later. So, off we go!

PostgreSQL provides a utility command named VACUUM, which is a reference to a garbage collector that sweeps up all of the bad things and fixes them – or at least most of them. That's the single most important thing you need to remember to do – I say single because closely connected...

Controlling automatic database maintenance

autovacuum is enabled by default in PostgreSQL and mostly does a great job of maintaining your PostgreSQL database. We say mostly because it doesn't know everything you do about the database, such as the best time to perform maintenance actions. Let's explore the settings that can be tuned so that you can use vacuums efficiently.

Getting ready

Exercising control requires some thinking about what you want:

What are the best times of day to do things? When are system resources more available?
Which days are quiet, and which are not?
Which tables are critical to the application, and which are not?

How to do it…

Perform the following steps:

The first thing you must do is make sure that autovacuum is switched on, which is the default. Check that you have the following parameters enabled in your postgresql.conf file:
```
autovacuum = on 
track_counts = on 
```
PostgreSQL...

Avoiding auto-freezing and page corruptions

In the life cycle of a row, there are two routes that a row can take in PostgreSQL – a row version dies and needs to be removed by VACUUM, or a row version gets old enough and needs to be frozen, a task that is also performed by the VACUUM process. The removal of dead rows is easy to understand, while the second seems strange and surprising.

PostgreSQL uses internal transaction identifiers that are 4 bytes long, so we only have 232 transaction IDs (about four billion). PostgreSQL starts again from the beginning when that wraps around, circularly allocating new identifiers. The reason we do this is that moving to an 8-byte identifier has various other negative effects and costs that we would rather not pay for, so we keep the 4-byte transaction identifier. The impact is that we need to do regular sweeps of the entire database to mark tuples as frozen, meaning they are visible to all users – that's...

Removing issues that cause bloat

Bloat can be caused by long-running queries or long-running write transactions that execute alongside write-heavy workloads. Resolving that is mostly down to understanding the workloads that are running on the server.

Getting ready

Look at the age of the oldest snapshots that are running, like this:

postgres=# SELECT now() -
  CASE
  WHEN backend_xid IS NOT NULL
  THEN xact_start
  ELSE query_start END
  AS age
, pid
, backend_xid AS xid
, backend_xmin AS xmin
, state
FROM  pg_stat_activity
WHERE backend_type = 'client backend'
ORDER BY 1 DESC;
age             |  pid  |   xid    |   xmin   |        state       
----------------+--...

Removing old prepared transactions

You may have been routed here from other recipes, so you might not even know what prepared transactions are, let alone what an old prepared transaction looks like.

The good news is that prepared transactions don't just happen at random; they happen in certain situations. If you don't know what I'm talking about, that's OK! You don't need to, and better still, you probably don't have any prepared transactions either.

Prepared transactions are part of the two-phase commit feature, also known as 2PC. A transaction commits in two stages rather than one, allowing multiple databases to have synchronized commits. Its typical use is to combine multiple so-called resource managers using the XA protocol, which is usually provided by a Transaction Manager (TM), as used by the Java Transaction API (JTA) and others. If none of this means anything to you, then you...

Actions for heavy users of temporary tables

If you are a heavy user of temporary tables in your applications, then there are some additional actions that you may need to perform.

How to do it…

There are four main things to check, which are as follows:

Make sure you run VACUUM on system tables or enable autovacuum so that it will do this for you.
Monitor running queries to see how many temporary files are active and how large they are.
Tune the memory parameters. Think about increasing the temp_buffers parameter, but be careful not to over-allocate memory.
Separate the temp table's I/O. In a query-intensive system, you may find that reads/writes to temporary files exceed reads/writes on permanent data tables and indexes. In this case, you should create new tablespace(s) on separate disks, and ensure that the temp_tablespaces parameter is configured to use the additional tablespace(s).

How...

Identifying and fixing bloated tables and indexes

PostgreSQL implements Multiversion Concurrency Control (MVCC), which allows users to read data at the same time as writers make changes. This is an important feature for concurrency in database applications as it can allow the following:

Better performance because of fewer locks
Greatly reduced deadlocking
Simplified application design and management

Bloated tables and indexes are a natural consequence of MVCC design in PostgreSQL. Bloat is caused mainly by updates, as we must retain both the old and new updates for a certain period. Since these extra row versions are required to provide MVCC, some amount of bloat is normal and acceptable. Tuning to remove bloat completely isn't useful and probably a waste of time.

Bloating results in increased disk consumption, as well as performance loss – if a table is twice as big as...

Monitoring and tuning a vacuum

This recipe covers both the VACUUM command and autovacuum, which I refer to collectively as vacuums (non-capitalized).

If you're currently waiting for a long-running vacuum (or autovacuum) to finish, go straight to the How to do it... section.

If you've just had a long-running vacuum complete, then you may want to think about setting a few parameters for next time, so read the How it works… section.

Getting ready

Let's watch what happens when we run a large VACUUM. Don't run VACUUM FULL, because it runs for a long time while holding an AccessExclusiveLock on the table. Ouch.

First, locate which process is running this VACUUM by using the pg_stat_activity view to identify the specific pid (34399 is just an example).

How to do it…

Repeatedly execute the following query to see the progress of the...

Maintaining indexes

Just as tables can become bloated, so can indexes. However, reusing space in indexes is much less effective. In the Identifying and fixing bloated tables and indexes recipe, you saw that non-HOT updates can cause bloated indexes. Non-primary key indexes are also prone to some bloat from normal INSERT commands, as is common in most relational databases. Indexes can become a problem in many database applications that involve a high proportion of INSERT and DELETE commands.

autovacuum does not detect bloated indexes, nor does it do anything to rebuild indexes. Therefore, we need to look at other ways to maintain indexes.

Getting ready

PostgreSQL supports commands that will rebuild indexes for you. The client utility, reindexdb, allows you to execute the REINDEX command conveniently from the operating system:

$ reindexdb

This executes the SQL REINDEX command on every table in the default...

Finding unused indexes

Selecting the correct set of indexes for a workload is known to be a hard problem. It usually involves trial and error by developers and DBAs to get a good mix of indexes.

Tools for identifying slow queries exist and many SELECT statements can be improved by adding an index.

What many people forget is to check whether the mix of indexes remains valuable over time, which is something for the DBA to investigate and optimize.

How to do it…

PostgreSQL keeps track of each access against an index. We can view that information and use it to see whether an index is unused, as follows:

postgres=# SELECT schemaname, relname, indexrelname, idx_scan
FROM pg_stat_user_indexes ORDER BY idx_scan;
 schemaname |       indexrelname       | idx_scan
------------+--------------------------+----------
 public     | pgbench_accounts_bid_idx | ...

Carefully removing unwanted indexes

Carefully removing? Do you mean pressing Enter gently after typing DROP INDEX? Err, no!

The reasoning is that it takes a long time to build an index and a short time to drop it.

What we want is a way of removing an index so that if we discover that removing it was a mistake, we can put the index back again quickly.

Getting ready

The following query will list all invalid indexes, if any:

SELECT ir.relname AS indexname 
, it.relname AS tablename 
, n.nspname AS schemaname 
FROM pg_index i 
JOIN pg_class ir ON ir.oid = i.indexrelid 
JOIN pg_class it ON it.oid = i.indrelid 
JOIN pg_namespace n ON n.oid = it.relnamespace 
WHERE NOT i.indisvalid;

Take note of these indexes so that you can tell whether a given index is invalid later because we marked it as invalid during this recipe, in which case it can safely be marked as valid, or because it was already invalid for other reasons.

How to do it…

...

Planning maintenance

Monitoring systems are not a substitute for good planning. They alert you to unplanned situations that need attention. The more unplanned things you respond to, the greater the chance that you will need to respond to multiple emergencies at once. And when that happens, something will break. Ultimately, that is your fault. If you wish to take your responsibilities seriously, you should plan for this.

How to do it…

This recipe is all about planning, so we'll provide discussion points rather than portions of code. We'll cover the main points that should be addressed and provide a list of points as food for thought, around which the actual implementation should be built:

Let's break a rule: If you don't have a backup, take one now. I mean now – go on, off you go! Then, let's talk some more about planning maintenance. If you already have, well done! It's hard to keep your job as a DBA if you lose data...

The rest of the chapter is locked

You have been reading a chapter from

PostgreSQL 14 Administration Cookbook

Published in: Mar 2022Publisher: PacktISBN-13: 9781803248974

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Simon Riggs

Simon Riggs is the CTO of 2ndQuadrant, having contributed to PostgreSQL as a major developer and committer for 14 years. He has written and designed features for replication, performance, BI, management, and security. Under his guidance, 2ndQuadrant is now a leading developer of open source PostgreSQL, serving hundreds of clients in USA, Europe, and worldwide. Simon is a frequent speaker at many conferences on PostgreSQL Futures. He has worked as a database architect for 30 years.
Read more about Simon Riggs

Gianni Ciolli

Gianni Ciolli is the Vice President for Solutions Architecture at EnterpriseDB (EDB). As a PostgreSQL consultant, he has driven many successful enterprise deployments for customers in every part of the globe.Gianni is respected worldwide as a popular speaker and trainer at many PostgreSQL conferences in Europe and abroad over the last 14 years. He has worked with free and open-source software since the 1990s as an active member of the community (Prato Linux User Group, and Italian PostgreSQL Users Group). Gianni has a Ph.D. in Mathematics from the University of Florence. He lives in London with his son. His other interests include music, drama, poetry and athletics.
Read more about Gianni Ciolli

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages