Reader small image

You're reading from  PostgreSQL 14 Administration Cookbook

Product typeBook
Published inMar 2022
PublisherPackt
ISBN-139781803248974
Edition1st Edition
Concepts
Right arrow
Authors (2):
Simon Riggs
Simon Riggs
author image
Simon Riggs

Simon Riggs is the CTO of 2ndQuadrant, having contributed to PostgreSQL as a major developer and committer for 14 years. He has written and designed features for replication, performance, BI, management, and security. Under his guidance, 2ndQuadrant is now a leading developer of open source PostgreSQL, serving hundreds of clients in USA, Europe, and worldwide. Simon is a frequent speaker at many conferences on PostgreSQL Futures. He has worked as a database architect for 30 years.
Read more about Simon Riggs

Gianni Ciolli
Gianni Ciolli
author image
Gianni Ciolli

Gianni Ciolli is the Vice President for Solutions Architecture at EnterpriseDB (EDB). As a PostgreSQL consultant, he has driven many successful enterprise deployments for customers in every part of the globe.Gianni is respected worldwide as a popular speaker and trainer at many PostgreSQL conferences in Europe and abroad over the last 14 years. He has worked with free and open-source software since the 1990s as an active member of the community (Prato Linux User Group, and Italian PostgreSQL Users Group). Gianni has a Ph.D. in Mathematics from the University of Florence. He lives in London with his son. His other interests include music, drama, poetry and athletics.
Read more about Gianni Ciolli

View More author details
Right arrow

Chapter 7: Database Administration

In Chapter 5, Tables and Data, we looked at the contents of tables and various complexities. Now, we'll turn our attention to larger administration tasks that we need to perform from time to time, such as creating things, moving things around, storing things neatly, and removing them when they're no longer required.

The most sensible way to perform major administrative tasks is to write a script to do what you think is required. This allows you to run the script on a system test server, and then run it again on the production server once you're happy with it. Manically typing commands against production database servers isn't wise. Worse, using an admin tool can lead to serious issues if that tool doesn't show you the SQL you're about to execute. If you haven't dropped your first live table yet, don't worry; there is still time. Perhaps you might want to read Chapter 11Backup and Recovery, first...

Writing a script that either succeeds entirely or fails entirely

Database administration often involves applying a coordinated set of changes to the database. One of PostgreSQL's greatest strengths is its transaction system, wherein almost all actions can be executed inside a transaction. This allows us to build a script with many actions that will either all succeed or all fail. This means that if any of these actions fail, then all the other actions in the script are rolled back and never become visible to any other user, which can be critically important in a production system. This property is referred to as atomicity in the sense that the script is intended as a single unit that cannot be split. This is the meaning of the A in the ACID properties of database transactions.

Transactions apply to Data Definition Language (DDL), which refers to the set of SQL commands that are used to define, modify...

Writing a psql script that exits on the first error

The default mode for the psql script tool is to continue processing when it finds an error. This sounds silly, but it exists for historical compatibility only. There are some easy and permanent ways to avoid this, so let's look at them.

Getting ready

Let's start with a simple script, with a command we know will fail:

$ $EDITOR test.sql
mistake1;
mistake2;
mistake3;

Execute the following script using psql to see what the results look like:

$ psql -f test.sql
psql:test.sql:1: ERROR:  syntax error at or near "mistake1"
LINE 1: mistake1;
        ^
psql:test.sql:2: ERROR:  syntax error at or near "mistake2"
LINE 1: mistake2;
        ^
psql:test.sql:3: ERROR:  syntax error at or near "mistake3"
LINE 1: mistake3;
      ...

Using psql variables

In the previous recipe, you learned how to use the ON_ERROR_STOP variable. Here, we will show you how to work with any variable, including user-defined ones.

Getting ready

As an example, we will create a script that takes a table name as a parameter. We will keep it simple because we just want to show how variables work.

For instance, we might want to add a text column to a table and then set it to a given value. So, we must write the following lines in a file called vartest.sql:

ALTER TABLE mytable ADD COLUMN mycol text;
UPDATE mytable SET mycol = 'myval';

The script can be run as follows:

psql -f vartest.sql

How to do it…

We change vartest.sql as follows:

\set tabname mytable
\set colname mycol
\set colval 'myval'
ALTER TABLE :tabname ADD COLUMN :colname text;
UPDATE :tabname SET :colname = :'colval';

How it works…

What do these changes mean? We have defined...

Placing query output into psql variables

It is also possible to store some values that have been produced by a query into variables – for instance, to reuse them later in other queries.

In this recipe, we will demonstrate this approach with a concrete example.

Getting ready

In the Controlling automatic database maintenance recipe of Chapter 9, Regular Maintenance, we will describe VACUUM, showing that it runs regularly on each table based on the number of rows that might need vacuuming (dead rows). The VACUUM command will run if that number exceeds a given threshold, which by default is just above 20% of the row count.

In this recipe, we will create a script that picks the table with the largest number of dead rows and runs VACUUM on it, assuming you have some tables already in existence.

How to do it…

The script is as follows:

SELECT schemaname
, relname
, n_dead_tup
, n_live_tup...

Writing a conditional psql script

psql supports the conditional \if\elif\else, and \endif meta-commands. In this recipe, we will demonstrate some of them.

Getting ready

We want to improve the vartest.sql script so that it runs VACUUM  if there are dead rows in that table.

How to do it…

We can add conditional commands to vartest.sql, resulting in the following script:

\set needs_vacuum false
SELECT schemaname
, relname
, n_dead_tup
, n_live_tup
, n_dead_tup > 0 AS needs_vacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 1
\gset
\if :needs_vacuum
\qecho Running VACUUM on table :"relname" in schema :"schemaname"
\qecho Rows before: :n_dead_tup dead, :n_live_tup live
VACUUM ANALYZE :schemaname.:relname;
\qecho Waiting 1 second...
SELECT pg_sleep(1);
SELECT n_dead_tup AS n_dead_tup_now
,      n_live_tup AS n_live_tup_now
FROM pg_stat_user_tables...

Investigating a psql error

Error messages can sometimes be cryptic, and you may be left wondering, why did this error happen at all?

For this purpose, psql recognizes two variables – VERBOSITY and CONTEXT; valid values are tersedefault, or verbose for the former and nevererrors, or always for the latter. A more verbose error message will hopefully specify extra details, and the context information will be included. Here is an example to show the difference:

postgres=# \set VERBOSITY terse
postgres=# \set CONTEXT never
postgres=# select * from missingtable;
ERROR:  relation "missingtable" does not exist at character 15

This is quite a simple error, so we don't need the extra details, but it is nevertheless useful for illustrating the extra detail you get when raising verbosity and enabling context information:

postgres=# \set VERBOSITY verbose
postgres...

Setting the psql prompt with useful information

When you're connecting to multiple systems, it can be useful to configure your psql prompt so that it tells you what you are connected to.

To do this, we will edit the psql profile file so that we can execute commands when we first start psql. In the profile file, we will set values for two special variables, called PROMPT1 and PROMPT2, that control the command-line prompt.

Getting ready

Identify and edit the ~/.psqlrc file that will be executed when you start psql.

How to do it…

My psql prompt looks like this:

Figure 7.1 – The psql prompt set by ~./psqlrc

As you can see, it has a banner that highlights my employer's company name – I have this set for when we do demos. You can skip that part, or you can create some word art, being careful with backslashes since they are escape characters:

\echo '________ _____  _______'
\echo '| ______)...

Using pgAdmin for DBA tasks

In this recipe, we will show you how to use pgAdmin for some administration tasks in your database. PgAdmin is one of the two graphical interfaces that we introduced in the Using graphical administration tools recipe in Chapter 1First Steps.

Getting ready

You should have already installed pgAdmin as part of the Using graphical administration tools recipe of Chapter 1First Steps, which includes website pointers. If you haven't done so, please read it now.

Remember to install pgAdmin 4, which is the last generation of the software; the previous one, pgAdmin 3, is no longer supported and hasn't been for a few years, so it will give various errors on PostgreSQL 10 and above.

How to do it…

The first task of a DBA is to get access to the database and get a first glance at its contents. In that respect, we have already learned how to create a connection, access...

Scheduling jobs for regular background execution

Normal user tasks cause the user to wait while the task executes. Frequently, there is a requirement to run tasks or "jobs" in the background without the user present, which is referred to as a Job Scheduler component. You can use cron, but some users look for an in-database solution.

pgAgent is our recommended job scheduler for Postgres, which is supplied as part of the pgAdmin package, but a separate component. pgAgent can be operated from the pgAdmin GUI or using a simple command-line API. pgAgent keeps a history of job executions so that you can see what is happening and what is not happening.

Getting ready

If you want to manage a new database from an existing Pgagent installation, then you don't need to prepare anything. If you want to set up a new pgagent database, execute the following command:

CREATE EXTENSION pgagent;

pgAgent is an external program, not a binary plugin, so you do not need to modify...

Performing actions on many tables

As a database administrator, you will often need to apply multiple commands as part of the same overall task. This task could be one of the following:

  • Performing many different actions on multiple tables
  • Performing the same action on multiple tables
  • Performing the same action on multiple tables in parallel
  • Performing different actions, one on each table, in parallel

The first is a general case where you need to make a set of coordinated changes. The solution is to write a script, as we've already discussed. We can also call this static scripting because you write the script manually and then execute it.

The second type of task can be achieved very simply with dynamic scripts, where we write a script that writes another script. This technique is the main topic of this recipe.

Performing actions in parallel sounds cool, and it would be useful if it were easy. In some ways...

Adding/removing columns on a table

As designs change, we may want to add or remove columns from our data tables. These are common operations in development, though they need more careful planning on a running production database server as they take full locks and may run for long periods.

How to do it…

You can add a new column to a table using the following command:

ALTER TABLE mytable
ADD COLUMN last_update_timestamp TIMESTAMP WITHOUT TIME ZONE;

You can drop the same column using the following command:

ALTER TABLE mytable
DROP COLUMN last_update_timestamp;

You can combine multiple operations when using ALTER TABLE, which then applies the changes in a sequence. This allows you to perform a useful trick, which is to add a column unconditionally using IF EXISTS, which is useful because ADD COLUMN does not allow IF NOT EXISTS:

ALTER TABLE mytable
DROP COLUMN IF EXISTS last_update_timestamp,ADD COLUMN last_update_timestamp TIMESTAMP...

Changing the data type of a column

Thankfully, changing column data types is not an everyday task, but when we need to do it, we must understand the behavior to ensure we can execute the change without any problem.

Getting ready

Let's start with a simple example of a table, with just one row, as follows:

CREATE TABLE birthday
( name       TEXT
, dob        INTEGER);
INSERT INTO birthday VALUES ('simon', 690926);
postgres=# select * from birthday;

This gives us the following output:

 name  |  dob  
-------+--------
 simon | 690926
(1 row)

How to do it…

Let's say we want to change the dob column to another data type. Let's try this with a simple example first, as follows:

postgres=# ALTER TABLE birthday
postgres-# ALTER COLUMN dob SET DATA TYPE text;
ALTER TABLE

This works fine. Let's just change...

Changing the definition of an enum data type

PostgreSQL comes with several data types, but users can create custom types to faithfully represent any value. Data type management is mostly, but not exclusively, a developer's job, and data type design goes beyond the scope of this book. This is a quick recipe that only covers the simpler problem of the need to apply a specific change to an existing data type.

Getting ready

Enumerative data types are defined like this:

CREATE TYPE satellites_uranus AS ENUM ('titania','oberon');

The other popular case is composite data types, which are created as follows:

CREATE TYPE node AS 
( node_name text,
  connstr text,
  standbys text[]);

How to do it…

If you made misspelled some enumerative values, and you realize it too late, you can fix it like so:

ALTER TYPE satellites_uranus RENAME VALUE 'titania' TO 'Titania'; 
ALTER TYPE satellites_uranus...

Adding a constraint concurrently

A table constraint is a guarantee that must be satisfied by all of the rows in the table. Therefore, adding a constraint to a table is a two-phase procedure – first, the constraint is created, and second, the existing rows are validated. Both happen in the same transaction, and the table will be locked according to the type of constraint for the whole duration.

For example, if we add a Foreign Key to a table, we will lock the table to prevent all write transactions against it. This validation could run for an hour in some cases and prevent writes for all that time.

This recipe demonstrates another case – that it is possible to split those two phases into multiple transactions since this allows validation to occur with a lower lock level than what's required to add the constraint, reducing the effect of locking on the table.

First, we create the constraint and mark it as NOT VALID to make it clear that...

Adding/removing schemas

Separating groups of objects is a good way of improving administrative efficiency. You need to know how to create new schemas and remove schemas that are no longer required.

How to do it…

To add a new schema, issue this command:

CREATE SCHEMA sharedschema;

If you want that schema to be owned by a particular user, then you can add the following option:

CREATE SCHEMA sharedschema AUTHORIZATION scarlett;

If you want to create a new schema that has the same name as an existing user so that the user becomes the owner, then try this:

CREATE SCHEMA AUTHORIZATION scarlett;

In many database systems, the schema name is the same as that of the owning user. PostgreSQL allows schemas that are owned by one user to have objects owned by another user within them. This can be especially confusing when you have a schema that has the same name as the owning user. To avoid this, you should have two types of schema: schemas that are named...

Moving objects between schemas

Once you've created schemas for administration purposes, you'll want to move existing objects to keep things tidy.

How to do it…

To move one table from its current schema to a new schema, use the following command:

ALTER TABLE cust
SET SCHEMA anotherschema;

If you want to move all objects, you can consider renaming the schema itself by using the following query:

ALTER SCHEMA existingschema RENAME TO anotherschema;

This only works if another schema with that name does not exist. Otherwise, you'll need to run ALTER TABLE for each table you want to move. You can follow the Performing actions on many tables recipe, earlier in this chapter, to achieve that.

Views, sequences, functions, aggregates, and domains can also be moved by ALTER commands with SET SCHEMA options.

How it works…

When you move tables to a new schema, all the indexes, triggers, and rules that...

Adding/removing tablespaces

Tablespaces allow us to store PostgreSQL data across different devices. We may want to do that for performance or administrative ease, or our database may have run out of disk space.

Getting ready

Before we can create a useful tablespace, we need the underlying devices in a production-ready form. Think carefully about the speed, volume, and robustness of the disks you are about to use. Make sure that they have been configured correctly. Those decisions will affect your life for the next few months and years!

Disk performance is a subtle issue that most people think can be decided in a few seconds. We recommend reading Chapter 10Performance and Concurrency, of this book, as well as additional books on the same topic, to learn more.

Once you've done all of that, you can create a directory for your tablespace. The directory must be as follows:

  • Empty
  • Owned by the PostgreSQL-owning user ID
  • ...

Moving objects between tablespaces

At some point, you may need to move data between tablespaces.

Getting ready

First, create your tablespaces. Once the old and new tablespaces exist, we can issue the commands to move the objects inside them.

How to do it…

Tablespaces can contain both permanent and temporary objects.

Permanent data objects include tables, indexes, and TOAST objects. We don't need to worry too much about TOAST objects because they are created in and always live in the same tablespace as their main table. So, if you alter the tablespace of a table, its TOAST objects will also move:

ALTER TABLE mytable SET TABLESPACE new_tablespace;

Indexes can exist in separate tablespaces, and moving a table leaves the indexes where they are. Don't forget to run ALTER INDEX commands as well, one for each index, as follows:

ALTER INDEX mytable_val_idx SET TABLESPACE new_tablespace;

Temporary objects cannot be explicitly moved...

Accessing objects in other PostgreSQL databases

Sometimes, you may want to access data in other PostgreSQL databases. The reasons for this may be as follows:

  • You have more than one database server, and you need to extract data (such as a reference) from one server and load it into the other.
  • You want to access data that is in a different database on the same database server, which was split for administrative purposes.
  • You want to make some changes that you do not wish to rollback in the event of an error or transaction abort. These are known as function side effects or autonomous transactions.

You may also be considering this because you are exploring the scale-out, sharding, or load balancing approaches. If so, read the last part of this recipe (the See also section) and then skip to Chapter 12Replication and Upgrades.

Note

PostgreSQL includes two separate mechanisms for accessing external PostgreSQL...

Accessing objects in other foreign databases

In the previous recipe, you learned how to use objects from a different PostgreSQL database, either with dblink or by using the Foreign Data Wrapper infrastructure. Here, we will explore another variant of the latter – using Foreign Data Wrappers to access databases other than PostgreSQL.

There are many Foreign Data Wrappers for other database systems, all of which are maintained as extensions independently from the PostgreSQL project. The PostgreSQL Extension Network (PGXN), which we mentioned in Chapter 3Server Configuration, is a good place to see which extensions are available.

Just note this so that you don't get confused: while you can find Foreign Data Wrappers to access several database systems, there are also other wrappers for different types of data sources, such as text files, web services, and so on. There is even postgres_fdw, a backport of the contrib...

Making views updatable

PostgreSQL supports the SQL standard CREATE VIEW command, which supports automatic UPDATEINSERT, and DELETE commands, provided they are simple enough.

Note that certain types of updates are forbidden just because they are either impossible or impractical to derive a corresponding list of modifications on the constituent tables. We'll discuss those issues here.

Getting ready

First, you need to consider that only simple views can be made to receive insertions, updates, and deletions easily. The SQL standard differentiates between views that are simple and updatable, and more complex views that cannot be expected to be updatable.

So, before we proceed, we need to understand what a simple updatable view is and what it is not. Let's start with the cust table:

postgres=# SELECT * FROM cust;
 customerid | firstname | lastname | age 
------------+-----------+----------+-----
 ...

Using materialized views

Every time we select rows from a view, we select from the result of the underlying query. If that query is slow and we need to use it more than once, then it makes sense to run the query once, save its output as a table, and then select the rows from the latter.

This procedure has been available for a long time, and there is a dedicated syntax for it, called CREATE MATERIALIZED VIEW, that we will describe in this recipe.

Getting ready

Let's create two randomly populated tables, of which one is large:

CREATE TABLE dish
( dish_id SERIAL PRIMARY KEY
, dish_description text
);
CREATE TABLE eater
( eater_id SERIAL
, eating_date date
, dish_id int REFERENCES dish (dish_id)
);
INSERT INTO dish (dish_description)
VALUES ('Lentils'), ('Mango'), ('Plantain'), ('Rice'), ('Tea');
INSERT INTO eater(eating_date, dish_id)
SELECT floor(abs(sin(n)) * 365) :: int + date '2014-01...

Using GENERATED data columns

You are probably used to the idea that a column can have a default value that's been set by a function; this is how we use sequences to set column values in tables. The SQL Standard provides a new syntax for this, which is referred to as GENERATED … AS IDENTITY. PostgreSQL supports this, but we won't discuss that here.

We can also use views to dynamically calculate new columns as if the data had been stored. PostgreSQL 12+ allows the user to specify that columns can be generated and stored in the table automatically, which is easier and faster than writing a trigger to do this. This is a very important performance and usability feature since we can store data that may take significant time to calculate, so this is much better than just using views. We refer to this feature as GENERATED ALWAYS, which also follows the SQL Standard syntax.

How to do it…

Let's start with an example table:

CREATE TABLE example
( id ...

Using data compression

As data volumes expand, we often think about whether there are ways to compress data to save space. There are many patents awarded in data compression, so the development of open source solutions has been slower than normal. PostgreSQL 14 contains some exciting innovations.

Getting ready

Make sure you're running Postgres 14+.

Various types of data compression are available for PostgreSQL:

  • Automatic compression of long data values (TOAST)
  • Extensions that offer compressed data types (for example, for JSON)
  • Compression of WAL files
  • Dump file compression
  • Base backup compression
  • SSL compression (this is considered insecure, so it's only used on private networks)
  • GiST and SP-GiST index compression
  • Btree index compression (also known as deduplication)

Only the first three types of compression will be discussed here, but we focus mainly on the parameters that allow us to control how long data values are automatically...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
PostgreSQL 14 Administration Cookbook
Published in: Mar 2022Publisher: PacktISBN-13: 9781803248974
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Simon Riggs

Simon Riggs is the CTO of 2ndQuadrant, having contributed to PostgreSQL as a major developer and committer for 14 years. He has written and designed features for replication, performance, BI, management, and security. Under his guidance, 2ndQuadrant is now a leading developer of open source PostgreSQL, serving hundreds of clients in USA, Europe, and worldwide. Simon is a frequent speaker at many conferences on PostgreSQL Futures. He has worked as a database architect for 30 years.
Read more about Simon Riggs

author image
Gianni Ciolli

Gianni Ciolli is the Vice President for Solutions Architecture at EnterpriseDB (EDB). As a PostgreSQL consultant, he has driven many successful enterprise deployments for customers in every part of the globe.Gianni is respected worldwide as a popular speaker and trainer at many PostgreSQL conferences in Europe and abroad over the last 14 years. He has worked with free and open-source software since the 1990s as an active member of the community (Prato Linux User Group, and Italian PostgreSQL Users Group). Gianni has a Ph.D. in Mathematics from the University of Florence. He lives in London with his son. His other interests include music, drama, poetry and athletics.
Read more about Gianni Ciolli