Exploring the disk layout of PGDATA
In the previous sections, you have seen how to install PostgreSQL and connect to it, but we have not looked at the storage part of a cluster. Since the aim of PostgreSQL, as well as the aim of any relational database, is to permanently store data, the cluster needs some sort of permanent storage. In particular, PostgreSQL exploits the underlying filesystem to store its own data. All of the PostgreSQL-related stuff is contained in a directory known as PGDATA.
The PGDATA directory acts as the disk container that stores all the data of the cluster, including the users’ data and cluster configuration.
The following is an example of the content of PGDATA for a running PostgreSQL 16 cluster:
$ ls -1 /postgres/16/data
base
global
pg_commit_ts
pg_dynshmem
pg_hba.conf
pg_ident.conf
pg_logical
pg_multixact
pg_notify
pg_replslot
pg_serial
pg_snapshots
pg_stat
pg_stat_tmp
pg_subtrans
pg_tblspc
pg_twophase
PG_VERSION
pg_wal
pg_xact
postgresql.auto.conf
postgresql.conf
postmaster.opts
postmaster.pid
The PGDATA directory is structured in several files and subdirectories. The main files are as follows:
postgresql.confis the main configuration file, used by default when the service is started.postgresql.auto.confis the automatically included configuration file used to store dynamically changed settings via SQL instructions.pg_hba.confis the HBA file that provides the configuration regarding available database connections.PG_VERSIONis a text file that contains the major version number (useful when inspecting the directory to understand which version of the cluster has managed thePGDATAdirectory).postmaster.pidis the PID of the postmaster process, the first launched process in the cluster.
The main directories available in PGDATA are as follows:
baseis a directory that contains all the users’ data, including databases, tables, and other objects.globalis a directory containing cluster-wide objects.pg_walis the directory containing the WAL files.pg_statandpg_stat_tmpare, respectively, the storage of permanent and temporary statistical information about the status and health of the cluster.
Of course, all files and directories in PGDATA are important for the cluster to work properly, but so far, the preceding is the “core” list of objects that are fundamental in PGDATA itself. Other files and directories will be discussed in later chapters.
Objects in the PGDATA directory
PostgreSQL does not name objects on disk, such as tables, in a mnemonic or human-readable way; instead, every file is named after a numeric identifier. You can see this by having a look, for instance, at the base subdirectory:
$ ls -1 /postgres/16/data/base
1
16386
4
5
As you can see from the preceding code, the base directory contains four objects, named 1,4, 5, and 16386. Please note that these numbers could be different on your machine. In particular, each of the preceding is a directory that contains other files, as shown here:
$ ls -1 /postgres/16/data/base/16386 | head
112
113
1247
1247_fsm
1247_vm
1249
1249_fsm
1249_vm
1255
1255_fsm
As you can see, each file is named with a numeric identifier. Internally, PostgreSQL holds a specific catalog that allows the database to match a mnemonic name to a numeric identifier, and vice versa. The integer identifier is named OID (or, Object Identifier); this name is a historical term that today corresponds to the so-called filenode. The two terms will be used interchangeably in this section.
There is a specific utility that allows you to inspect a PGDATA directory and extract mnemonic names: oid2name. For example, if you executed the oid2name utility, you’d get a list of all available databases similar to the following one:
$ oid2name
All databases:
Oid Database Name Tablespace
----------------------------------
16390 forumdb pg_default
5 postgres pg_default
4 template0 pg_default
1 template1 pg_default
As you can see, the Oid numbers in the oid2name output reflect the same directory names listed in the base directory; every subdirectory has a name corresponding to the database.
You can even go further and inspect a single file going into the database directory, specifying the database where you are going to search for an object name with the -d flag:
$ cd /postgres/16/data/base/1
$ oid2name -d template1 -f 3395
From database "template1":
Filenode Table Name
-------------------------------------
3395 pg_init_privs_o_c_o_index
As you can see from the preceding example, the 3395 file in the /postgres/16/data/base/1 directory corresponds to the table named pg_init_privs_o_c_o_index. Therefore, when PostgreSQL needs to interact with a table like this, it will seek the disk to the /postgres/16/data/base/1/3395 file.
From the preceding example, it should be clear that every SQL table is stored as a file with a numeric name. However, PostgreSQL does not allow a single file to be greater than 1 GB in size, so what happens if a table grows beyond that limit? PostgreSQL “attaches” another file with a numeric extension that indicates the next chunk of 1 GB of data. In other words, if your table is stored in the 123 file, the second gigabyte will be stored in the 123.1 file, and if another gigabyte of storage is needed, another file, 123.2, will be created. Therefore, the filenode refers to the very first file related to a specific table, but more than one file can be stored on disk.
Tablespaces
PostgreSQL pretends to find all its data within the PGDATA directory, but that does not mean that your cluster is “jailed” in this directory. In fact, PostgreSQL allows “escaping” the PGDATA directory by means of tablespaces. A tablespace is a directory that can be outside the PGDATA directory and can also belong to different storage. Tablespaces are mapped into the PGDATA directory by means of symbolic links stored in the pg_tblspc subdirectory. In this way, the PostgreSQL processes do not have to look outside PGDATA, but are still able to access “external” storage. A tablespace can be used to achieve different aims, such as enlarging the storage data or providing different storage performances for specific objects. For instance, you can create a tablespace on a slow disk to contain infrequently accessed objects and tables, keeping fast storage within another tablespace for frequently accessed objects.
You don’t have to make links by yourself: PostgreSQL provides the TABLESPACE feature to manage this and the cluster will create and manage the appropriate links under the pg_tblspc subdirectory.
For instance, the following is a PGDATA directory that has three different tablespaces:
$ ls -l /postgres/16/data/pg_tblspc/
lrwxrwxrwx 1 postgres postgres 22 Jan 19 13:08 16384 -> /data/tablespaces/ts_a
lrwxrwxrwx 1 postgres postgres 22 Jan 19 13:08 16385 -> /data/tablespaces/ts_b
lrwxrwxrwx 1 postgres postgres 22 Jan 19 13:08 16386 -> /data/tablespaces/ts_c
As you can see from the preceding example, there are three tablespaces that are attached to the /data storage. You can inspect them with oid2name and the -s flag:
$ oid2name -s
All tablespaces:
Oid Tablespace Name
------------------------
1663 pg_default
1664 pg_global
16384 ts_a
16385 ts_b
16386 ts_c
As you can see, the numeric identifiers of the symbolic links are mapped to the mnemonic names of the tablespaces. From the preceding example, you can observe that there are also two particular tablespaces:
pg_defaultis the default tablespace corresponding to “none,” the default storage to be used for every object when nothing is explicitly specified. In other words, every object stored directly under thePGDATAdirectory is attached to thepg_defaulttablespace.pg_globalis the tablespace used for system-wide objects.
By default, both of the preceding tablespaces refer directly to the PGDATA directory, meaning any cluster without a custom tablespace is totally contained within the PGDATA directory.