Getting to Know Your Cluster
To be a proficient user and administrator of a PostgreSQL cluster, you first must know and understand how PostgreSQL works. A database system is a very complex beast, and PostgreSQL, being an enterprise-level Database Management System (DBMS), is in no way a simple software system. However, thanks to very good design and implementation, once you understand the basic concepts and terminology of PostgreSQL, things will quickly become comprehensive and clear.
This chapter will continue from the foundation of the previous chapter and introduce you to some other PostgreSQL terminology and concepts, as well as teaching you how to interact with the cluster. You will also be introduced to the
psql client, which ships with PostgreSQL and is the recommended way to connect to your database. You are free to use any SQL client that can connect to PostgreSQL, and all the code and examples shown in this chapter will run out of the box in any other client as well, but we recommend that you take some time to learn
psql. Shipped with PostgreSQL,
psql is guaranteed to work in any situation and is the default way to connect to a cluster.
psql is a text-only client; if you are more comfortable using a graphical client, you can have a look at
pgAdmin4, one of the most famous PostgreSQL graphical clients.
This chapter covers the following main topics:
- Managing your cluster
- Connecting to the cluster
- Exploring the disk layout of PGDATA
- Exploring configuration files and parameters
The knowledge required in this chapter is as follows:
- How to install binary packages on your Unix machine
- PostgreSQL basic terminology (from the previous chapter)
- Basic Unix command-line usage
- Basic SQL statements covered in this chapter, like
The chapter examples can be run on the standalone Docker image, which you can find in the book’s GitHub repository: https://github.com/PacktPublishing/Learn-PostgreSQL-Second-Edition. For installation and usage of the Docker images available for this book, please refer to the instructions in Chapter 1, Introduction to PostgreSQL.
Managing your cluster
Managing a cluster means being able to start, stop, take control, and get information about the status of a PostgreSQL instance.
From an operating system point of view, PostgreSQL is a service that can be started, stopped, and, of course, monitored. As you saw in the previous chapter, usually when you install PostgreSQL, you also get a set of operating system-specific tools and scripts to integrate PostgreSQL with your operating system service management. Usually, you will find system service files or other operating system-specific tools, like
pg_ctl cluster, which is shipped with Debian GNU/Linux and its derivatives.
PostgreSQL ships with a specific tool called
pg_ctl, which helps in managing the cluster and the related running processes. This section introduces you to the basic usage of
pg_ctl and to the processes that you can encounter in a running cluster. It does not matter which service management system your operating system is using,
pg_ctl will always be available to the PostgreSQL administrator in order to take control of a database instance.
pg_ctl command-line utility allows you to perform different actions on a cluster, mainly initialize, start, restart, stop, and so on.
pg_ctl accepts the command to execute as the first argument, followed by other specific arguments—the main commands are as follows:
restartexecute the corresponding actions on the cluster.
statusreports the current status (running or not) of the cluster.
initfor short) executes the initialization of the cluster, possibly removing any previously existing data.
reloadcauses the PostgreSQL server to reload the configuration, which is useful when you want to apply configuration changes.
promoteis used when the cluster is running as a replica server (namely a
standbynode) and, from now on, must be detached from the original primary becoming independent (replication will be explained in later chapters).
pg_ctl interacts mainly with the postmaster (the first process launched within a cluster), which in turn “redirects” commands to other existing processes. For instance, when
pg_ctl starts a server instance, it makes the postmaster process run, which in turn completes all the startup activities, including launching other utility processes (as briefly explained in the previous chapter). On the other hand, when
pg_ctl stops a cluster, it issues a halt command to the postmaster, which in turn requires other active processes to exit, waiting for them to finish.
The postmaster process is just the very first PostgreSQL-related process launched within the instance; on some systems, there is a process named “postmaster,” while on other operating systems, there are only processes named “postgres.” The first process ever launched, despite its name, is referred to as the postmaster. The name
postmaster is just that, a name used to identify a process among the others (in particular, the first process launched within the cluster).
pg_ctl needs to know where the
PGDATA is located, and this can be specified by either setting an environment variable named
PGDATA or by specifying it on the command line by means of the
Interacting with a cluster status (for example, to stop it) is an action that not every user must be able to perform; usually, only an operating system administrator must be able to interact with services including PostgreSQL.
PostgreSQL, in order to mitigate the side effects of privilege escalation, does not allow a cluster to be run by privileged users, such as
root. Therefore, PostgreSQL is run by a “normal” user, usually named
postgres on all operating systems. This unprivileged user owns the
PGDATA directory and runs the
postmaster process, and, therefore, also all the processes launched by the postmaster itself.
pg_ctl must be run by the same unprivileged operating system user that is going to run the cluster.
If you are using the Docker image, PostgreSQL is already running as the main service. This means that issuing a
stop or a
restart command will force you to exit from the container due to its shutdown.
Moreover, in the Docker container, the PostgreSQL service will be already running without any need for manual intervention.
pg_ctl status pg_ctl: server is running (PID: 1) /usr/lib/postgresql/16/bin/postgres
The command reports back that the server is running, with a Process Identifier (PID) equal to one (this number will be different on your machine). Moreover, the command reports the executable file used to launch the server, in the above example,
If the server is not running for any reason, the
pg_ctl command will report an appropriate message to indicate that is unable to find an instance of PostgreSQL started:
pg_ctl status pg_ctl: no server running
In order to report the status of the cluster,
pg_ctl needs to know where the database is storing its own data—that is, where the
PGDATA is on disk. There are two ways to make
pg_ctl aware of where the
- Setting an environment variable named
PGDATA, containing the path of the data directory
- Using the
–Dcommand-line flag to specify the path to the data directory
Almost every PostgreSQL cluster-related command searches for the value of
PGDATA as an environmental variable or as a
-D command-line option.
In the previous examples, no
PGDATA has been specified, and this is because it has been assumed the value of the
PGDATA was specified by an environment variable.
It is quite easy to verify this—for example, in the Docker container:
echo $PGDATA /postgres/16/data pg_ctl status pg_ctl: server is running (PID: 1) /usr/lib/postgresql/16/bin/postgres
export PGDATA=/postgres/16/data pg_ctl status pg_ctl: server is running (PID: 1)
The command-line argument, specified with
-D, always has precedence against any
PGDATA environment variable, so if you don’t set or misconfigure the
PGDATA variable but, instead, pass the right value on the command line, everything works fine:
export PGDATA=/postgres/data # wrong PGDATA! pg_ctl status -D /postgres/16/data pg_ctl: server is running (PID: 1) /usr/lib/postgresql/16/bin/postgres "-D" "/postgres/16/data"
The same concepts of
PGDATA and the
-D optional argument are true for pretty much any “low-level” commands that act against a cluster and make clear that, with the same set of executables, you can run multiple instances of PostgreSQL on the same machine, as long as you keep the
PGDATA directory of each one separate.
Do not use the same
PGDATA directory for multiple versions of PostgreSQL. While it could be tempting, on your own test machine, to have a single
PGDATA directory that can be used in turn by a PostgreSQL 16 and a PostgreSQL 15 instance, this will not work as expected and you risk losing all your data. Luckily, PostgreSQL is smart enough to see that
PGDATA has been created and used by a different version and refuses to operate, but please be careful not to share the same
PGDATA directory with different instances.
pg_ctl start waiting for server to start....  LOG: starting PostgreSQL 16.0 on x 86_64-pc-linux-gnu, compiled by gcc (GCC) 12.1.0, 64-bit  LOG: listening on IPv6 address "::1", port 5432  LOG: listening on IPv4 address "127.0.0.1", port 5432  LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"  LOG: database system was shut down at 2023-07-19 07:20:24 EST  LOG: database system is ready to accept connections done server started
restart commands do not work on the Docker images from this book’s repository because such containers are running PostgreSQL as the main process; therefore, stopping (or restarting) will cause the container to exit. Similarly, there is no need to start the service because it is automatically started once the container starts.
pg_ctl command launches the
postmaster process, which prints out a few log lines before redirecting the logs to the appropriate log file. The
server started message at the end confirms that the server has started. During the startup, the PID of the postmaster is reported within square brackets; in the above example, the postmaster is the operating system process number
Now, if you run
pg_ctl again to check the server, you will see that it has been started:
pg_ctl status pg_ctl: server is running (PID: 27765) /usr/pgsql-16/bin/postgres
As you can see, the server is now running and
pg_ctl shows the PID of the running postmaster (
27765), as well as the executable command line (in this case,
Remember: The postmaster process is the first process ever started in the cluster. Both the backend processes and the postmaster are run starting from the
postgres executable, and the postmaster is just the root of all PostgreSQL processes, with the main aim of keeping all the other processes under control.
pg_ctl stop waiting for server to shut down....  LOG: received fast shutdown request  LOG: aborting any active transactions  LOG: background worker "logical replication launcher" (PID 27771) exited with exit code 1  LOG: shutting down  LOG: checkpoint starting: shutdown immediate  LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.035 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=237 kB; lsn=0/1529DC8, redo lsn=0/1529DC8  LOG: database system is shut down done server stopped
During a shutdown, the system prints a few messages to inform the administrator about what is happening, and as soon as the server stops, the message
server stopped confirms that the cluster is no longer running.
Shutting down a cluster can be much more problematic than starting it, and for that reason, it is possible to pass extra arguments to the
stop command in order to let
pg_ctl act accordingly. There are three ways of stopping a cluster:
smartmode means that the PostgreSQL cluster will gently wait for all the connected clients to disconnect and only then will it shut the cluster down.
fastmode will immediately disconnect every client and will shut down the server without having to wait.
immediatemode will abort every PostgreSQL process, including client connections, and shut down the cluster in a dirty way, meaning that the server will need some specific activity on the restart to clean up such dirty data (more on this in the next chapters).
In any case, once a
stop command is issued, the server will not accept any new incoming connections from clients, and depending on the stop mode you have selected, existing connections will be terminated. The default stop mode, if none is specified, is
fast, which forces an immediate disconnection of the clients but ensures data integrity.
If you want to change the stop mode, you can use the
-m flag, specifying the mode name, as follows:
pg_ctl stop -m smart waiting for server to shut down........................ done server stopped
In the preceding example, the
pg_ctl command will wait, printing a dot every second until all the clients disconnect from the server. In the meantime, if you try to connect to the same cluster from another client, you will receive an error, because the server has entered the stopping procedure:
psql psql: error: could not connect to server: FATAL: the database system is shutting down
You have already learned how the
postmaster is the root of all PostgreSQL processes, but as explained in Chapter 1, Introduction to PostgreSQL, PostgreSQL will launch multiple different processes at startup. These processes are in charge of keeping the cluster operational and in good health. This section provides a glance at the main processes you can find in a running cluster, allowing you to recognize each of them and their respective purposes.
If you inspect a running cluster from the operating system point of view, you will see a bunch of processes tied to PostgreSQL:
pstree -p postgres postgres(1)─┬─postgres(34) ├─postgres(35) ├─postgres(37) ├─postgres(38) └─postgres(39) ps -C postgres -af postgres 1 0 0 11:08 ? 00:00:00 postgres postgres 34 1 0 11:08 ? 00:00:00 postgres: checkpointer postgres 35 1 0 11:08 ? 00:00:00 postgres: background writer postgres 37 1 0 11:08 ? 00:00:00 postgres: walwriter postgres 38 1 0 11:08 ? 00:00:00 postgres: autovacuum launcher postgres 39 1 0 11:08 ? 00:00:00 postgres: logical replication launcher
The PID numbers reported in these examples refer to the Docker container, where the first PostgreSQL process has a PID equal to 1. On other machines, you will get different PID numbers.
As you can see, the process with PID
1 is one that spawns several other child processes and hence is the first and main PostgreSQL process launched, and as such, is usually called
postmaster. The other processes are as follows:
checkpointeris the process responsible for executing the checkpoints, which are points in time where the database ensures that all the data is actually stored persistently on the disk.
background writeris responsible for helping to push the data out of the memory to permanent storage.
walwriteris responsible for writing out the Write-Ahead Logs (WALs), the logs that are needed to ensure data reliability even in the case of a database crash.
logical replication launcheris the process responsible for handling logical replication.
Depending on the exact configuration of the cluster, there could be other processes active:
- Background workers: These are processes that can be customized by the user to perform background tasks.
- WAL receiver and/or WAL sender: These are processes involved in receiving data from or sending data to another cluster in replication scenarios.
Many of the concepts and aims of the preceding process list will become clearer as you progress through the book’s chapters, but for now, it is sufficient that you know that PostgreSQL has a few other processes that are always active without any regard to incoming client connections.
When a client connects to your cluster, a new process is spawned: this process, named the backend process, is responsible for serving the client requests (meaning executing the queries and returning the results). You can see and count connections by inspecting the process list:
ps -C postgres -af UID PID PPID C STIME TTY TIME CMD postgres 1 0 0 11:08 ? 00:00:00 postgres postgres 34 1 0 11:08 ? 00:00:00 postgres: checkpointer postgres 35 1 0 11:08 ? 00:00:00 postgres: background writer postgres 37 1 0 11:08 ? 00:00:00 postgres: walwriter postgres 38 1 0 11:08 ? 00:00:00 postgres: autovacuum launcher postgres 39 1 0 11:08 ? 00:00:00 postgres: logical replication launcher postgres 40 1 0 04:35 ? 00:00:00 postgres: postgres postgres [local] idle
If you compare the preceding list with the previous one, you will see that there is another process with PID
40: this process is a backend process. In particular, this process represents a client connection to the database named
PostgreSQL uses a process approach to concurrency instead of a multi-thread approach. There are different reasons for this: most notably, the isolation and portability that a multi-process approach offers. Moreover, on modern hardware and software, forking a process is no longer so much of an expensive operation.
Therefore, once PostgreSQL is running, there is a tree of processes that roots at
postmaster. The aim of the latter is to spawn new processes when there is the need to handle new database connections, as well as to monitor all maintenance processes to ensure that the cluster is running fine.
Connecting to the cluster
Once PostgreSQL is running, it awaits incoming database connections to serve; as soon as a connection comes in, PostgreSQL serves it by connecting the client to the right database. This means that to interact with the cluster, you need to connect to it. However, you don’t connect to the whole cluster; rather, you ask PostgreSQL to interact with one of the databases the cluster is serving. Therefore, when you connect to the cluster, you need to connect to a specific database. This also means that the cluster must have at least one database from the very beginning of its life.
When you initialize the cluster with the
initdb command, PostgreSQL builds the filesystem layout of the
PGDATA directory and builds two template databases, named
template1. The template databases are used as a starting point to clone other new databases, which can then be used by normal users to connect to. In a freshly installed PostgreSQL cluster, you usually end up with a
postgres database, used to allow the database administrator user
postgres to connect to and interact with the cluster.
To connect to one of the databases, either a template or a user-defined one, you need a client to connect with. PostgreSQL ships with
psql, a command-line client that allows you to connect, interact with, and administer databases and the cluster itself.
Other clients do exist, but they will not be discussed in this book. You are free to choose the client you like the most, since every command, query, and example shown in the book will run with no exception under every compatible client.
While connecting interactively to the cluster is an important task for a database administrator, often, developers need their own applications to connect to the cluster. To achieve this, the applications need a so-called connection string, a URI indicating all the required parameters to connect to the database.
The template databases
template1 database is the first database created when the system is initialized, and then it is cloned into
template0. This means that the two databases are, at least initially, identical, and the aim of
template0 is to act as a safe copy for rebuilding in case it is accidentally damaged or removed.
You can inspect available databases using the
psql -l command. On a freshly installed installation, you will get the following three databases:
psql -l List of databases Name | Owner | Encoding | Collate | Ctype | ICU Locale | Locale Provider | Access privileges -----------+----------+----------+-------------+-------------+------------+-----------------+----------------------- postgres | postgres | UTF8 | it_IT.UTF-8 | it_IT.UTF-8 | | libc | template0 | postgres | UTF8 | it_IT.UTF-8 | it_IT.UTF-8 | | libc | =c/postgres + | | | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | it_IT.UTF-8 | it_IT.UTF-8 | | libc | =c/postgres + | | | | | | | postgres=CTc/postgres (3 rows) +
In the Docker image, you will also see the
forumdb database, which has been automatically created for you to let you interact with other examples.
It is interesting to note that, alongside the two template databases, there’s a third database that is created during the installation process: the
postgres database. That database belongs to the
postgres user, which is, by default, the only database administrator created during the initialization process. This database is a common space to be used for connections instead of the template databases.
The name template indicates the real aim of these two databases: when you create a new database, PostgreSQL clones a template database as a common base. This is somewhat like creating a user home directory on Unix systems: the system clones a skeleton directory and assigns the new copy to the user. PostgreSQL does the same—it clones
template1 and assigns the newly created database to the user that requested it.
What this also means is that whatever object you put into
template1, you will find the very same object in freshly created databases. This can be really useful for providing a common base database and having all other databases brought to life with the same set of attributes and objects.
Nevertheless, you are not forced to use
template1 as the base template; in fact, you can create your own databases and use them as templates for other databases. However, please keep in mind that, by default, (and most notably on a newly initialized system), the
template1 database is the one that is cloned for the first databases you will create.
Another difference between
template0, apart from the former being the default for new databases, is that you cannot connect to the latter. This is in order to prevent accidental damage to
template0 (the safety copy).
It is important to note that the cluster (and all user-defined databases) can work even without the template databases—the
template0 databases are not fundamental for the other databases to run. However, if you lose the templates, you will be required to use another database as a template every time you perform an action that requires it, such as creating a new database.
Template databases are not meant for interactive connections, and you should not connect to the template databases unless you need to customize them. PostgreSQL will present as a skeleton for another database if there are active connections to it.
The psql command-line client
psql command is the command-line interface that ships with every installation of PostgreSQL. While you can certainly use a graphical user interface to connect and interact with the databases, a basic knowledge of
psql is mandatory in order to administer a PostgreSQL cluster. In fact, a specific
psql version is shipped with every release of PostgreSQL; therefore, it is the most up-to-date client speaking the same language (i.e., protocol) of the cluster. Moreover, the client is lightweight and useful even in emergency situations when a GUI is not available.
psql accepts several options to connect to a database, mainly the following:
-d: The database name
-U: The username
-h: The host (either an IPv4 or IPv6 address or a hostname)
If no option is specified,
psql assumes your operating system user is trying to connect to a database with the same name, and a database user with a name that matches the operating system on a local connection. Take the following connection:
id uid=999(postgres) gid=999(postgres) groups=999(postgres),101(ssl-cert) psql psql (16.0) Type "help" for help. postgres=#
This means that the current operating system user (
postgres) has required
psql to connect to a database named
postgres via the PostgreSQL user named
postgres on the local machine. Explicitly, the connection could have been requested as follows:
psql -U postgres -d postgres psql (16.0) Type "help" for help. postgres=#
The first thing to note is that once a connection has been established, the command prompt changes:
psql reports the database to which the user has been connected (
postgres) and a sign to indicate they are a superuser (
#). In the case that the user is not a database administrator, a
> sign is placed at the end of the prompt.
If you need to connect to a database that is named differently by your operating system username, you need to specify it:
psql -d template1 psql (16.0) Type "help" for help. template1=#
Similarly, if you need to connect to a database that does not correspond to your operating username with a PostgreSQL user that is different from your operating system username, you have to explicitly pass both parameters to
id uid=999(postgres) gid=999(postgres) groups=999(postgres),101(ssl-cert) psql -d template1 -U luca psql (16.0) Type "help" for help. template1=>
As you can see from the preceding example, the operating system user
postgres has connected to the
template1 database with the PostgreSQL user
luca. Since the latter is not a system administrator, the command prompt ends with the
psql -d template1 -U luca psql (16.0) Type "help" for help. template1=> \q
Entering SQL statements via psql
Once you are connected to a database via
psql, you can issue any statement you like. Statements must be terminated by a semicolon, indicating that the next Enter key will execute the statement. The following is an example where the Enter key has been emphasized:
psql -d template1 -U luca psql (16.0) Type "help" for help. template1=> SELECT current_time; <ENTER> current_time -------------------- 06:04:57.435155-05 (1 row)
SQL is a case-insensitive language, so you can enter statements in either uppercase, lowercase, or a mix. The same rule applies to column names, which are case-insensitive. If you need to have identifiers with specific cases, you need to quote them in double quotes.
Another way to execute the statement is to issue a
\g command, again followed by
<ENTER>. This is useful when connecting via a terminal emulator that has keys remapped:
template1=> SELECT current_time \g <ENTER> current_time -------------------- 06:07:03.328744-05 (1 row)
Until you end a statement with a semicolon or
psql will keep the content you are typing in the query buffer, so you can also edit multiple lines of text as follows:
template1=> SELECT template1-> current_time template1-> ; current_time -------------------- 06:07:28.908215-05 (1 row)
Note how the
psql command prompt has changed on the lines following the first one: the difference is there to remind you that you are editing a multi-line statement and
psql has not (yet) found a statement terminator (either a semicolon or the
One useful feature of the
psql query buffer is the capability to edit the content of the query buffer in an external editor. If you issue the
\e command, your favorite editor will pop up with the content of the last-edited query. You can then edit and refine your SQL statement as much as you want, and once you exit the editor,
psql will read what you have produced and execute it. The editor to use is chosen with the
EDITOR operating system environment variable.
It is also possible to execute all the statements included in a file or edit a file before executing it. As an example, assume the
test.sql file has the following content:
cat test.sql SELECT current_database(); SELECT current_time; SELECT current_role;
The file has three very simple SQL statements. In order to execute the whole file at once, you can use the
\i special command followed by the name of the file:
template1=> \i test.sql current_database ------------------ template1 (1 row) current_time -------------------- 06:08:43.077305-05 (1 row) current_role -------------- luca (1 row)
As you can see, the client has executed, one after the other, every statement within the file. If you need to edit the file without leaving
psql, you can issue
\e test.sql to open your favorite editor, make changes, and come back to the
SQL is case-insensitive and space-insensitive: you can write it in all uppercase or all lowercase, with however many horizontal and vertical spaces you want. In this book, SQL keywords will be written in uppercase and the statements will be formatted to read cleanly.
A glance at the psql commands
Every command specific to
psql starts with a backslash character (
\). It is possible to get some help with SQL statements and PostgreSQL commands via the special
\h command, after which you can specify the specific statement you want help for:
template1=> \h SELECT Command: SELECT Description: retrieve rows from a table or view Syntax: [ WITH [ RECURSIVE ] with_query [, ...] ] SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ] [ * | expression [ [ AS ] output_name ] [, ...] ] ... URL: https://www.postgresql.org/docs/16/sql-select.html
The displayed help is, for space reasons, concise. You can find a much more verbose description and usage examples in the online documentation. For this reason, at the end of the
help screen, there is a link reference to the online documentation.
template1=> \? General \copyright show PostgreSQL usage and distribution terms \crosstabview [COLUMNS] execute query and display results in crosstab \errverbose show most recent error message at maximum verbosity \g [FILE] or ; execute query (and send results to file or |pipe) \gdesc describe result of query, without executing it ...
There are also a lot of introspection commands, such as, for example,
\d to list all user-defined tables. These special commands are, under the hood, a way to execute queries against the PostgreSQL system catalogs, which are, in turn, registries about all objects that live in a database. The introspection commands will be shown later in the book and are useful as shortcuts to get an idea of which objects are defined in the current database.
Introducing the connection string
LibPQ is the underlying library that every application can use to connect to a PostgreSQL cluster and is, for example, used in C and C++ clients, as well as non-native connectors.
A connection string in LibPQ is a URI made up of several parts:
Here, we have the following:
postgresqlis a fixed string that specifies the protocol the URI refers to.
usernameis the PostgreSQL username to use when connecting to the database.
hostis the hostname (or IP address) to connect to.
portis the TCP/IP port the server is listening on (by default,
databaseis the name of the database to which you want to connect.
The following connections are all equivalent:
psql -d template1 -U luca -h localhost psql postgresql://luca@localhost/template1 psql postgresql://luca@localhost:5432/template1
Solving common connection problems
Please note that the solutions provided here are just for testing purposes and not for production usage. All of the security settings will be explained in later chapters, so the aim of the following subsection is just to help you get your test environment usable.
Database “foo” does not exist
This means either you misspelled the name of the database in the connection string or you are trying to connect without specifying the database name.
For instance, the following connection fails when executed by an operating system user named
luca because, by default, it is assuming that the user
luca is trying to connect to a database with the same name (meaning
luca) since none has been explicitly set:
psql psql: error: could not connect to server: FATAL: database "luca" does not exist
The solution is to provide an existing database name via the
-d option or to create a database with the same name as the user.
As an example, imagine PostgreSQL is running on a machine named
venkman and we are trying to connect from another host on the same network:
psql -h venkman -U luca template1 psql: error: could not connect to server: could not connect to server: Connection refused Is the server running on host "venkman" (192.168.222.123) and accepting TCP/IP connections on port 5432?
In this case, the database cluster is running on the remote host but is not accepting connections from the outside. Usually, you have to fix the server configuration or connect to the remote machine (via SSH, for instance) and open a local connection from there.
In order to quickly solve the problem, you have to edit the
postgresql.conf file (usually located under the
PGDATA directory) and ensure the
listen_address option has an asterisk (or the name of your external network card) so that the server will listen on any available network address:
listen_addresses = '*'
After a restart of the service, by means of the
restart command issued to
pg_ctl, the client will be able to connect. Please note that enabling the server to listen on any available network address might not be the optimal solution and can expose the server to risks in a production environment. Later in the book, you will learn how to specifically configure the connection properties for your server.
No pg_hba.conf entry
This error should never happen in the Docker container used for this chapter, because its configuration is already allowing trusted connections. However, other PostgreSQL installations will be stricter; therefore, knowing about this type of error message can help you to quickly figure out where the configuration problem is.
As an example, the following connection is refused:
psql -h localhost -U luca template1 psql: error: could not connect to server: FATAL: no pg_hba.conf entry for host "127.0.0.1", user "luca", database "template1", SSL off
The reason for this is that, inspecting the
pg_hba.conf file, there is no rule to let the user
luca in on the
localhost interface. So, for instance, adding a single line such as the following to the
pg_hba.conf file can fix the problem:
host all luca 127.0.0.1/32 trust
You need to reload the configuration in order to apply changes. The format of every line in the
pg_hba.conf file will be discussed later, but for now, please assume that the preceding line instruments the cluster to accept any connection incoming from
localhost by means of the user
Exploring the disk layout of PGDATA
In the previous sections, you have seen how to install PostgreSQL and connect to it, but we have not looked at the storage part of a cluster. Since the aim of PostgreSQL, as well as the aim of any relational database, is to permanently store data, the cluster needs some sort of permanent storage. In particular, PostgreSQL exploits the underlying filesystem to store its own data. All of the PostgreSQL-related stuff is contained in a directory known as
PGDATA directory acts as the disk container that stores all the data of the cluster, including the users’ data and cluster configuration.
The following is an example of the content of
PGDATA for a running PostgreSQL 16 cluster:
ls -1 /postgres/16/data base global pg_commit_ts pg_dynshmem pg_hba.conf pg_ident.conf pg_logical pg_multixact pg_notify pg_replslot pg_serial pg_snapshots pg_stat pg_stat_tmp pg_subtrans pg_tblspc pg_twophase PG_VERSION pg_wal pg_xact postgresql.auto.conf postgresql.conf postmaster.opts postmaster.pid
postgresql.confis the main configuration file, used by default when the service is started.
postgresql.auto.confis the automatically included configuration file used to store dynamically changed settings via SQL instructions.
pg_hba.confis the HBA file that provides the configuration regarding available database connections.
PG_VERSIONis a text file that contains the major version number (useful when inspecting the directory to understand which version of the cluster has managed the
postmaster.pidis the PID of the postmaster process, the first launched process in the cluster.
The main directories available in
PGDATA are as follows:
baseis a directory that contains all the users’ data, including databases, tables, and other objects.
globalis a directory containing cluster-wide objects.
pg_walis the directory containing the WAL files.
pg_stat_tmpare, respectively, the storage of permanent and temporary statistical information about the status and health of the cluster.
Of course, all files and directories in
PGDATA are important for the cluster to work properly, but so far, the preceding is the “core” list of objects that are fundamental in
PGDATA itself. Other files and directories will be discussed in later chapters.
Objects in the PGDATA directory
PostgreSQL does not name objects on disk, such as tables, in a mnemonic or human-readable way; instead, every file is named after a numeric identifier. You can see this by having a look, for instance, at the
ls -1 /postgres/16/data/base 1 16386 4 5
As you can see from the preceding code, the
base directory contains four objects, named
16386. Please note that these numbers could be different on your machine. In particular, each of the preceding is a directory that contains other files, as shown here:
ls -1 /postgres/16/data/base/16386 | head 112 113 1247 1247_fsm 1247_vm 1249 1249_fsm 1249_vm 1255 1255_fsm
As you can see, each file is named with a numeric identifier. Internally, PostgreSQL holds a specific catalog that allows the database to match a mnemonic name to a numeric identifier, and vice versa. The integer identifier is named
OID (or, Object Identifier); this name is a historical term that today corresponds to the so-called filenode. The two terms will be used interchangeably in this section.
There is a specific utility that allows you to inspect a
PGDATA directory and extract mnemonic names:
oid2name. For example, if you executed the
oid2name utility, you’d get a list of all available databases similar to the following one:
oid2name All databases: Oid Database Name Tablespace ---------------------------------- 16390 forumdb pg_default 5 postgres pg_default 4 template0 pg_default 1 template1 pg_default
You can even go further and inspect a single file going into the database directory, specifying the database where you are going to search for an object name with the
cd /postgres/16/data/base/1 oid2name -d template1 -f 3395 From database "template1": Filenode Table Name ------------------------------------- 3395 pg_init_privs_o_c_o_index
As you can see from the preceding example, the
3395 file in the
/postgres/16/data/base/1 directory corresponds to the table named
pg_init_privs_o_c_o_index. Therefore, when PostgreSQL needs to interact with a table like this, it will seek the disk to the
From the preceding example, it should be clear that every SQL table is stored as a file with a numeric name. However, PostgreSQL does not allow a single file to be greater than 1 GB in size, so what happens if a table grows beyond that limit? PostgreSQL “attaches” another file with a numeric extension that indicates the next chunk of 1 GB of data. In other words, if your table is stored in the
123 file, the second gigabyte will be stored in the
123.1 file, and if another gigabyte of storage is needed, another file,
123.2, will be created. Therefore, the filenode refers to the very first file related to a specific table, but more than one file can be stored on disk.
PostgreSQL pretends to find all its data within the
PGDATA directory, but that does not mean that your cluster is “jailed” in this directory. In fact, PostgreSQL allows “escaping” the
PGDATA directory by means of tablespaces. A tablespace is a directory that can be outside the
PGDATA directory and can also belong to different storage. Tablespaces are mapped into the
PGDATA directory by means of symbolic links stored in the
pg_tblspc subdirectory. In this way, the PostgreSQL processes do not have to look outside
PGDATA, but are still able to access “external” storage. A tablespace can be used to achieve different aims, such as enlarging the storage data or providing different storage performances for specific objects. For instance, you can create a tablespace on a slow disk to contain infrequently accessed objects and tables, keeping fast storage within another tablespace for frequently accessed objects.
You don’t have to make links by yourself: PostgreSQL provides the
TABLESPACE feature to manage this and the cluster will create and manage the appropriate links under the
For instance, the following is a
PGDATA directory that has three different tablespaces:
ls -l /postgres/16/data/pg_tblspc/ lrwxrwxrwx 1 postgres postgres 22 Jan 19 13:08 16384 -> /data/tablespaces/ts_a lrwxrwxrwx 1 postgres postgres 22 Jan 19 13:08 16385 -> /data/tablespaces/ts_b lrwxrwxrwx 1 postgres postgres 22 Jan 19 13:08 16386 -> /data/tablespaces/ts_c
As you can see from the preceding example, there are three tablespaces that are attached to the
/data storage. You can inspect them with
oid2name and the
oid2name -s All tablespaces: Oid Tablespace Name ------------------------ 1663 pg_default 1664 pg_global 16384 ts_a 16385 ts_b 16386 ts_c
As you can see, the numeric identifiers of the symbolic links are mapped to the mnemonic names of the tablespaces. From the preceding example, you can observe that there are also two particular tablespaces:
pg_defaultis the default tablespace corresponding to “none,” the default storage to be used for every object when nothing is explicitly specified. In other words, every object stored directly under the
PGDATAdirectory is attached to the
pg_globalis the tablespace used for system-wide objects.
Exploring configuration files and parameters
Usually, when changing the configuration of the cluster, you must edit the
postgresql.conf file to write the new settings and, depending on the context of the settings you have edited, to issue a cluster
SIGHUP signal (that is, reload the configuration) or restart it.
Every configuration parameter is associated with a context, and depending on the context, you can apply changes with or without a cluster restart. Available contexts are as follows:
internal: A group of parameters that are set at compile time and therefore cannot be changed at runtime.
postmaster: All the parameters that require the cluster to be restarted (that is, to kill the
postmasterprocess and start it again) to activate them.
sighup: All the configuration parameters that can be applied with a
SIGHUPsignal sent to the
postmasterprocess, which is equivalent to issuing a
reloadsignal in the operating system service manager.
superuser-backend: All the parameters that can be set at runtime but will be applied to the next normal or administrative connection.
superuser: A group of settings that can be changed at runtime and are immediately active for normal and administrative connection.
cat /postgres/16/data/postgresql.conf shared_buffers = 512MB maintenance_work_mem = 128MB checkpoint_completion_target = 0.7 wal_buffers = 16MB work_mem = 32MB min_wal_size = 1GB max_wal_size = 2GB
postgresql.auto.conf file has the very same syntax as the main
postgresql.conf file but is automatically overwritten by PostgreSQL when the configuration is changed at runtime directly within the system, by means of specific administrative statements such as
ALTER SYSTEM. The
postgresql.auto.conf file is always loaded at the very last moment, therefore overwriting other settings. In a fresh installation, this file is empty, meaning it will not overwrite any other custom setting.
You are not tied to having a single configuration file, and, in fact, there are specific directives that can be used to include other configuration files. The configuration of the cluster will be detailed in a later chapter.
The PostgreSQL HBA file (
pg_hba.conf) is another text file that contains the connection allowance: it lists the databases, users, and networks that are allowed to connect to your cluster. The HBA method can be thought of as a firewall embedded into PostgreSQL. As an example, the following is an excerpt from a
hosts all luca 192.168.222.1/32 md5 hostssl all enrico 192.168.222.1/32 md5
In short, the preceding lines mean that the user
luca can connect to any database in the cluster with the machine with the IPv4 address
192.168.222.1, while the user
enrico can connect to any database from the same machine but only on an SSL-encrypted connection. All the available
pg_hba.conf rules will be detailed in a later chapter, but for now, it is sufficient to know that this file acts as a “list of firewall rules” for incoming connections.
PostgreSQL can handle several databases within a single cluster, served out of disk storage contained in a single directory named
PGDATA. The cluster runs many different processes; one, in particular, is named
postmaster and is in charge of spawning other processes, one per client connection, and keeping track of the status of maintenance processes.
The configuration of the cluster is managed via text-based configuration files, the main one being
postgresql.conf. It is possible to filter incoming user connections by means of rules placed in the
pg_hba.conf text file.
You can interact with the cluster status by means of the
pg_ctl tool or, depending on your operating system, by other provided programs, such as
This chapter has presented you with the relevant information so that you are able not only to install PostgreSQL but also to start and stop it regularly, integrate it with your operating system, and connect to the cluster.
In the following chapter, you will learn how to manage users and connections.
Verify your knowledge
- What is the
pg_ctlis a command shipped with PostgreSQL that allows you to start, restart, stop, and do other actions on the cluster. It is often used as the way to manage the whole cluster. See the pg_ctl section for more details.
- What is a template database?
A template database is a database that can be used as a base to clone another (new) database that will initially include the same objects. See the The template databases section for more details.
- What is the
psqlis the official client application to connect to a PostgreSQL database. It is a command - line application that can be used to enter SQL statements and get results out of the cluster. It is shipped with every version of PostgreSQL. See the The psql command-line client section for more details.
- What is a connection string?
A connection string is a URI that specifies all the properties required to connect to a database, often including the username, the host, the database, and so on. See the The connection string scction for more details.
- What are the
The special commands are all the short commands that begin with a backslash symbol, like, for example,
\d. They are informative commands valid only within the
psqlclient. See the A glance at the psql commands section for more details.
PGDATAdisk layout: https://www.postgresql.org/docs/current/storage-file-layout.html
initdbofficial documentation: https://www.postgresql.org/docs/current/app-initdb.html
pg_ctlofficial documentation: https://www.postgresql.org/docs/current/app-pg-ctl.html
pgAdmin4graphical client for PostgreSQL: https://www.pgadmin.org/
Learn more on Discord
To join the Discord community for this book – where you can share feedback, ask questions to the author, and learn about new releases – follow the QR code below: