Reader small image

You're reading from  Learn PostgreSQL - Second Edition

Product typeBook
Published inOct 2023
PublisherPackt
ISBN-139781837635641
Edition2nd Edition
Right arrow
Authors (2):
Luca Ferrari
Luca Ferrari
author image
Luca Ferrari

Luca Ferrari has been passionate about computer science since the Commodore 64 era, and today holds a master's degree (with honors) and a Ph.D. from the University of Modena and Reggio Emilia. He has written several research papers, technical articles, and book chapters. In 2011, he was named an Adjunct Professor by Nipissing University. An avid Unix user, he is a strong advocate of open source, and in his free time, he collaborates with a few projects. He met PostgreSQL back in release 7.3; he was a founder and former president of the Italian PostgreSQL Community (ITPUG). He also talks regularly at technical conferences and events and delivers professional training.
Read more about Luca Ferrari

Enrico Pirozzi
Enrico Pirozzi
author image
Enrico Pirozzi

Enrico Pirozzi, EnterpriseDB certified on implementation management and tuning, with a master's in computer science, has been a PostgreSQL DBA since 2003. Based in Italy, he has been providing database advice to clients in industries such as manufacturing and web development for 10 years. He has been training others on PostgreSQL since 2008. Dedicated to open source technology since early in his career, he is a cofounder of the PostgreSQL Italian mailing list, PostgreSQL-it, and of the PostgreSQL Italian community site, PSQL
Read more about Enrico Pirozzi

View More author details
Right arrow

Transactions, MVCC, WALs, and Checkpoints

This chapter introduces you to transactions, a fundamental part of every enterprise-level database system. Transactions are a way for a database to manage multiple operations, making them as though they were a single atomic operation. PostgreSQL has very rich and standard-compliant transaction machinery that allows users to specifically define transaction properties, including nested transactions.

PostgreSQL relies heavily on transactions to keep data consistent across concurrent connections and parallel activities, and thanks to Write-Ahead Logs (WALs), PostgreSQL does its best to keep the data safe and reliable. Moreover, PostgreSQL implements Multi-Version Concurrency Control (MVCC), a way to maintain high concurrency among transactions.

The chapter can be split into two parts: the first part is more practical and provides concrete examples of what transactions are, how to use them, and how to understand MVCC. The second part is...

Technical requirements

In order to proceed, you need to know the following:

  • How to issue SQL statements via psql
  • How to connect to the cluster and a database
  • How to check and modify the cluster configuration

The chapter examples are available in the book’s code repository and can be run on the standalone Docker image that you can find in the book’s GitHub repository: https://github.com/PacktPublishing/Learn-PostgreSQL-Second-Edition.

Introducing transactions

A transaction is an atomic unit of work that either succeeds or fails. Transactions are a key feature of any database system and are what allows a database to implement the properties: Atomicity, Consistency, Isolation, and Durability (ACID). Altogether, the ACID properties mean that the database must be able to handle units of work whole (atomicity), store data in a permanent way (durability), without inter-mixed changes to the data (consistency), and in a way that concurrent actions are executed as if they were alone (isolation).

You can think of a transaction as a bunch of related statements that, ultimately, will either all succeed or fail. Transactions are everywhere in a database, and you will have already used them even if you did not realize it: function calls, single statements, and so on are executed in a (tiny) transaction block. In other words, every action you issue against the database is executed within a transaction, even if you did not...

Transaction isolation levels

In a concurrent database system, you could encounter three different problems:

  • Dirty reads: A dirty read happens when the database allows a transaction to see work-in-progress data from other not-yet-finished transactions. In other words, data that has not been consolidated is visible to other transactions. No production-ready database allows that, and PostgreSQL is no exception: you are assured your transaction will only perceive data that has been consolidated, and in order to be consolidated, the transactions that created such data must be complete.
  • Unrepeatable reads: An unrepeatable read happens when the same query, within the same transaction, executed multiple times, perceives a different set of data. This essentially means that the data has changed between two sequential executions of the same query in the same transaction. PostgreSQL does not allow this kind of problem by means of snapshots: every transaction can perceive the...

Explaining MVCC

xmin is only a part of the story of managing MVCC. PostgreSQL labels every tuple in the database with four different fields, named xmin (already described), xmax, cmin, and cmax. Similar to what you learned about xmin, in order to make those fields appear in a query result, you need to explicitly reference them – for instance:

forumdb=> SELECT xmin, xmax, cmin, cmax, * FROM tags ORDER BY tag;
 xmin | xmax | cmin | cmax | pk | tag  | parent
------+------+------+------+----+------+--------
 4854 |    0 |    0 |    0 | 24 | c++  |
 4853 |    0 |    0 |    0 | 23 | java |
 4852 |    0 |    0 |    0 | 22 | perl |
 4855 |    0 |    0 |    0 | 25 | unix |
(4 rows)

The meaning of xmin has been already described in a previous section: it indicates the transaction identifier of the transaction that created the tuple. The xmax field, on the other hand, indicates the xid of the transaction that invalidated the tuple, for example, because it has deleted the data...

Savepoints

A transaction is a block of work that must either succeed or fail as a whole. A savepoint is a way to split a transaction into smaller blocks that can be rolled back independently of each other. Thanks to savepoints, you can divide a big transaction (one transaction with multiple statements) into smaller chunks, allowing a subset of the bigger transaction to fail without having the overall transaction fail. PostgreSQL does not handle transaction nesting, so you cannot issue a nested set of BEGIN or COMMIT/ROLLBACK statements. Savepoints, however, allow PostgreSQL to mimic the nesting of transaction blocks.

Savepoints are marked with a mnemonic name, which you can use to commit or roll back. The name must be unique within the transaction, and if you reuse the same over and over, the previous savepoints with the same name will be discarded. Let’s see an example:

forumdb=> BEGIN;
BEGIN
forumdb=> INSERT INTO tags( tag ) VALUES ( 'Eclipse IDE'...

Deadlocks

A deadlock is an event that happens when different transactions depend on each other in a circular way. Deadlocks are, to some extent, normal events in a concurrent database environment and nothing an administrator should worry about, unless they become extremely frequent, meaning there is some dependency error in the applications and the transactions.

When a deadlock happens, there is no choice but to terminate the locked transactions. PostgreSQL has a very powerful deadlock detection engine that does exactly this job: it finds stalled transactions and, in the case of a deadlock, terminates them (producing a ROLLBACK).

In order to produce a deadlock, imagine two concurrent transactions applying changes to the very same tuples in a conflicting way. For example, the first transaction could do something like the following:

-- session 1
forumdb=> BEGIN;
BEGIN
forumdb=> SELECT txid_current();
 txid_current
--------------
         4875
(1 row)
forumdb=> UPDATE...

How PostgreSQL handles persistency and consistency: WALs

In the previous sections, you have seen how to interact with transactions and, most notably, how PostgreSQL executes every statement within a transaction, either explicitly or implicitly.

PostgreSQL does, internally, very complex work to ensure that consolidated data in storage reflects the status of the committed transactions. In other words, data can be considered consolidated only if the transaction that produced (or modified) it has been successfully committed. But this also means that, once a transaction has been successfully committed, its data is “safe” on storage, no matter what happens in the future: if a transaction is reported to be successful, its data must be made persistent, even if the database or the whole system crashes.

PostgreSQL manages transactions and data consolidations by means of WALs. This section introduces you to the concept of WALs and their use within PostgreSQL.

WALs

...

VACUUM

In the previous sections, you learned how PostgreSQL exploits MVCC to store different versions of the same data (tuples) that different transactions can perceive, depending on their active snapshot. However, keeping different versions of the same tuples requires extra space with regard to the last active version, and this space could fill your storage sooner or later. To prevent that, and reclaim storage space, PostgreSQL provides an internal tool named VACUUM, the aim of which is to analyze stored tuple versions and remove the ones that are no longer perceivable.

Remember: a tuple is not perceivable (visible) when there are no more active transactions that can reference the version, which means having the tuple version within their snapshot. A not-perceivable tuple is often called a dead tuple, marking the fact that it is not required anymore in the database life cycle.

VACUUM can be an I/O-intensive operation, since it must reclaim and free disk space...

Summary

PostgreSQL exploits MVCC to provide high concurrent access to underlying data, and this means that every transaction perceives a snapshot of the data while the system keeps different versions of the same tuples. Sooner or later, invalid tuples will be removed, and storage space will be reclaimed. On one hand, MVCC provides better concurrency, but on the other hand, it requires extra effort to reclaim the storage space once transactions no longer reference dead tuples. PostgreSQL provides VACUUM for this aim and also has a background process machinery, named autovacuum, to periodically and non-invasively keep a system clean and healthy.

In order to improve I/O and reliability, PostgreSQL stores data in a journal written sequentially, the WAL. The WAL is split into segments, and at particular time intervals, named checkpoints, all the dirty data in memory is forced to a specified position in the storage, and the WAL segments are recycled.

In this chapter, you have learned...

Verify your knowledge

  • What is a transaction?

    A transaction is a unit of work that is either consolidated or discarded as a whole. A transaction can be made by a single statement or multiple statements and can be implicit or explicit. See the Introducing transactions section for more details.

  • What is an xid and to which problem is it subject?

    An xid is a transaction identifier, a number that uniquely represents a transaction within the whole cluster. Being stored as a counter, the value is subject to the so-called problem of xid wraparound, which VACUUM and autovacuum freezing solve. See the More about transaction identifiers – the XID wraparound problem section for more details.

  • What is MVCC?

    MVCC is a technique by which, at a given instant, multiple versions of a tuple can exist within a database. Depending on the currently running transactions and their commit status, a different version is used. See the Explaining MVCC...

References

Learn more on Discord

To join the Discord community for this book – where you can share feedback, ask questions to the author, and learn about new releases – follow the QR code below:

https://discord.gg/jYWCjF6Tku

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learn PostgreSQL - Second Edition
Published in: Oct 2023Publisher: PacktISBN-13: 9781837635641
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Luca Ferrari

Luca Ferrari has been passionate about computer science since the Commodore 64 era, and today holds a master's degree (with honors) and a Ph.D. from the University of Modena and Reggio Emilia. He has written several research papers, technical articles, and book chapters. In 2011, he was named an Adjunct Professor by Nipissing University. An avid Unix user, he is a strong advocate of open source, and in his free time, he collaborates with a few projects. He met PostgreSQL back in release 7.3; he was a founder and former president of the Italian PostgreSQL Community (ITPUG). He also talks regularly at technical conferences and events and delivers professional training.
Read more about Luca Ferrari

author image
Enrico Pirozzi

Enrico Pirozzi, EnterpriseDB certified on implementation management and tuning, with a master's in computer science, has been a PostgreSQL DBA since 2003. Based in Italy, he has been providing database advice to clients in industries such as manufacturing and web development for 10 years. He has been training others on PostgreSQL since 2008. Dedicated to open source technology since early in his career, he is a cofounder of the PostgreSQL Italian mailing list, PostgreSQL-it, and of the PostgreSQL Italian community site, PSQL
Read more about Enrico Pirozzi