You're reading from Learn PostgreSQL - Second Edition

Product typeBook

Published inOct 2023

PublisherPackt

ISBN-139781837635641

Edition2nd Edition

Concepts

Data Analysis

Authors (2):

Luca Ferrari

Enrico Pirozzi

View More author details

Transactions, MVCC, WALs, and Checkpoints

This chapter introduces you to transactions, a fundamental part of every enterprise-level database system. Transactions are a way for a database to manage multiple operations, making them as though they were a single atomic operation. PostgreSQL has very rich and standard-compliant transaction machinery that allows users to specifically define transaction properties, including nested transactions.

PostgreSQL relies heavily on transactions to keep data consistent across concurrent connections and parallel activities, and thanks to Write-Ahead Logs (WALs), PostgreSQL does its best to keep the data safe and reliable. Moreover, PostgreSQL implements Multi-Version Concurrency Control (MVCC), a way to maintain high concurrency among transactions.

The chapter can be split into two parts: the first part is more practical and provides concrete examples of what transactions are, how to use them, and how to understand MVCC. The second part is...

Technical requirements

In order to proceed, you need to know the following:

How to issue SQL statements via psql
How to connect to the cluster and a database
How to check and modify the cluster configuration

The chapter examples are available in the book’s code repository and can be run on the standalone Docker image that you can find in the book’s GitHub repository: https://github.com/PacktPublishing/Learn-PostgreSQL-Second-Edition.

Introducing transactions

A transaction is an atomic unit of work that either succeeds or fails. Transactions are a key feature of any database system and are what allows a database to implement the properties: Atomicity, Consistency, Isolation, and Durability (ACID). Altogether, the ACID properties mean that the database must be able to handle units of work whole (atomicity), store data in a permanent way (durability), without inter-mixed changes to the data (consistency), and in a way that concurrent actions are executed as if they were alone (isolation).

You can think of a transaction as a bunch of related statements that, ultimately, will either all succeed or fail. Transactions are everywhere in a database, and you will have already used them even if you did not realize it: function calls, single statements, and so on are executed in a (tiny) transaction block. In other words, every action you issue against the database is executed within a transaction, even if you did not...

Transaction isolation levels

In a concurrent database system, you could encounter three different problems:

Dirty reads: A dirty read happens when the database allows a transaction to see work-in-progress data from other not-yet-finished transactions. In other words, data that has not been consolidated is visible to other transactions. No production-ready database allows that, and PostgreSQL is no exception: you are assured your transaction will only perceive data that has been consolidated, and in order to be consolidated, the transactions that created such data must be complete.
Unrepeatable reads: An unrepeatable read happens when the same query, within the same transaction, executed multiple times, perceives a different set of data. This essentially means that the data has changed between two sequential executions of the same query in the same transaction. PostgreSQL does not allow this kind of problem by means of snapshots: every transaction can perceive the...

Explaining MVCC

xmin is only a part of the story of managing MVCC. PostgreSQL labels every tuple in the database with four different fields, named xmin (already described), xmax, cmin, and cmax. Similar to what you learned about xmin, in order to make those fields appear in a query result, you need to explicitly reference them – for instance:

forumdb=> SELECT xmin, xmax, cmin, cmax, * FROM tags ORDER BY tag;
 xmin | xmax | cmin | cmax | pk | tag  | parent
------+------+------+------+----+------+--------
 4854 |    0 |    0 |    0 | 24 | c++  |
 4853 |    0 |    0 |    0 | 23 | java |
 4852 |    0 |    0 |    0 | 22 | perl |
 4855 |    0 |    0 |    0 | 25 | unix |
(4 rows)

The meaning of xmin has been already described in a previous section: it indicates the transaction identifier of the transaction that created the tuple. The xmax field, on the other hand, indicates the xid of the transaction that invalidated the tuple, for example, because it has deleted the data...

Savepoints

A transaction is a block of work that must either succeed or fail as a whole. A savepoint is a way to split a transaction into smaller blocks that can be rolled back independently of each other. Thanks to savepoints, you can divide a big transaction (one transaction with multiple statements) into smaller chunks, allowing a subset of the bigger transaction to fail without having the overall transaction fail. PostgreSQL does not handle transaction nesting, so you cannot issue a nested set of BEGIN or COMMIT/ROLLBACK statements. Savepoints, however, allow PostgreSQL to mimic the nesting of transaction blocks.

Savepoints are marked with a mnemonic name, which you can use to commit or roll back. The name must be unique within the transaction, and if you reuse the same over and over, the previous savepoints with the same name will be discarded. Let’s see an example:

forumdb=> BEGIN;
BEGIN
forumdb=> INSERT INTO tags( tag ) VALUES ( 'Eclipse IDE'...

Deadlocks

A deadlock is an event that happens when different transactions depend on each other in a circular way. Deadlocks are, to some extent, normal events in a concurrent database environment and nothing an administrator should worry about, unless they become extremely frequent, meaning there is some dependency error in the applications and the transactions.

When a deadlock happens, there is no choice but to terminate the locked transactions. PostgreSQL has a very powerful deadlock detection engine that does exactly this job: it finds stalled transactions and, in the case of a deadlock, terminates them (producing a ROLLBACK).

In order to produce a deadlock, imagine two concurrent transactions applying changes to the very same tuples in a conflicting way. For example, the first transaction could do something like the following:

-- session 1
forumdb=> BEGIN;
BEGIN
forumdb=> SELECT txid_current();
 txid_current
--------------
         4875
(1 row)
forumdb=> UPDATE...

How PostgreSQL handles persistency and consistency: WALs

In the previous sections, you have seen how to interact with transactions and, most notably, how PostgreSQL executes every statement within a transaction, either explicitly or implicitly.

PostgreSQL does, internally, very complex work to ensure that consolidated data in storage reflects the status of the committed transactions. In other words, data can be considered consolidated only if the transaction that produced (or modified) it has been successfully committed. But this also means that, once a transaction has been successfully committed, its data is “safe” on storage, no matter what happens in the future: if a transaction is reported to be successful, its data must be made persistent, even if the database or the whole system crashes.

PostgreSQL manages transactions and data consolidations by means of WALs. This section introduces you to the concept of WALs and their use within PostgreSQL.

WALs

...

VACUUM

In the previous sections, you learned how PostgreSQL exploits MVCC to store different versions of the same data (tuples) that different transactions can perceive, depending on their active snapshot. However, keeping different versions of the same tuples requires extra space with regard to the last active version, and this space could fill your storage sooner or later. To prevent that, and reclaim storage space, PostgreSQL provides an internal tool named VACUUM, the aim of which is to analyze stored tuple versions and remove the ones that are no longer perceivable.

Remember: a tuple is not perceivable (visible) when there are no more active transactions that can reference the version, which means having the tuple version within their snapshot. A not-perceivable tuple is often called a dead tuple, marking the fact that it is not required anymore in the database life cycle.

VACUUM can be an I/O-intensive operation, since it must reclaim and free disk space...

Summary

PostgreSQL exploits MVCC to provide high concurrent access to underlying data, and this means that every transaction perceives a snapshot of the data while the system keeps different versions of the same tuples. Sooner or later, invalid tuples will be removed, and storage space will be reclaimed. On one hand, MVCC provides better concurrency, but on the other hand, it requires extra effort to reclaim the storage space once transactions no longer reference dead tuples. PostgreSQL provides VACUUM for this aim and also has a background process machinery, named autovacuum, to periodically and non-invasively keep a system clean and healthy.

In order to improve I/O and reliability, PostgreSQL stores data in a journal written sequentially, the WAL. The WAL is split into segments, and at particular time intervals, named checkpoints, all the dirty data in memory is forced to a specified position in the storage, and the WAL segments are recycled.

In this chapter, you have learned...

Verify your knowledge

What is a transaction?
A transaction is a unit of work that is either consolidated or discarded as a whole. A transaction can be made by a single statement or multiple statements and can be implicit or explicit. See the Introducing transactions section for more details.

What is an xid and to which problem is it subject?
An xid is a transaction identifier, a number that uniquely represents a transaction within the whole cluster. Being stored as a counter, the value is subject to the so-called problem of xid wraparound, which VACUUM and autovacuum freezing solve. See the More about transaction identifiers – the XID wraparound problem section for more details.

What is MVCC?
MVCC is a technique by which, at a given instant, multiple versions of a tuple can exist within a database. Depending on the currently running transactions and their commit status, a different version is used. See the Explaining MVCC...

References

PostgreSQL transaction isolation levels – official documentation: https://www.postgresql.org/docs/current/sql-set-transaction.html
PostgreSQL transaction isolation level SERIALIZABLE – official documentation: https://www.postgresql.org/docs/current/transaction-iso.html#XACT-SERIALIZABLE
PostgreSQL savepoints – official documentation: https://www.postgresql.org/docs/current/sql-savepoint.html
PostgreSQL VACUUM – official documentation: https://www.postgresql.org/docs/current/sql-vacuum.html

Learn more on Discord

To join the Discord community for this book – where you can share feedback, ask questions to the author, and learn about new releases – follow the QR code below:

https://discord.gg/jYWCjF6Tku

The rest of the chapter is locked

You have been reading a chapter from

Learn PostgreSQL - Second Edition

Published in: Oct 2023Publisher: PacktISBN-13: 9781837635641

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Luca Ferrari

Luca Ferrari has been passionate about computer science since the Commodore 64 era, and today holds a master's degree (with honors) and a Ph.D. from the University of Modena and Reggio Emilia. He has written several research papers, technical articles, and book chapters. In 2011, he was named an Adjunct Professor by Nipissing University. An avid Unix user, he is a strong advocate of open source, and in his free time, he collaborates with a few projects. He met PostgreSQL back in release 7.3; he was a founder and former president of the Italian PostgreSQL Community (ITPUG). He also talks regularly at technical conferences and events and delivers professional training.
Read more about Luca Ferrari

Enrico Pirozzi

Enrico Pirozzi, EnterpriseDB certified on implementation management and tuning, with a master's in computer science, has been a PostgreSQL DBA since 2003. Based in Italy, he has been providing database advice to clients in industries such as manufacturing and web development for 10 years. He has been training others on PostgreSQL since 2008. Dedicated to open source technology since early in his career, he is a cofounder of the PostgreSQL Italian mailing list, PostgreSQL-it, and of the PostgreSQL Italian community site, PSQL
Read more about Enrico Pirozzi

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages