Reader small image

You're reading from  PostgreSQL 14 Administration Cookbook

Product typeBook
Published inMar 2022
PublisherPackt
ISBN-139781803248974
Edition1st Edition
Concepts
Right arrow
Authors (2):
Simon Riggs
Simon Riggs
author image
Simon Riggs

Simon Riggs is the CTO of 2ndQuadrant, having contributed to PostgreSQL as a major developer and committer for 14 years. He has written and designed features for replication, performance, BI, management, and security. Under his guidance, 2ndQuadrant is now a leading developer of open source PostgreSQL, serving hundreds of clients in USA, Europe, and worldwide. Simon is a frequent speaker at many conferences on PostgreSQL Futures. He has worked as a database architect for 30 years.
Read more about Simon Riggs

Gianni Ciolli
Gianni Ciolli
author image
Gianni Ciolli

Gianni Ciolli is the Vice President for Solutions Architecture at EnterpriseDB (EDB). As a PostgreSQL consultant, he has driven many successful enterprise deployments for customers in every part of the globe.Gianni is respected worldwide as a popular speaker and trainer at many PostgreSQL conferences in Europe and abroad over the last 14 years. He has worked with free and open-source software since the 1990s as an active member of the community (Prato Linux User Group, and Italian PostgreSQL Users Group). Gianni has a Ph.D. in Mathematics from the University of Florence. He lives in London with his son. His other interests include music, drama, poetry and athletics.
Read more about Gianni Ciolli

View More author details
Right arrow

Quickly estimating the number of rows in a table

We don't always need an accurate count of rows, especially on a large table that may take a long time to execute. Administrators often need to estimate how big a table is so that they can estimate how long other operations may take.

How to do it…

The Postgres optimizer can provide a quick estimate of the number of rows in a table simply by using its statistics:

EXPLAIN SELECT * FROM mytable;
                          QUERY PLAN                            
----------------------------------------------------------------
 Seq Scan on mytable  (cost=0.00..2640.00 rows=100000 width=97)
(1 row)

We can directly compute a similar number using roughly the same calculation:

SELECT (CASE WHEN reltuples > 0 THEN pg_relation_size(oid)*reltuples/(8192*relpages) 
ELSE 0
END)::bigint AS estimated_row_count
FROM pg_class
WHERE oid = 'mytable'::regclass;

This gives us the following output:

 estimated_row_count
---------------------
               99960
(1 row)

Both queries return a row count very quickly, no matter how large the table that we are examining is, because they use statistics that were collected in advance. You may want to create a SQL function for the preceding calculation, so you won't need to retype the SQL code every now and then.

The following function estimates the total number of rows using a mathematical procedure called extrapolation. In other words, we take the average number of bytes per row resulting from the last statistics collection, and we apply it to the current table size:

CREATE OR REPLACE FUNCTION estimated_row_count(text) 
RETURNS bigint 
LANGUAGE sql 
AS $$ 
SELECT (CASE WHEN reltuples > 0 THEN 
              pg_relation_size($1)*reltuples/(8192*relpages) 
             ELSE 0 
             END)::bigint 
FROM pg_class 
WHERE oid = $1::regclass; 
$$;

How it works…

We saw the pg_relation_size() function earlier, so we know that it brings back an accurate value for the current size of the table.

When we vacuum a table in Postgres, we record two pieces of information in the pg_class catalog entry for the table. These two items are the number of data blocks in the table (relpages) and the number of rows in the table (reltuples). Some people think they can use the value of reltuples in pg_class as an estimate, but it could be severely out of date. You will also be fooled if you use information in another table named pg_stat_user_tables, which is discussed in more detail in Chapter 10Performance and Concurrency.

The Postgres optimizer uses the relpages and reltuples values to calculate the average rows per block, which is also known as the average tuple density.

If we assume that the average tuple density remains constant over time, then we can calculate the number of rows using this formula: Row estimate = number of data blocks * rows per block.

We include some code to handle cases where the reltuples or relpages fields are zero. The Postgres optimizer actually works a little harder than we do in that case, so our estimate isn't very good.

The WHERE oid = 'mytable'::regclass; syntax introduces the concept of object identifier types. They just use a shorthand trick to convert the name of an object to the object identifier number for that object. The best way to understand this is to think of that syntax as meaning the same as a function named relname2relid().

There's more…

The good thing about the preceding recipe is that it returns a value in about the same time, no matter how big the table is. The bad thing about it is that pg_relation_size() requests a lock on the table, so if any other user has an AccessExclusiveLock lock on the table, then the table size estimate will wait for the lock to be released before returning a value.

Err... so what is an AccessExclusiveLock lock? While performing a SQL maintenance action, such as changing the data type of a column, PostgreSQL will lock out all other actions on that table, including pg_relation_size, which takes a lock in the AccessShareLock mode. For me, a typical case is when I issue some form of SQL maintenance action, such as ALTER TABLE, and the statement takes much longer than I thought it would. At that point, I think, Oh, was that table bigger than I thought? How long will I be waiting? Yes, it's better to calculate that beforehand, but hindsight doesn't get you out of the hole you are in right now. So, we need a way to calculate the size of a table without needing the lock.

A solution is to look at the operating system files that Postgres uses to store data, and figure out how large they are, but that requires a high level of security than most people usually allow. In any case, looking at files without a lock could cause problems if the table were dropped or changed.

Previous PageNext Page
You have been reading a chapter from
PostgreSQL 14 Administration Cookbook
Published in: Mar 2022Publisher: PacktISBN-13: 9781803248974
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Simon Riggs

Simon Riggs is the CTO of 2ndQuadrant, having contributed to PostgreSQL as a major developer and committer for 14 years. He has written and designed features for replication, performance, BI, management, and security. Under his guidance, 2ndQuadrant is now a leading developer of open source PostgreSQL, serving hundreds of clients in USA, Europe, and worldwide. Simon is a frequent speaker at many conferences on PostgreSQL Futures. He has worked as a database architect for 30 years.
Read more about Simon Riggs

author image
Gianni Ciolli

Gianni Ciolli is the Vice President for Solutions Architecture at EnterpriseDB (EDB). As a PostgreSQL consultant, he has driven many successful enterprise deployments for customers in every part of the globe.Gianni is respected worldwide as a popular speaker and trainer at many PostgreSQL conferences in Europe and abroad over the last 14 years. He has worked with free and open-source software since the 1990s as an active member of the community (Prato Linux User Group, and Italian PostgreSQL Users Group). Gianni has a Ph.D. in Mathematics from the University of Florence. He lives in London with his son. His other interests include music, drama, poetry and athletics.
Read more about Gianni Ciolli