An Overview of PostgreSQL

Achim Vannahme

December 2015

In this article by Salahadin Juba, Achim Vannahme, and Andrey Volkov author of the book Learning PostgreSQL, PostgreSQL is a great tool from different perspectives, first of all postgres is an open source software with a very high development activities, commercial and community support, and very broad customer base; due to this, PostgreSQL has a very rich and attractive features; it is secure, reliable, cost effective, and customizable.

(For more resources related to this topic, see here.)

PostgreSQL supports several advanced SQL statements such as window functions, common table expressions, and lateral joins. Advanced SQL statements allow the developers to build very complex logic easily using only SQL language. Also, PostgreSQL has very rich extensions and data types; these extensions falls in different categories such as foreign data wrappers, administration utilities and functions, reporting, geographical information system and so on. PostgreSQL can be easily extended by utilizing PL/pgSQL, SQL and C languages, add to that several programing languages such as PL/Perl and PL/python. PostgreSQL supports several rich data types JSON, XML, ARRAY, and hstore.

From administration point of view, postgres is very easy to install and maintain. Setting up and maintaining a streaming replication is relatively easy. PostgreSQL shipped with different utilities and tools to allow the database administrators to perform logical as well as physical dumps and to maintain different aspect of database health. PostgreSQL configuration settings allow the database administrator to get the best of the used hardware for PostgreSQL cluster.

In this article will an overview about PostgreSQL basic data structures is given.

PostgreSQL identifiers

PostgreSQL object names length is 63 characters; also PostgreSQL follow ANSI SQL regarding case sensitivity. If one wanted, to use camel case in renaming database objects, he could achieve that by double quoting the identifier name. The PostgreSQL identifier names have the following constraints:

  • The identifier name should start with underscore or a letter. Letters can be Latin or non-Latin letters.
  • The identifier name can be composed of letters, digits, underscore and dollar sign. For compatibility reasons, the usage of dollar sign is not recommended.
  • The minimum length of the identifier is typically one and the maximum length is 63.

In addition to above, it is not recommended to use keywords as table names.

PostgreSQL Objects hierarchy             

Understanding PostgreSQL database logical objects organization helps in understanding object relations and interactions. PostgreSQL databases, roles, tablespaces, settings and template languages have the same, and highest level of hierarchy among all the object in the PostgreSQL cluster.

Template databases

By default, when a database is created it is cloned form a template database called Template1. The template database contains a set of tables, views and functions which is used to model the relation between user defined database objects. These tables, views and functions are a part of system catalog schema which is called pg_catalog.

The schema is very close to namespace concept in object oriented languages. It is often used to organize database objects, functionality, security access, or eliminate name collision.

The PostgreSQL server has two template databases:

  • Template1: The default database to be cloned. It can be modified to allow global modification to all newly created databases. For example, if one intends to use a certain extension in all databases, then he could install this extension in Template1 database. Certainly, installing an extension in Template1 will not be cascaded to already existing databases; but, it will affect the databases which will be created after this installation.
  • Template0: A safeguard or version database, it has several purposes as follows:
  • If Template1 is corrupted by a user, then it can be used to fix Template1.
  • It is handy, in restoring a database dump. When a developer dumps a database, all extensions are also dumped. If the extension is already installed in Template1 this will lead to collision because the newly created database already contains the extensions.
  • Unlike Template1, Template0 does not contain encoding-specific or locale-specific data.

One could create a database using a user database as a template; this is very handy for testing, database refactoring purposes, deployment plans and so on.

User databases

One could have as many databases as he wants in a database cluster. A client connection to the PostgreSQL server can access only the data in a single database which specified in the connection string. That means data is not shared within databases, unless postgres foreign data wrapper or dblink extension is used.

Every database in the database cluster has an owner and a set of associated permissions to control allowed actions for a particular role. The psql meta-command \l is used to list all databases in the database cluster with associated attributes.

The database access privileges are

  • Create (C): The create access privilege allows the specified role to create new schema in the database.
  • Connect (-c): When a role tries to connect to database, the connect permissions is checked. Note that, in Template0 the PostgreSQL is allowed only to connect.
  • Temporary (T): The temporary access privilege allows the specified role to crate temporary tables. Temporary tables are very similar to tables but they are not persistent and they are destroyed after the user session is terminated.

Encoding allows you to store text in a variety of character sets including one byte character set such as SQL_ASCII or multiple byte character set such as UTF-8. PostgreSQL supports a rich set of character encodings; for the full list of character encoding http://www.postgresql.org/docs/current/static/multibyte.html.

In addition to the above attribute, a PostgreSQL database has other several attributes for various purposes including:

  • Maintenance: The attribute datfrozenxid is used to determine if a database vacuum is required.
  • Storage management: The dattablespace attribute is used to determine database tablespace.
  • Concurrency: The datconnlimit attribute is used to determine the number of allowed connections. -1 means no limits.
  • Protection: The datallowconn attribute disables connection to a database. This is used mainly to protect Template0 from being altered.

Roles

Roles belong to the PostgreSQL server cluster and not to a certain database. A role can be either a database user or a database group. The role concept subsumes the concepts of users and groups in old PostgreSQL versions. For compatibility reasons with PostgreSQL version 8.1, the CREATE USER and CREATE GROUP SQL commands are still supported.

The roles have several attributes as follows:

  • Super user: A super user role can bypass all permissions checks except login attribute.
  • Login: A role with login attribute can be used to connect to a database by a client.
  • Create database: A role with create database attribute can create databases.
  • Initiating replication: a role with this attribute can be used for streaming replication.
  • Password: The role password can be used with md5 authentication method. Also, it can be encrypted. The password expiration can be controlled by specifying the validity period. Note that, the password differs from the OS password.
  • Connection limit: Connection limit specifies the number of concurrent connection the user can initiate. Connection creation consumes hardware resources; thus, it is recommended to use connection pooling tools such as pgpool-II or pgbouncer or some APIs such as apache DBCP or c3p0.
  • Inherit: If specified, the role will inherit the privileges assigned to the roles it is a member of. If not specified, Inherit is the default.

When a database cluster is created, the postgres super user role is created by default.

CREATE USER is equivalent to CREATE ROLE with LOGIN option and CREATE GROUP is equivalent to CREATE ROLE with NOLOGIN option.

A role can be a member of another role to ease accessing and managing database permissions; for example, one could create a role with no login also known as group and grant it permissions to access database objects. If a new role needs to access the same database objects with the same permissions as the group, the new role could be assigned a membership to this group. This is achieved by the SQL commands GRANT and REVOKE.

Roles of a cluster do not necessarily have the privilege to access every database in the cluster.

Tablespace

Tablespace is a defined storage location for database or database objects. Tablespaces are used by administrators to achieve the following:

  • Maintenance: If the hard disk partition runs out of space where the database cluster is created and cannot be extended, a tablespace on another partition can be created to solve this problem by moving the data to another location.
  • Optimisation: Heavily accessed data could be stored in fast media such as solid-state-drive (SSD). At the same time tables which is not performance critical could be stored on slow disk.

The SQL statement to create tablespace is CREATE TABLESPACE.

Template programming languages

Template programming language is used to register a new language in a convenient way. There are two ways to create a programing language; the first way is by specifying only the programing language name. In this way PostgreSQL consult the programing language template and determine the parameters. The second way is to specify the name as well as parameters. The SQL command to create a language is CREATE ALNGUAGE.

In PostgreSQL versions older than 9.1, create extension can be used to install a programming language.

Settings

PostgreSQL setting controls different aspect of PostgreSQL server including replication, write ahead logs, resource consumption, query planning, logging, authentication, statistic collection, garbage collection, client connections, lock management, error handling and debug options.

The SQL command below shows the number of PostgreSQL settings.

postgres=# SELECT count(*) FROM pg_settings;
count
-------
   219
(1 row)

Setting parameters

The setting names are case-insensitive, the setting value domains can be:

  • Boolean: 0, 1, true, false, on, off or any case-insensitive form of the previous values. ENABLE_SEQSCAN setting falls in this category.
  • Integer: An integer might specify a memory or time value; there is an implicit unit for each setting such as second or minute. In order to remove confusion PostgreSQL allows units to be specified. For example, one could specify 128 MB as a shared_buffers setting value.
  • Enum: predefined values such as ERROR and WARNNING.
  • Floating point: cpu_operator_cost has a floating point domain. cpu_operator_cost is used to optimize PostgreSQL execution plans.
  • String: A string might be used to specify file location on a hard disk such as the location of authentication file.

Setting context

The setting context determines how to change a setting value and when the change can take effect, the setting contexts are:

  • internal: The setting cannot be changed directly. One might need to recompile the server source code or initialize the database cluster to change this. For example, PostgreSQL identifiers length is 63 characters.
  • postmaster: Changing a setting value requires restarting the server. Values for these setting are typically stored in the PostgreSQL postgresql.conf file.
  • SIGHUP: No server restart is required. The setting change can be made by amending the postgresql.conf followed by sending a SIGHUP signal to postgres server process.
  • Backend: No server restart is required; they also can be set for a particular session.
  • superuser: Only a super user can change this setting, the setting can be set in postgresql.conf or via SET command.
  • user: Similar to superuser, typically used to change session-local values.

PostgreSQL provides SET and SHOW commands to change and inspect a setting parameter values respectively. Those commands are used to change setting parameters with superuser and user context. Typically, changing the value of a setting parameter in the postgresql.conf file makes the effect global. For example, let us assume you would like some clients to be able to perform read only operation; this is useful to configure some tools such as Confluence (Atlassian). In this case we could achieve that by setting the default_transaction_read_only parameter.

postgres=# SET default_transaction_read_only to on;
SET
postgres=# CREATE TABLE test_readonly AS SELECT 1;
ERROR: cannot execute SELECT INTO in a read-only transaction

In the preceding example, the creation of a table has failed within the opened session; however, if one opens new session and tries to execute the command CREATE TABLE, it will be executed successfully because the default value of default_transaction_read_only setting is off. Setting the default_transaction_read_only parameter in the postgresql.conf will have a global effect.

PostgreSQL provides pg_reload_conf () function, this function is equivalent to sending SIGHUP signal to postgres process.

PostgreSQL database components

A PostgreSQL database could be considered as a container for database schema; the database must contain at least one schema. A database schema is used to organize database objects in a similar way to namespaces in high programing languages.

Schema

Object names can be reused in different schema without conflict. Schema contain all database named objects including tables, views, functions, aggregates, indexes, sequences, triggers, data types, domains and ranges.

By default there is a schema called public in the template databases; that means, all newly created databases contain also this schema. All users by default can access this schema implicitly; again this is inherited from template databases. Allowing this access pattern stimulates the situation where the server is not schema aware; this is useful in small companies where there is no need to have a complex security. Also, this enables smooth transition from non-schema-aware databases.

Warning: in a multi-user and multi database environment setup remember to revoke the ability for all users to create objects in the public schema. This is done by the following command in the newly created database or in Template1 database.

REVOKE CREATE ON SCHEMA public FROM PUBLIC.

When a user wants to access a certain object he needs to specify the schema name and the object name separated by period(.); for example, to select all entries in pg_database in the pg_catalog schema, one need to write the following command.

SELECT * FROM pg_catalog.pg_database;
-- or
TABLE pg_catalog.pg_database;

Qualified database object names are sometimes tedious to write; so, many developers prefer to use the unqualified object name which is composed only the object name without the schema. PostgreSQL provides a search_path setting which is similar to using directive in C++ language. The search path is composed of schemas which used by the server to search for the object. The default search path as shown below is $user, public. If there is a schema with the same name as the user, then it will be used first to search for objects or creating new objects. If the object was not found in the schemas specified in the search_path then an error will be thrown.

SHOW search_path;
--------------
$user,public

Schema usages

Schema is used for several reasons as explained below:

  • Control authorization: in a multi-user database environment one could use schemas to group objects based on roles.
  • Organize database objects: one could organize database objects in groups based on the business logic. For example, historical and auditing data could be logically grouped and organized in a specific schema.
  • Maintain third party SQL code: extensions available in the contribution package can be used with several applications. Maintaining these extensions in separate schemas enables the developer to reuse these extensions and to update them easily.

For more information about CREATE SCHEMA command syntax, one could use psql \h meta-command or visit PostgreSQL manual http://www.postgresql.org/docs/current/static/sql-createschema.html.

Tables

The CREATE TABLE SQL statement is very rich; it can be used for several purposes such as cloning a table which is handy in database refactoring to create uninstallation script to rollback changes. Also, it could be used to materialize the result of SELECT SQL statement to boost performance, or temporarily storing the data for later use.

PostgreSQL tables are used internally to model views and sequences.

In PostgreSQL tables can have different types as follow:

  • Permanent table: The table life cycle starts with table creation and ends with the table dropping.
  • Temporary table: The table life cycle is the user session. This is used often with procedural languages to model some business logic.
  • Unlogged table: Operations on unlogged table are much faster than permanent tables because data is not written into the WAL files. Unlogged table are not crash-safe. Also, since streaming replication is based on shipping log files, unlogged tables cannot be replicated to the slave node.
  • Child table: A child table is a table which inherits one or more tables. The inheritance is often used with constraint exclusion to physically partition data on the hard disk and gain performance in retrieving subset of data which have a certain value.

The create table syntax is quite long, the full syntax of create table can be found on http://www.postgresql.org/docs/current/static/sql-createtable.html. The create table SQL commands normally requires the following input:

  • Table name of the created table.
  • The table type.
  • The table storage parameters, these parameters are used to control table storage allocation and other several administrative tasks.
  • The table columns including the data type, default values and constraint.

PostgreSQL native data types

When designing a database table one should take care in picking the appropriate data type, when database goes to production changing the data type of a column might become very costly operation specifically for heavily loaded tables. The cost comes often form locking the table and in some cases rewriting it. When picking a data type, consider a balance between the following factors:

  • Extensibility: Can the maximum length of a type increased or decreased without a full table rewrite, and a full table scan.
  • Data type size: Going for a safe option such as choosing big integer instead of integer will cause more storage consumption.

PostgreSQL provides a very extensive set data type, some of the native data type categories are:

  • Numeric type
  • Character type
  • Date and time type

The above data types are almost common for all relational databases; also, often they are sufficient to model traditional applications.

Views

A view can be seen as a named query or as a wrapper around a SELECT statement. Views are an essential building block for relational database, from UML modeling perspective, a view can be thought as a method for a UML class. Views share several advantages with procedures, so the following benefits are shared among views as well as stored procedures, views can be used to

  • Simplify complex queries and increase code modularity
  • Tune performance by cashing the view results for later use
  • Decrease the amount of SQL code
  • Bridge between relational database and OO languages—especially updatable views
  • Implement authorization on row level by opting out rows which do not meet a certain predicate
  • Implement interfaces and abstraction layer between high level languages and Relational databases
  • Implement a last minute changes without redeploying the software

Unlike stored procedures, the views dependency tree is maintained in the database, thus altering a view might be forbidden due to cascading effect.

The way views are designed are essential, here is a list of general advices to handle views properly.

In PostgreSQL, one cannot find a dangling view due to the maintenance of the view dependency tree. However, for stored procedure this might happen.

  • The view definition should be crisp: The view should meet current business need but not potential future business needs. It should be designed to provide a certain functionality or service. Note that, the more attributes in the view, the more effort required to re-factor this view.  In addition to that, when the view aggregates data from many tables and used as an interface, performance degradation might appear due to many factors such as: bad execution plans due to outdated statistics for some tables, and execution plan time generation.
  • Views dependency: When having complex business logic implemented in the database using views and stored procedure, then database re-factoring, mainly for base tables might be very expensive. To solve this issue, consider to migrating the business logic to the application business tier.
  • Take care of business tier needs: Some frameworks such as object relational mappers might require specific needs such as unique key -row number-.

In PostgreSQL, the view is internally modeled as table with a _RUTURN rule. So, the following two pieces of code are equivalent:

CREATE VIEW test AS SELECT 1 AS one;

CREATE TABLE test (one INTEGER);
CREATE RULE "_RETURN" AS ON SELECT TO test DO INSTEAD
   SELECT 1;

The preceding example, is for explanation only, it is not a recommended to play with PostgreSQL catalogue manually including reserved RULEs. Also, note that a table can be converted to a view and not vice versa.

When one creates views, the created tables are used to maintain the dependency between the created views; and the queries which are used to define the views are the one which executed when one SELECT from the view. So when executing the query:

SELECT * FROM test;

We actually execute:

SELECT * FROM(SELECT 1) AS test;

To understand views dependency, let us build a view using another view as follows:

--date_trunc function is similar to trunc function for numbers,
CREATE VIEW day_only AS
SELECT date_trunc('day', now()) AS day;

CREATE VIEW month_only AS
SELECT date_trunc('month', day_only.day)AS month FROM day_only;

The preceding views month_only and day_only are truncating the time to day and month respectively. The month_only view depends on day_only. In order to drop the day_only view one could achieve this by one of the following options as shown as follows:

  • Drop first the month_only view then the date_only view:
    example=# DROP VIEW day_only;
    ERROR: cannot drop view day_only because other objects depend on it
    DETAIL: view month_only depends on view day_only
    HINT: Use DROP ... CASCADE to drop the dependent objects too.
    example=# DROP VIEW month_only;
    DROP VIEW
    example=# DROP VIEW day_only;
    DROP VIEW
  • Use CASCADE option when dropping the view:
    example=# DROP VIEW day_only CASCADE;
    NOTICE: drop cascades to view month_only
    DROP VIEW

Indexes

Index is a physical database objects which is defined on a table column or list of columns. In PostgreSQL, there are many types of indexes and several ways to use them, indexes can be used in general to:

  • Optimize performance, index allows an efficient retrieval of small number or rows from the table, small number is often determined by the table number of rows.
  • Validate constraint, unlike check constraint; index can be used to validate constraints on several rows. For example, the UNIQUE check constraint creates behind the scene a unique index on the column.

Index types

PostgreSQL support different indexes, each index can be used for a certain scenario or a data type:

  • B-Tree index: This is the default index in postgres when the index type is not specified with CREATE INDEX command. The "B" stands for balanced, that means the data on both side of the tree is roughly equal. B-tree can be used for equality, ranges, and null predicates. B-Tree index supports all postgres data types.
  • Hash indexes: The hash indexes are not well supported in PostgreSQL. They are not transactional safe and not replicated to slave nodes in streaming replication. They are useful for equality predicates, but since B-tree index can support this use case and they are not well supported; it is not recommended to use them.
  • Generalized inverted index (GIN): The GIN index is useful when several values should map to one row, it can be used with complex data structure such as arrays and full-text search.
  • Generalized Search Tree (GiST): The GiST indexes allow building general balanced tree structures. They are useful in indexing the geometric data types, as well as full-text search.
  • Block range index (BRIN): This will be introduced in PostgreSQL 9.5, BRIN index is useful for very large tables where size is limited. BRIN index is slower than B-tree index but requires less space comparing to B-tree.

Partial index

Partial index indexes only subset of the table data which meets a certain predicate; the WHERE clause is used with the index. The idea behind the partial index is to decrease the index size, thus making it more maintainable and faster to access

Indexes on expressions

As said earlier, an index could be created on a table column or multiple columns; also, it can be created on expressions and function results. For example, when using lower (first_name) function the value of the function is not determined unless the function is executed, thus the index cannot be used. To solve this, an index on lower function could be used as follows and the lower function then could be used in a query.

CREATE index ON account(lower(first_name));
SELECT * FROM account WHERE lower(first_name) = 'foo';

Another usage of expression indexes is to find rows by casting a data type to another data type. For example, the birth of a child could be stored as a timestamp; however, we often search for a birth date not birth time.

Unique index

Unique index guarantees the uniqueness of a certain value in a row across the whole table.

In PostgreSQL renaming conventions, the suffixes for unique and normal indexes are key and idx respectively.

One could also use unique and partial indexes together. For example, let us assume we have a table called employee where each employee must have a supervisor except for the company head. We could model this as self-referencing table as follows:

CREATE TABLE employee (employee_id INT PRIMARY KEY, supervisor_id INT);
ALTER TABLE employee ADD CONSTRAINT supervisor_id_fkey FOREIGN KEY (supervisor_id) REFERENCES employee(employee_id);

To guarantee only one row is not assigned to a supervisor we could add the following unique index:

CREATE UNIQUE INDEX ON employee ((1)) WHERE supervisor_id IS NULL;

The unique index on the constant expression (1) will allow only one row with a null value. With the first insert of a null value a unique index will be built using the value 1, a second attempt to add a row with a null value will cause an error because the value 1 is already indexed.

Multicolumn index

Multi-column index can be used for a certain pattern of query conditions. If a query has a following pattern as:

SELECT * FROM table WHERE column1 = constant1 and column2= constant2 AND … columnn = constantn;

Then an index could be created on column1, column2,…, columnn where n is less or equal 32 . Currently only B-tree, GIN and GiST support multi-column indexes. When creating a multicolumn index the column order is important. A multicolumn index can be used to retrieve data if the leftmost columns are used with equality expressions and inequality expressions on the first column that does not have equality expressions.

Since multicolumn index size is often big, the planner might prefer to perform a sequential scan rather than reading the index.

Functions

A PostgreSQL functions is used to provide a distinct service and often composed of a set of declarations, expressions, and statements. PostgreSQL has a very rich built in functions almost for all existing data types. In this chapter we will focus on user defined functions and we will not give details about the syntax and function parameters since this will be covered in the next chapters

PostgreSQL native programming languages

PostgreSQL supports out of the box user-defined function to be written in C and SQL and PL/pgSQL. There are also three other procedural languages come with the standard PostgreSQL distribution: PL/Tcl, PL/Python, and PL/Perl; but, one needs to create the language in order to use them via CREATE EXTENSION PostgreSQL command or via createlang utility tool.

This simplest way to create a language and make it accessible to all databases is to create it in Template1 directly after the PostgreSQL cluster installation; note that one do not need to do this step for C, SQL and PL/pgSQL.

For beginners, the most convenient languages to use are SQL and PL/pgSQL since they are supported directly; also they are highly portable and do not need special care during upgrading the PostgreSQL cluster. Creating functions in C language is not as easy as creating it in SQL or PL/pgSQL; but, since C language is a general programing language one could use it to create very complex functions to handle complex data types such as images.

PostgreSQL function usages

PostgreSQL can be used in several scenarios, for example some developers use functions as an abstract interface with higher programming languages to hide the data model. Also, functions can have several other usages including:

  • Perform complex logic that is difficult to perform with SQL queries.
  • Dynamic SQL where function argument can be used to pass table and views names via EXECUTE statement.
  • Perform actions before or after the execution of SQL statement via the trigger system.
  • Perform exception handling and additional logging via EXCEPTION blocks and RASIE statement respectively.
  • Clean the SQL code by reusing the common code and bundle SQL codes in modules.

PostgreSQL function dependency

When using PostgreSQL function, one needs to be careful not ending with dangling functions since the dependency between functions in not well maintained in PostgreSQL system catalogue. The following example shows how one could end up with a dangling function:

CREATE OR REPLACE FUNCTION test_dep (INT) RETURNS INT AS $$
BEGIN
RETURN $1;
END;
$$
LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION test_dep_2(INT) RETURNS INT AS
$$
BEGIN
RETURN test_dep($1);
END;
$$
LANGUAGE plpgsql;
DROP FUNCTION test_dep(int);
SELECT test_dep_2 (5);

ERROR: function test_dep(integer) does not exist
LINE 1: SELECT test_dep($1)
               ^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
QUERY: SELECT test_dep($1)
CONTEXT: PL/pgSQL function test_dep_2(integer) line 3 at RETURN

In the preceding example, two functions where created one depend on the other. The test_dep function was dropped leaving test_depend_2 as a dangling function.

PostgreSQL function categories

When creating a function, by default it is marked as VOLATILE, if the volatility classification is not specified. If the created function is not VOLATILE, it is important to mark it as stable or immutable because this will help the optimizer to generate the optimal exaction plans. PostgreSQL functions can have one of three volatility classifications:

  • VOLATILE: The volatile function can return a different result on successive calls even if the function argument did not change or it can change the data in the database.
  • STABLE and IMMUTABLE: These functions cannot modify the database and is guaranteed to return the same result for the same argument. The stable function provides this guarantee within statement scope, while immutable function provides this guarantee globally without any scope.

For example, the function random() is volatile; since, it will give a different result for each call. The function round() is immutable because it will give the same result for the same argument always. The functions current time is stable since it will give always the same result within statement or transaction.

PostgreSQL anonymous functions

PostgreSQL provide DO statement which is used to execute anonymous code block. The DO statement reduces the needs to create shell script for administration purposes. However, one should note that in all PostgreSQL function are transactional, so if one would like to create for example indexes on partitioned tables shell scripting is a better alternative.

PostgreSQL user defined datatypes

PostgreSQL provides two methods for implementing user defined data type through the following commands:

  • CREATE DOMAIN: Create domain command allows developer to create a user defined data types with constraints, this help in making the source code more modular.
  • CREATE TYPE: Create type is used often to create composite type which is useful in procedural languages and used as the return data type. Also, one can use create type to create ENUM type which is useful in decreasing the number of joins specifically for lookup tables.

Often developers tend not to use user defined data types and use flat tables instead due to lack of support on the driver side such JDBC and ODBC. Still, in JDBC composite data types can be retried as a Java object and parsed manually.

PostgreSQL CREATE DOMAIN command

Domain is a data type with optional constraints, as other database objects it should have a unique name within the schema scope.

The first use case of domains is to use it for common patterns. For example, a text type which does not allow null values and does not contain spaces is a common pattern:

CREATE DOMAIN text_without_space_and_null AS TEXT NOT NULL CHECK (value !~ '\s');

In order to test the text_without_space_and_null let us use it in a table definition and execute several insert statements as follows:

CREATE TABLE test_domain (
test_att text_without_space_and_null
);
INSERT INTO test_domain values ('hello');
INSERT INTO test_domain values ('hello with space');
-- This error is raised: ERROR: value for domain text_without_space_and_null violates check constraint "text_without_space_and_null_check"

INSERT INTO test_domain values (NULL);
-- This error is raised: ERROR: domain text_without_space_and_null does not allow null values

Another good use case for creating domains is to create distinct identifiers across several tables; since some people tend to used numbers instead of names to retrieve information.

One could do that by creating a sequence and wrap it with a domain.

CREATE SEQUENCE global_id_seq;
CREATE DOMAIN global_id INT DEFAULT NEXTVAL('global_id_seq') NOT NULL;

Finally, one could alter the domain using ALTER DOMAIN command. If a new constraint is added to the domain, it will cause all attributes using this domain to be validated against the new constraint. One can control this by suppressing the constraint validation on old values and then cleaning up the tables individually. For example, let us assume we would like to have a constraint on the text length of the text_without_space_and_null domian, this could be done as follows:

ALTER DOMAIN text_without_space_and_null ADD CONSTRAINT text_without_space_and_null_length_chk check (length(value)<=15);

PostgreSQL CREATE TYPE command

Composite data type are very useful in creating function specially wen the return type is a row of several values. For example, let us assume we would like to have a function that returns the seller_id, seller_name, number of advertisement and the total rank for a certain customer account. The first step is to create a type as follows:

CREATE TYPE seller_information AS (seller_id INT, seller_name TEXT, number_of_advertisements BIGINT, total_rank float);

Then we can use the newly created data type as the return type of the function as follows:

CREATE OR REPLACE FUNCTION seller_information (account_id INT ) RETURNS seller_information AS
$$
SELECT
   seller_id,
   first_name || last_name as seller_name,
   count(*),
   sum(rank)::float/count(*)
FROM
   account INNER JOIN
   seller_account ON (account.account_id =
   seller_account.account_id) LEFT JOIN
   advertisement ON (advertisement.seller_account_id =
   seller_account.seller_account_id)LEFT JOIN
   advertisement_rating ON
     (advertisement.advertisement_id =
   advertisement_rating.advertisement_id)
WHERE
   account.account_id = $1
GROUP BY
   seller_id,
   first_name,
   last_name
$$
LANGUAGE SQL;

CREATE TYPE could be also to define ENUM; an ENUM type is a special data type that enables an attribute to be assigned one of a predefined constants. The usage of ENUM data types reduces the number of join to create some queries thus makes the SQL code more compact and easier to understand. For example:

CREATE TYPE rank AS ENUM ('poor', 'fair', 'good', 'very good', 'excellent');

ENUM data type order is determined by the order of the values in the ENUM at the creation time. So in our example, poor always comes first as shown in the example as follows:

CREATE TABLE rank_type_test (
id SERIAL PRIMARY KEY,
rank rank
);
INSERT into rank_type_test(rank) VALUES ('poor') , ('fair'), ('very good') ,( 'excellent'), ('good'), ('poor') ;

SELECT * FROM rank_type_test ORDER BY rank ASC;
id |   rank
----+-----------
17 | poor
22 | poor
18 | fair
21 | good
19 | very good
20 | excellent
(6 rows)

ENUM PostgreSQL data types are type safe and different ENUM data type cannot be compared with each other. Also ENUM data type can be altered and new values can be added; unfortunately it is not possible to take out old values.

Trigger and rule systems

PostgreSQL provides triggers and rules systems to perform a certain function automatically when an event is performed including INSERT, UPDATE, and DELETE.

Triggers and rules cannot be defined on SELECT statements except for the _RETURN which is used in PostgreSQL views internal implementation.

From functionality point of view, trigger system is more generic, it can be used to implement complex actions easier than rules; however, both trigger and rule systems can be used to implement the same functionality in several cases.

From performance point of view rules tends to be faster than triggers; but, but triggers tend to be simpler and more compatible with other RDBMs since the rule system is a PostgreSQL extension.

PostgreSQL rule system

Creating a rule will either rewire the default rule or create a new rule for a specific action on specific table or view. The rules are created by CREATE RULE statement that should match the following syntax:

CREATE [ OR REPLACE ] RULE name AS ON event
   TO table_name [ WHERE condition ]
   DO [ ALSO | INSTEAD ] { NOTHING | command | ( command ; command ... ) }

where event can be one of:

   SELECT | INSERT | UPDATE | DELETE

In other word, a rule on insert action can change the insert action behavior or can create a new action for the insert. When using the rule system one needs to note that it is based on C macro system, that means one can get strange results when it is used with volatile functions such as random() and sequence functions such as nextval().

Writing rules is often quicker than writing triggers and the code is crisper and shorter. A common case of using rules is to ignore CRUD operation on the table in order to protect the table against data changes. This scenario could be used for example to protect look up tables or to ignore CRUD operations. A very simple technique to do that without changing the client codes is as follows:

CREATE RULE forbid_inserts AS
   ON INSERT TO sample_table
   DO INSTEAD NOTHING;

When creating rule, one could have a conditional rule i.e. rewrite the action if a certain condition is met as shown in the rule synopsis below. However, one cannot have conditional rules for insert update and delete on views without having unconditional rule. To solve this one could create an unconditional dummy rule. Rules on views is one way to implement updatable views.

Finally, similar to triggers when defining rules one should take care of infinite recursion. Wrong rule definitions can cause the following error ERROR: infinite recursion detected in rules for relation …

PostgreSQL trigger system

PostgreSQL trigger calls a function when a certain event occurred on table, view, or foreign table. Triggers are executed when a user tries to modify data through a data manipulation language (DML) events including INSERT, UPDATE, DELETE or TRUNCATE. The create trigger synopsis is as follows:

CREATE [ CONSTRAINT ] TRIGGER name { BEFORE | AFTER | INSTEAD OF } { event [ OR ... ] }
   ON table_name
   [ FROM referenced_table_name ]
   [ NOT DEFERRABLE | [ DEFERRABLE ] { INITIALLY IMMEDIATE | INITIALLY DEFERRED } ]
   [ FOR [ EACH ] { ROW | STATEMENT } ]
   [ WHEN ( condition ) ]
   EXECUTE PROCEDURE function_name ( arguments )

where event can be one of:

   INSERT
   UPDATE [ OF column_name [, ... ] ]
   DELETE
   TRUNCATE

Trigger events time context is one of the following:

  • BEFORE: Applied on tables only and fired before constraints are checked and the operation is performed. Useful for checking data constraint on several tables not possible to model using referential integrity constraints.
  • AFTER: Applied on tables only and fired after the operation is performed. Useful for cascading the changes for other tables. An example use case is data auditing.
  • INSTEAD OF: Applied on views and used to make views updatable.

When the trigger is marked for each row, then the trigger function will be executed for each row that has been affected by the CRUD operation. Statement trigger is only executed once per operation. When the WHEN condition is supplied, then only rows which fulfill the condition will be handled by the trigger. Finally, triggers can be marked as CONSTRAINT to control when the trigger can be executed; the trigger can be executed after the end of the statement or at the end of the transaction. The constraint trigger must be AFTER and FOR EACH ROW trigger and the firing time controlled by the following options:

  • DEFERRABLE
  • INITIALLY DEFERRED
  • NOT DEFERRABLE
  • INITIALLY IMMEDIATE

Trigger names define the execution order of the triggers which have the same firing time context alphabetically.

The trigger function should fulfill the following requirements:

  • Return type: The trigger function should return the trigger pseudo type.
  • Return Value: The trigger function must return a value. The value is often NULL for AFTER and statement level triggers or RECORD/ROW with the exact structure of the table that fired the trigger in other cases.
  • No arguments: The trigger function must be declared without argument, even if one needs to pass argument to it. Argument passing is achieved via TG_ARG variable.

For row level trigger which is fired BEFORE the actual operation, returning null values will cancel the operation, which means the next trigger will not be fired and the affected row will not be deleted, updated or inserted. For the trigger which is fired AFTER the operation or statement level trigger, the return value will be ignored; however, the operation will be aborted if the trigger function raises an exception or an error due to the relational database transactional behavior.

Summary

The article introduces the main logical components of a PostgreSQL server and their hierarchy. The basic schema objects such as tables, indexes, views, functions, user defined data type, rule and trigger systems have also been discussed. The article also reveals some tricky points of some of those structures and describes some of the best practices of using them.

Still, PostgreSQL supports many other schema objects e.g. user defined aggregates, operators, casts, conversions, collations, foreign data wrappers, operator classes, text search dictionaries and others, not covered by the article, which makes PostgreSQL extremely flexible and powerful database management system.

Resources for Article:


Further resources on this subject:


You've been reading an excerpt of:

Learning PostgreSQL

Explore Title