Home

Data

PostgreSQL Server Programming

Book

eBook $32.99 $22.99

Print $54.99

Subscription $15.99 $10 p/m for three months

BUY NOW

$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

What do you get with a Packt Subscription?

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

eBook $32.99 $22.99

Print $54.99

Subscription $15.99 $10 p/m for three months

What do you get with a Packt Subscription?

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

About this book

Learn how to work with PostgreSQL as if you spent the last decade working on it. PostgreSQL is capable of providing you with all of the options that you have in your favourite development language and then extending that right on to the database server. With this knowledge in hand, you will be able to respond to the current demand for advanced PostgreSQL skills in a lucrative and booming market."PostgreSQL Server Programming" will show you that PostgreSQL is so much more than a database server. In fact, it could even be seen as an application development framework, with the added bonuses of transaction support, massive data storage, journaling, recovery and a host of other features that the PostgreSQL engine provides. This book will take you from learning the basic parts of a PostgreSQL function, then writing them in languages other than the built-in PL/PgSQL. You will see how to create libraries of useful code, group them into even more useful components, and distribute them to the community. You will see how to extract data from a multitude of foreign data sources, and then extend PostgreSQL to do it natively. And you can do all of this in a nifty debugging interface that will allow you to do it efficiently and with reliability.

Publication date:: June 2013
Publisher: Packt
Pages: 264
ISBN: 9781849516983

Chapter 1. What Is a PostgreSQL Server?

If you think that a PostgreSQL server is just a storage system, and the only way to communicate with it is by executing SQL statements, you are limiting yourself tremendously. That is using just a tiny part of the database's features.

A PostgreSQL server is a powerful framework that can be used for all kinds of data processing, and even some non-data server tasks. It is a server platform that allows you to easily mix and match functions and libraries from several popular languages. Consider this complicated, multi-language sequence of work:

Call a string parsing function in Perl.
Convert the string to XSLT and process the result using JavaScript.
Ask for a secure stamp from an external time-stamping service such as www.guardtime.com, using their SDK for C.
Write a Python function to digitally sign the result.

This can be implemented as a series of simple function calls using several of the available server programming languages. The developer needing to accomplish all this work can just call a single PostgreSQL function without having to be aware of how the data is being passed between languages and libraries:

SELECT convert_to_xslt_and_sign(raw_data_string);

In this book, we will discuss several facets of PostgreSQL server programming. PostgreSQL has all of the native server-side programming features available in most larger database systems such as triggers, automated actions invoked automatically each time data is changed. But it has uniquely deep abilities to override the built-in behavior down to very basic operators. Examples of this customization include the following.

Writing User-defined functions (UDF) in C for carrying out complex computations:

Add complicated constraints to make sure that data in the server meets guidelines.
Create triggers in many languages to make related changes to other tables, log the actions, or forbid the action to happen if it does not meet certain criteria.
Define new data types and operators in the database.
Use the geography types defined in the PostGIS package.
Add your own index access methods for either existing or new data types, making some queries much more efficient.

What sort of things can you do with these features? There are limitless possibilities, such as the ones listed as follows:

Write data extractor functions to get just the interesting parts from structured data, such as XML or JSON, without needing to ship the whole, possibly huge, document to the client application.
Process events asynchronously, like sending mail without slowing down the main application. You could create a mail queue for changes to user info, populated by a trigger. A separate mail-sending process can consume this data whenever it's notified by an application process.

The rest of this chapter is presented as a series of descriptions of common data management tasks showing how they can be solved in a robust and elegant way via server programming.

The samples in this chapter are all tested to work, but they come with minimal commentary. They are here just to show you various things server programming can accomplish. The techniques described will be explained thoroughly in later chapters.

Why program in the server?

Developers program their code in a number of different languages and it could be designed to run just about anywhere. When writing an application, some people follow the philosophy that as much of the logic as possible for the application, should be pushed to the client. We see this in the explosion of applications leveraging JavaScript inside browsers. Others like to push the logic into the middle tier with an application server handling the business rules. These are all valid ways to design an application, so why would you want to program in the database server?

Let's start with a simple example. Many applications include a list of customers who have a balance in their account. We'll use this sample schema and data:

CREATE TABLE accounts(owner text, balance numeric);
INSERT INTO accounts VALUES ('Bob',100);
INSERT INTO accounts VALUES ('Mary',200);

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

When using a database, the most common way to interact with it is to use SQL queries. If you want to move 14 dollars from Bob's account to Mary's account, with simple SQL it would look like this:

UPDATE accounts SET balance = balance - 14.00 WHERE owner = 'Bob';
UPDATE accounts SET balance = balance + 14.00 WHERE owner = 'Mary';

But you have to also make sure that Bob actually has enough money (or credit) on his account. It's also important that if anything fails then none of the transactions happen. In an application program, the preceding code snippet becomes:

BEGIN;
SELECT amount FROM accounts WHERE owner = 'Bob' FOR UPDATE;
-- now in the application check that the amount is actually bigger than 14
UPDATE accounts SET amount = amount - 14.00 WHERE owner = 'Bob';
UPDATE accounts SET amount = amount + 14.00 WHERE owner = 'Mary';
COMMIT;

But did Mary actually have an account? If she did not, the last UPDATE will succeed by updating zero rows. If any of the checks fail, you should do a ROLLBACK instead of COMMIT. Once you have done all this for all the clients that transfer money, a new requirement will invariably arrive. Perhaps, the minimum amount that can be transferred is now 5.00. You will need to revisit all your code in all your clients again.

So what can you do to make all of this more manageable, more secure, and more robust? This is where server programming, executing code on the database server itself, can help. You can move the computations, checks, and data manipulations entirely into a User-defined function (UDF) on the server. This does not just ensure that you have only one copy of operation logic to manage, but also makes things faster by not needing several round-trips between client and server. If required, you can also make sure that only as much information as needed is given out of the database. For example, there is no business for most client applications to know how much money Bob has on his account. Mostly, they only need to know if there is enough money to make the transfer, or more to the point, if the transaction succeeded.

Using PL/pgSQL for integrity checks

PostgreSQL includes its own programming language named PL/pgSQL that is aimed to integrate easily with SQL commands. PL stands for programming language, and this is just one of the many languages available for writing server code. pgSQL is shorthand for PostgreSQL.

Unlike basic SQL, PL/pgSQL includes procedural elements, like the ability to use if/then/else statements and loops. You can easily execute SQL statements, or even loop over the result of a SQL statement in the language.

The integrity checks needed for the application can be done in a PL/pgSQL function which takes three arguments: names of the payer and recipient, and the amount to pay. This sample also returns the status of the payment:

CREATE OR REPLACE FUNCTION transfer( 
              i_payer text, 
              i_recipient text, 
              i_amount numeric(15,2))
RETURNS text 
AS
$$
DECLARE
  payer_bal numeric;
BEGIN
  SELECT balance INTO payer_bal 
     FROM accounts 
  WHERE owner = i_payer FOR UPDATE;
  IF NOT FOUND THEN
    RETURN 'Payer account not found';
  END IF;
  IF payer_bal < i_amount THEN
    RETURN 'Not enough funds';
  END IF;

  UPDATE accounts 
        SET balance = balance + i_amount 
    WHERE owner = i_recipient;
  IF NOT FOUND THEN
    RETURN 'Recipient does not exist';
  END IF;

  UPDATE accounts 
         SET balance = balance - i_amount 
   WHERE owner = i_payer;
  RETURN 'OK';
END;
$$ LANGUAGE plpgsql;

Here are a few examples of using this function, assuming you haven't executed the previously proposed UPDATE statements yet:

postgres=# SELECT * FROM accounts;
 owner | balance 
-------+---------
 Bob   |     100
 Mary  |     200
(2 rows)

postgres=# SELECT * FROM transfer('Bob','Mary',14.00);
 transfer 
----------
 OK
(1 row)

postgres=# SELECT * FROM accounts;
 owner | balance 
-------+---------
 Mary  |  214.00
 Bob   |   86.00
(2 rows)

Your application would need to check the return code and decide how to handle these errors. As long as it was written to reject any unexpected value, you could extend this function to do more checking, such as minimum transferrable amount, and be sure it would be prevented. There are three errors this can return:

postgres=# SELECT * FROM transfer('Fred','Mary',14.00);
        transfer         
-------------------------
 Payer account not found
(1 row)

postgres=# SELECT * FROM transfer('Bob','Fred',14.00);
         transfer         
--------------------------
 Recipient does not exist
(1 row)

postgres=# SELECT * FROM transfer('Bob','Mary',500.00);
     transfer     
------------------
 Not enough funds
(1 row)

For these checks to always work, you would need to make all transfer operations go through the function, rather than manually changing the values with SQL statements.

About this book's code examples

The sample output shown here has been created with PostgreSQL's psql utility, usually running on a Linux system. Most of the code will work the same way if you are using a GUI utility like pgAdmin3 to access the server instead. When you see lines like this:

postgres=# SELECT 1;

The postgres=# part is the prompt shown by the psql command.

Examples in this book have been tested using PostgreSQL 9.2. They will probably work on PostgreSQL version 8.3 and later. There have not been many major changes to how server programming happens in the last few versions of PostgreSQL. The syntax has become stricter over time to reduce the possibility of mistakes in server programming code. Due to the nature of those changes, most code from newer versions will still run on the older ones, unless it uses very new features. However, the older code can easily fail to run due to one of the newly-enforced restrictions.

Switching to the expanded display

When using the psql utility to execute a query, PostgreSQL normally outputs the result using vertically aligned columns:

$ psql -c "SELECT 1 AS test"
 test 
------
    1
(1 row)

$ psql
psql (9.2.1)
Type "help" for help.

postgres=# SELECT 1 AS test;
 test 
------
    1
(1 row)

You can tell when you're seeing a regular output because it will end up showing the number of rows.

This type of output is hard to fit into the text of a book like this. It's easier to print the output from what the program calls the expanded display, which breaks each column into a separate line. You can switch to expanded using either the -x command-line switch, or by sending \x to the psql program. Here is an example of using each:

$ psql -x -c "SELECT 1 AS test"
-[ RECORD 1 ]
test | 1

$ psql
psql (9.2.1)
Type "help" for help.

postgres=# \x
Expanded display is on.
postgres=# SELECT 1 AS test;
-[ RECORD 1 ]
test | 1

Notice how the expanded output doesn't show the row count, and it numbers each output row. To save space, not all of the examples in the book will show the expanded output being turned on. You can normally tell which type you're seeing by differences like this, whether you're seeing rows or RECORD. The expanded mode will be normally preferred when the output of the query is too wide to fit into the available width of the book.

Moving beyond simple functions

Server programming can mean a few different things. Server programming is not just writing server functions. There are many other things you can do in the server which can be considered programming.

Data comparisons using operators

For more complex tasks you can define your own types, operators, and casts from one type to another, letting you actually compare apples and oranges.

As shown in the next example, you can define the type, fruit_qty, for fruit-with-quantity and then teach PostgreSQL to compare apples and oranges, say to make one orange to be worth 1.5 apples and convert apples to oranges:

postgres=# CREATE TYPE FRUIT_QTY as (name text, qty int);

postgres=# SELECT '("APPLE", 3)'::FRUIT_QTY;
 fruit_quantity
----------------
 (APPLE,3)
(1 row)

CREATE FUNCTION fruit_qty_larger_than(left_fruit FRUIT_QTY,
                                      right_fruit FRUIT_QTY)
RETURNS BOOL
AS $$
BEGIN
    IF (left_fruit.name = 'APPLE' AND right_fruit.name = 'ORANGE')
    THEN
        RETURN left_fruit.qty > (1.5 * right_fruit.qty);
    END IF;
    IF (left_fruit.name = 'ORANGE' AND right_fruit.name = 'APPLE' )
    THEN
        RETURN (1.5 * left_fruit.qty) > right_fruit.qty;
    END IF;
    RETURN  left_fruit.qty > right_fruit.qty;
END;
$$
LANGUAGE plpgsql;

postgres=# SELECT fruit_qty_larger_than('("APPLE", 3)'::FRUIT_QTY,'("ORANGE", 2)'::FRUIT_QTY);
 fruit_qty_larger_than 
-----------------------
 f
(1 row)

postgres=# SELECT fruit_qty_larger_than('("APPLE", 4)'::FRUIT_QTY,'("ORANGE", 2)'::FRUIT_QTY);
 fruit_qty_larger_than 
-----------------------
 t
(1 row)

CREATE OPERATOR > (
    leftarg = FRUIT_QTY,
    rightarg = FRUIT_QTY,
    procedure = fruit_qty_larger_than,
    commutator = >
);

 postgres=# SELECT '("ORANGE", 2)'::FRUIT_QTY > '("APPLE", 2)'::FRUIT_QTY;
 ?column? 
----------
 t
(1 row)

postgres=# SELECT '("ORANGE", 2)'::FRUIT_QTY > '("APPLE", 3)'::FRUIT_QTY;
 ?column? 
----------
 f
(1 row)

Managing related data with triggers

Server programming can also mean setting up automated actions (triggers), so that some operations in the database cause some other things to happen as well. For example, you can set up a process where making an offer on some items is automatically reserved to them in the stock table.

So let's create a fruit stock table:

CREATE TABLE fruits_in_stock (
    name text PRIMARY KEY,
    in_stock integer NOT NULL,
    reserved integer NOT NULL DEFAULT 0,
    CHECK (in_stock between 0 and 1000 ),
    CHECK (reserved <= in_stock)
);

The CHECK constraints make sure that some basic rules are followed: you can't have more than 1000 fruits in stock (they'll probably go bad), you can't have negative stock, and you can't reserve more than what you have.

CREATE TABLE fruit_offer (
    offer_id serial PRIMARY KEY,
    recipient_name text,
    offer_date timestamp default current_timestamp,
    fruit_name text REFERENCES fruits_in_stock,
    offered_amount integer
);

The offer table has an ID for the offer (so you can distinguish between offers later), recipient, date, offered fruit name, and offered amount.

For automating the reservation management, you first need a TRIGGER function, which implements the management logic:

CREATE OR REPLACE FUNCTION reserve_stock_on_offer () RETURNS trigger AS $$
    BEGIN
        IF TG_OP = 'INSERT' THEN
            UPDATE fruits_in_stock
         SET reserved = reserved + NEW.offered_amount
       WHERE name = NEW.fruit_name;
  ELSIF TG_OP = 'UPDATE' THEN
      UPDATE fruits_in_stock
         SET reserved = reserved - OLD.offered_amount
                                     + NEW.offered_amount
       WHERE name = NEW.fruit_name;
  ELSIF TG_OP = 'DELETE' THEN
     UPDATE fruits_in_stock
        SET reserved = reserved - OLD.offered_amount
      WHERE name = OLD.fruit_name;
        END IF;
        RETURN NEW;
    END;
$$ LANGUAGE plpgsql;

You have to tell PostgreSQL to call this function each and every time the offer row is changed:

CREATE TRIGGER manage_reserve_stock_on_offer_change
AFTER INSERT OR UPDATE OR DELETE ON fruit_offer
    FOR EACH ROW EXECUTE PROCEDURE reserve_stock_on_offer();

After this we are ready to test the functionality. First, we will add some fruit to our stock:

INSERT INTO fruits_in_stock(name,in_stock)

Then, we check that stock (this is using the expanded display):

postgres=# \x
Expanded display is on.
postgres=# SELECT * FROM fruits_in_stock;
-[ RECORD 1 ]----
name     | APPLE
in_stock | 500
reserved | 0
-[ RECORD 2 ]----
name     | ORANGE
in_stock | 500
reserved | 0

Next, let's make an offer of 100 apples to Bob:

postgres=# INSERT INTO fruit_offer(recipient_name,fruit_name,offered_amount) VALUES('Bob','APPLE',100);
INSERT 0 1
postgres=# SELECT * FROM fruit_offer;
-[ RECORD 1 ]--+---------------------------
offer_id       | 1
recipient_name | Bob
offer_date     | 2013-01-25 15:21:15.281579
fruit_name     | APPLE
offered_amount | 100

postgres=# SELECT * FROM fruits_in_stock;
-[ RECORD 1 ]----
name     | ORANGE
in_stock | 500
reserved | 0
-[ RECORD 2 ]----
name     | APPLE
in_stock | 500
reserved | 100

On checking the stock we see that indeed 100 apples are reserved:

postgres=# SELECT * FROM fruits_in_stock;
-[ RECORD 1 ]----
name     | ORANGE
in_stock | 500
reserved | 0
-[ RECORD 2 ]----
name     | APPLE
in_stock | 500
reserved | 100

If we change the offered amount, the reservation follows:

postgres=# UPDATE fruit_offer SET offered_amount = 115 WHERE offer_id = 1;
UPDATE 1
postgres=# SELECT * FROM fruits_in_stock;
-[ RECORD 1 ]----
name     | ORANGE
in_stock | 500
reserved | 0
-[ RECORD 2 ]----
name     | APPLE
in_stock | 500
reserved | 115

We also get some extra benefits. First, because of the constraint on the stock table, you can't sell the reserved apples:

postgres=# UPDATE fruits_in_stock SET in_stock = 100 WHERE name = 'APPLE';
ERROR:  new row for relation "fruits_in_stock" violates check constraint "fruits_in_stock_check"
DETAIL:  Failing row contains (APPLE, 100, 115).

More interestingly, you also can't reserve more than you have, even though the constraints are on another table:

postgres=# UPDATE fruit_offer SET offered_amount = 1100 WHERE offer_id = 1;
ERROR:  new row for relation "fruits_in_stock" violates check constraint "fruits_in_stock_check"
DETAIL:  Failing row contains (APPLE, 500, 1100).
CONTEXT:  SQL statement "UPDATE fruits_in_stock
       SET reserved = reserved - OLD.offered_amount
                                     + NEW.offered_amount
     WHERE name = NEW.fruit_name"
PL/pgSQL function reserve_stock_on_offer() line 8 at SQL statement

When you finally delete the offer, the reservation is released:

postgres=# DELETE FROM fruit_offer WHERE offer_id = 1;
DELETE 1
postgres=# SELECT * FROM fruits_in_stock;
-[ RECORD 1 ]----
name     | ORANGE
in_stock | 500
reserved | 0
-[ RECORD 2 ]----
name     | APPLE
in_stock | 500
reserved | 0

In a real system, you probably would archive the old offer before deleting it.

Auditing changes

If you need to know who did what to the data and when it was done, one way to do that is to log every action that is performed on an important table.

There are at least two equally valid ways of doing the auditing:

Use auditing triggers
Allow tables to be accessed only through functions, and do the auditing inside these functions

Here, we will take a look at minimal examples of both the approaches.

First, let's create the tables:

CREATE TABLE salaries(
    emp_name text PRIMARY KEY,
    salary integer NOT NULL
);

CREATE TABLE salary_change_log(
    changed_by text DEFAULT CURRENT_USER,
    changed_at timestamp DEFAULT CURRENT_TIMESTAMP,
    salary_op text,
    emp_name text,
    old_salary integer,
    new_salary integer
);
REVOKE ALL ON salary_change_log FROM PUBLIC;
GRANT ALL ON salary_change_log TO managers;

You don't generally want your users to be able to change audit logs, so grant only the managers the right to access these. If you plan to let users access the salary table directly, you should put a trigger on it for auditing:

CREATE OR REPLACE FUNCTION log_salary_change () RETURNS trigger AS $$
    BEGIN
        IF TG_OP = 'INSERT' THEN
      INSERT INTO salary_change_log(salary_op,emp_name,new_salary)
     VALUES (TG_OP,NEW.emp_name,NEW.salary);
  ELSIF TG_OP = 'UPDATE' THEN        INSERT INTO salary_change_log(salary_op,emp_name,old_salary,new_salary)
      VALUES (TG_OP,NEW.emp_name,OLD.salary,NEW.salary);
  ELSIF TG_OP = 'DELETE' THEN
      INSERT INTO salary_change_log(salary_op,emp_name,old_salary)
      VALUES (TG_OP,NEW.emp_name,OLD.salary);
        END IF;
        RETURN NEW;
    END;
$$ LANGUAGE plpgsql SECURITY DEFINER;

CREATE TRIGGER audit_salary_change
AFTER INSERT OR UPDATE OR DELETE ON salaries
    FOR EACH ROW EXECUTE PROCEDURE log_salary_change ();

Now, let's test out some salary management:

postgres=# INSERT INTO salaries values('Bob',1000);
INSERT 0 1
postgres=# UPDATE salaries set salary = 1100 where emp_name = 'Bob';
UPDATE 1
postgres=# INSERT INTO salaries values('Mary',1000);
INSERT 0 1
postgres=# UPDATE salaries set salary = salary + 200;
UPDATE 2
postgres=# SELECT * FROM salaries;
-[ RECORD 1 ]--
emp_name | Bob
salary   | 1300
-[ RECORD 2 ]--
emp_name | Mary
salary   | 1200

Each one of those changes is saved into the salary change log table for auditing purposes:

postgres=# SELECT * FROM salary_change_log;
-[ RECORD 1 ]--------------------------
changed_by | frank
changed_at | 2012-01-25 15:44:43.311299
salary_op  | INSERT
emp_name   | Bob
old_salary | 
new_salary | 1000
-[ RECORD 2 ]--------------------------
changed_by | frank
changed_at | 2012-01-25 15:44:43.313405
salary_op  | UPDATE
emp_name   | Bob
old_salary | 1000
new_salary | 1100
-[ RECORD 3 ]--------------------------
changed_by | frank
changed_at | 2012-01-25 15:44:43.314208
salary_op  | INSERT
emp_name   | Mary
old_salary | 
new_salary | 1000
-[ RECORD 4 ]--------------------------
changed_by | frank
changed_at | 2012-01-25 15:44:43.314903
salary_op  | UPDATE
emp_name   | Bob
old_salary | 1100
new_salary | 1300
-[ RECORD 5 ]--------------------------
changed_by | frank
changed_at | 2012-01-25 15:44:43.314903
salary_op  | UPDATE
emp_name   | Mary
old_salary | 1000new_salary | 1200

On the other hand, you may not want anybody to have direct access to the salary table, in which case you can perform the following:

REVOKE ALL ON salaries FROM PUBLIC;

Also, give users access to only two functions: the first is for any user looking at salaries and the other is for changing salaries, which is available only to managers.

The functions themselves will have all the access to underlying tables because they are declared as SECURITY DEFINER, which means they run with the privileges of the user who created them.

The salary lookup function will look like the following:

CREATE OR REPLACE FUNCTION get_salary(text)
RETURNS integer
AS $$
    -- if you look at other people's salaries, it gets logged
    INSERT INTO salary_change_log(salary_op,emp_name,new_salary)
    SELECT 'SELECT',emp_name,salary
      FROM salaries
     WHERE upper(emp_name) = upper($1)
       AND upper(emp_name) != upper(CURRENT_USER); – don't log select of own salary
    -- return the requested salary
    SELECT salary FROM salaries WHERE upper(emp_name) = upper($1);
$$ LANGUAGE SQL SECURITY DEFINER;

Notice that we implemented a "soft security" approach, where you can look up for other people's salaries, but you have to do it responsibly, that is, only when you need to as your manager will know that you have checked.

The set_salary() function abstracts away the need to check if the user exists; if the user does not, it is created. Setting someone's salary to 0 will remove him from the salary table. Thus, the interface is much simplified and the client application of these functions needs to know and do less:

CREATE OR REPLACE FUNCTION set_salary(i_emp_name text, i_salary int)
RETURNS TEXT AS $$
DECLARE
    old_salary integer;
BEGIN
    SELECT salary INTO old_salary
      FROM salaries
     WHERE upper(emp_name) = upper(i_emp_name);
    IF NOT FOUND THEN
        INSERT INTO salaries VALUES(i_emp_name, i_salary);
  INSERT INTO salary_change_log(salary_op,emp_name,new_salary)
      VALUES ('INSERT',i_emp_name,i_salary);
        RETURN 'INSERTED USER ' || i_emp_name;
    ELSIF i_salary > 0 THEN
        UPDATE salaries
     SET salary = i_salary
   WHERE upper(emp_name) = upper(i_emp_name);
  INSERT INTO salary_change_log
                 (salary_op,emp_name,old_salary,new_salary)
      VALUES ('UPDATE',i_emp_name,old_salary,i_salary);
        RETURN 'UPDATED USER ' || i_emp_name;
    ELSE -- salary set to 0
        DELETE FROM salaries WHERE upper(emp_name) = upper(i_emp_name);
  INSERT INTO salary_change_log(salary_op,emp_name,old_salary)
      VALUES ('DELETE',i_emp_name,old_salary);
        RETURN 'DELETED USER ' || i_emp_name;
    END IF;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;

Now, drop the audit trigger (or the changes will be logged twice) and test the new functionality:

postgres=# DROP TRIGGER audit_salary_change ON salaries;
DROP TRIGGER
postgres=# 
postgres=# SELECT set_salary('Fred',750);
-[ RECORD 1 ]------------------
set_salary | INSERTED USER Fred

postgres=# SELECT set_salary('frank',100);
-[ RECORD 1 ]-------------------
set_salary | INSERTED USER frank

postgres=# SELECT * FROM salaries ;
-[ RECORD 1 ]---
emp_name | Bob
salary   | 1300
-[ RECORD 2 ]---
emp_name | Mary
salary   | 1200
-[ RECORD 3 ]---
emp_name | Fred
salary   | 750
-[ RECORD 4 ]---
emp_name | frank
salary   | 100

postgres=# SELECT set_salary('mary',0);
-[ RECORD 1 ]-----------------
set_salary | DELETED USER mary

postgres=# SELECT * FROM salaries ;
-[ RECORD 1 ]---
emp_name | Bob
salary   | 1300
-[ RECORD 2 ]---
emp_name | Fred
salary   | 750
-[ RECORD 3 ]---
emp_name | frank
salary   | 100

postgres=# SELECT * FROM salary_change_log ;
...
-[ RECORD 6 ]--------------------------
changed_by | gsmith
changed_at | 2013-01-25 15:57:49.057592
salary_op  | INSERT
emp_name   | Fred
old_salary | 
new_salary | 750
-[ RECORD 7 ]--------------------------
changed_by | gsmith
changed_at | 2013-01-25 15:57:49.062456
salary_op  | INSERT
emp_name   | frank
old_salary | 
new_salary | 100
-[ RECORD 8 ]--------------------------
changed_by | gsmith
changed_at | 2013-01-25 15:57:49.064337
salary_op  | DELETE
emp_name   | mary
old_salary | 1200
new_salary |

Data cleaning

We notice that employee names don't have consistent cases. It would be easy to enforce consistency by adding a constraint:

CHECK (emp_name = upper(emp_name))

However, it is even better to just make sure that it is stored as uppercase, and the simplest way to do it is by using trigger:

CREATE OR REPLACE FUNCTION uppercase_name () 
  RETURNS trigger AS $$
    BEGIN
        NEW.emp_name = upper(NEW.emp_name);
        RETURN NEW;
    END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER uppercase_emp_name
BEFORE INSERT OR UPDATE OR DELETE ON salaries
    FOR EACH ROW EXECUTE PROCEDURE uppercase_name ();

The next set_salary() call for a new employee will now insert emp_name in uppercase:

postgres=# SELECT set_salary('arnold',80);
-[ RECORD 1 ]-------------------
set_salary | INSERTED USER arnold

As the uppercasing happened inside a trigger, the function response still shows a lowercase name, but in the database it is uppercase:

postgres=# SELECT * FROM salaries ;
-[ RECORD 1 ]---
emp_name | Bob
salary   | 1300
-[ RECORD 2 ]---
emp_name | Fred
salary   | 750
-[ RECORD 3 ]---
emp_name | frank
salary   | 100
-[ RECORD 4 ]---
emp_name |  ARNOLD
salary   | 80

After fixing the existing mixed-case emp_names, we can make sure that all emp_names will be in uppercase in the future by adding a constraint:

postgres=# update salaries set emp_name = upper(emp_name) where not emp_name = upper(emp_name);
UPDATE 3
postgres=# alter table salaries add constraint emp_name_must_be_uppercasepostgres-# CHECK (emp_name = upper(emp_name));
ALTER TABLE

If this behavior is needed in more places, it would make sense to define a new type – say u_text, which is always stored as uppercase. You will learn more about this approach in the chapter about defining user types.

Custom sort orders

The last example in this chapter is about using functions for different ways of sorting.

Say we are given a task of sorting words by their vowels only, and in addition to that, make the last vowel the most significant one when sorting. While this task may seem really complicated at first, it is easy to solve with functions:

CREATE OR REPLACE FUNCTION reversed_vowels(word text) 
    RETURNS text AS $$
  vowels = [c for c in word.lower() if c in 'aeiou']
  vowels.reverse()
  return ''.join(vowels)
$$ LANGUAGE plpythonu IMMUTABLE;

postgres=# select word,reversed_vowels(word) from words order by reversed_vowels(word);
    word     | reversed_vowels
-------------+-----------------
 Abracadabra | aaaaa
 Great       | ae
 Barter      | ea
 Revolver    | eoe
(4 rows)

The best part is that you can use your new function in an index definition:

postgres=# CREATE INDEX reversed_vowels_index ON words (reversed_vowels(word));
CREATE INDEX

The system will automatically use this index whenever the function reversed_vowels(word) is used in the WHERE clause or ORDER BY.

Programming best practices

Developing application software is complicated. Some of the approaches to help manage that complexity are so popular that they've been given simple acronyms to remember them. Next, we'll introduce some of these principles and show how server programming helps make them easier to follow.

KISS – keep it simple stupid

One of the main techniques to successful programming is writing simple code. That is, writing code that you can easily understand three years from now, and that others can understand as well. It is not always achievable, but it almost always makes sense to write your code in the simplest way possible. You may rewrite parts of it later for various reasons such as speed, code compactness, to show off how clever you are, and so on. But always write the code first in a simple way, so you can absolutely be sure that it does what you want. Not only do you get working on code fast, you also have something to compare to when you try more advanced ways to do the same thing.

And remember, debugging is harder than writing code; so if you write the code in the most complex way you can, you will have a really hard time debugging it.

It is often easier to write a set returning function instead of a complex query. Yes, it will probably run slower than the same thing implemented as a single complex query due to the fact that the optimizer can do very little to code written as functions, but the speed may be sufficient for your needs. If more speed is needed, it's very likely to refactor the code piece by piece, joining parts of the function into larger queries where the optimizer has a better chance of discovering better query plans until the performance is acceptable again.

Remember that for most of the times, you don't need the absolutely fastest code. For your clients or bosses, the best code is the one that does the job well and arrives on time.

DRY – don't repeat yourself

This one means to try to implement any piece of business logic just once, and put the code for doing it in the right place.

It may sometimes be hard, for example you do want to do some checking of your web forms in the browser, but still do the final check in the database. But as a general guideline it is very much valid.

Server programming helps a lot here. If your data manipulation code is in the database near the data, all the data users have easy access to it, and you will not need to manage a similar code in a C++ Windows program, two PHP websites, and a bunch of Python scripts doing nightly management tasks. If any of them needs to do this thing to a customer's table, they just call:

SELECT * FROM  do_this_thing_to_customers(arg1, arg2, arg3);

And that's it!

If the logic behind the function needs changing, you just change the function with no downtime and no complicated orchestration of pushing database query updates to several clients. Once the function is changed in the database, it is changed for all users.

YAGNI – you ain't gonna need it

In other words, don't do more than you absolutely need to.

If you have a creepy feeling that your client is not yet well aware of what the final database will look like or what it will do, it's helpful to resist the urge to design "everything" into the database. A much better way is to do the minimal implementation that satisfies the current spec, but do it with extensibility in mind. It is much easier to "paint yourself into a corner" when implementing a big spec with large imaginary parts.

If you organize your access to the database through functions, it is often possible to do even large rewrites of business logic without touching the frontend application code. Your application still does SELECT * FROM do_this_thing_to_customers(arg1, arg2, arg3) even after you have rewritten the function five times and changed the whole table structure twice.

SOA – service-oriented architecture

Usually when you hear the acronym SOA, it comes from Enterprise Software people selling you a complex set of SOAP services. But the essence of the SOA is organizing your software platform as a set of services that clients and other services call for performing certain well-defined atomic tasks, such as:

Checking a user's password and credentials
Presenting him/her with a list of his/her favorite websites
Selling him/her a new red dog collar with complementary membership in the red-collared dog club

These services can be implemented as SOAP calls with corresponding WSDL definitions and Java servers with servlet containers, and complex management infrastructure. They can also be a set of PostgreSQL functions, taking a set of arguments and returning a set of values. If arguments or return values are complex, they can be passed as XML or JSON, but often a simple set of standard PostgreSQL data types is enough. In Chapter 9, Scaling Your Database with PL/Proxy, we will learn how to make such PostgreSQL-based SOA service infinitely scalable.

Type extensibility

Some of the preceding techniques are available in other databases, but PostgreSQL's extensibility does not stop here. In PostgreSQL, you can just write User-defined functions (UDFs) in any of the most popular scripting languages. You can also define your own types, not just domains, which are standard types with some extra constraints attached, but new full-fledged types too.

For example, a Dutch company MGRID has developed value with unit set of data types, so that you can divide 10 km by 0.2 hour and get the result in 50 km/h. Of course, you can also cast the same result to meters per second or any other unit of speed. And yes, you can get this as a fraction of c—the speed of light.

This kind of functionality needs both the types themselves and overloaded operands, which know that if you divide distance by time then the result is speed. You will also need user-defined casts, which are automatically- or manually-invoked conversion functions between types.

MGRID developed this for use in medical applications where the cost of error can be high—the difference between 10 ml and 10 cc can be vital. But using a similar system could also have averted many other disasters, where using wrong units has ended with producing bad computation results. If the unit is always there together with the amount, the possibility for these kinds of errors is very much diminished. You can also add your own index methods if you have some programming skills and your problem domain is not well served by the existing indexes. There is already a respectable set of index types included in the core PostgreSQL, as well as several others which are developed outside the core.

The latest index method which became officially included in PostgreSQL is KNN (K Nearest Neighbor)—a clever index, which can return K rows ordered by their distance from the desired search target. One use of KNN is in fuzzy text search, where this can be used for ranking full-text search results by how well they match the search terms. Before KNN, this kind of thing was done by querying all rows which matched even a little, and then sorting all these by the distance function and returning K top rows as the last step.

If done using KNN index, the index access can start returning the rows in the desired order; so a simple LIMIT K function will return you the K top matches.

The KNN index can also be used for real distances, for example answering the request "Give me the 10 nearest pizza places to central station."

As you see, index types are separate from the data types they index. As another example, the same GIN (General Inverted Index) can be used for full-text search (together with stemmers, thesauri, and other text processing stuff) as well as indexing elements of integer arrays.

On caching

Yet another place where server-side programming can be used is for caching values, which are expensive to compute. The basic pattern here is:

Check if the value is cached.
If not or the value is too old, compute and cache it.
Return the cached value.

For example, calculating sales for a company is the perfect item to cache. Perhaps, a large retail company has 1,000 stores with potentially millions of individual sales transactions per day. If the corporate headquarters is looking for sales trends, it is much more efficient if the daily sales numbers were precalculated at the store level instead of summing up millions of daily transactions.

If the value is simple, like looking up a user's information from a single table based on the user ID, you don't need to do anything. The value becomes cached in PostgreSQL's internal page cache, and all lookups to it are so fast that even on a very fast network most of the time spent doing the lookups are in the network, not in the actual lookup. In such a case, getting data from a PostgreSQL database is as fast as getting it from any other in-memory cache (like memcached) but without any extra overhead in managing the cache.

Another use-case of caching is implementing materialized views. These are views which are precomputed only when needed, not each time one selects from that view. Some SQL databases have materialized views as a separate database object, but in PostgreSQL you have to do it all yourself, using other database features for automating the whole process.

Wrap up – why program in the server?

The main advantages of doing most data manipulation code server-side are the following.

Performance

Doing the computation near data is almost always a performance win, as the latencies for getting the data are minimal. In a typical data-intensive computation, most of the time tends to be spent in getting the data. Therefore, making data access inside the computation faster is the best way to make the whole thing fast. On my laptop it takes 2.2 ms to query one random row from a 1,00,000 row database into the client, but it takes only 0.12 ms to get the data inside the database. This is 20 times faster and this is inside the same machine over Unix sockets. The difference can be bigger if there is a network connection between client and server.

A small real-word story:

A friend of mine was called to help a large company (I'm sure you all know it, though I can't tell you which one) to try to make its e-mail sending application faster. They had implemented their e-mail generation system with all the latest Java EE technologies, first getting the data from the database, passing the data around between services, and serializing and de-serializing it several times before finally doing XSLT transform on the data to produce the e-mail text. The end result being that it produced only a few hundred e-mails per second and they were falling behind with their responses.

When he rewrote the process to use a PL/Perl function inside the database to format the data and the query returned already fully-formatted e-mails, it suddenly started spewing out tens of thousands of e-mails per second, and they had to add a second copy of sent mail to actually be able to send them out.

Ease of maintenance

If all data manipulation code is in a database, either as database functions or views, the actual upgrade process becomes very easy. All that is needed is running a DDL script that redefines the functions and all the clients automatically use the new code with no downtime, and no complicated coordination between several frontend systems and teams.

Simple ways to tighten security

If all access for some possibly insecure servers goes through functions, the database user of these servers use can be granted only the access to the needed functions and nothing else. They can't see the table data or even the fact that these tables exist. So even if that server becomes compromised, all it can do is continue to call the same functions. Also, there is no possibility to steal passwords, e-mails, or other sensitive information by issuing its own queries like SELECT * FROM users; and getting all the data there is in the database.

And the most important thing, programming in server is fun!

Summary

Programming inside the database server is not always the first thing that comes to mind to many developers, but it's unique placement inside the application stack gives it some powerful advantages. Your application can be faster, more secure, and more maintainable by pushing your logic into the database. With server-side programming in PostgreSQL, you can:

Secure your data using functions
Audit access to your data using triggers
Enrich your data using custom data types
Analyze your data using custom operators

And this is just the very start of what you can do inside PostgreSQL. Throughout the rest of this book, you will learn about many other ways to write powerful applications by programming inside PostgreSQL.

Great book covering the basics of putting your business logic inside PostgreSQL.