Using Unrestricted Languages

Exclusive offer: get 50% off this eBook here
PostgreSQL Server Programming

PostgreSQL Server Programming — Save 50%

Extend PostgreSQL and integrate the database layer into your development framework with this book and ebook

$29.99    $15.00
by Hannu Krosing Jim Mlodgenski Kirk Roybal | August 2013 | Open Source

The article, Using Unrestricted Languages, talks about writing functions in languages other than the built-in PL/pgSQL. This article by Hannu Krosing, Jim Mlodgenski, and Kirk Roybal the authors of PostgreSQL Server Programming, provides a gist of writing server-side code in languages other than PL/pgSQL.

You may have noticed that some of the PLs in PostgreSQL can be declared as untrusted. They all end in letter u to remind you that they are untrusted each time you use them to create a function.

This untrustedness brings up many questions:

  • Does being untrusted mean that such languages are somehow inferior to trusted ones?
  • Can I still write an important function in an untrusted language?
  • Will they silently eat my data and corrupt the database?

The answers are no, yes, and maybe respectively. Let's discuss these questions in order.

(For more resources related to this topic, see here.)

Are untrusted languages inferior to trusted ones?

No, on the contrary, these languages are untrusted in the same way that a sharp knife is untrusted and should not be trusted to very small children, at least not without adult supervision. They have extra powers that ordinary SQL or even the trusted languages (such as PL/pgSQL) and trusted variants of the same language (PL/Perl versus PL/Perlu) don't have.

You can use the untrusted languages to directly read and write on the server's disks, and you can use it to open sockets and make Internet queries to the outside world. You can even send arbitrary signals to any process running on the database host. Generally, you can do anything the native language of the PL can do.

However, you probably should not trust arbitrary database users to have the right to define functions in these languages. Always think twice before giving all privileges on some untrusted language to a user or group by using the *u languages for important functions.

Can you use the untrusted languages for important functions? Absolutely. Sometimes, it may be the only way to accomplish some tasks from inside the server. Performing simple queries and computations should do nothing harmful to your database, and neither should connecting to the external world for sending e-mails, fetching web pages, or doing SOAP requests. They may cause delays and even queries that get stuck, but these can usually be dealt with by setting an upper limit as to how long a query can run by using an appropriate statement time-out value. Setting a reasonable statement time-out value by default is a good practice anyway.

So, if you don't deliberately do risky things, the probability of harming the database is no bigger than using a "trusted" (also known as "restricted") variant of the language. However, if you give the language to someone who starts changing bytes on the production database "to see what happens", you probably get what you asked for.

Will untrusted languages corrupt the database?

The power to corrupt the database is definitely there, since the functions run as the system user of the database server with full access to the filesystem. So, if you blindly start writing into the data files and deleting important logs, it is very likely that your database will be corrupted.

Additional types of denial-of-service attacks are also possible such as using up all memory or opening all IP ports; but there are ways to overload the database using plain SQL as well, so that part is not much different from the trusted database access with the ability to just run arbitrary queries.

So yes, you can corrupt the database, but please don't do it on a production server. If you do, you will be sorry.

Why untrusted?

PostgreSQL's ability to use an untrusted language is a powerful way to perform some nontraditional things from database functions. Creating these functions in a PL is an order of magnitude smaller task than writing an extension function in C. For example, a function to look up a hostname for an IP address is only a few lines in PL/Pythonu:

CREATE FUNCTION gethostbyname(hostname text) RETURNS inet AS $$ import socket return socket.gethostbyname(hostname) $$ LANGUAGE plpythonu SECURITY DEFINER;

You can test it immediately after creating the function by using psql:

hannu=# select gethostbyname('www.postgresql.org'); gethostbyname ---------------- 98.129.198.126 (1 row)

Creating the same function in the most untrusted language, C, involves writing tens of lines of boilerplate code, worrying about memory leaks, and all the other problems coming from writing code in a low-level language. I recommend prototyping in some PL language if possible, and in an untrusted language if the function needs something that the restricted languages do not offer.

Why PL/Python?

All of these tasks could be done equally well using PL/Perlu or PL/Tclu; I chose PL/Pythonu mainly because Python is the language I am most comfortable with. This also translates to having written some PL/Python code, which I plan to discuss and share with you in this article.

Quick introduction to PL/Python

PL/pgSQL is a language unique to PostgreSQL and was designed to add blocks of computation and SQL inside the database. While it has grown in its breath of functionality, it still lacks the completeness of syntax of a full programming language. PL/Python allows your database functions to be written in Python with all the depth and maturity of writing a Python code outside the database.

A minimal PL/Python function

Let's start from the very beginning (again):

CREATE FUNCTION hello(name text) RETURNS text AS $$ return 'hello %s !' % name $$ LANGUAGE plpythonu;

Here, we see that creating the function starts by defining it as any other PostgreSQL function with a RETURNS definition of a text field:

CREATE FUNCTION hello(name text) RETURNS text

The difference from what we have seen before is that the language part is specifying plpythonu (the language ID for PL/Pythonu language):

$$ LANGUAGE plpythonu;

Inside the function body it is very much a normal python function, returning a value obtained by the name passed as an argument formatted into a string 'hello %s !' using the standard Python formatting operator %:

return 'hello %s !' % name

Finally, let's test how this works:

hannu=# select hello('world'); hello --------------- hello world ! (1 row)

And yes, it returns exactly what is expected!

Data type conversions

The first and last things happening when a PL function is called by PostgreSQL are converting argument values between the PostgreSQL and PL types. The PostgreSQL types need to be converted to the PL types on entering the function, and then the return value needs to be converted back into the PostgreSQL type.

Except for PL/pgSQL, which uses PostgreSQL's own native types in computations, the PLs are based on existing languages with their own understanding of what types (integer, string, date, …) are, how they should behave, and how they are represented internally. They are mostly similar to PostgreSQL's understanding but quite often are not exactly the same. PL/Python converts data from PostgreSQL type to Python types as shown in the following table:

PostgreSQL

Python 2

Python 3

Comments

int2, int4

int

int

 

int8

long

int

 

real, double, numeric

float

float

This may lose precision for numeric values.

bytea

str

bytes

No encoding conversion is done, nor should any encoding be assumed.

text, char(), varchar(), and other text types

str

str

On Python 2, the string will be in server encoding.

On Python 3, it is an unicode string.

All other types

str

str

PostgreSQL's type output function is used to convert to this string.

Inside the function, all computation is done using Python types and the return value is converted back to PostgreSQL using the following rules (this is a direct quote from official PL/Python documentation at http://www.postgresql.org/docs/current/static/plpython-data.html):

  • When the PostgreSQL return type is Boolean, the return value will be evaluated for truth according to the Python rules. That is, 0 and empty string are false, but notably f is true.
  • When the PostgreSQL return type is bytea, the return value will be converted to a string (Python 2) or bytes (Python 3) using the respective Python built-ins, with the result being converted bytea.
  • For all other PostgreSQL return types, the returned Python value is converted to a string using Python's built-in str, and the result is passed to the input function of the PostgreSQL data type.

Strings in Python 2 are required to be in the PostgreSQL server encoding when they are passed to PostgreSQL. Strings that are not valid in the current server encoding will raise an error; but not all encoding mismatches can be detected, so garbage data can still result when this is not done correctly. Unicode strings are converted to the correct encoding automatically, so it can be safer and more convenient to use those. In Python 3, all strings are Unicode strings.

In other words, anything but 0, False, and an empty sequence, including empty string ' ' or dictionary becomes PostgreSQL false.

One notable exception to this is that the check for None is done before any other conversions and even for Booleans, None is always converted to NULL and not to the Boolean value false.

For the bytea type, the PostgreSQL byte array, the conversion from Python's string representation, is an exact copy with no encoding or other conversions applied.

Writing simple functions in PL/Python

Writing functions in PL/Python is not much different in principle from writing functions in PL/pgSQL. You still have the exact same syntax around the function body in $$, and the argument name, types, and returns all mean the same thing regardless of the exact PL/language used.

A simple function

So a simple add_one() function in PL/Python looks like this:

CREATE FUNCTION add_one(i int) RETURNS int AS $$ return i + 1; $$ LANGUAGE plpythonu;

It can't get much simpler than that, can it?

What you see here is that the PL/Python arguments are passed to the Python code after converting them to appropriate types, and the result is passed back and converted to the appropriate PostgreSQL type for the return value.

Functions returning a record

To return a record from a Python function, you can use:

  • A sequence or list of values in the same order as the fields in the return record
  • A dictionary with keys matching the fields in the return record
  • A class or type instance with attributes matching the fields in the return record

Here are samples of the three ways to return a record.

First, using an instance:

CREATE OR REPLACE FUNCTION userinfo( INOUT username name, OUT user_id oid, OUT is_superuser boolean) AS $$ class PGUser: def __init__(self,username,user_id,is_superuser): self.username = username self.user_id = user_id self.is_superuser = is_superuser u = plpy.execute("""\ select usename,usesysid,usesuper from pg_user where usename = '%s'""" % username)[0] user = PGUser(u['usename'], u['usesysid'], u['usesuper']) return user $$ LANGUAGE plpythonu;

Then, a little simpler one using a dictionary:

CREATE OR REPLACE FUNCTION userinfo( INOUT username name, OUT user_id oid, OUT is_superuser boolean) AS $$ u = plpy.execute("""\ select usename,usesysid,usesuper from pg_user where usename = '%s'""" % username)[0] return {'username':u['usename'], 'user_id':u['usesysid'], 'is_ superuser':u['usesuper']} $$ LANGUAGE plpythonu;

Finally, using a tuple:

CREATE OR REPLACE FUNCTION userinfo( INOUT username name, OUT user_id oid, OUT is_superuser boolean) AS $$ u = plpy.execute("""\ select usename,usesysid,usesuper from pg_user where usename = '%s'""" % username)[0] return (u['usename'], u['usesysid'], u['usesuper']) $$ LANGUAGE plpythonu;

Notice [0] at the end of u = plpy.execute(...)[0] in all the examples. It is there to extract the first row of the result, as even for one-row results plpy.execute still returns a list of results.

Danger of SQL injection!

As we have neither executed a prepare() method and executed a execute() method with arguments after it, nor have we used the plpy.quote_literal() method (both techniques are discussed later) to safely quote the username before merging it into the query, we are open to a security flaw known as SQL injection. So, make sure that you only let trusted users call this function or supply the username argument.

Calling the function defined via any of these three CREATE commands will look exactly the same:

hannu=# select * from userinfo('postgres'); username | user_id | is_superuser ----------+---------+-------------- postgres | 10 | t (1 row)

It usually does not make sense to declare a class inside a function just to return a record value. This possibility is included mostly for cases where you already have a suitable class with a set of attributes matching the ones the function returns.

Table functions

When returning a set from a PL/Python functions, you have three options:

  • Return a list or any other sequence of return type
  • Return an iterator or generator
  • yield the return values from a loop

Here, we have three ways to generate all even numbers up to the argument value using these different styles.

First, returning a list of integers:

CREATE FUNCTION even_numbers_from_list(up_to int) RETURNS SETOF int AS $$ return range(0,up_to,2) $$ LANGUAGE plpythonu;

The list here is returned by a built-in Python function called range, which returns a result of all even numbers below the argument. This gets returned as a table of integers, one integer per row from the PostgreSQL function. If the RETURNS clause of the function definition would say int[] instead of SETOF int, the same function would return a single number of even integers as a PostgreSQL array.

The next function returns a similar result using a generator and returning both the even number and the odd one following it. Also, notice the different PostgreSQL syntax RETURNS TABLE(...) used this time for defining the return set:

CREATE FUNCTION even_numbers_from_generator(up_to int) RETURNS TABLE (even int, odd int) AS $$ return ((i,i+1) for i in xrange(0,up_to,2)) $$ LANGUAGE plpythonu;

The generator is constructed using a generator expression (x for x in <seq>). Finally, the function is defined using a generator using and explicit yield syntax, and yet another PostgreSQL syntax is used for returning SETOF RECORD with the record structure defined this time by OUT parameters:

CREATE FUNCTION even_numbers_with_yield(up_to int, OUT even int, OUT odd int) RETURNS SETOF RECORD AS $$ for i in xrange(0,up_to,2): yield i, i+1 $$ LANGUAGE plpythonu;

The important part here is that you can use any of the preceding ways to define a PL/Python set returning function, and they all work the same. Also, you are free to return a mixture of different types for each row of the set:

CREATE FUNCTION birthdates(OUT name text, OUT birthdate date) RETURNS SETOF RECORD AS $$ return ( {'name': 'bob', 'birthdate': '1980-10-10'}, {'name': 'mary', 'birthdate': '1983-02-17'}, ['jill', '2010-01-15'], ) $$ LANGUAGE plpythonu;

This yields result as follows:

hannu=# select * from birthdates(); name | birthdate ------+------------ bob | 1980-10-10 mary | 1983-02-17 jill | 2010-01-15 (3 rows)

As you see, the data returning a part of PL/Pythonu is much more flexible than returning data from a function written in PL/pgSQL.

Running queries in the database

If you have ever accessed a database in Python, you know that most database adapters conform to a somewhat loose standard called Python Database API Specification v2.0 or DBAPI 2 for short.

The first thing you need to know about database access in PL/Python is that in-database queries do not follow this API.

Running simple queries

Instead of using the standard API, there are just three functions for doing all database access. There are two variants: plpy.execute() for running a query, and plpy.prepare() for turning query text into a query plan or a prepared query.

The simplest way to do a query is with:

res = plpy.execute(<query text>, [<row count>])

This takes a textual query and an optional row count, and returns a result object, which emulates a list of dictionaries, one dictionary per row.

As an example, if you want to access a field 'name' of the third row of the result, you use:

res[2]['name']

The index is 2 and not 3 because Python lists are indexed starting from 0, so the first row is res[0], the second row res[1], and so on.

Using prepared queries

In an ideal world this would be all that is needed, but plpy.execute(query, cnt) has two shortcomings:

  • It does not support parameters
  • The plan for the query is not saved, requiring the query text to be parsed and run through the optimizer at each invocation

We will show a way to properly construct a query string later, but for most uses simple case parameter passing is enough. So, the execute(query, [maxrows]) call becomes a set of two statements:

plan = plpy.prepare(<query text>, <list of argument types>) res = plpy.execute(plan, <list of values>, [<row count>])For example, to query if a user 'postgres' is a superuser, use the following: plan = plpy.prepare("select usesuper from pg_user where usename = $1", ["text"]) res = plpy.execute(plan, ["postgres"]) print res[0]["usesuper"]

The first statement prepares the query, which parses the query string into a query tree, optimizes this tree to produce the best query plan available, and returns the prepared_query object. The second row uses the prepared plan to query for a specific user's superuser status.

The prepared plan can be used multiple times, so that you could continue to see if user bob is superuser.

res = plpy.execute(plan, ["bob"]) print res[0]["usesuper"]

Caching prepared queries

Preparing the query can be quite an expensive step, especially for more complex queries where the optimizer has to choose from a rather large set of possible plans; so, it makes sense to re-use results of this step if possible.

The current implementation of PL/Python does not automatically cache query plans (prepared queries), but you can do it easily yourself.

try: plan = SD['is_super_qplan'] except: SD['is_super_qplan'] = plpy.prepare(".... plan = SD['is_super_qplan'] <the rest of the function>

The values in SD[] and GD[] only live inside a single database session, so it only makes sense to do the caching in case you have long-lived connections.

PostgreSQL Server Programming Extend PostgreSQL and integrate the database layer into your development framework with this book and ebook
Published: June 2013
eBook Price: $29.99
Book Price: $49.99
See more
Select your format and quantity:

Writing trigger functions in PL/Python

As with other PLs, PL/Pythonu can be used to write trigger functions. The declaration of a trigger function is different from an ordinary function by the return type RETURNS TRIGGER. So, a simple trigger function that just notifies the caller that it is indeed called looks like this:

CREATE OR REPLACE FUNCTION notify_on_call() RETURNS TRIGGER AS $$ plpy.notice('I was called!') $$ LANGUAGE plpythonu;

After creating this function, the trigger can be tested on a table using a trigger function:

hannu=# CREATE TABLE ttable(id int); CREATE TABLE hannu=# CREATE TRIGGER ttable_notify BEFORE INSERT ON ttable EXECUTE PROCEDURE notify_on_call(); CREATE TRIGGER hannu=# INSERT INTO ttable VALUES(1); NOTICE: I was called! CONTEXT: PL/Python function "notify_on_call" INSERT 0 1

Of course, the preceding trigger function is quite useless, as will be any trigger without knowing when and on what data change the trigger was called. All the data needed by a trigger when it is called is passed in via the trigger dictionary, called TD. In TD, you have the following values:

Key

Value

TD["event"]

The event the trigger function is called for; one of the following strings is contained as the event:

INSERT, UPDATE, DELETE, or TRUNCATE

TD["when"]

One of BEFORE, AFTER, or INSTEAD OF.

TD["level"]

ROW or STATEMENT.

TD["old"]

This is the before-command image of the row. For low-level UPDATE and DELETE triggers, this contains a dictionary for the values of the triggering row before the changes have been made by the command. It is None for other cases.

TD["new"]

This is the after-command image of the row. For low-level INSERT and UPDATE triggers, this contains a dictionary for the values of the triggering row after the changes have been made by the command. It is None for other cases.

If you are in a BEFORE or INSTEAD OF trigger, you can make changes to this dictionary and then signal PostgreSQL to use the changed tuple by returning the string MODIFY from the trigger function.

TD["name"]

The trigger name from the CREATE TRIGGER command.

TD["table_name"]

The name of the table on which the trigger occurred.

TD["table_schema"]

The schema of the table on which the trigger occurred.

TD["relid"]

The object identifier (OID) of the table on which the trigger occurred.

TD["args"]

If the CREATE TRIGGER command included arguments, they are available from TD["args"][0] to TD["args"][n-1].

In addition to doing anything you can do in ordinary PL/Python functions, such as modifying data in tables, writing to files and sockets, and sending e-mails, you can also affect the behavior of the triggering command.

If TD["when"] is ("BEFORE", "INSTEAD OF") and TD["level"] == "ROW", you can return SKIP to abort the event. Returning None or OK indicates that the row is unmodified and it is OK to continue. Returning None is also the default behavior for Python if the function does a simple return or runs to the end without a return statement; so, you don't need to do anything for this case.

In case you have modified values in the TD["new"] and you want PostgreSQL to continue with the new values, you can return MODIFY to indicate that you've modified the new row. This can be done if TD["event"] is INSERT or UPDATE, otherwise the return value is ignored.

Exploring the inputs of a trigger

The following trigger function is useful when developing triggers, so that you can easily see what the trigger function is really getting when called:

CREATE OR REPLACE FUNCTION explore_trigger() RETURNS TRIGGER AS $$ import pprint nice_data = pprint.pformat( ( ('TD["table_schema"]' , TD["table_schema"] ), ('TD["event"]' , TD["event"] ), ('TD["when"]' , TD["when"] ), ('TD["level"]' , TD["level"] ), ('TD["old"]' , TD["old"] ), ('TD["new"]' , TD["new"] ), ('TD["name"]' , TD["name"] ), ('TD["table_name"]' , TD["table_name"] ), ('TD["relid"]' , TD["relid"] ), ('TD["args"]' , TD["args"] ), ) ) plpy.notice('explore_trigger:\n' + nice_data) $$ LANGUAGE plpythonu;

This function formats all the data passed to the trigger in TD using pprint.pformat, and then sends it to the client as a standard Python info message using plpy.notify. For testing this out, we create a simple table and then put an AFTER … FOR EACH ROW … trigger using this function on that table:

CREATE TABLE test( id serial PRIMARY KEY, data text, ts timestamp DEFAULT clock_timestamp() ); CREATE TRIGGER test_explore_trigger AFTER INSERT OR UPDATE OR DELETE ON test FOR EACH ROW EXECUTE PROCEDURE explore_trigger('one', 2, null);

Now, we can explore what the trigger function actually gets:

hannu=# INSERT INTO test(id,data) VALUES(1, 'firstrowdata'); NOTICE: explore_trigger: (('TD["table_schema"]', 'public'), ('TD["event"]', 'INSERT'), ('TD["when"]', 'AFTER'), ('TD["level"]', 'ROW'), ('TD["old"]', None), ('TD["new"]', {'data': 'firstrowdata', 'id': 1, 'ts': '2013-05-13 12:04:03.676314'}), ('TD["name"]', 'test_explore_trigger'), ('TD["table_name"]', 'test'), ('TD["relid"]', '35163'), ('TD["args"]', ['one', '2', 'null'])) CONTEXT: PL/Python function "explore_trigger" INSERT 0 1

Most of this is expected and corresponds well to the table of TD dictionary values given in the previous table. What may be a little unexpected is the fact that the arguments given in the CREATE TRIGGER statement are all converted to strings, even the NULL. When developing your own triggers, either in PL/Python or any other language, it may be useful to put this trigger on the table as well to check that the inputs to the trigger are as expected. For example, it is easy to see that if you omit the FOR EACH ROW part, the TD['old'] and TD['new'] will be both empty as the trigger definition defaults to FOR EACH STATEMENT.

A log trigger

Now, we can put this knowledge to work and write a trigger that logs changes to the table to either a file or to a special log-collector process over UDP. Logging to a file is the simplest way to permanently log the changes in transactions which were rolled back. If these were logged to a log table, the ROLLBACK command would also remove the log records. This may be a crucial audit requirement for you business.

Of course, this also has the downside. You will be logging the changes that may not be permanent due to the transaction being rolled back, but this is the price you have to pay for not losing the log records.

CREATE OR REPLACE FUNCTION log_trigger() RETURNS TRIGGER AS $$ args = tuple(TD["args"]) if not SD.has_key(args): protocol = args[0] if protocol == 'udp': import socket sock = socket.socket( socket.AF_INET, socket.SOCK_DGRAM ) def logfunc(msg, addr=args[1], port=int(args[2]), sock=sock): sock.sendto(msg, (addr, port)) elif protocol == 'file': f = open(args[1], 'a+') def logfunc(msg,f=f): f.write(msg+'\n') f.flush() else: raise ValueError, 'bad logdest in CREATE TRIGGER' SD[args] = logfunc SD['env_plan'] = plpy.prepare(""" select clock_timestamp(), txid_current(), current_user, current_database()""", []) logfunc = SD[args] env_info_row = plpy.execute(SD['env_plan'])[0] import json log_msg = json.dumps( {'txid' : env_info_row['txid_current'], 'time' : env_info_row['clock_timestamp'], 'user' : env_info_row['current_user'], 'db' : env_info_row['current_database'], 'table' : '%s.%s' % (TD['table_name'], TD['table_schema']), 'event' : TD['event'], 'old' : TD['old'], 'new' : TD['new'], } ) logfunc(log_msg) $$ LANGUAGE plpythonu;

First, this trigger checks if it already has a logger function defined and cached in the function's local dictionary SD[]. As the same trigger may be used with many different log destinations, the log function is stored under the key constructed as a Python tuple from the trigger function arguments in the CREATE TRIGGER statement. We can not use the TD["args"] list directly as a key, as Python dictionary keys have to be immutable, which a list is not, but a tuple is.

If the key is not present, meaning this is the first call to this particular trigger, we have to create an appropriate log function and store it. To do this, we examine the first argument for the log destination type.

For the udp log type, we create a UDP socket for writing. Then, we define a function, passing in this socket and also the other two trigger arguments as default arguments for the function. This is the most convenient way to create a closure, and to bundle a function with some data values in Python.

For the file type, we just open this file in the append mode (a+) and also create a log function. The log function writes a message to this file and flushes the write, so the data is written to the file immediately and not some time later when the write buffer fills up. The log function created in either of these cases is stored in SD[tuple(TD["args"])].

At this point, we also prepare and save a query plan for getting other data we want to log and save this in SD['env_plan']. Now that we are done with the one-time preparations, we can proceed with the actual logging part, which is really very simple.

Next, we retrieve the logging function (logfunc = SD[args]) and get the row of the other logged data:

env_info_row = plpy.execute(SD['env_plan'])[0]

Finally, we convert all the logged data into one JSON object (log_msg = json.dumps({...})) and then use the logging function to send it to the log, logfunc(log_msg).

And that's it.

Next, let's test it out to see how it works by adding another trigger to our test table we created earlier:

CREATE TRIGGER test_audit_trigger AFTER INSERT OR UPDATE OR DELETE ON test FOR EACH ROW EXECUTE PROCEDURE log_trigger('file', '/tmp/test.json.log');

Any changes to the table done via INSERT, UPDATE, or DELETE are logged into /tmp/test.json.log. This file is initially owned by the same user running the server, usually postgres; so to look at it you need to either be that user or root user, or you have to change the permissions on the file created to allow reading.

If you want to test the UDP logging part, you just have to define another trigger with different arguments:

CREATE TRIGGER test_audit_trigger_udp AFTER INSERT OR UPDATE OR DELETE ON test FOR EACH ROW EXECUTE PROCEDURE log_trigger('udp', 'localhost', 9999);

Constructing queries

PL/Python does a good job of managing values passed to prepared query plans, but a standard PostgreSQL query plan can take an argument in a very limited number of places. Sometimes, you may want to construct whole queries, not just pass values to predefined queries. For example, you can't have an argument for a table name, or a field name.

So, how would you proceed if you want to construct a query from the function's arguments and be sure that everything is quoted properly and no SQL injection would be possible? PL/Python provides three functions to help you with proper quoting of identifiers and data just for this purpose.

The function plpy.quote_ident(name) is meant for quoting identifiers, that is, anything that names a database object or its attribute like a table, a view, a field name, or function name. It surrounds the name with double quotes and takes care of properly escaping anything inside the string which would break the quoting:

hannu=# DO LANGUAGE plpythonu $$ plpy.notice(plpy.quote_ident(r'5" \"'))
$$; NOTICE: "5"" \""" CONTEXT: PL/Python anonymous code block DO

And yes, 5" \" is a legal table or field name in PostgreSQL; you just have to always quote it if you use it in any statement.

The DO syntax creates an anonymous block inside your database session. It is a very handy way to run some procedural language code without needing to create a function.

The other two functions are for quoting literal values. The function, plpy.quote_literal(litvalue), is for quoting strings and plpy.quote_nullable(value_or_none) is for quoting a value, which may be None. Both of these functions quote strings in a similar way, by enclosing them in single quotes (str becomes 'str') and doubling any single quotes or backslashes:

hannu=# DO LANGUAGE plpythonu $$ plpy.notice(plpy.quote_literal(r" \' ")) $$; NOTICE: E' \\'' ' CONTEXT: PL/Python anonymous code block DO

The only difference between these two is that plpy.quote_nullable() can also take a value None, which will rendered as string NULL without any surrounding quotes. The argument to both of these has to be a string or a unicode string. If you want it to work with a value of any Python type, wrapping the value in str(value) usually works well.

Handling exceptions

With any bit of code, you need to make sure you handle when errors occur and your PL/Python functions are not an exception.

Before Version 9.1 of PostgreSQL, any error in an SQL query caused the surrounding transaction to be rolled back:

hannu=# DO LANGUAGE plpythonu $$ hannu$# plpy.execute('insert into ttable values(1)') hannu$# plpy.execute('fail!') hannu$# $$; ERROR: spiexceptions.SyntaxError: syntax error at or near "fail" LINE 1: fail! ^ QUERY: fail! CONTEXT: Traceback (most recent call last): PL/Python anonymous code block, line 3, in <module> plpy.execute('fail!') PL/Python anonymous code block

You can manually use the SAVEPOINT attributes to control the boundaries of the rolled-back block, at least as far back as Version 8.4 of PostgreSQL. This will reduce the amount of the transaction that is rolled back:

CREATE OR REPLACE FUNCTION syntax_error_rollback_test() RETURNS void AS $$ plpy.execute('insert into ttable values(1)') try: plpy.execute('SAVEPOINT foo;') plpy.execute('insert into ttable values(2)') plpy.execute('fail!') except: pass plpy.execute('insert into ttable values(3)') $$ LANGUAGE plpythonu; hannu=# select syntax_error_rollback_test() syntax_error_rollback_test --------------------------- (1 row)

When the SAVEPOINT foo; command is executed in PL/Python, an SQL error will not cause full "ROLLBACK;" but an equivalent of "ROLLBACK TO SAVEPOINT foo;", so only the effects of commands between SAVEPOINT and the error are rolled back:

hannu=# select * from ttable ; id ---- 1 3 (2 rows)

In Version 9.1, there are two important changes in how PostgreSQL exceptions are handled. If no SAVEPOINT or subtransaction is used, each invocation of plpy.prepare() and plpy.execute() is run in it's own subtransaction, so that an error will only rollback this subtransaction and not all of the current transaction. Since using a separate subtransactions for each database interaction involves extra costs, and you may want to control the subtransaction boundaries anyway, a new Python context manager, plpy.subtransaction(), is provided.

For an explanation of Python's context managers, refer to http://docs.python.org/library/stdtypes.html#context-manager-types so that you can use the with statement in Python 2.6 or newer to wrap a group of database interactions in one subtransaction in a more Pythonic way:

hannu=# CREATE TABLE test_ex(i int); CREATE TABLE hannu=# DO LANGUAGE plpythonu $$ hannu$# plpy.execute('insert into test_ex values(1)') hannu$# try: hannu$# with plpy.subtransaction(): hannu$# plpy.execute('insert into test_ex values(2)') hannu$# plpy.execute('fail!') hannu$# except plpy.spiexceptions.SyntaxError: hannu$# pass # silently ignore, avoid doing this in prod. code hannu$# plpy.execute('insert into test_ex values(3)') hannu$# $$; DO hannu=# select * from test_ex; i --- 1 3 (2 rows)

Atomicity in Python

While the subtransactions manage data changes in the PostgreSQL database, the variables on Python side's of the fence live their separate lives. Python does not provide even a single-statement level atomicity, as demonstrated by the following:

>>> a = 1 >>> a[1] = a = 2 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'int' object does not support item assignment >>> a 1 >>> a = a[1] = 2 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'int' object does not support item assignment >>> a 2

As you can see, it is possible that even a single multi-assignment statement can be executed only halfway through. This means that you have to be be prepared to fully manage your Python data yourself. The function, plpy.subtransaction(), won't help you in any way with managing Python variables.

Debugging PL/Python

First, let's start by stating that there is no debugger support when running functions in PL/Python; so it is a good idea to develop and debug a PL/Python function as a pure Python function as much as possible and only do the final integration in PL/Python. To help with this, you can have a similar environment in your Python development environment using the plpy module.

Just put the module in your path and do import plpy before you try running your prospective PL/Pythonu functions in an ordinary interpreter. If you use any of the plpy.execute(...) or plpy.prepare() functions, you also need to set up a database connection before using these by calling plpy.connect(<connectstring>).

Using plpy.notice() for tracking the function's progress

The debugging technology I use most often in any language is printing out intermediate values as the function progresses. If the printout rolls past too fast, you can slow it down by sleeping a second or two after each print.

In standard python, this would look like this:

def fact(x): f = 1 while (x > 0): f = f * x x = x – 1 print 'f:%d, x:%d' % (f, x) return f

It will print out all intermediate values for f and x as it runs:

>>> fact(3) f:3, x:2 f:6, x:1 f:6, x:0 6

If you try to use print in a PL/Python function, you will discover that nothing is printed. In fact, there is no single logical place to print to when running a pluggable language inside a PostgreSQL server.

The closest thing to print in PL/Python is the function plpy.notice(), which sends a PostgreSQL NOTICE to the client and also to the server log if log_min_messages is set to value notice or smaller.

CREATE FUNCTION fact(x int) RETURNS int AS $$ global x f = 1 while (x > 0): f = f * x x = x - 1 plpy.notice('f:%d, x:%d' % (f, x)) return f $$ LANGUAGE plpythonu;

Running this is much more verbose than the version with print, because each NOTICE also includes information about the CONTEXT from where the NOTICE comes:

hannu=# select fact(3); NOTICE: f:3, x:2 CONTEXT: PL/Python function "fact" NOTICE: f:6, x:1 CONTEXT: PL/Python function "fact" NOTICE: f:6, x:0 CONTEXT: PL/Python function "fact" fact ------ 6 (1 row)

PL/Pythonu function arguments are passed in as globals

If you compared the fact(x) function in Python and PL/Python you noticed an extra line at the beginning of the PL/Python function:

global x

This is needed to overcome an implementation detail that often surprises PL/Pythonu developers; the function arguments are not the function arguments in the Python sense and neither are they locals. They are passed in as variables in the function's global scope.

Using assert

Similar to ordinary Python programming, you can also use Python's assert statement to catch conditions which should not happen:

CREATE OR REPLACE FUNCTION fact(x int) RETURNS int AS $$ global x assert x>=0, "argument must be a positive integer" f = 1 while (x > 0): f = f * x x = x - 1 return f $$ LANGUAGE plpythonu;

To test this, call fact() with a negative number:

hannu=# select fact(-1); ERROR: AssertionError: argument must be a positive integer CONTEXT: Traceback (most recent call last): PL/Python function "fact", line 3, in <module> assert x>=0, "argument must be a positive integer" PL/Python function "fact"

You will get a message about AssertionError together with the location of the failing line number.

Redirecting sys.stdout and sys.stderr

If all the code you need to debug is your own, the preceding two techniques will cover most of your needs. However, what do you do in cases where you use some third party libraries which print out debug information to sys.stdout and/or sys.stderr?

Well, in this case you can replace Python's sys.stdout and sys.stdin with your own pseudo file object that stores everything written there for later retrieval. Here is a pair of functions. The first of which does the capturing of sys.stdout or uncapturing; if it is called with the argument, do_capture set to false, and the second one returns everything captured:

CREATE OR REPLACE FUNCTION capture_stdout(do_capture bool) RETURNS text AS $$ import sys if do_capture: try: sys.stdout = GD['stdout_to_notice'] except KeyError: class WriteAsNotice: def __init__(self, old_stdout): self.old_stdout = old_stdout self.printed = [] def write(self, s): self.printed.append(s) def read(self): text = ''.join(self.printed) self.printed = [] return text GD['stdout_to_notice'] = WriteAsNotice(sys.stdout) sys.stdout = GD['stdout_to_notice'] return "sys.stdout captured" else: sys.stdout = SD['stdout_to_notice'].old_stdout return "restored original sys.stdout" $$ LANGUAGE plpythonu; CREATE OR REPLACE FUNCTION read_stdout() RETURNS text AS $$ return GD['stdout_to_notice'].read() $$ LANGUAGE plpythonu;

Here is a sample session using the preceding functions:

hannu=# select capture_stdout(true); capture_stdout --------------------- sys.stdout captured (1 row) hannu=# DO LANGUAGE plpythonu $$ hannu$# print 'TESTING sys.stdout CAPTURING' hannu$# import pprint hannu$# pprint.pprint( {'a':[1,2,3], 'b':[4,5,6]} ) hannu$# $$; DO hannu=# select read_stdout(); read_stdout ---------------------------------- TESTING sys.stdout CAPTURING + {'a': [1, 2, 3], 'b': [4, 5, 6]}+ (1 row)

Thinking out of the "SQL database server" box

We'll wrap up the article on PL/Python with a couple of sample PL/Pythonu functions for doing some things you would not usually consider doing inside the database function or trigger.

Generating thumbnails when saving images

Our first example uses Python's powerful Python Imaging Library (PIL) module to generate thumbnails of uploaded photos. For ease of interfacing with various client libraries, this program takes the incoming image data as a base-64 encoded string:

CREATE FUNCTION save_image_with_thumbnail(image64 text) RETURNS int AS $$ import Image, cStringIO size = (64,64) # thumbnail size # convert base64 encoded text to binary image data raw_image_data = image64.decode('base64') # create a pseudo-file to read image from infile = cStringIO.StringIO(raw_image_data) pil_img = Image.open(infile) pil_img.thumbnail(size, Image.ANTIALIAS) # create a stream to write the thumbnail to outfile = cStringIO.StringIO() pil_img.save(outfile, 'JPEG') raw_thumbnail = outfile.getvalue() # store result into database and return row id q = plpy.prepare(''' INSERT INTO photos(image, thumbnail) VALUES ($1,$2) RETURNING id''', ('bytea', 'bytea')) res = plpy.execute(q, (raw_image_data,raw_thumbnail)) # return column id of first row return res[0]['id'] $$ LANGUAGE plpythonu;

The Python code is more or less a straight rewrite from the PIL tutorial, except that the files to read the image from and write the thumbnail image to, are replaced with Python's standard file-like StringIO objects. For all this to work, you need to have PIL installed on your database server host.

In Debian/Ubuntu, this can be done by running sudo apt.get install python-imaging. On most modern Linux distributions, an alternative is to use Python's own package distribution system by running sudo easy_install PIL.

Sending an e-mail

The next sample is a function for sending e-mails from inside a database function:

CREATE OR REPLACE FUNCTION send_email( sender text, -- sender e-mail recipients text, -- comma-separated list of recipient addresses subject text, -- email subject message text, -- text of the message smtp_server text -- SMTP server to use for sending ) RETURNS void AS $$ msg = "From: %s\r\nTo: %s\r\nSubject: %s\r\n\r\n%s" % \ (sender, recipients, subject, message) recipients_list = [r.strip() for r in recipients.split(',')] server = smtplib.SMTP(smtp_server) server.sendmail(sender_address, recipients_list, msg) server.quit() $$ LANGUAGE plpythonu;

This function formats a message (msg = ""), converts a comma-separated To: address into a list of e-mail addresses (recipients_list = [r.strip()...), connects to a SMTP server, and then passes the message to the SMTP server for delivery.

To use this function in a production system, it would probably require a bit more checking on the formats and some extra error handling in case something goes wrong. You can read more about Python's smtplib at http://docs.python.org/library/smtplib.html.

Summary

In this article, we saw that it is relatively easy to do things way beyond what a simple SQL database server is supposed to support, thanks to its pluggable language's support.

In fact, you can do almost anything in the PostgreSQL server you could do in any other application server. Hopefully, this article just scratched the surface on some of the ideas of what you can do inside a PostgreSQL server.

Resources for Article:


Further resources on this subject:


PostgreSQL Server Programming Extend PostgreSQL and integrate the database layer into your development framework with this book and ebook
Published: June 2013
eBook Price: $29.99
Book Price: $49.99
See more
Select your format and quantity:

About the Author :


Hannu Krosing

Hannu Krosing was a PostgreSQL user before it was rewritten to use SQL as its main query language in 1995. So, he has both the historic perspective of its development and almost 20 years of experience using it for solving various real-life problems.

Hannu was the first Database Administrator and Database Architect at Skype, where he invented the sharding language PL/Proxy that allows scaling the user database to work with billions of users.

Since leaving Skype at the end of 2006—about a year after it was bought up by eBay—Hannu has been working as a PostgreSQL consultant with 2ndQuadrant, the premier PostgreSQL consultancy with global reach and local presence in most of the world.

Hannu has co-authored another Packt Publishing book, PostgreSQL 9 Administration Cookbook, together with one of the main PostgreSQL developers, Simon Riggs.

Jim Mlodgenski

Jim Mlodgenski is the CTO of OpenSCG, a professional services company focused on leveraging open source technologies for strategic advantage. He was formerly the CEO of StormDB, a database cloud company focused on horizontal scalability. Prior to StormDB, Jim held deeply technical roles at Cirrus Technology, Inc., EnterpriseDB, and Fusion Technologies.

Jim is also a fervent advocate of PostgreSQL. He is a member of the board of the United States PostgreSQL Association, as well as being a part of the organizing teams of the New York PostgreSQL User Group and Philadelphia PostgreSQL User Groups.

Kirk Roybal

Kirk Roybal has been active in the PostgreSQL community since 1998. He has helped to organize user groups in Houston, Dallas, and Bloomington, IL. He has mentored many junior database administrators and provided cross training for senior database engineers. He has provided solutions using PostgreSQL for reporting, business intelligence, data warehousing, applications, and development support.

Kirk saw the value of PostgreSQL when the first small business customer asked for a web application. At the time, competitive database products were either extremely immature, or cost prohibitive. Kirk has stood by the choice of PostgreSQL for many years now. His expertise is founded on keeping up with features and capabilities as they have become available.

Books From Packt


Mastering phpMyAdmin 3.4 for Effective MySQL Management
Mastering phpMyAdmin 3.4 for Effective MySQL Management

Oracle Advanced PL/SQL Developer Professional Guide
Oracle Advanced PL/SQL Developer Professional Guide

Learning SQL Server 2008 Reporting Services
Learning SQL Server 2008 Reporting Services

Microsoft SQL Server 2012 Performance Tuning Cookbook
Microsoft SQL Server 2012 Performance Tuning Cookbook

PostgreSQL 9.0 High Performance
PostgreSQL 9.0 High Performance

Microsoft SQL Server 2012 Integration Services: An Expert Cookbook
Microsoft SQL Server 2012 Integration Services: An Expert Cookbook

Mastering SQL Queries for SAP Business One
Mastering SQL Queries for SAP Business One

PostgreSQL 9 Admin Cookbook
PostgreSQL 9 Admin Cookbook


Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software