PostgreSQL Development Essentials

3.3 (3 reviews total)
By Manpreet Kaur , Baji Shaik
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Advanced SQL

About this book

PostgreSQL is the most advanced open source database in the world. It is easy to install, configure, and maintain by following the documentation; however, it’s difficult to develop applications using programming languages and design databases accordingly. This book is what you need to get the most out of PostgreSQL

You will begin with advanced SQL topics such as views, materialized views, and cursors, and learn about performing data type conversions. You will then perform trigger operations and use trigger functions in PostgreSQL. Next we walk through data modeling, normalization concepts, and the effect of transactions and locking on the database.

The next half of the book covers the types of indexes, constrains, and the concepts of table partitioning, as well as the different mechanisms and approaches available to write efficient queries or code. Later, we explore PostgreSQL Extensions and Large Object Support in PostgreSQL. Finally, you will perform database operations in PostgreSQL using PHP and Java. By the end of this book, you will have mastered all the aspects of PostgreSQL development. You will be able to build efficient enterprise-grade applications with PostgreSQL by making use of these concepts

Publication date:
September 2016
Publisher
Packt
Pages
210
ISBN
9781783989003

 

Chapter 1.  Advanced SQL

This book is all about an open source software product, a relational database called PostgreSQL. PostgreSQL is an advanced SQL database server, available on a wide range of platforms. The purpose of this book is to teach database developers the fundamental practices and techniques to program database applications with PostgreSQL.

In this chapter, we will discuss the following advanced SQL topics:

  • Creating views

  • Understanding materialized views

  • Creating cursors

  • Using the GROUP BY clause

  • Using the HAVING clause

  • Understanding complex topics such as subqueries and joins

 

Creating views


A view is a virtual table based on the result set of an SQL statement. Just like a real table, a view consist of rows and columns. The fields in a view are from one or more real tables in the database. Generally speaking, a table has a set of definitions that physically stores data. A view also has a set of definitions built on top of table(s) or other view(s) that does not physically store data. The purpose of creating views is to make sure that the user does not have access to all the data and is being restricted through a view. Also, it's better to create a view if we have a query based on multiple tables so that we can use it straightaway rather than writing a whole PSQL again and again.

Database views are created using the CREATE VIEW statement. Views can be created from a single table or multiple tables, or another view.

The basic CREATE VIEW syntax is as follows:

CREATE VIEW view_name AS
SELECT column1, column2
FROM table_name
WHERE [condition];

Let's take a look at each of these commands:

  • CREATE VIEW: This command helps create the database's view.

  • SELECT: This command helps you select the physical and virtual columns that you want as part of the view.

  • FROM: This command gives the table names with an alias from where we can fetch the columns. This may include one or more table names, considering you have to create a view at the top of multiple tables.

  • WHERE: This command provides a condition that will restrict the data for a view. Also, if you include multiple tables in the FROM clause, you can provide the joining condition under the WHERE clause.

You can then query this view as though it were a table. (In PostgreSQL, at the time of writing, views are read-only by default.) You can SELECT data from a view just as you would from a table and join it to other tables; you can also use WHERE clauses. Each time you execute a SELECT query using the view, the data is rebuilt, so it is always up-to-date. It is not a frozen copy stored at the time the view was created.

Let's create a view on supplier and order tables. But, before that, let's see what the structure of the suppliers and orders table is:

CREATE TABLE suppliers
(supplier_id number primary key,
Supplier_name varchar(30),
Phone_number number);
CREATE TABLE orders
(order_number number primary key,
Supplier_id  number references suppliers(supplier_id),
Quanity number,
Is_active varchar(10),
Price number);
CREATE VIEW active_supplier_orders AS 
SELECT suppliers.supplier_id, suppliers.supplier_name  orders.quantity, orders.price 
FROM suppliers
INNER JOIN orders
ON suppliers.supplier_id = orders.supplier_id
WHERE suppliers.supplier_name = 'XYZ COMPANY'
And orders.active='TRUE';

The preceding example will create a virtual table based on the result set of the SELECT statement. You can now query the PostgreSQL VIEW as follows:

SELECT * FROM active_supplier_orders;

Deleting and replacing views

To delete a view, simply use the DROP VIEW statement with view_name. The basic DROP VIEW syntax is as follows:

DROP VIEW IF EXISTS view_name;

If you want to replace an existing view with one that has the same name and returns the same set of columns, you can use a CREATE OR REPLACE command.

The following is the syntax to modify an existing view:

CREATE OR REPLACE VIEW view_name AS
SELECT column_name(s)
FROM table_name(s)
WHERE condition;

Let's take a look at each of these commands:

  • CREATE OR REPLACE VIEW: This command helps modify the existing view.

  • SELECT: This command selects the columns that you want as part of the view.

  • FROM: This command gives the table name from where we can fetch the columns. This may include one or more table names, since you have to create a view at the top of multiple tables.

  • WHERE: This command provides the condition to restrict the data for a view. Also, if you include multiple tables in the FROM clause, you can provide the joining condition under the WHERE clause.

Let's modify a view, supplier_orders, by adding some more columns in the view. The view was originally based on supplier and order tables having supplier_id, supplier_name, quantity, and price. Let's also add order_number in the view.

CREATE OR REPLACE VIEW active_supplier_orders AS 
SELECT suppliers.supplier_id, suppliers.supplier_name  orders.quantity, orders.price,order. order_number 
FROM suppliers
INNER JOIN orders
ON suppliers.supplier_id = orders.supplier_id
WHERE suppliers.supplier_name = 'XYZ COMPANY'
And orders.active='TRUE';;
 

Materialized views


A materialized view is a table that actually contains rows but behaves like a view. This has been added in the PostgreSQL 9.3 version. A materialized view cannot subsequently be directly updated, and the query used to create the materialized view is stored in exactly the same way as the view's query is stored. As it holds the actual data, it occupies space as per the filters that we applied while creating the materialized view.

Why materialized views?

Before we get too deep into how to implement materialized views, let's first examine why we may want to use materialized views.

You may notice that certain queries are very slow. You may have exhausted all the techniques in the standard bag of techniques to speed up those queries. In the end, you will realize that getting queries to run as fast as you want simply isn't possible without completely restructuring the data.

Now, if you have an environment where you run the same type of SELECT query multiple times against the same set of tables, then you can create a materialized view for SELECT so that, on every run, this view does not go to the actual tables to fetch the data, which will obviously reduce the load on them as you might be running a Data Manipulation Language (DML) against your actual tables at the same time. So, basically, you take a view and turn it into a real table that holds real data rather than a gateway to a SELECT query.

Read-only, updatable, and writeable materialized views

A materialized view can be read-only, updatable, or writeable. Users cannot perform DML statements on read-only materialized views, but they can perform them on updatable and writeable materialized views.

Read-only materialized views

You can make a materialized view read-only during creation by omitting the FOR UPDATE clause or by disabling the equivalent option in the database management tool. Read-only materialized views use many mechanisms similar to updatable materialized views, except they do not need to belong to a materialized view group.

In a replication environment, a materialized table holds the table data and resides in a different database. A table that has a materialized view on it is called a master table. The master table resides on a master site and the materialized view resides on a materialized-view site.

In addition, using read-only materialized views eliminates the possibility of introducing data conflicts on the master site or the master materialized view site, although this convenience means that updates cannot be made on the remote materialized view site.

The syntax to create a materialized view is as follows:

CREATE MATERIALIZED VIEW  view_name AS SELECT  columns FROM table;

The CREATE MATERIALIZED VIEW command helps us create a materialized view. The command acts in way similar to the CREATE VIEW command, which was explained in the previous section.

Let's make a read-only materialized view for a supplier table:

CREATE MATERIALIZED VIEW suppliers_matview AS
SELECT * FROM suppliers;

This view is a read-only materialized view and will not reflect the changes to the master site.

Updatable materialized views

You can make a materialized view updatable during creation by including the FOR UPDATE clause or enabling the equivalent option in the database management tool. In order for changes that have been made to an updatable materialized view to be reflected in the master site during refresh, the updatable materialized view must belong to a materialized view group.

When we say "refreshing the materialized view," we mean synchronizing the data in the materialized view with data in its master table.

An updatable materialized view enables you to decrease the load on master sites because users can make changes to data on the materialized view site.

The syntax to create an updatable materialized view is as follows:

CREATE MATERIALIZED VIEW  view_name  FOR UPDATE 
AS 
SELECT columns FROM table;

Let's make an updatable materialized view for a supplier table:

CREATE MATERIALIZED VIEW suppliers_matview FOR UPDATE
AS
SELECT * FROM suppliers;

Whenever changes are made in the suppliers_matview clause, it will reflect the changes to the master sites during refresh.

Writeable materialized views

A writeable materialized view is one that is created using the FOR UPDATE clause like an updatable materialized view is, but it is not a part of a materialized view group. Users can perform DML operations on a writeable materialized view; however, if you refresh the materialized view, then these changes are not pushed back to the master site and are lost in the materialized view itself. Writeable materialized views are typically allowed wherever fast-refreshable, read-only materialized views are allowed.

 

Creating cursors


A cursor in PostgreSQL is a read-only pointer to a fully executed SELECT statement's result set. Cursors are typically used within applications that maintain a persistent connection to the PostgreSQL backend. By executing a cursor and maintaining a reference to its returned result set, an application can more efficiently manage which rows to retrieve from a result set at different times without re-executing the query with different LIMIT and OFFSET clauses.

The four SQL commands involved with PostgreSQL cursors are DECLARE, FETCH, MOVE, and CLOSE.

The DECLARE command both defines and opens a cursor, in effect defining the cursor in memory, and then populates the cursor with information about the result set returned from the executed query. A cursor may be declared only within an existing transaction block, so you must execute a BEGIN command prior to declaring a cursor.

Here is the syntax for DECLARE:

DECLARE cursorname [ BINARY ] [ INSENSITIVE ] [ SCROLL ] CURSOR  FOR query
[ FOR { READ ONLY | UPDATE [ OF  column [, ...] ] } ]

DECLARE cursorname is the name of the cursor to create. The optional BINARY keyword causes the output to be retrieved in binary format instead of standard ASCII; this can be more efficient, though it is only relevant to custom applications as clients such as psql are not built to handle anything but text output. The INSENSITIVE and SCROLL keywords exist to comply with the SQL standard, though they each define PostgreSQL's default behavior and are never necessary. The INSENSITIVE SQL keyword exists to ensure that all data retrieved from the cursor remains unchanged from other cursors or connections. As PostgreSQL requires the cursors to be defined within transaction blocks, this behavior is already implied. The SCROLL SQL keyword exists to specify that multiple rows at a time can be selected from the cursor. This is the default in PostgreSQL, even if it is unspecified.

The CURSOR FOR query is the complete query and its result set will be accessible by the cursor when executed.

The [FOR { READ ONLY | UPDATE [ OF column [, ...] ] } ] cursors may only be defined as READ ONLY, and the FOR clause is, therefore, superfluous.

Let's begin a transaction block with the BEGIN keyword, and open a cursor named order_cur with SELECT * FROM orders as its executed select statement:

BEGIN;
DECLARE order_cur CURSOR
FOR SELECT * FROM orders;

Once the cursor is successfully declared, it means that the rows retrieved by the query are now accessible from the order_cur cursor.

Using cursors

In order to retrieve rows from the open cursor, we need to use the FETCH command. The MOVE command moves the current location of the cursor within the result set and the CLOSE command closes the cursor, freeing up any associated memory.

Here is the syntax for the FETCH SQL command:

FETCH [ FORWARD | BACKWARD]
[ # | ALL | NEXT | PRIOR ]
{ IN | FROM } 
cursor

cursor is the name of the cursor from where we can retrieve row data. A cursor always points to a current position in the executed statement's result set and rows can be retrieved either ahead of the current location or behind it. The FORWARD and BACKWARD keywords may be used to specify the direction, though the default is forward. The NEXT keyword (the default) returns the next single row from the current cursor position. The PRIOR keyword causes the single row preceding the current cursor position to be returned.

Let's consider an example that fetches the first four rows stored in the result set, pointed to by the order_cur cursor. As a direction is not specified, FORWARD is implied. It then uses a FETCH statement with the NEXT keyword to select the fifth row, and then another FETCH statement with the PRIOR keyword to again select the fourth retrieved row.

FETCH 4 FROM order_cur;

In this case, the first four rows will be fetched.

Closing a cursor

You can use the CLOSE command to explicitly close an open cursor. A cursor can also be implicitly closed if the transaction block that it resides within is committed with the COMMIT command, or rolled back with the ROLLBACK command.

Here is the syntax for the CLOSE command, where Cursorname is the name of the cursor intended to be closed:

CLOSE
Cursorname;
 

Using the GROUP BY clause


The GROUP BY clause enables you to establish data groups based on columns. The grouping criterion is defined by the GROUP BY clause, which is followed by the WHERE clause in the SQL execution path. Following this execution path, the result set rows are grouped based on like values of grouping columns and the WHERE clause restricts the entries in each group.

Note

All columns that are used besides the aggregate functions must be included in the GROUP BY clause. The GROUP BY clause does not support the use of column aliases; you must use the actual column names. The GROUP BY columns may or may not appear in the SELECT list. The GROUP BY clause can only be used with aggregate functions such as  SUM , AVG , COUNT , MAX , and MIN .

The following statement illustrates the syntax of the GROUP BY clause:

SELECT expression1, expression2, ... expression_n,
aggregate_function (expression)
FROM tables
WHERE conditions
GROUP BY expression1, expression2, ... expression_n;

The expression1, expression2, ... expression_n commands are expressions that are not encapsulated within an aggregate function and must be included in the GROUP BY clause.

Let's take a look at these commands:

The GROUP BY clause must appear right after the FROM or WHERE clause. Followed by the GROUP BY clause is one column or a list of comma-separated columns. You can also put an expression in the GROUP BY clause.

As mentioned in the previous paragraph, the GROUP BY clause divides rows returned from the SELECT statement into groups. For each group, you can apply an aggregate function, for example, to calculate the sum of items or count the number of items in the groups.

Let's look at a GROUP BY query example that uses the SUM function (http://www.techonthenet.com/oracle/functions/sum.php). This example uses the SUM function to return the name of the product and the total sales (for the product).

SELECT product, SUM(sale) AS "Total sales"
FROM order_details
GROUP BY product;

In the select statement, we have sales where we applied the SUM function and the other field product is not part of SUM, we must use in the GROUP BY clause.

 

Using the HAVING clause


In the previous section, we discussed about GROUP BY clause, however if you want to restrict the groups of returned rows, you can use HAVING clause. The HAVING clause is used to specify which individual group(s) is to be displayed, or in simple language we use the HAVING clause in order to filter the groups on the basis of an aggregate function condition.

Note: The WHERE clause cannot be used to return the desired groups. The WHERE clause is only used to restrict individual rows. When the GROUP BY clause is not used, the HAVING clause works like the WHERE clause.

The syntax for the PostgreSQL HAVING clause is as follows:

SELECT expression1, expression2, ... expression_n, 
aggregate_function (expression)
FROM tables
WHERE conditions
GROUP BY expression1, expression2, ... expression_n
HAVING group_condition;

Parameters or arguments

aggregate_function can be a function such as SUM, COUNT, MIN, MAX, or AVG.

expression1, expression2, ... expression_n are expressions that are not encapsulated within an aggregate function and must be included in the GROUP BY clause.

conditions are the conditions used to restrict the groups of returned rows. Only those groups whose condition evaluates to true will be included in the result set.

Let's consider an example where you try to fetch the product that has sales>10000:

SELECT product, SUM(sale) AS "Total sales"
FROM order_details
GROUP BY product
Having sum(sales)>10000;

The PostgreSQL HAVING clause will filter the results so that only the total sales greater than 10000 will be returned.

 

Using the UPDATE operation clauses


The PostgreSQL UPDATE query is used to modify the existing records in a table. You can use the WHERE clause with the UPDATE query to update selected rows; otherwise, all the rows will be updated.

The basic syntax of the UPDATE query with the WHERE clause is as follows:

UPDATE table_name
SET column1 = value1, column2 = value2...., columnN = valueN
WHERE [condition];

You can combine n number of conditions using the AND or OR operators.

The following is an example that will update SALARY for an employee whose ID is 6:

UPDATE employee SET SALARY = 15000 WHERE ID = 6;

This will update the salary to 15000 whose ID = 6.

 

Using the LIMIT clause


The LIMIT clause is used to retrieve a number of rows from a larger data set. It helps fetch the top n records. The LIMIT and OFFSET clauses allow you to retrieve just a portion of the rows that are generated by the rest of the query from a result set:

SELECT select_list
FROM table_expression
[LIMIT { number | ALL }] [OFFSET number]

If a limit count is given, no more than that many rows will be returned (but possibly fewer, if the query itself yields fewer rows). LIMIT ALL is the same as omitting the LIMIT clause.

The OFFSET clause suggests skipping many rows before beginning to return rows. OFFSET 0 is the same as omitting the OFFSET clause. If both OFFSET and LIMIT appear, then the OFFSET rows will be skipped before starting to count the LIMIT rows that are returned.

 

Using subqueries


A subquery is a query within a query. In other words, a subquery is a SQL query nested inside a larger query. It may occur in a SELECT, FROM, or WHERE clause. In PostgreSQL, a subquery can be nested inside a SELECT, INSERT, UPDATE, DELETE, SET, or DO statement or inside another subquery. It is usually added within the WHERE clause of another SQL SELECT statement. You can use comparison operators, such as >, <, or =. Comparison operators can also be a multiple-row operator, such as IN, ANY, SOME, or ALL. It can be treated as an inner query that is an SQL query placed as a part of another query called as outer query. The inner query is executed before its parent query so that the results of the inner query can be passed to the outer query.

The following statement illustrates the subquery syntax:

SELECT column list
FROM table 
WHERE table.columnname expr_operator
(SELECT column FROM table)

The query inside the brackets is called the inner query. The query that contains the subquery is called the outer query.

PostgreSQL executes the query that contains a subquery in the following sequence:

  • First, it executes the subquery

  • Second, it gets the results and passes it to the outer query

  • Third, it executes the outer query

Let's consider an example where you want to find employee_id, first_name, last_name, and salary for employees whose salary is higher than the average salary throughout the company.

We can do this in two steps:

  1. First, find the average salary from the employee table.

  2. Then, use the answer in the second SELECT statement to find employees who have a higher salary from the result (which is the average salary).

        SELECT avg(salary) from employee;

        Result: 25000
        SELECT employee_id,first_name,last_name,salary
        FROM employee
        WHERE salary > 25000;

This does seem rather inelegant. What we really want to do is pass the result of the first query straight into the second query without needing to remember it, and type it back for a second query.

The solution is to use a subquery. We put the first query in brackets, and use it as part of

a WHERE clause to the second query, as follows:

SELECT employee_id,first_name,last_name,salary
FROM employee
WHERE salary > (Select avg(salary) from employee);

PostgreSQL runs the query in brackets first, that is, the average of salary. After getting the answer, it then runs the outer query, substituting the answer from the inner query, and tries to find the employees whose salary is higher than the average.

Note

Note: A subquery that returns exactly one column value from one row is called a scalar subquery. The SELECT query is executed and the single returned value is used in the surrounding value expression. It is an error to use a query that returns more than one row or column as a scalar subquery. If the subquery returns no rows during a particular execution, it is not an error, and the scalar result is taken to be null. The subquery can refer to variables from the surrounding query, which will act as constants during any one evaluation of the subquery.

 

Subqueries that return multiple rows


In the previous section, we saw subqueries that only returned a single result because an aggregate function was used in the subquery. Subqueries can also return zero or more rows.

Subqueries that return multiple rows can be used with the ALL, IN, ANY, or SOME operators. We can also negate the condition like NOT IN.

Correlated subqueries

A subquery that references one or more columns from its containing SQL statement is called a correlated subquery. Unlike non-correlated subqueries that are executed exactly once prior to the execution of a containing statement, a correlated subquery is executed once for each candidate row in the intermediate result set of the containing query.

The following statement illustrates the syntax of a correlated subquery:

SELECT column1,column2,..
FROM table 1 outer
WHERE column1 operator( SELECT column1 from table 2 WHERE  column2=outer.column4)

The PostgreSQL runs will pass the value of column4 from the outer table to the inner query and will be compared to column2 of table 2. Accordingly, column1 will be fetched from table 2 and depending on the operator it will be compared to column1 of the outer table. If the expression turned out to be true, the row will be passed; otherwise, it will not appear in the output.

But with the correlated queries you might see some performance issues. This is because of the fact that for every record of the outer query, the correlated subquery will be executed. The performance is completely dependent on the data involved. However, in order to make sure that the query works efficiently, we can use some temporary tables.

Let's try to find all the employees who earn more than the average salary in their department:

SELECT last_name, salary, department_id
FROM employee outer
WHERE salary >
(SELECT AVG(salary)
FROM employee
WHERE department_id = outer.department_id);

For each row from the employee table, the value of department_id will be passed into the inner query (let's consider that the value of department_id of the first row is 30) and the inner query will try to find the average salary of that particular department_id = 30. If the salary of that particular record will be more than the average salary of department_id = 30, the expression will turn out to be true and the record will come in the output.

Existence subqueries

The PostgreSQL EXISTS condition is used in combination with a subquery, and is considered to be met if the subquery returns at least one row. It can be used in a SELECT, INSERT, UPDATE, or DELETE statement. If a subquery returns any rows at all, the EXISTS subquery is true, and the NOT EXISTS subquery is false.

The syntax for the PostgreSQL EXISTS condition is as follows:

WHERE EXISTS ( subquery );

Parameters or arguments

The subquery is a SELECT statement that usually starts with SELECT * rather than a list of expressions or column names. To increase performance, you could replace SELECT * with SELECT 1 as the column result of the subquery is not relevant (only the rows returned matter).

Note

The SQL statements that use the EXISTS condition in PostgreSQL are very inefficient as the subquery is re-run for every row in the outer query's table. There are more efficient ways, such as using joins to write most queries, that do not use the EXISTS condition.

Let's look at the following example that is a SELECT statement and uses the PostgreSQL EXISTS condition:

SELECT *
FROM products
WHERE EXISTS (SELECT 1
FROM inventory
WHERE products.product_id = inventory.product_id);

This PostgreSQL EXISTS condition example will return all records from the products table where there is at least one record in the inventory table with the matching product_id. We used SELECT 1 in the subquery to increase performance as the column result set is not relevant to the EXISTS condition (only the existence of a returned row matters).

The PostgreSQL EXISTS condition can also be combined with the NOT operator, for example:

SELECT *
FROM products

WHERE NOT EXISTS (SELECT 1
FROM inventory
WHERE products.product_id = inventory.product_id);

This PostgreSQL NOT EXISTS example will return all records from the products table where there are no records in the inventory table for the given product_id.

 

Using the Union join


The PostgreSQL UNION clause is used to combine the results of two or more SELECT statements without returning any duplicate rows.

The basic rules to combine two or more queries using the UNION join are as follows:

  • The number and order of columns of all queries must be the same

  • The data types of the columns on involving table in each query must be same or compatible

  • Usually, the returned column names are taken from the first query

By default, the UNION join behaves like DISTINCT, that is, eliminates the duplicate rows; however, using the ALL keyword with the UNION join returns all rows, including the duplicates, as shown in the following example:

SELECT <column_list>
FROM  table 
WHERE  condition
GROUP BY  <column_list>  [HAVING ] condition
UNION
SELECT <column_list>
FROM  table 
WHERE  condition
GROUP BY  <column_list>  [HAVING ] condition
ORDER BY column list;

The queries are all executed independently, but their output is merged. The Union operator may place rows in the first query, before, after, or in between the rows in the result set of the second query. To sort the records in a combined result set, you can use ORDER BY.

Let's consider an example where you combine the data of customers belonging to two different sites. The table structure of both the tables is the same, but they have data of the customers from two different sites:

SELECT customer_id,customer_name,location_id
FROM  customer_site1 
UNION
SELECT customer_id,customer_name,location_id
FROM  customer_site2
ORDER BY customer_name  asc;

Both the SELECT queries would run individually, combine the result set, remove the duplicates (as we are using UNION), and sort the result set according to the condition, which is customer_name in this case.

 

Using the Self join


The tables we are joining don't have to be different ones. We can join a table with itself. This is called a self join. In this case, we will use aliases for the table; otherwise, PostgreSQL will not know which column of which table instance we mean. To join a table with itself means that each row of the table is combined with itself, and with every other row of the table. The self join can be viewed as a joining of two copies of the same table. The table is not actually copied but SQL carries out the command as though it were.

The syntax of the command to join a table with itself is almost the same as that of joining two different tables:

SELECT a.column_name, b.column_name...
FROM table1 a, table1 b 
WHERE condition1 and/or condition2

To distinguish the column names from one another, aliases for the actual table names are used as both the tables have the same name. Table name aliases are defined in the FROM clause of the SELECT statement.

Let's consider an example where you want to find a list of employees and their supervisor. For this example, we will consider the Employee table that has the columns Employee_id, Employee_name, and Supervisor_id. The Supervisor_id contains nothing but the Employee_id of the person who the employee reports to.

In the following example, we will use the table Employee twice; and in order to do this, we will use the alias of the table:

SELECT a.emp_id AS "Emp_ID", a.emp_name AS "Employee Name",
b.emp_id AS "Supervisor ID",b.emp_name AS "Supervisor Name"
FROM employee a, employee b
WHERE a.supervisor_id = b.emp_id;

For every record, it will compare the Supervisor_id to the Employee_id and the Employee_name to the supervisor name.

 

Using the Outer join


Another class of join is known as the OUTER JOIN. In OUTER JOIN, the results might contain both matched and unmatched rows. It is for this reason that beginners might find such joins a little confusing. However, the logic is really quite straightforward.

The following are the three types of Outer joins:

  • The PostgreSQL LEFT OUTER JOIN (or sometimes called LEFT JOIN)

  • The PostgreSQL RIGHT OUTER JOIN (or sometimes called RIGHT JOIN)

  • The PostgreSQL FULL OUTER JOIN (or sometimes called FULL JOIN)

Left outer join

Left outer join returns all rows from the left-hand table specified in the ON condition, and only those rows from the other tables where the joined fields are equal (the join condition is met). If the condition is not met, the values of the columns in the second table are replaced by null values.

The syntax for the PostgreSQL LEFT OUTER JOIN is:

SELECT columns
FROM table1
LEFT OUTER JOIN table2
ON condition1, condition2

In the case of LEFT OUTER JOIN, an inner join is performed first. Then, for each row in table1 that does not satisfy the join condition with any row in table2, a joined row is added with null values in the columns of table2. Thus, the joined table always has at least one row for each row in table1.

Let's consider an example where you want to fetch the order details placed by a customer. Now, there can be a scenario where a customer doesn't have any order placed that is open, and the order table contains only those orders that are open. In this case, we will use a left outer join to get information on all the customers and their corresponding orders:

SELECT customer.customer_id, customer.customer_name, orders.order_number
FROM customer
LEFT OUTER JOIN orders
ON customer.customer_id = orders.customer_id

This LEFT OUTER JOIN example will return all rows from the customer table and only those rows from the orders table where the join condition is met.

If a customer_id value in the customer table does not exist in the orders table, all fields in the orders table will display as <null> in the result set.

Right outer join

Another type of join is called a PostgreSQL RIGHT OUTER JOIN. This type of join returns all rows from the right-hand table specified in the ON condition, and only those rows from the other table where the joined fields are equal (join condition is met). If the condition is not met, the value of the columns in the first table is replaced by null values.

The syntax for the PostgreSQL RIGHT OUTER JOIN is as follows:

SELECT columns
FROM table1
RIGHT OUTER JOIN table2
ON table1.column = table2.column;
Condition1, condition2;

In the case of RIGHT OUTER JOIN, an inner join is performed first. Then, for each row in table2 that does not satisfy the join condition with any row in table1, a joined row is added with null values in the columns of table1. This is the converse of a left join; the result table will always have a row for each row in table2.

Let's consider an example where you want to fetch the invoice information for the orders. Now, when an order is completed, we generate an invoice for the customer so that he can pay the amount. There can be a scenario where the order has not been completed, so the invoice is not generated yet. In this case, we will use a right outer to get all the orders information and corresponding invoice information.

SELECT invoice.invoice_id, invoice.invoice_date,  orders.order_number
FROM invoice
RIGHT OUTER JOIN orders
ON invoice.order_number= orders.order_number

This RIGHT OUTER JOIN example will return all rows from the order table and only those rows from the invoice table where the joined fields are equal. If an order_number value in the invoice table does not exist, all the fields in the invoice table will display as <null> in the result set.

Full outer join

Another type of join is called a PostgreSQL FULL OUTER JOIN. This type of join returns all rows from the left-hand table and right-hand table with nulls in place where the join condition is not met.

The syntax for the PostgreSQL FULL OUTER JOIN is as follows:

SELECT columns
FROM table1
FULL OUTER JOIN table2
ON table1.column = table2.column;
Condition1,condition2;

First, an inner join is performed. Then, for each row in table1 that does not satisfy the join condition with any row in table2, a joined row is added with null values in the columns of table2. Also, for each row of table2 that does not satisfy the join condition with any row in table1, a joined row with null values in the columns of table1 is added.

Let's consider an example where you want to fetch an invoice information and all the orders information. In this case, we will use a full outer to get all the orders information and the corresponding invoice information.

SELECT invoice.invoice_id, invoice.invoice_date, orders.order_number
FROM invoice
FULL  OUTER JOIN orders
ON invoice.order_number= orders.order_number;

This FULL OUTER JOIN example will return all rows from the invoice table and the orders table and, whenever the join condition is not met, <null> will be extended to those fields in the result set.

If an order_number value in the invoice table does not exist in the orders table, all the fields in the orders table will display as <null> in the result set. If order number in order's table does not exist in the invoice table, all fields in the invoice table will display as <null> in the result set.

 

Summary


After reading this chapter, you will be familiar with advanced concepts of PostgreSQL. We talked about views and materialized views, which are really significant. We also talked about cursors that help run a few rows at a time rather than full query at once. This helps avoid memory overrun when results contain a large number of rows. Another usage is to return a reference to a cursor that a function has created and allow the caller to read the rows. In addition to these, we discussed the aggregation concept by using the GROUP BY clause, which is really important for calculations. Another topic that we discussed in this chapter is subquery, which is a powerful feature of PostgreSQL. However, subqueries that contain an outer reference can be very inefficient. In many instances, these queries can be rewritten to remove the outer reference, which can improve performance. Other than that, the concept we covered is join, along with self, union, and outer join; these are really helpful when we need data from multiple tables. In the next chapter, we will discuss conversion between the data types and how to deal with arrays. Also we will talk about some complex data types, such as JSON and XML.

About the Authors

  • Manpreet Kaur

    Manpreet Kaur currently works as a business intelligence solution developer at an IT-based MNC in Chandigarh. She has over 7 years of work experience in the field of developing successful analytical solutions in data warehousing, analytics and reporting, and portal and dashboard development in the PostgreSQL and Oracle databases. She has worked on business intelligence tools such as Noetix, SSRS, Tableau, and OBIEE. She has a good understanding of ETL tools such as Informatica and Oracle Data Integrator (ODI). Currently, she works on analytical solutions using Hadoop and OBIEE 12c.

    Additionally, she is very creative and enjoys oil painting. She also has a youtube channel, Oh so homemade, where she posts easy ways to make recycled crafts.

    Browse publications by this author
  • Baji Shaik

    Baji Shaik is a database administrator and developer. He is currently working as a database consultant at OpenSCG. He has an engineering degree in telecommunications, and he started his career as a C# and Java developer. He started working with databases in 2011 and, over the years, he has worked with Oracle, PostgreSQL, and Greenplum. His background spans a wide depth and breadth of expertise and experience in SQL/NoSQL database technologies. He has architectured and designed many successful database solutions addressing challenging business requirements. He has provided solutions using PostgreSQL for reporting, business intelligence, data warehousing, applications, and development support. He has a good knowledge of automation, orchestration, and DevOps in a cloud environment.

    He comes from a small village named Vutukutu in Andhra Pradesh and currently lives in Hyderabad. He likes to watch movies, read books, and write technical blogs. He loves to spend time with family. He has tech-reviewed Troubleshooting PostgreSQL by Packt Publishing. He is a certified PostgreSQL professional.

    Browse publications by this author

Latest Reviews

(3 reviews total)
The book was a piece of garbage. Whole chunks of code were missing. It was obviously written by somebody with a poor grasp of English and had not been edited or proof-read.
good. I like small book, free to look anywhere.
good book, very well written