This book is all about an open source software product, a relational database called PostgreSQL. PostgreSQL is an advanced SQL database server, available on a wide range of platforms. The purpose of this book is to teach database developers the fundamental practices and techniques to program database applications with PostgreSQL.
In this chapter, we will discuss the following advanced SQL topics:
Creating views
Understanding materialized views
Creating cursors
Using the
GROUP BY
clauseUsing the
HAVING
clauseUnderstanding complex topics such as subqueries and joins
A view is a virtual table based on the result set of an SQL statement. Just like a real table, a view consist of rows and columns. The fields in a view are from one or more real tables in the database. Generally speaking, a table has a set of definitions that physically stores data. A view also has a set of definitions built on top of table(s) or other view(s) that does not physically store data. The purpose of creating views is to make sure that the user does not have access to all the data and is being restricted through a view. Also, it's better to create a view if we have a query based on multiple tables so that we can use it straightaway rather than writing a whole PSQL again and again.
Database views are created using the CREATE VIEW
statement. Views can be created from a single table or multiple tables, or another view.
The basic CREATE
VIEW
syntax is as follows:
CREATE VIEW view_name AS SELECT column1, column2 FROM table_name WHERE [condition];
Let's take a look at each of these commands:
CREATE VIEW
: This command helps create the database's view.SELECT
: This command helps you select the physical and virtual columns that you want as part of the view.FROM
: This command gives the table names with an alias from where we can fetch the columns. This may include one or more table names, considering you have to create a view at the top of multiple tables.WHERE
: This command provides a condition that will restrict the data for a view. Also, if you include multiple tables in theFROM
clause, you can provide the joining condition under theWHERE
clause.
You can then query this view as though it were a table. (In PostgreSQL, at the time of writing, views are read-only by default.) You can SELECT
data from a view just as you would from a table and join it to other tables; you can also use WHERE
clauses. Each time you execute a SELECT
query using the view, the data is rebuilt, so it is always up-to-date. It is not a frozen copy stored at the time the view was created.
Let's create a view on supplier and order tables. But, before that, let's see what the structure of the suppliers
and orders
table is:
CREATE TABLE suppliers (supplier_id number primary key, Supplier_name varchar(30), Phone_number number); CREATE TABLE orders (order_number number primary key, Supplier_id number references suppliers(supplier_id), Quanity number, Is_active varchar(10), Price number); CREATE VIEW active_supplier_orders AS SELECT suppliers.supplier_id, suppliers.supplier_name orders.quantity, orders.price FROM suppliers INNER JOIN orders ON suppliers.supplier_id = orders.supplier_id WHERE suppliers.supplier_name = 'XYZ COMPANY' And orders.active='TRUE';
The preceding example will create a virtual table based on the result set of the SELECT
statement. You can now query the PostgreSQL VIEW
as follows:
SELECT * FROM active_supplier_orders;
To delete a view, simply use the DROP VIEW
statement with view_name
. The basic DROP
VIEW
syntax is as follows:
DROP VIEW IF EXISTS view_name;
If you want to replace an existing view with one that has the same name and returns the same set of columns, you can use a CREATE OR REPLACE
command.
The following is the syntax to modify an existing view:
CREATE OR REPLACE VIEW view_name AS SELECT column_name(s) FROM table_name(s) WHERE condition;
Let's take a look at each of these commands:
CREATE OR REPLACE VIEW
: This command helps modify the existing view.SELECT
: This command selects the columns that you want as part of the view.FROM
: This command gives the table name from where we can fetch the columns. This may include one or more table names, since you have to create a view at the top of multiple tables.WHERE
: This command provides the condition to restrict the data for a view. Also, if you include multiple tables in theFROM
clause, you can provide the joining condition under theWHERE
clause.
Let's modify a view, supplier_orders
, by adding some more columns in the view. The view was originally based on supplier and order tables having supplier_id
, supplier_name
, quantity
, and price
. Let's also add order_number
in the view.
CREATE OR REPLACE VIEW active_supplier_orders AS SELECT suppliers.supplier_id, suppliers.supplier_name orders.quantity, orders.price,order. order_number FROM suppliers INNER JOIN orders ON suppliers.supplier_id = orders.supplier_id WHERE suppliers.supplier_name = 'XYZ COMPANY' And orders.active='TRUE';;
A materialized view is a table that actually contains rows but behaves like a view. This has been added in the PostgreSQL 9.3 version. A materialized view cannot subsequently be directly updated, and the query used to create the materialized view is stored in exactly the same way as the view's query is stored. As it holds the actual data, it occupies space as per the filters that we applied while creating the materialized view.
Before we get too deep into how to implement materialized views, let's first examine why we may want to use materialized views.
You may notice that certain queries are very slow. You may have exhausted all the techniques in the standard bag of techniques to speed up those queries. In the end, you will realize that getting queries to run as fast as you want simply isn't possible without completely restructuring the data.
Now, if you have an environment where you run the same type of SELECT
query multiple times against the same set of tables, then you can create a materialized view for SELECT
so that, on every run, this view does not go to the actual tables to fetch the data, which will obviously reduce the load on them as you might be running a Data Manipulation Language (DML) against your actual tables at the same time. So, basically, you take a view and turn it into a real table that holds real data rather than a gateway to a SELECT
query.
A materialized view can be read-only, updatable, or writeable. Users cannot perform DML statements on read-only materialized views, but they can perform them on updatable and writeable materialized views.
You can make a materialized view read-only during creation by omitting the FOR UPDATE
clause or by disabling the equivalent option in the database management tool. Read-only materialized views use many mechanisms similar to updatable materialized views, except they do not need to belong to a materialized view group.
In a replication environment, a materialized table holds the table data and resides in a different database. A table that has a materialized view on it is called a master table. The master table resides on a master site and the materialized view resides on a materialized-view site.
In addition, using read-only materialized views eliminates the possibility of introducing data conflicts on the master site or the master materialized view site, although this convenience means that updates cannot be made on the remote materialized view site.
The syntax to create a materialized view is as follows:
CREATE MATERIALIZED VIEW view_name AS SELECT columns FROM table;
The CREATE MATERIALIZED VIEW
command helps us create a materialized view. The command acts in way similar to the CREATE VIEW
command, which was explained in the previous section.
Let's make a read-only materialized view for a supplier table:
CREATE MATERIALIZED VIEW suppliers_matview AS SELECT * FROM suppliers;
This view is a read-only materialized view and will not reflect the changes to the master site.
You can make a materialized view updatable during creation by including the FOR UPDATE
clause or enabling the equivalent option in the database management tool. In order for changes that have been made to an updatable materialized view to be reflected in the master site during refresh, the updatable materialized view must belong to a materialized view group.
When we say "refreshing the materialized view," we mean synchronizing the data in the materialized view with data in its master table.
An updatable materialized view enables you to decrease the load on master sites because users can make changes to data on the materialized view site.
The syntax to create an updatable materialized view is as follows:
CREATE MATERIALIZED VIEW view_name FOR UPDATE AS SELECT columns FROM table;
Let's make an updatable materialized view for a supplier table:
CREATE MATERIALIZED VIEW suppliers_matview FOR UPDATE AS SELECT * FROM suppliers;
Whenever changes are made in the suppliers_matview
clause, it will reflect the changes to the master sites during refresh.
A writeable materialized view is one that is created using the FOR UPDATE
clause like an updatable materialized view is, but it is not a part of a materialized view group. Users can perform DML operations on a writeable materialized view; however, if you refresh the materialized view, then these changes are not pushed back to the master site and are lost in the materialized view itself. Writeable materialized views are typically allowed wherever fast-refreshable, read-only materialized views are allowed.
A cursor in PostgreSQL is a read-only pointer to a fully executed SELECT
statement's result set. Cursors are typically used within applications that maintain a persistent connection to the PostgreSQL backend. By executing a cursor and maintaining a reference to its returned result set, an application can more efficiently manage which rows to retrieve from a result set at different times without re-executing the query with different LIMIT
and OFFSET
clauses.
The four SQL commands involved with PostgreSQL cursors are DECLARE
, FETCH
, MOVE
, and CLOSE
.
The DECLARE
command both defines and opens a cursor, in effect defining the cursor in memory, and then populates the cursor with information about the result set returned from the executed query. A cursor may be declared only within an existing transaction block, so you must execute a BEGIN
command prior to declaring a cursor.
Here is the syntax for DECLARE
:
DECLARE cursorname [ BINARY ] [ INSENSITIVE ] [ SCROLL ] CURSOR FOR query [ FOR { READ ONLY | UPDATE [ OF column [, ...] ] } ]
DECLARE cursorname
is the name of the cursor to create. The optional BINARY
keyword causes the output to be retrieved in binary format instead of standard ASCII; this can be more efficient, though it is only relevant to custom applications as clients such as psql are not built to handle anything but text output. The INSENSITIVE
and SCROLL
keywords exist to comply with the SQL standard, though they each define PostgreSQL's default behavior and are never necessary. The INSENSITIVE
SQL keyword exists to ensure that all data retrieved from the cursor remains unchanged from other cursors or connections. As PostgreSQL requires the cursors to be defined within transaction blocks, this behavior is already implied. The SCROLL
SQL keyword exists to specify that multiple rows at a time can be selected from the cursor. This is the default in PostgreSQL, even if it is unspecified.
The CURSOR FOR
query is the complete query and its result set will be accessible by the cursor when executed.
The [FOR { READ ONLY | UPDATE [ OF column [, ...] ] } ]
cursors may only be defined as READ ONLY
, and the FOR
clause is, therefore, superfluous.
Let's begin a transaction block with the BEGIN
keyword, and open a cursor named order_cur
with SELECT * FROM orders
as its executed select statement:
BEGIN; DECLARE order_cur CURSOR FOR SELECT * FROM orders;
Once the cursor is successfully declared, it means that the rows retrieved by the query are now accessible from the order_cur
cursor.
In order to retrieve rows from the open cursor, we need to use the FETCH
command. The MOVE
command moves the current location of the cursor within the result set and the CLOSE
command closes the cursor, freeing up any associated memory.
Here is the syntax for the FETCH
SQL command:
FETCH [ FORWARD | BACKWARD] [ # | ALL | NEXT | PRIOR ] { IN | FROM } cursor
cursor
is the name of the cursor from where we can retrieve row data. A cursor
always points to a current position in the executed statement's result set and rows can be retrieved either ahead of the current location or behind it. The FORWARD
and BACKWARD
keywords may be used to specify the direction, though the default is forward. The NEXT
keyword (the default) returns the next single row from the current cursor position. The PRIOR
keyword causes the single row preceding the current cursor position to be returned.
Let's consider an example that fetches the first four rows stored in the result set, pointed to by the order_cur
cursor. As a direction is not specified, FORWARD
is implied. It then uses a FETCH
statement with the NEXT
keyword to select the fifth row, and then another FETCH
statement with the PRIOR
keyword to again select the fourth retrieved row.
FETCH 4 FROM order_cur;
In this case, the first four rows will be fetched.
You can use the CLOSE
command to explicitly close an open cursor. A cursor can also be implicitly closed if the transaction block that it resides within is committed with the COMMIT
command, or rolled back with the ROLLBACK
command.
Here is the syntax for the CLOSE
command, where Cursorname
is the name of the cursor intended to be closed:
CLOSE Cursorname;
The GROUP BY
clause enables you to establish data groups based on columns. The grouping criterion is defined by the GROUP BY
clause, which is followed by the
WHERE
clause in the SQL execution path. Following this execution path, the result set rows are grouped based on like values of grouping columns and the
WHERE
clause restricts the entries in each group.
Note
All columns that are used besides the aggregate functions must be included in the GROUP BY
clause. The GROUP BY
clause does not support the use of column aliases; you must use the actual column names. The GROUP BY
columns may or may not appear in the
SELECT
list. The GROUP BY
clause can only be used with aggregate functions such as
SUM
,
AVG
,
COUNT
,
MAX
, and
MIN
.
The following statement illustrates the syntax of the GROUP BY
clause:
SELECT expression1, expression2, ... expression_n, aggregate_function (expression) FROM tables WHERE conditions GROUP BY expression1, expression2, ... expression_n;
The expression1, expression2, ... expression_n
commands are expressions that are not encapsulated within an aggregate function and must be included in the GROUP BY
clause.
Let's take a look at these commands:
aggregate_function
: This performs many functions, such asSUM
(http://www.techonthenet.com/oracle/functions/sum.php),COUNT
(http://www.techonthenet.com/oracle/functions/count.php),MIN
(http://www.techonthenet.com/oracle/functions/min.php),MAX
(http://www.techonthenet.com/oracle/functions/max.php), orAVG
(http://www.techonthenet.com/oracle/functions/avg.php).tables
: This is where you can retrieve records from. There must be at least one table listed in theFROM
clause.conditions
: This is a condition that must be met for the records to be selected.
The GROUP BY
clause must appear right after the FROM
or WHERE
clause. Followed by the GROUP BY
clause is one column or a list of comma-separated columns. You can also put an expression in the GROUP BY
clause.
As mentioned in the previous paragraph, the GROUP BY
clause divides rows returned from the
SELECT
statement into groups. For each group, you can apply an aggregate function, for example, to calculate the sum of items or count the number of items in the groups.
Let's look at a GROUP BY
query example that uses the SUM
function (http://www.techonthenet.com/oracle/functions/sum.php). This example uses the SUM
function to return the name of the product and the total sales (for the product).
SELECT product, SUM(sale) AS "Total sales" FROM order_details GROUP BY product;
In the select statement, we have sales where we applied the SUM
function and the other field product is not part of SUM
, we must use in the GROUP BY
clause.
In the previous section, we discussed about GROUP BY
clause, however if you want to restrict the groups of returned rows, you can use HAVING
clause. The HAVING
clause is used to specify which individual group(s) is to be displayed, or in simple language we use the HAVING
clause in order to filter the groups on the basis of an aggregate function condition.
Note: The WHERE
clause cannot be used to return the desired groups. The WHERE
clause is only used to restrict individual rows. When the GROUP BY
clause is not used, the HAVING
clause works like the WHERE
clause.
The syntax for the PostgreSQL HAVING
clause is as follows:
SELECT expression1, expression2, ... expression_n, aggregate_function (expression) FROM tables WHERE conditions GROUP BY expression1, expression2, ... expression_n HAVING group_condition;
aggregate_function
can be a function such as SUM
, COUNT
, MIN
, MAX
, or AVG
.
expression1, expression2, ... expression_n
are expressions that are not encapsulated within an aggregate function and must be included in the GROUP BY
clause.
conditions
are the conditions used to restrict the groups of returned rows. Only those groups whose condition evaluates to true will be included in the result set.
Let's consider an example where you try to fetch the product that has sales>10000
:
SELECT product, SUM(sale) AS "Total sales" FROM order_details GROUP BY product Having sum(sales)>10000;
The PostgreSQL HAVING
clause will filter the results so that only the total sales greater than 10000
will be returned.
The PostgreSQL UPDATE
query is used to modify the existing records in a table. You can use the WHERE
clause with the UPDATE
query to update selected rows; otherwise, all the rows will be updated.
The basic syntax of the UPDATE
query with the WHERE
clause is as follows:
UPDATE table_name SET column1 = value1, column2 = value2...., columnN = valueN WHERE [condition];
You can combine n number of conditions using the AND
or OR
operators.
The following is an example that will update SALARY
for an employee whose ID
is 6
:
UPDATE employee SET SALARY = 15000 WHERE ID = 6;
This will update the salary to 15000
whose ID = 6
.
The LIMIT
clause is used to retrieve a number of rows from a larger data set. It helps fetch the top n records. The LIMIT
and OFFSET
clauses allow you to retrieve just a portion of the rows that are generated by the rest of the query from a result set:
SELECT select_list
FROM table_expression
[LIMIT { number | ALL }] [OFFSET number]
If a limit count is given, no more than that many rows will be returned (but possibly fewer, if the query itself yields fewer rows). LIMIT ALL
is the same as omitting the LIMIT
clause.
The OFFSET
clause suggests skipping many rows before beginning to return rows. OFFSET 0
is the same as omitting the OFFSET
clause. If both OFFSET
and LIMIT
appear, then the OFFSET
rows will be skipped before starting to count the LIMIT
rows that are returned.
A subquery is a query within a query. In other words, a subquery is a SQL query nested inside a larger query. It may occur in a SELECT
, FROM
, or WHERE
clause. In PostgreSQL, a subquery can be nested inside a SELECT
, INSERT
, UPDATE
, DELETE
, SET
, or DO
statement or inside another subquery. It is usually added within the WHERE
clause of another SQL SELECT
statement. You can use comparison operators, such as >
, <
, or =
. Comparison operators can also be a multiple-row operator, such as IN
, ANY
, SOME
, or ALL
. It can be treated as an inner query that is an SQL query placed as a part of another query called as outer query. The inner query is executed before its parent query so that the results of the inner query can be passed to the outer query.
The following statement illustrates the subquery syntax:
SELECT column list FROM table WHERE table.columnname expr_operator (SELECT column FROM table)
The query inside the brackets is called the inner query. The query that contains the subquery is called the outer query.
PostgreSQL executes the query that contains a subquery in the following sequence:
First, it executes the subquery
Second, it gets the results and passes it to the outer query
Third, it executes the outer query
Let's consider an example where you want to find employee_id
, first_name
, last_name
, and salary
for employees whose salary is higher than the average salary throughout the company.
We can do this in two steps:
First, find the average salary from the
employee
table.Then, use the answer in the second
SELECT
statement to find employees who have a highersalary
from the result (which is the average salary).
SELECT avg(salary) from employee; Result: 25000 SELECT employee_id,first_name,last_name,salary FROM employee WHERE salary > 25000;
This does seem rather inelegant. What we really want to do is pass the result of the first query straight into the second query without needing to remember it, and type it back for a second query.
The solution is to use a subquery. We put the first query in brackets, and use it as part of
a WHERE
clause to the second query, as follows:
SELECT employee_id,first_name,last_name,salary FROM employee WHERE salary > (Select avg(salary) from employee);
PostgreSQL runs the query in brackets first, that is, the average of salary. After getting the answer, it then runs the outer query, substituting the answer from the inner query, and tries to find the employees whose salary
is higher than the average.
Note
Note: A subquery that returns exactly one column value from one row is called a scalar subquery. The SELECT
query is executed and the single returned value is used in the surrounding value expression. It is an error to use a query that returns more than one row or column as a scalar subquery. If the subquery returns no rows during a particular execution, it is not an error, and the scalar result is taken to be null. The subquery can refer to variables from the surrounding query, which will act as constants during any one evaluation of the subquery.
In the previous section, we saw subqueries that only returned a single result because an aggregate function was used in the subquery. Subqueries can also return zero or more rows.
Subqueries that return multiple rows can be used with the ALL
, IN
, ANY
, or SOME
operators. We can also negate the condition like NOT IN
.
A subquery that references one or more columns from its containing SQL statement is called a correlated subquery. Unlike non-correlated subqueries that are executed exactly once prior to the execution of a containing statement, a correlated subquery is executed once for each candidate row in the intermediate result set of the containing query.
The following statement illustrates the syntax of a correlated subquery:
SELECT column1,column2,.. FROM table 1 outer WHERE column1 operator( SELECT column1 from table 2 WHERE column2=outer.column4)
The PostgreSQL runs will pass the value of column4
from the outer table to the inner query and will be compared to column2
of table 2
. Accordingly, column1
will be fetched from table 2
and depending on the operator it will be compared to column1
of the outer table. If the expression turned out to be true, the row will be passed; otherwise, it will not appear in the output.
But with the correlated queries you might see some performance issues. This is because of the fact that for every record of the outer query, the correlated subquery will be executed. The performance is completely dependent on the data involved. However, in order to make sure that the query works efficiently, we can use some temporary tables.
Let's try to find all the employees who earn more than the average salary in their department:
SELECT last_name, salary, department_id FROM employee outer WHERE salary > (SELECT AVG(salary) FROM employee WHERE department_id = outer.department_id);
For each row from the employee
table, the value of department_id
will be passed into the inner query (let's consider that the value of department_id
of the first row is 30
) and the inner query will try to find the average salary of that particular department_id = 30
. If the salary of that particular record will be more than the average salary of department_id = 30
, the expression will turn out to be true and the record will come in the output.
The PostgreSQL EXISTS
condition is used in combination with a subquery, and is considered to be met if the subquery returns at least one row. It can be used in a SELECT
, INSERT
, UPDATE
, or DELETE
statement. If a subquery returns any rows at all, the EXISTS
subquery is true, and the NOT EXISTS
subquery is false.
The syntax for the PostgreSQL EXISTS
condition is as follows:
WHERE EXISTS ( subquery );
The subquery
is a SELECT
statement that usually starts with SELECT *
rather than a list of expressions or column names. To increase performance, you could replace SELECT *
with SELECT 1
as the column result of the subquery is not relevant (only the rows returned matter).
Note
The SQL statements that use the EXISTS
condition in PostgreSQL are very inefficient as the subquery is re-run for every row in the outer query's table. There are more efficient ways, such as using joins to write most queries, that do not use the EXISTS
condition.
Let's look at the following example that is a SELECT
statement and uses the PostgreSQL EXISTS
condition:
SELECT * FROM products WHERE EXISTS (SELECT 1 FROM inventory WHERE products.product_id = inventory.product_id);
This PostgreSQL EXISTS
condition example will return all records from the products
table where there is at least one record in the inventory
table with the matching product_id
. We used SELECT 1
in the subquery to increase performance as the column result set is not relevant to the EXISTS
condition (only the existence of a returned row matters).
The PostgreSQL EXISTS
condition can also be combined with the NOT
operator, for example:
SELECT * FROM products WHERE NOT EXISTS (SELECT 1 FROM inventory WHERE products.product_id = inventory.product_id);
This PostgreSQL NOT EXISTS
example will return all records from the products
table where there are no records in the inventory
table for the given product_id
.
The PostgreSQL UNION
clause is used to combine the results of two or more SELECT
statements without returning any duplicate rows.
The basic rules to combine two or more queries using the UNION
join are as follows:
The number and order of columns of all queries must be the same
The data types of the columns on involving table in each query must be same or compatible
Usually, the returned column names are taken from the first query
By default, the UNION
join behaves like DISTINCT
, that is, eliminates the duplicate rows; however, using the ALL
keyword with the UNION
join returns all rows, including the duplicates, as shown in the following example:
SELECT <column_list> FROM table WHERE condition GROUP BY <column_list> [HAVING ] condition UNION SELECT <column_list> FROM table WHERE condition GROUP BY <column_list> [HAVING ] condition ORDER BY column list;
The queries are all executed independently, but their output is merged. The Union operator may place rows in the first query, before, after, or in between the rows in the result set of the second query. To sort the records in a combined result set, you can use ORDER BY
.
Let's consider an example where you combine the data of customers belonging to two different sites. The table structure of both the tables is the same, but they have data of the customers from two different sites:
SELECT customer_id,customer_name,location_id FROM customer_site1 UNION SELECT customer_id,customer_name,location_id FROM customer_site2 ORDER BY customer_name asc;
Both the SELECT
queries would run individually, combine the result set, remove the duplicates (as we are using UNION
), and sort the result set according to the condition, which is customer_name
in this case.
The tables we are joining don't have to be different ones. We can join a table with itself. This is called a self join. In this case, we will use aliases for the table; otherwise, PostgreSQL will not know which column of which table instance we mean. To join a table with itself means that each row of the table is combined with itself, and with every other row of the table. The self join can be viewed as a joining of two copies of the same table. The table is not actually copied but SQL carries out the command as though it were.
The syntax of the command to join a table with itself is almost the same as that of joining two different tables:
SELECT a.column_name, b.column_name... FROM table1 a, table1 b WHERE condition1 and/or condition2
To distinguish the column names from one another, aliases for the actual table names are used as both the tables have the same name. Table name aliases are defined in the FROM
clause of the SELECT
statement.
Let's consider an example where you want to find a list of employees and their supervisor. For this example, we will consider the Employee table that has the columns Employee_id, Employee_name, and Supervisor_id. The Supervisor_id contains nothing but the Employee_id of the person who the employee reports to.

In the following example, we will use the table Employee twice; and in order to do this, we will use the alias of the table:
SELECT a.emp_id AS "Emp_ID", a.emp_name AS "Employee Name", b.emp_id AS "Supervisor ID",b.emp_name AS "Supervisor Name" FROM employee a, employee b WHERE a.supervisor_id = b.emp_id;
For every record, it will compare the Supervisor_id to the Employee_id and the Employee_name to the supervisor name.
Another class of join is known as the OUTER JOIN
. In OUTER JOIN
, the results might contain both matched and unmatched rows. It is for this reason that beginners might find such joins a little confusing. However, the logic is really quite straightforward.
The following are the three types of Outer joins:
The PostgreSQL
LEFT OUTER JOIN
(or sometimes calledLEFT JOIN
)The PostgreSQL
RIGHT OUTER JOIN
(or sometimes calledRIGHT JOIN
)The PostgreSQL
FULL OUTER JOIN
(or sometimes calledFULL JOIN
)
Left outer join returns all rows from the left-hand table specified in the ON
condition, and only those rows from the other tables where the joined fields are equal (the join condition is met). If the condition is not met, the values of the columns in the second table are replaced by null values.

The syntax for the PostgreSQL LEFT OUTER JOIN
is:
SELECT columns FROM table1 LEFT OUTER JOIN table2 ON condition1, condition2
In the case of LEFT OUTER JOIN
, an inner join is performed first. Then, for each row in table1
that does not satisfy the join condition with any row in table2
, a joined row is added with null values in the columns of table2
. Thus, the joined table always has at least one row for each row in table1
.
Let's consider an example where you want to fetch the order
details placed by a customer. Now, there can be a scenario where a customer doesn't have any order placed that is open, and the order table contains only those orders that are open. In this case, we will use a left outer join to get information on all the customers and their corresponding orders:
SELECT customer.customer_id, customer.customer_name, orders.order_number FROM customer LEFT OUTER JOIN orders ON customer.customer_id = orders.customer_id
This LEFT OUTER JOIN
example will return all rows from the customer
table and only those rows from the orders
table where the join condition is met.
If a customer_id
value in the customer
table does not exist in the orders
table, all fields in the orders
table will display as <null>
in the result set.
Another type of join is called a PostgreSQL RIGHT OUTER JOIN
. This type of join returns all rows from the right-hand table specified in the ON
condition, and only those rows from the other table where the joined fields are equal (join condition is met). If the condition is not met, the value of the columns in the first table is replaced by null values.

The syntax for the PostgreSQL RIGHT OUTER JOIN
is as follows:
SELECT columns FROM table1 RIGHT OUTER JOIN table2 ON table1.column = table2.column; Condition1, condition2;
In the case of RIGHT OUTER JOIN
, an inner join is performed first. Then, for each row in table2
that does not satisfy the join condition with any row in table1
, a joined row is added with null values in the columns of table1
. This is the converse of a left join; the result table will always have a row for each row in table2
.
Let's consider an example where you want to fetch the invoice
information for the orders. Now, when an order is completed, we generate an invoice for the customer so that he can pay the amount. There can be a scenario where the order has not been completed, so the invoice is not generated yet. In this case, we will use a right outer to get all the orders
information and corresponding invoice
information.
SELECT invoice.invoice_id, invoice.invoice_date, orders.order_number FROM invoice RIGHT OUTER JOIN orders ON invoice.order_number= orders.order_number
This RIGHT OUTER JOIN
example will return all rows from the order
table and only those rows from the invoice
table where the joined fields are equal. If an order_number
value in the invoice
table does not exist, all the fields in the invoice
table will display as <null>
in the result set.
Another type of join is called a PostgreSQL FULL OUTER JOIN
. This type of join returns all rows from the left-hand table and right-hand table with nulls in place where the join condition is not met.

The syntax for the PostgreSQL FULL OUTER JOIN
is as follows:
SELECT columns FROM table1 FULL OUTER JOIN table2 ON table1.column = table2.column; Condition1,condition2;
First, an inner join is performed. Then, for each row in table1
that does not satisfy the join condition with any row in table2
, a joined row is added with null values in the columns of table2
. Also, for each row of table2
that does not satisfy the join condition with any row in table1
, a joined row with null values in the columns of table1
is added.
Let's consider an example where you want to fetch an invoice
information and all the orders
information. In this case, we will use a full outer to get all the orders
information and the corresponding invoice
information.
SELECT invoice.invoice_id, invoice.invoice_date, orders.order_number FROM invoice FULL OUTER JOIN orders ON invoice.order_number= orders.order_number;
This FULL OUTER JOIN
example will return all rows from the invoice
table and the orders
table and, whenever the join condition is not met, <null>
will be extended to those fields in the result set.
If an order_number
value in the invoice
table does not exist in the orders
table, all the fields in the orders
table will display as <null>
in the result set. If order number in order's table does not exist in the invoice
table, all fields in the invoice
table will display as <null>
in the result set.
After reading this chapter, you will be familiar with advanced concepts of PostgreSQL. We talked about views and materialized views, which are really significant. We also talked about cursors that help run a few rows at a time rather than full query at once. This helps avoid memory overrun when results contain a large number of rows. Another usage is to return a reference to a cursor that a function has created and allow the caller to read the rows. In addition to these, we discussed the aggregation concept by using the GROUP BY
clause, which is really important for calculations. Another topic that we discussed in this chapter is subquery, which is a powerful feature of PostgreSQL. However, subqueries that contain an outer reference can be very inefficient. In many instances, these queries can be rewritten to remove the outer reference, which can improve performance. Other than that, the concept we covered is join, along with self, union, and outer join; these are really helpful when we need data from multiple tables. In the next chapter, we will discuss conversion between the data types and how to deal with arrays. Also we will talk about some complex data types, such as JSON and XML.