PostgreSQL in Action

Achim Vannahme

September 2015

In this article by Salahadin Juba, Achim Vannahme, and Andrey Volkov, authors of the book Learning PostgreSQL, we will discuss PostgreSQL (pronounced Post-Gres-Q-L) or Postgres is an open source, object-relational database management system. It emphasizes extensibility, creativity, and compatibility. It competes with major relational database vendors, such as Oracle, MySQL, SQL servers, and others. It is used by different sectors, including government agencies and the public and private sectors. It is cross-platform and runs on most modern operating systems, including Windows, Mac, and Linux flavors. It conforms to SQL standards and it is ACID complaint.

(For more resources related to this topic, see here.)

An overview of PostgreSQL

PostgreSQL has many rich features. It provides enterprise-level services, including performance and scalability. It has a very supportive community and very good documentation.

The history of PostgreSQL

The name PostgreSQL comes from post-Ingres database. the history of PostgreSQL can be summarized as follows:

  • Academia: University of California at Berkeley (UC Berkeley)
    • 1977-1985, Ingres project: Michael Stonebraker created RDBMS according to the formal relational model
    • 1986-1994, postgres: Michael Stonebraker created postgres in order to support complex data types and the object-relational model.
    • 1995, Postgres95: Andrew Yu and Jolly Chen changed postgres to postgres query language (P) with an extended subset of SQL.
  • Industry
    • 1996, PostgreSQL: Several developers dedicated a lot of labor and time to stabilize Postgres95. The first open source version was released on January 29, 1997. With the introduction of new features, and enhancements, and at the start of open source projects, the Postgres95 name was changed to PostgreSQL.

PostgreSQL began at version 6, with a very strong starting point by taking advantage of several years of research and development. Being an open source with a very good reputation, PostgreSQL has attracted hundreds of developers. Currently, PostgreSQL has innumerable extensions and a very active community.

Advantages of PostgreSQL

PostgreSQL provides many features that attract developers, administrators, architects, and companies.

Business advantages of PostgreSQL

PostgreSQL is free, open source software (OSS); it has been released under the PostgreSQL license, which is similar to the BSD and MIT licenses. The PostgreSQL license is highly permissive, and PostgreSQL is not a subject to monopoly and acquisition. This gives the company the following advantages.

  • There is no associated licensing cost to PostgreSQL.
  • The number of deployments of PostgreSQL is unlimited.
  • A more profitable business model.
  • PostgreSQL is SQL standards compliant. Thus finding professional developers is not very difficult. PostgreSQL is easy to learn and porting code from one database vendor to PostgreSQL is cost efficient. Also, PostgreSQL administrative tasks are easy to automate. Thus, the staffing cost is significantly reduced.
  • PostgreSQL is cross-platform, and it has drivers for all modern programming languages; so, there is no need to change the company policy about the software stack in order to use PostgreSQL.
  • PostgreSQL is scalable and it has a high performance.
  • PostgreSQL is very reliable; it rarely crashes. Also, PostgreSQL is ACID compliant, which means that it can tolerate some hardware failure. In addition to that, it can be configured and installed as a cluster to ensure high availability (HA).

User advantages of PostgreSQL

PostgreSQL is very attractive for developers, administrators, and architects; it has rich features that enable developers to perform tasks in an agile way. The following are some attractive features for the developer:

  • There is a new release almost each year; until now, starting from Postgres95, there have been 23 major releases.
  • Very good documentation and an active community enable developers to find and solve problems quickly. The PostgreSQL manual is over than 2,500 pages in length.
  • A rich extension repository enables developers to focus on the business logic. Also, it enables developers to meet requirement changes easily.
  • The source code is available free of charge, it can be customized and extended without a huge effort.
  • Rich clients and administrative tools enable developers to perform routine tasks, such as describing database objects, exporting and importing data, and dumping and restoring databases, very quickly.
  • Database administration tasks do not requires a lot of time and can be automated.
  • PostgreSQL can be integrated easily with other database management systems, giving software architecture good flexibility in putting software designs.

Applications of PostgreSQL

PostgreSQL can be used for a variety of applications. The main PostgreSQL application domains can be classified into two categories:

  • Online transactional processing (OLTP): OLTP is characterized by a large number of CRUD operations, very fast processing of operations, and maintaining data integrity in a multiaccess environment. The performance is measured in the number of transactions per second.
  • Online analytical processing (OLAP): OLAP is characterized by a small number of requests, complex queries that involve data aggregation, and a huge amount of data from different sources, with different formats and data mining and historical data analysis.

OLTP is used to model business operations, such as customer relationship management (CRM). OLAP applications are used for business intelligence, decision support, reporting, and planning. An OLTP database size is relatively small compared to an OLAP database. OLTP normally follows the relational model concepts, such as normalization when designing the database, while OLAP is less relational and the schema is often star shaped.

Unlike OLTP, the main operation of OLAP is data retrieval. OLAP data is often generated by a process called Extract, Transform and Load (ETL). ETL is used to load data into the OLAP database from different data sources and different formats.

PostgreSQL can be used out of the box for OLTP applications. For OLAP, there are many extensions and tools to support it, such as the PostgreSQL COPY command and Foreign Data Wrappers (FDW).

Success stories

PostgreSQL is used in many application domains, including communication, media, geographical, and e-commerce applications. Many companies provide consultation as well as commercial services, such as migrating proprietary RDBMS to PostgreSQL in order to cut off licensing costs. These companies often influence and enhance PostgreSQL by developing and submitting new features.

The following are a few companies that use PostgreSQL:

  • Skype uses PostgreSQL to store user chats and activities. Skype has also affected PostgreSQL by developing many tools called Skytools.
  • Instagram is a social networking service that enables its user to share pictures and photos. Instagram has more than 100 million active users.
  • The American Chemical Society (ACS): More than one terabyte of data for their journal archive is stored using PostgreSQL.

In addition to the preceding list of companies, PostgreSQL is used by HP, VMware, and Heroku. PostgreSQL is used by many scientific communities and organizations, such as NASA, due to its extensibility and rich data types.

Forks

There are more than 20 PostgreSQL forks; PostgreSQL extensible APIs makes postgres a great candidate to fork. Over years, many groups have forked PostgreSQL and contributed their findings to PostgreSQL. The following is a list of popular PostgreSQL forks:

  • HadoopDB is a hybrid between the PostgreSQL, RDBMS, and MapReduce technologies to target analytical workload.
  • Greenplum is a proprietary DBMS that was built on the foundation of PostgreSQL. It utilizes the shared-nothing and massively parallel processing (MPP) architectures. It is used as a data warehouse and for analytical workloads.
  • The EnterpriseDB advanced server is a proprietary DBMS that provides Oracle capabilities to cap Oracle fees.
  • Postgres-XC (eXtensible Cluster) is a multi-master PostgreSQL cluster based on the shared-nothing architecture. It emphasis write-scalability and provides the same APIs to applications that PostgreSQL provides.
  • Vertica is a column-oriented database system, which was started by Michael Stonebraker in 2005 and acquisitioned by HP in 2011. Vertica reused the SQL parser, semantic analyzer, and standard SQL rewrites from the PostgreSQL implementation.
  • Netzza is a popular data warehouse appliance solution that was started as a PostgreSQL fork.
  • Amazon Redshift is a popular data warehouse management system based on PostgreSQL 8.0.2. It is mainly designed for OLAP applications.

The PostgreSQL architecture

PostgreSQL uses the client/server model; the client and server programs could be on different hosts. The communication between the client and server is normally done via TCP/IP protocols or Linux sockets. PostgreSQL can handle multiple connections from a client. A common PostgreSQL program consists of the following operating system processes:

  • Client process or program (frontend): The database frontend application performs a database action. The frontend could be a web server that wants to display a web page or a command-line tool to perform maintenance tasks. PostgreSQL provides frontend tools, such as psql, createdb, dropdb, and createuser.
  • Server process (backend): The server process manages database files, accepts connections from client applications, and performs actions on behalf of the client; the server process name is postgres.

PostgreSQL forks a new process for each new connection; thus, the client and server processes communicate with each other without the intervention of the server main process (postgres), and they have a certain lifetime determined by accepting and terminating a client connection.

The abstract architecture of PostgreSQL

The aforementioned abstract, conceptual PostgreSQL architecture can give an overview of PostgreSQL's capabilities and interactions with the client as well as the operating system. The PostgreSQL server can be divided roughly into four subsystems as follows:

  • Process manager: The process manager manages client connections, such as the forking and terminating processes.
  • Query processor: When a client sends a query to PostgreSQL, the query is parsed by the parser, and then the traffic cop determines the query type. A Utility query is passed to the utilities subsystem. The Select, insert, update, and delete queries are rewritten by the rewriter, and then an execution plan is generated by the planner; finally, the query is executed, and the result is returned to the client.
  • Utilities: The utilities subsystem provides the means to maintain the database, such as claiming storage, updating statistics, exporting and importing data with a certain format, and logging.
  • Storage manager: The storage manager handles the memory cache, disk buffers, and storage allocation.

Almost all PostgreSQL components can be configured, including the logger, planner, statistical analyzer, and storage manager. PostgreSQL configuration is governed by the application nature, such as OLAP and OLTP. The following diagram shows the PostgreSQL abstract, conceptual architecture:

 Learning PostgreSQL

PostgreSQL's abstract, conceptual architecture

The PostgreSQL community

PostgreSQL has a very cooperative, active, and organized community. In the last 8 years, the PostgreSQL community published eight major releases. Announcements are brought to developers via the PostgreSQL weekly newsletter. There are dozens of mailing lists organized into categories, such as users, developers, and associations. Examples of user mailing lists are pgsql-general, psql-doc, and psql-bugs. pgsql-general is a very important mailing list for beginners. All non-bug-related questions about PostgreSQL installation, tuning, basic administration, PostgreSQL features, and general discussions are submitted to this list.

The PostgreSQL community runs a blog aggregation service called Planet PostgreSQL—https://planet.postgresql.org/. Several PostgreSQL developers and companies use this service to share their experience and knowledge.

Summary

PostgreSQL is an open source, object-oriented relational database system. It supports many advanced features and complies with the ANSI-SQL:2008 standard. It has won industry recognition and user appreciation. The PostgreSQL slogan "The world's most advanced open source database" reflects the sophistication of the PostgreSQL features. PostgreSQL is a result of many years of research and collaboration between academia and industry. Companies in their infancy often favor PostgreSQL due to licensing costs. PostgreSQL can aid profitable business models. PostgreSQL is also favoured by many developers because of its capabilities and advantages.

Resources for Article:


Further resources on this subject:


You've been reading an excerpt of:

Learning PostgreSQL

Explore Title
comments powered by Disqus