Oracle GoldenGate technology overview
Let's take a look at GoldenGate's fundamental building blocks; the capture process, trail files, data pump, server collector, and apply processes. In fact, the order in which the processes are listed depicts the sequence of events for GoldenGate data replication across the distributed systems. A Manager process runs on both the source and the target systems that oversees the processing and transmission of data.
All the individual processes are modular and can be easily decoupled or combined to provide the best solution to meet the business requirements. It is normal practice to configure multiple capture and apply processes to balance the load and enhance the performance. You can read more about this in Chapter 9, Performance Tuning.
The filtering and transformation of data can be done either at the source by the capture process or the target by the apply process. This is achieved through parameter files, which is explained in detail in Chapter 4, Configuring Oracle GoldenGate.
Extract – the capture process
Oracle GoldenGate's capture process, known as Extract, obtains the necessary data from the databases' transaction logs. For Oracle, these are the online redo logs that contain all the data changes made in the database. Depending on the requirements, GoldenGate does not require access to the source database and only extracts committed transactions from the online redo logs. It can, however, read archived redo logs to extract data from long-running transactions as well as access the database to support features such as compression (but more about these later in the book).
The Extract process will regularly checkpoint its read and write position, typically to a file. The checkpoint data ensures GoldenGate can recover its processes without data loss in the case of failure.
The Extract process can have one of the following statuses:
STOPPED
STARTING
RUNNING
ABENDED
The ABENDED status stems back to the Tandem computer, where processes either stop (end normally) or abend (end abnormally). Abend is short for abnormal end.
Since Oracle GoldenGate 11gR2, the capture process can be configured in three different modes:
We will learn more about these different capture modes and how to configure them later in the book.
To replicate transactional data efficiently from one database to another, Oracle GoldenGate converts the captured data into a canonical format, which is written to trail files both on the source and the target system. The provision of source and target trail files in the GoldenGate architecture eliminates any single point of failure and ensures data integrity is maintained. A dedicated checkpoint process keeps track of the data being written to the trails on both, the source and target for fault tolerance.
It is possible to configure GoldenGate to not use trail files on the source system and write data directly from the database's redo logs to the target server data collector. In this case, the Extract process sends data in large blocks across a TCP / IP network to the target system. However, this configuration is not recommended due to the possibility of data loss occurring during unplanned system or network outages. Oracle best practice states that the use of local trail files would provide a history of transactions and support the recovery of data for retransmission via a data pump.
While using trail files on the source system, known as a local trail files, GoldenGate requires an additional Extract process called data pump that sends data in large blocks across a TCP / IP network to the target system. As previously stated, this is the best practice and it should be adopted for all Extract configurations.
The server collector process runs on the target system and accepts data from the source (Extract/data pump). Its job is to reassemble the data and write it to a GoldenGate trail file, known as a remote trail. It also handles the decryption of received data when configured.
Replicat – the apply process
The apply process, known in GoldenGate as Replicat, is the final step in the data delivery. It reads the trail file and applies it to the target database in the form of DML (deletes, updates, and inserts) or DDL (database structural changes). This can be concurrent with the data capture or performed later.
The Replicat process will regularly checkpoint its read and write position, typically to a database table. The checkpoint data ensures that GoldenGate recovers its processes without data loss in the case of failure.
The Replicat process can have one of the following statuses:
STOPPED
STARTING
RUNNING
ABENDED
DDL is only supported in unidirectional configurations and non-heterogeneous (Oracle to Oracle) environments.
Oracle GoldenGate 12c now supports three Replicat configuration modes:
Classic Replicat
Coordinated Replicat
Integrated Replicat
We will learn more about these later in the book.
The Manager process runs on both source and target systems. Its job is to control activities such as starting, stopping, monitoring, and restarting processes; allocating data storage; and reporting errors and events. The Manager process must exist in any GoldenGate implementation. However, there can be only one Manager process per changed data capture (CDC) configuration on the source and target.
The Manager process can have either of the following statuses:
As included in the previous releases, Oracle GoldenGate 12c ships with its own command-line interface known as GoldenGate Software Command Interpreter (GGSCI). This tool provides the administrator with a comprehensive array of commands to create, configure, and monitor all GoldenGate processes. You will become very familiar with GGSCI as we continue through this book.
Oracle GoldenGate 12c is command-line-driven. However, there is a product called Oracle GoldenGate Director that provides a GUI for configuration. Oracle Enterprise Manger 12c Cloud Control offers monitoring functionality and basic administration through GoldenGate modules.
The following diagram illustrates the GoldenGate processes and their dependencies. The arrows depict replicated data flow (committed transactions) including checkpoint data and configuration data. The Extract and Replicat processes periodically checkpoint to a file for persistence. The parameter file provides the configuration data. As described in the previous paragraphs, two options exist to send data from the source to the target. These are shown as broken arrows in the process flow:
Having discovered all the processes required for GoldenGate to replicate data, let's now dive a little deeper into the architecture and its configurations.