Your message has been sent.
This article has been saved to your account.
Go to my account
This article has been emailed to your Kindle.
Send this article
In the world of DB2 replication, we have two main options—SQL replication and Q replication, both of which involve replicating between source and target tables. The choice of replication solution depends on a number of factors, of which the fundamental ones are:
- Type of source
- Type of target
- Operating system support
In this article by Pav Kumar-Chatterjee, author of IBM InfoSphere Replication Server and Data Event Publisher, we look at the different types of replication available, namely the base replication methods of unidirectional, bidirectional, and peer-to-peer.
|Read more about this book|
(For more resources on IBM, see here.)
There are four basic types of Q replication:
- Event Publishing
Replicating to a stored procedure or a Consistent Change Data (CCD) table are a subset of unidirectional replication. Let's look at each of these in more detail.
In unidirectional replication, we can replicate all of the rows and columns of a source table or we can just replicate a subset of the rows and columns. We cannot really perform any transformation on this data. If we want to perform some sort of transformation, then we would need to replicate to a stored procedure.
Replicating to a stored procedure
Stored procedure replication is a subset of unidirectional replication in which the target is not a table as such, but a stored procedure, as shown in the following diagram:
A stored procedure can transform the data and output the results to a target table. This target table is not known to Q Apply. These stored procedures can be written in SQL, C, or Java.
Prior to DB2 9.7 the source table and the stored procedure must have the same name, and the target table name can be any name we like.
In bidirectional replication, we replicate copies of tables between two servers, each of which has a copy of the table. Note that we can only set up bidirectional replication between two servers. Unlike unidirectional replication, where we can replicate a subset of rows and columns, this is not possible in bidirectional replication. The tables on both servers can have different names, but must have the same number of rows and columns. The columns must have identical column names of compatible data types. It is not possible to do any data transformation using this type of replication.
Because we are updating records on both servers, it is possible that the same record will be updated at the same time on both servers.
Although Q replication provides a conflict detection mechanism, we strongly advise that the driving application should be written or modified in such a way that such conflicts be avoided. The conflict detection provided by Q replication should be treated as a safety net and not the primary conflict resolution mechanism.
This mechanism allows us to choose which data values are used to detect conflicts (key column values only, changed column values, or all column values) and which server should win if such a conflict is detected. The row in the losing system is rolled back and the record is written to the IBMQSNAP_EXCEPTIONS table for review.
One of the related subjects to conflict detection is the concept of which server takes precedence in a conflict, or to put it more bluntly, which server is the master and which is the slave! If there is a conflict, then whichever server takes precedence will not apply changes from the other server. This ensures that the servers remain in sync. There is a more egalitarian option, which is that no server takes precedence. In this situation, rows are applied irrespective of whether or not there is a conflict, which ultimately leads to a divergence of the contents of the databases, which is not good!
There are two types of bidirectional replication—the first type is where we have an active/passive setup and the second type is where we have an active/active setup. The type of replication you choose will have implications on which server is defined as the master and which as the slave and what to do if a Q subscription is inadvertently inactivated.
In an active/passive setup, the passive server should be made the master. In an active/active setup, the choice of which system is the master is a decision you have to make.
Peer-to-peer replication allows us to replicate data between two or more servers. This is different from bidirectional replication, which is only between two servers. Each server has a copy of the table (which can have a different schema and name), but must have the same number of rows and columns and these columns must have identical column names and compatible data types. It is not possible to do any data transformation using this type of replication.
In peer-to-peer replication, there is no such thing as a master or slave server—each server will have the most recent copy of the table—eventually! What this means is that there will be a slight delay between the first server having a copy of the table and the last server having that copy. This is an asynchronous process, so at any one time the tables might be different, but once applications stop updating them, then the tables will converge to the most recently updated value. This type of processing means that there isn't any "manual" conflict detection as such (it is handled automatically by Q Apply), because the latest update will always win.
If two applications update the same record at exactly the same time, then Q replication uses the server number allocated when the peer-to-peer environment was set up to determine the winner. This type of processing means that two columns are added to each of the tables in the Q replication environment, where the first column is a timestamp of when the row was last updated (GMT) and the second column is the machine number. These updates are performed through triggers on the tables.
eBook Price: $35.99
Book Price: $59.99
|Read more about this book|
(For more resources on IBM, see here.)
Tree replication comes in two flavors: bidirectional, which we call B-Tree replication, and unidirectional, which we call U-Tree replication.
A variation on the bidirectional replication theme is that it would be nice to be able to replicate from one master to two slaves in a bidirectional manner. This requirement was addressed with B-Tree replication.
A B-Tree replication structure is one, which looks like a tree (as shown in the preceding diagram). DB2A replicates with DB2B, DB2C, and DB2<n> in a bidirectional manner, but DB2B, DB2C, and DB2<n> do not replicate directly with each other—they have to replicate through DB2A, which is what differentiates B-Tree replication from peer-to-peer replication.
For B-Tree replication, we can replicate between one master and many slaves in a bidirectional manner. In SQL replication terms, this was called Update Anywhere. Note that it is not possible to set up B-Tree replication using the Replication Center—we need to use ASNCLP commands.
We can also replicate from one source to many targets in a unidirectional scenario, which we call a U-Tree scenario. In the preceding figure, DB2A replicates with DB2B, DB2C, and DB2<n> in a unidirectional manner (we can have more than three targets). Note, there is no radio button in the Replication Center to set up unidirectional U-Tree replication. What we have to do is set up unidirectional replication from DB2A to DB2B, and then for DB2A to DB2C, and so on. It is easier to use ASNCLP commands.
Replicating to a Consistent Change Data table
Let's first look at the definition of Consistent Change Data (CCD) replication. CCD table replication is a subset of unidirectional replication, in which the target is a CCD table, which contains a row for each insert/delete/update that occurs on the source table. These CCD tables can be complete and/or condensed.
There are three main uses of CCD table replication:
- To populate an operational data store
- To keep a history of changes made to the source table for audit purposes
- To enable multi-target update
Consider the situation where we want to populate an Operational Data Store (ODS) with data from our live system. We want to replicate all operations apart from delete operations. Before the introduction of CCD tables, our only option was to use a stored procedure. One of the parameters that the Q Apply program passes to a stored procedure is the operation (insert, delete, and so on) that occurred on the source system.
We can use CCD tables to keep a history of changes made to a table, or as a feed to InfoSphere DataStage.
The multi-target update scenario uses Q replication to populate the CCD table and then uses SQL Replication to populate the multiple target tables as shown in the following diagram:
We can only specify that a target table be a CCD table in a unidirectional setup. A CCD target table is made up of the following columns (only four of the metadata columns are compulsory, the other four are optional, and the order of the columns does not matter):
|<user key columns>
<user nonkey columns>
<user computed columns>
|Compulsory auditing columns|
|Optional auditing columns|
The IBMSNAP_INTENTSEQ column is a sequence number that uniquely identifies a change and is ascending within a transaction. The IBMSNAP_OPERATION column is a flag that indicates the type of operation for a record. The IBMSNAP_COMMITSEQ column is a sequence number that provides transactional order. The IBMSNAP_LOGMARKER column is the time that the data were committed.
The IBMSNAP_AUTHID column is the authorization ID that is associated with the transaction. This column is for used for Linux, UNIX, Windows, and z/OS. For z/OS, this is the primary authorization ID. The IBMSNAP_AUTHTKN column is the authorization token that is associated with the transaction. This column is for z/OS only and is the correlation ID—it will be NULL for Linux, UNIX, and Windows. The IBMSNAP_PLANID column is the plan name that is associated with the transaction. This column is for z/OS only—it will be NULL for Linux, UNIX, and Windows. And finally, the IBMSNAP_UOWID column is the unit-of-work identifier from the log record for this unit of work. This column is used for Linux, UNIX, Windows, and z/OS.
The <user computed columns> columns will be the before image columns and must be NULLABLE (because there is no before image when we insert a row, the before image is NULL).
Now let's look at what data is stored in a CCD table. With CCD tables we can specify that the target table is: COMPLETE or NONCOMPLETE and CONDENSED or NONCONSENSED. These are interpreted as follows:
- Complete (COMPLETE=Y): A complete CCD table contains every row of interest from the source table and is initialized with a full set of source data. All target table loading options are valid for complete CCDs (automatic, manual, or no load).
- Noncomplete (COMPLETE=N): A noncomplete CCD table contains only changes to the source table and starts with no data. A noncomplete CCD table records all UPDATE operations at the source. The only valid load option for noncomplete CCD tables is no load.
- Condensed (CONDENSED=Y): A condensed CCD table contains one row for every key value in the source table and contains only the latest value for the row. For condensed CCD tables, a primary key is required, which is used in case of an update conflict. If such a conflict occurs, all the source columns are forced into the row (CONFLICT_ACTION=F).
- Noncondensed (CONDENSED=N): A noncondensed CCD table contains multiple rows with the same key value, one row for every UPDATE, INSERT, or DELETE operation at the source table. When added to the CCD table, all of the rows become INSERT operations. No primary key is required.
The options for handling unexpected conditions at the target are limited for CCD tables:
For condensed and complete, two choices are available:
- Force the source change into the target table (CONFLICT_ACTION=F)
- Ignore the condition and continue (CONFLICT_ACTION=I)
For any combination other than condensed and complete, the only valid option is to force the change into the target table.
For all CCD table types, the only valid conflict rule is to check only key columns (CONFLICT_RULE=K).
Before we move on, let's quickly look at CCD tables in an SQL Replication environment. In SQL Replication, there is the concept of internal and external CCD tables, which does not exist in Q replication. In SQL Replication terminology, all Q replication CCD tables are external CDD tables.
The Event Publishing functionality captures changes to source tables and converts committed transactional data to messages in an XML format. Each message can contain an entire transaction or only a row-level change. These messages are put on WebSphere MQ message queues and read by a message broker or other applications. We can publish subsets of columns and rows from source tables so that we publish only the data that we need.
In this article we took a look at the different types of Q Replication.
- Lotus Notes Domino 8: Upgrader's Guide [Book]
- Q Replication Components in IBM Replication Server [Article]
- IBM WebSphere MQ commands [Article]
- WebSphere MQ Sample Programs [Article]
- Q Subscription Maintenance in IBM Infosphere [Article]
eBook Price: $35.99
Book Price: $59.99
About the Author :
Pav Kumar-Chatterjee (Eur Ing, CENG, MBCS) has been involved in DB2 support on the mainframe platform since 1991, and on midrange platforms since 2000. Before joining IBM he worked as a database administrator in the airline industry as well as various financial institutions in the UK and Europe. He has held various positions during his time at IBM, including in the Software Business Services team and the global BetaWorks organization. His current position is a DB2 technical specialist in the Software Business. He was involved with Information Integrator (the forerunner of Replication Server) since its inception, and has helped numerous customers design and implement Q replication solutions, as well as speaking about Q replication at various conferences.
Pav Kumar-Chatterjee has co-authored the “DB2 pureXML Cookbook” (978-0-13-815047-1) published in August 2009.