IBM InfoSphere Replication Server and Data Event Publisher

By Pav Kumar-Chatterjee
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Q Replication Overview

About this book

Business planning is no longer just about defining goals, analyzing critical issues, and then creating strategies. You must aid business integration by linking changed-data events in DB2 databases on Linux, UNIX, and Windows with EAI solutions , message brokers, data transformation tools, and more. Investing in this book will save you many hours of work (and heartache) as it guides you around the many potential pitfalls to a successful conclusion.

This book will accompany you throughout your Q replication journey. Compiled from many of author's successful projects, the book will bring you some of the best practices to implement your project smoothly and within time scales. The book has in-depth coverage of Event Publisher, which publishes changed-data events that can run updated data into crucial applications, assisting your business integration processes. Event Publisher also eliminates the hand coding typically required to detect DB2 data changes that are made by operational applications.

We start with a brief discussion on what replication is and the Q replication release currently available in the market. We then go on to explore the world of Q replication in more depth. The latter chapters cover all the Q replication components and then talk about the different layers that need to be implemented—the DB2 database layer, the WebSphere MQ layer, and the Q replication layer. We conclude with a chapter on how to troubleshoot a problem. The Appendix (available online) demonstrates the implementation of 13 Q replication scenarios with step-by-step instructions.

Publication date:
August 2010
Publisher
Packt
Pages
344
ISBN
9781849681544

 

Chapter 1. Q Replication Overview

Welcome to the start of your journey along the Q replication road. Any journey can be a bumpy ride, but after reading this book and going through the numerous examples, your journey will be a smoother one! In this first chapter, we will take you through the following discussion points:

  • Why we want to replicate data.

  • What is available today in the IBM world of data replication.

  • The toolsets available to set up and administer a replication environment and look at the code that we need to install for a functioning Q replication solution.

  • Introduce the architecture of Q replication. We look at the different types of replication available, namely the base replication methods of unidirectional, bidirectional, and peer-to-peer, and the replication architectures built on these base methods.

  • Replicating XML data types and compressed tables. We look at some of the design points when considering replicating compressed table.

  • Q replication conflict detection.

  • Available transformation processing for both regular and XML data.

 

Why do we want to replicate data


Much has been written about why we need to replicate data, so we will keep this short. What's wrong with just storing our data in one place? Well, in today's 24x7 world where being without data for even a short period of time could be catastrophic to our business, we need a method to be able to take a copy of our data and possibly more than one copy and store it securely in a different location. This copy should be complete and be stored as many miles away as possible. Also the amount of data that has to be stored is ever increasing and being generated at a fast rate, so our method needs to be able to handle large volumes of data very quickly.

Overview of what is available today

In the IBM software world today, there are a number of options available to replicate data:

  • InfoSphere (formerly WebSphere) Replication Server

  • InfoSphere CDC (formerly the Data Mirror suite of programs)

  • The DB2 High Availability Disaster Recovery (HADR) functionality

  • Traditional log shipping

In this book, we will cover the first option InfoSphere Replication Server, which from now on, we will refer to as DB2 replication. The other options are outside the scope of this book.

The different replication options

In the world of DB2 replication, we have two main options—SQL replication and Q replication, both of which involve replicating between source and target tables. Event publishing is a subset of Q replication, in that the target is not a table but a WebSphere MQ queue. The choice of replication solution depends on a number of factors, of which the fundamental ones are:

  • Type of source

  • Type of target

  • Operating system support

The DB2 Information Center contains a table, which compares the three types of replication. This table can be used as a quick checklist for determining the best solution to a given business requirement (http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.swg.im.iis.db.repl.intro.doc/topics/iiyrcintrsbsc.html).

The following figure shows the basic operations of SQL replication:

Updates to any tables are logged, and if the table is a registered table (TAB1 and TAB2), then the SQL Capture program (Capture for short) reads the information from the DB2 log and inserts the row into a change data table (CD_<table-name>)—there is one of these for each registered source table. The SQL Apply program (Apply for short) reads from these change data tables and updates the target tables (TAB3 and TAB4).

In Q replication, we do not have the concept of change data tables, as shown in the following figure:

Any updates to registered tables, which the Q Capture program (Q Capture for short) detects are put onto a WebSphere MQ queue. The Q Apply program (Q Apply for short) then reads from these queues and updates the target tables.

In Event Publishing, there is no Q Apply and no target tables as shown in the following diagram:

Q Capture puts messages into the WebSphere MQ queues, and it is up to the consuming applications to destructively/non-destructively read from these queues.

Replication toolset

We have three ways of administering a replication environment. We can use:

  • The Replication Center GUI

  • The ASNCLP command interface

  • Standard SQL

We recommend that when you are new to replication, you should use the Replication Center and once you are confident with the process, you can then progress onto the ASNCLP interface. For defining production systems, we recommend using the ASNCLP interface, because the commands can be scripted.

The ASNCLP interface generates SQL, which is run against the appropriate control tables to define and administer the replication environment. Therefore, in theory, it is possible for us to write our own SQL to do this. However, the SQL can be complicated and manual coding could result in errors, and therefore we recommend not using this method.

The Replication Center GUI

The Replication Center GUI can be used to set up and administer a Q replication scenario. See Chapter 6, Administration Tasks, for details on accessing and using the Replication Center. The launchpad screen is shown next.

The Replication Center has a series of wizards, which are very useful if we are new to replication. The default wizard screen is the launchpad screen, and it can be accessed from the main replication screen through Replication Center | Launchpad. The wizards take us through all the steps necessary to set up a replication environment.

Using the Replication Center, it is possible to generate an SQL script for a particular function. The ability for the Replication Center to generate ASNCLP scripts in addition to SQL scripts is being planned for a future release.

The ASNCLP command interface

The ASNCLP interface (discussed in detail in Chapter 5, The ASNCLP Command Interface) allows us to enter commands from the command line, and more importantly, allows us to combine various commands into a script file, which can then be run from the command line. In this book, we will focus on ASNCLP scripts. It is supported on the Linux, UNIX, and Windows platforms. It is also supported on z/OS natively, through USS. We can also administer replication on z/OS from Linux, UNIX, and Windows system if we catalog the z/OS databases on the Linux, UNIX, and Windows system.

In the next section, we will look at the constituent components of Q replication.

 

Q replication constituent components


We like to think that the Q replication architecture is made up of three interconnected layers: the DB2 database layer, the WebSphere MQ layer, and finally the Q replication layer—each layer needs to be set up and tested before we move on to the next layer. An overview of the Q replication process is shown in the following diagram:

The basic operating architecture of a Q replication system involves:

  • An application processing a row in a table and DB2 logging this operation

  • A Q Capture program calling the DB2 log reader to "process" the DB2 log and place rows that have been committed and that it wants to replicate onto a WebSphere MQ queue

  • A Q Apply program "reading" from this queue and applying the row to the target table

In this setup, we have two components that need to be installed—the Q replication code and the WebSphere MQ code. We will discuss the installation of both of these in some detail.

With the current packaging, the Q replication code for homogeneous replication already comes bundled with the base code for DB2—all we have to install is a replication license key. The license for InfoSphere Replication Server is called isrs.lic and for InfoSphere Data Event Publisher the license is called isep.lic. Use the DB2 db2licm command to install the license key and to confirm that the license key has been successfully applied.

Turning to the WebSphere MQ component, we can use either WebSphere MQ V6 or V7 with DB2 replication.

The WebSphere MQ V6 Information Center can be found at http://publib.boulder.ibm.com/infocenter/wmqv6/v6r0/index.jsp.

The WebSphere MQ V7 Information Center can be found at http://publib.boulder.ibm.com/infocenter/wmqv7/v7r0/index.jsp.

For the procedure to install WebSphere MQ, consult the WebSphere MQ Information Center and search for install server. As an example, we will now take you through installing WebSphere MQ V6 on x86 64-bit Linux (which comes packaged as C87RUML.tar.gz). We need to perform the following tasks:

  1. 1. As root, use gunzip and tar to unpack the WebSphere MQ packages:

    # gunzip C87RUML.tar.gz
    # tar -xvf C87RUML.tar
  2. 2. As root, the first task we need to perform is accept the MQ license, as shown next:

    # ./mqlicense.sh
  3. 3. Now we can install the base packages. As root, issue the following commands:

    # rpm -U MQSeriesRuntime-6.0.1-0.x86_64.rpm
    # rpm -U MQSeriesServer-6.0.1-0.x86_64.rpm
    # rpm -U MQSeriesSDK-6.0.1-0.x86_64.rpm
  4. 4. If we want the WebSphere MQ sample programs, which include amqsput, amqsget, amqsgbr, amqsbcg, and so on (which we do!), then we have to install the following package:

    # rpm -U MQSeriesSamples-6.0.1-0.x86_64.rpm

For future reference, to uninstall WebSphere MQ, perform the following steps:

  1. 1. We can check which packages are installed using the following command:

    # rpm -q -a | grep MQSeries
  2. 2. We can check what version of WebSphere MQ we are running by using the following command:

    # dspmqver

    This should give us an output similar to the following:

    Name: WebSphere MQ
    Version: 6.0.0.0
    CMVC level: p000-L080610
    BuildType: IKAP - (Production)

On UNIX systems, if we are running on a 64-bit system, then we need to add the WebSphere MQ library to the LD_LIBRARY_PATH environment variable. If we do not do this, then when we try and start Q Capture (or Q Apply), we will see the following messages in the process log file:

2009-09-02-12.47.39.730985 <ASNMQLOD:MQCONN> ASN0584E "Q Capture" : "ASN" : "AdminThread" : An error occurred while the program was dynamically loading the WebSphere MQ library "libmqm_r.so". Error code: "0x90000076", "Cannot load the specified library". Environment variable ASNUSEMQCLIENT is set to "".

The ASN0584E message tells us to set the LD_LIBRARY_PATH environment variable. To check the current setting of this variable, we can either list the current values of all environment variables, using the env command, or we can list the value of this specific variable by using the echo command and prefixing the variable name with a dollar sign:

echo $LD_LIBRARY_PATH

We can temporarily set the value of this parameter (for the duration of the session in which the command was issued), using the following command:

LD_LIBRARY_PATH=/opt/mqm/lib64:$LD_LIBRARY_PATH

If we ever need to remove the packages, we would use the commands as shown:

#rpm -ev MQSeriesRuntime-6.0.1-0.x86_64.rpm
#rpm -ev MQSeriesServer-6.0.1-0.x86_64.rpm
#rpm -ev MQSeriesSDK-6.0.1-0.x86_64.rpm
#rpm -ev MQSeriesSamples-6.0.1-0.x86_64.rpm

The Q Capture and Q Apply programs are discussed in detail in Chapter 2, The Q Capture and Q Apply programs. Typically, these programs will be installed on different servers, in which case we have to pay attention to the machine clock time on the servers.

Note

The machine clock time on all servers involved in replication should be synchronized.

The times on all servers need to be synchronized, because each captured record has a timestamp associated with it, and Q Apply will not apply in the future. Therefore, if the Q Capture server is ahead of the Q Apply server, then Q Apply will wait until it has reached the timestamp in the replicated record before applying it. If the Apply server time is ahead of the Capture server time, then we will not hit the "Apply will not apply in the future" problem, but the latency figures will be out by the time difference.

In the next section, we will look at the different types of Q replication.

 

The different types of Q replication


There are four basic types of Q replication:

  • Unidirectional

  • Bidirectional

  • Peer-to-peer

  • Event Publishing

Replicating to a stored procedure or a Consistent Change Data (CCD) table are a subset of unidirectional replication.

Let's look at each of these in more detail. In the following sections, we talk about source and target tables. You may be wondering, what about views, triggers, and so on? You should check the Pre-setup evaluation section of Appendix A, for a list of objects to check for, before deciding on whether Q replication is the correct solution.

Unidirectional replication

In unidirectional replication, we can replicate all of the rows and columns of a source table or we can just replicate a subset of the rows and columns. We cannot really perform any transformation on this data. If we want to perform some sort of transformation, then we would need to replicate to a stored procedure, which we will discuss in detail in Appendix A.

Replicating to a stored procedure

Stored procedure replication is a subset of unidirectional replication in which the target is not a table as such, but a stored procedure, as shown in the following diagram:

A stored procedure can transform the data and output the results to a target table. This target table is not known to Q Apply. These stored procedures can be written in SQL, C, or Java. An example of replicating to a stored procedure is shown in the Replication to a stored procedure section of Appendix A.

Note

Prior to DB2 9.7 the source table and the stored procedure must have the same name, and the target table name can be any name we like.

Bidirectional replication

In bidirectional replication, we replicate copies of tables between two servers, each of which has a copy of the table. Note that we can only set up bidirectional replication between two servers. Unlike unidirectional replication, where we can replicate a subset of rows and columns, this is not possible in bidirectional replication. The tables on both servers can have different names, but must have the same number of rows and columns. The columns must have identical column names of compatible data types. It is not possible to do any data transformation using this type of replication.

Because we are updating records on both servers, it is possible that the same record will be updated at the same time on both servers.

Note

Although Q replication provides a conflict detection mechanism, we strongly advise that the driving application should be written or modified in such a way that such conflicts be avoided. The conflict detection provided by Q replication should be treated as a safety net and not the primary conflict resolution mechanism.

This mechanism allows us to choose which data values are used to detect conflicts (key column values only, changed column values, or all column values) and which server should win if such a conflict is detected. The row in the losing system is rolled back and the record is written to the IBMQSNAP_EXCEPTIONS table for review. Conflict detection is discussed in detail in the Q replication conflict detection section.

One of the related subjects to conflict detection is the concept of which server takes precedence in a conflict, or to put it more bluntly, which server is the master and which is the slave! If there is a conflict, then whichever server takes precedence will not apply changes from the other server. This ensures that the servers remain in sync. There is a more egalitarian option, which is that no server takes precedence. In this situation, rows are applied irrespective of whether or not there is a conflict, which ultimately leads to a divergence of the contents of the databases, which is not good!

There are two types of bidirectional replication—the first type is where we have an active/passive setup and the second type is where we have an active/active setup. The type of replication you choose will have implications on which server is defined as the master and which as the slave and what to do if a Q subscription is inadvertently inactivated.

In an active/passive setup, the passive server should be made the master. In an active/active setup, the choice of which system is the master is a decision you have to make. See the Conflict detection: update/delete conflict section of Appendix A for further discussion.

Peer-to-peer replication

Peer-to-peer replication allows us to replicate data between two or more servers. This is different from bidirectional replication, which is only between two servers. Each server has a copy of the table (which can have a different schema and name), but must have the same number of rows and columns and these columns must have identical column names and compatible data types. It is not possible to do any data transformation using this type of replication.

In peer-to-peer replication, there is no such thing as a master or slave server—each server will have the most recent copy of the table—eventually! What this means is that there will be a slight delay between the first server having a copy of the table and the last server having that copy. This is an asynchronous process, so at any one time the tables might be different, but once applications stop updating them, then the tables will converge to the most recently updated value. This type of processing means that there isn't any "manual" conflict detection as such (it is handled automatically by Q Apply), because the latest update will always win.

If two applications update the same record at exactly the same time, then Q replication uses the server number allocated when the peer-to-peer environment was set up to determine the winner. This type of processing means that two columns are added to each of the tables in the Q replication environment, where the first column is a timestamp of when the row was last updated (GMT) and the second column is the machine number. These updates are performed through triggers on the tables.

Tree replication

Tree replication comes in two flavors: bidirectional, which we call B-Tree replication, and unidirectional, which we call U-Tree replication.

A variation on the bidirectional replication theme is that it would be nice to be able to replicate from one master to two slaves in a bidirectional manner. This requirement was addressed with B-Tree replication.

A B-Tree replication structure is one, which looks like a tree (as shown in the preceding diagram). DB2A replicates with DB2B, DB2C, and DB2<n> in a bidirectional manner, but DB2B, DB2C, and DB2<n> do not replicate directly with each other—they have to replicate through DB2A, which is what differentiates B-Tree replication from peer-to-peer replication.

For B-Tree replication, we can replicate between one master and many slaves in a bidirectional manner. In SQL replication terms, this was called Update Anywhere. Note that it is not possible to set up B-Tree replication using the Replication Center—we need to use ASNCLP commands, which is described in detail in the Bidirectional replication to two targets (B-Tree) of Appendix A.

We can also replicate from one source to many targets in a unidirectional scenario, which we call a U-Tree scenario. In the preceding figure, DB2A replicates with DB2B, DB2C, and DB2<n> in a unidirectional manner (we can have more than three targets). Note, there is no radio button in the Replication Center to set up unidirectional U-Tree replication. What we have to do is set up unidirectional replication from DB2A to DB2B, and then for DB2A to DB2C, and so on. It is easier to use ASNCLP commands, which are described in detail in the Unidirectional replication to two targets (U-Tree) section of Appendix A.

Replicating to a Consistent Change Data table

Let's first look at the definition of Consistent Change Data (CCD) replication. CCD table replication is a subset of unidirectional replication, in which the target is a CCD table, which contains a row for each insert/delete/update that occurs on the source table. These CCD tables can be complete and/or condensed (this will be explained later).

There are three main uses of CCD table replication:

  • To populate an operational data store

  • To keep a history of changes made to the source table for audit purposes

  • To enable multi-target update

Consider the situation where we want to populate an Operational Data Store (ODS) with data from our live system. We want to replicate all operations apart from delete operations. Before the introduction of CCD tables, our only option was to use a stored procedure. One of the parameters that the Q Apply program passes to a stored procedure is the operation (insert, delete, and so on) that occurred on the source system. See the Replication to a stored procedure section of Appendix A.

We can use CCD tables to keep a history of changes made to a table, or as a feed to InfoSphere DataStage.

The multi-target update scenario uses Q replication to populate the CCD table and then uses SQL Replication to populate the multiple target tables as shown in the following diagram:

In the Replicating to a CCD table section of Appendix A, we go through the steps needed to set up replication to a CCD table.

We can only specify that a target table be a CCD table in a unidirectional setup. A CCD target table is made up of the following columns (only four of the metadata columns are compulsory, the other four are optional, and the order of the columns does not matter):

<user key columns>

<user nonkey columns>

<user computed columns>

 

IBMSNAP_INTENTSEQ

IBMSNAP_OPERATION

IBMSNAP_COMMITSEQ

IBMSNAP_LOGMARKER

Compulsory auditing columns

IBMSNAP_AUTHID

IBMSNAP_AUTHTKN

IBMSNAP_PLANID

IBMSNAP_UOWID

Optional auditing columns

The IBMSNAP_INTENTSEQ column is a sequence number that uniquely identifies a change and is ascending within a transaction. The IBMSNAP_OPERATION column is a flag that indicates the type of operation for a record. The IBMSNAP_COMMITSEQ column is a sequence number that provides transactional order. The IBMSNAP_LOGMARKER column is the time that the data were committed.

The IBMSNAP_AUTHID column is the authorization ID that is associated with the transaction. This column is for used for Linux, UNIX, Windows, and z/OS. For z/OS, this is the primary authorization ID. The IBMSNAP_AUTHTKN column is the authorization token that is associated with the transaction. This column is for z/OS only and is the correlation ID—it will be NULL for Linux, UNIX, and Windows. The IBMSNAP_PLANID column is the plan name that is associated with the transaction. This column is for z/OS only—it will be NULL for Linux, UNIX, and Windows. And finally, the IBMSNAP_UOWID column is the unit-of-work identifier from the log record for this unit of work. This column is used for Linux, UNIX, Windows, and z/OS.

The <user computed columns> columns will be the before image columns and must be NULLABLE (because there is no before image when we insert a row, the before image is NULL).

Now let's look at what data is stored in a CCD table. With CCD tables we can specify that the target table is: COMPLETE or NONCOMPLETE and CONDENSED or NONCONSENSED. These are interpreted as follows:

  • Complete (COMPLETE=Y): A complete CCD table contains every row of interest from the source table and is initialized with a full set of source data. All target table loading options are valid for complete CCDs (automatic, manual, or no load).

  • Noncomplete (COMPLETE=N): A noncomplete CCD table contains only changes to the source table and starts with no data. A noncomplete CCD table records all UPDATE operations at the source. The only valid load option for noncomplete CCD tables is no load.

  • Condensed (CONDENSED=Y): A condensed CCD table contains one row for every key value in the source table and contains only the latest value for the row. For condensed CCD tables, a primary key is required, which is used in case of an update conflict. If such a conflict occurs, all the source columns are forced into the row (CONFLICT_ACTION=F).

  • Noncondensed (CONDENSED=N): A noncondensed CCD table contains multiple rows with the same key value, one row for every UPDATE, INSERT, or DELETE operation at the source table. When added to the CCD table, all of the rows become INSERT operations. No primary key is required.

The options for handling unexpected conditions at the target are limited for CCD tables:

For condensed and complete, two choices are available:

  • Force the source change into the target table (CONFLICT_ACTION=F)

  • Ignore the condition and continue (CONFLICT_ACTION=I)

For any combination other than condensed and complete, the only valid option is to force the change into the target table.

For all CCD table types, the only valid conflict rule is to check only key columns (CONFLICT_RULE=K).

Before we move on, let's quickly look at CCD tables in an SQL Replication environment. In SQL Replication, there is the concept of internal and external CCD tables, which does not exist in Q replication. In SQL Replication terminology, all Q replication CCD tables are external CDD tables.

Event Publishing

The Event Publishing functionality captures changes to source tables and converts committed transactional data to messages in an XML format. Each message can contain an entire transaction or only a row-level change. These messages are put on WebSphere MQ message queues and read by a message broker or other applications. We can publish subsets of columns and rows from source tables so that we publish only the data that we need.

 

DB2 replication sources


In this section, we cover the various DB2 objects that can be used as replication sources, such as XML data types, compressed tables, and large objects.

Replicating XML data types

From DB2 9.5 onwards, we can replicate tables, which contain columns of data type XML, and an example is shown in the Unidirectional replication for an XML data type section of Appendix A. We can set up unidirectional, bidirectional, and peer-to-peer replication.

From DB2 9.7 onwards, in unidirectional replication, we can use XML expressions to transform XML data between the source and target tables. Examples of supported and unsupported XML expressions are shown next.

Supported XML expressions include XMLATTRIBUTES, XMLCOMMENT, XMLCAST, XMLCONCAT, XMLDOCUMENT, XMLELEMENT, XMLFOREST, XMLNAMESPACES, XMLPARSE, XMLPI, XMLQUERY, XMLROW, XMLSERIALIZE, XMLTEXT, and XMLVALIDATE.

Unsupported XML expressions include XMLAGG, XMLGROUP, XMLTABLE, XMLXSROBJECTID, and XMLTRANSFORM.

For a complete up-to-date list, check out the DB2 Information Center at http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.swg.im.iis.repl.qrepl.doc/topics/iiyrqsubcxmlexpress.html.

Replicating compressed tables

From DB2 9.7 onwards, tables can have both the COMPRESS YES and DATA CAPTURE CHANGES table options set, which means we can now replicate compressed tables.

The issue with replicating a compressed table, is what happens if the compression dictionary is changed while Q Capture is down? Once Q Capture is started again, then it will try and read logs and records that were compressed with the previous compression dictionary, and not succeed. To address this, when a table has both the COMPRESS YES and DATA CAPTURE CHANGES options set, then the table can have two dictionaries: an active data compression dictionary and a historical compression dictionary.

Note

We should not create more than one data compression dictionary while Q Capture is down.

If a table is set to DATA CAPTURE NONE, then if a second dictionary exists, it will be removed during the next REORG TABLE operation or during table truncate operations (LOAD REPLACE, IMPORT REPLACE, or TRUNCATE TABLE).

Replicating large objects

If a row change involves columns with large object (LOB) data, Q Capture copies the LOB data directly from the source table to the send queue.

If we are replicating or publishing data from LOB columns in a source table, then Q Capture will automatically divide the LOB data into multiple messages to ensure that the messages do not exceed the MAX MESSAGE SIZE value of the Replication Queue Map used to transport the data.

If we are going to replicate LOB data, then we need to ensure that the MAXDEPTH value for the Transmission Queue and Administration Queue on the source system, and the Receive Queue on the target system, is large enough to account for divided LOB messages.

If we select columns that contain LOB data types for a Q subscription, we need to make sure that the source table enforces at least one unique database constraint (a unique index, primary key, and so on). Note that we do not need to select the columns that make up this uniqueness property for the Q subscription.

Other DB2 objects

In addition to the previous objects, let's look at some other DB2 objects and see if they can be used as a Q replication source:

  • What about views? Not at the present time.

  • What about DB2 system tables? No.

  • What about Materialized Query Tables (MQTs)? Yes as of DB2 9.7.

  • What about range-partitioned tables? Yes as of DB2 9.7.

  • What about hash-partitioned tables? Yes, see the Q replication in a DPF environment section.

So now let's move on to looking at Q replication filtering and transformations.

 

Q replication filtering and transformations


Let's first look at what is possible when it comes to filtering rows and columns, and then move on to look at transformations.

Filtering rows/columns

Let's first look at row filtering. It is only possible to filter rows for replication in a unidirectional scenario, and this is done in the Q subscription. For an example, see the Creating a Q subscription section of Chapter 6.

What about the number of columns we want to replicate—can we replicate just a subset of the source table columns? For the latest release of code, we can subset the columns to be replicated. Note that we cannot replicate more columns than are defined at the target table or target stored procedure and that the column names must still match, which is shown in the following diagram:

For unidirectional replication only, the target table can have more columns than the source table as shown in the following diagram, but these "non-source" columns cannot be part of the target table key and must be defined as NULLABLE or NOT NULL WITH DEFAULT, as shown next.

Any filtering of rows or columns in unidirectional replication is specified at Q subscription definition time. At this time, we can specify:

  • Which columns to replicate and how they map to columns at the target table (or to parameters in a stored procedure)

  • A search condition to determine which rows from the source table are replicated

As stated at the beginning of this chapter, Q replication is built for speed with transformations not being a major factor. However, although Q replication does not have the transformation capabilities of SQL Replication, it does have some transformation capabilities, which are described in the following sections.

Before and After SQL—alternatives

In Q replication, there is no concept of before and after SQL, as there is in SQL Replication. In a unidirectional setup, we can use SQL expressions to transform data between the source and target tables. We can map multiple source columns to a single target column, or to create other types of computed columns at the target. An example is shown in the Q subscription for unidirectional replication section of Chapter 5 .

Stored procedure processing

If we want to perform transformations with Q replication, then we need to use stored procedure processing. This allows us to call external routines to perform all the transformations we want. The Replication to a stored procedure section of Appendix A shows an example of how to set up Q replication to a stored procedure.

We now move on to look at conflict detection in a Q replication environment.

 

Q replication conflict detection


This section looks at conflict detection, what it is, when it occurs and how we deal with it.

What is conflict detection?

Let's start by defining what we mean by a conflict. A conflict occurs in bidirectional replication when the same record is processed at the same time on the two servers. We then have to decide which server is the winner, which we do when we set up the Q subscription for the table.

When do conflicts occur?

In unidirectional replication, the only time we need conflict detection is if we are updating the target table outside of Q replication, which is not recommended! Q subscription for unidirectional replication section of Chapter 5, covers scenarios where the target table is updated outside of Q replication.

There is no conflict detection with Event Publishing.

We need conflict detection in multi-directional replication. Let's first look at bidirectional replication and then move on to peer-to-peer replication.

Bidirectional replication uses data values (which we can choose) to detect and resolve conflicts. The choice of data values is determined by the CONFLICT RULE parameter we specify when we create a Q subscription. The process is that "before" values at the source server are compared against the "current" values at the target server, and based on the level of conflict detection Q Capture sends a different combination of before and/or after values to Q Apply. The CONFLICT RULE options are:

  • Key column values only: Q Apply attempts to update or delete the target row by checking the values in the key columns. Q Apply detects the following conflicts:

    • A row is not found in the target table

    • A row is a duplicate of a row that already exists in the target table

    With this conflict rule, Q Capture sends the least amount of data to Q Apply for conflict checking. No before values are sent, only the after values for any changed columns are sent.

  • Key and changed column values: Q Apply attempts to update or delete the target row by checking the key columns and the columns that changed in the update. Q Apply detects the following conflicts:

    • A row is not found in the target table

    • A row is a duplicate of a row that already exists in the target table

    • A row is updated at both servers simultaneously and the same column values changed

    If a row is updated at both servers simultaneously and the different column values changed, then there is no conflict. With this conflict rule, Q Apply merges updates that affect different columns into the same row. Because Q Apply requires the before values for changed columns for this conflict action, Q Capture sends the before values of changed columns.

  • All column values: Q Apply attempts to update or delete the target row by checking all columns that are in the target table. With this conflict rule, Q Capture sends the greatest amount of data to Q Apply for conflict checking.

    Note

    If we replicate the LOB columns, then conflicts are not detected. This is because Q replication does not replicate "before" values for LOB data types. See the Replicating large objects section for more information.

In a peer-to-peer configuration, conflict detection and resolution are managed automatically by Q Apply in a way that assures data convergence. We do not need to set anything up, as was discussed in the Peer-to-peer replication section.

The Conflict detection examples section of Appendix A walks through a couple of conflict detection examples.

 

Q replication and HADR


We can combine Q replication and the High Availability Disaster Recovery (HADR) feature of DB2 in a couple of ways. In the first way, we can use Q replication to replicate data between database DB2A and DB2B and then use HADR to protect the DB2B database, as shown in the following diagram:

There is a new Q Apply parameter in DB2 9.7 called LOADCOPY_PATH, which can be used instead of the HADR DB2_LOAD_COPY_NO_OVERRIDE registry variable when target tables are loaded by Q Apply using the DB2 LOAD utility.

In the second way, we can use HADR to provide local resilience and Q replication to provide remote resilience, as shown in the following diagram:

 

Q replication in a DPF environment


Q replication works well in a Database Partition Facility (DPF) environment, but there are a couple of design points to be aware of. Consider the sample configuration shown in the following diagram:

We have four servers called RED01, RED02, BLUE01, and BLUE02. We want to replicate from the RED side to the BLUE side. Each side will have four data partitions and a catalog partition with one DAS instance per box. The instance name is db2i001 and the database name is TP1.

The RED side is shown next. There are five database configuration files (DB CFG) and one database manager configuration file (DBM CFG). It is the "Detailed" table, which is replicated from the RED side to the BLUE side.

Note the following:

  • MQ installed on RED01

  • Replication control tables created on partition 0

  • Q Capture and Q Apply run on RED01

  • EXPLAIN tables defined on RED01

Tables with referential integrity

The first design point deals with tables, which have referential integrity. We need to ensure that all related parent and child tables are on the same partition. If we do not do this and start replication, then we will get ASN7628E errors.

Table load and insert considerations

If we want to load from the application, then the staging table should be partitioned similarly to the detailed table so that we can make use of collocation (therefore, we need the same partition group and same partition key).

If we want to insert from the application, then the staging table should be defined on partition 1 ONLY. If we are using INSERT, then we would use INSERT/SELECT/DELETE to transfer data from the staging table to the detailed table. We also need to perform simple housekeeping tasks on the staging table, for example regular online reorganizations.

An example of an INSERT/SELECT/DELETE statement is shown next:

with fred (id,name,trans_date) as
(
select id,name,trans_date from stag_tab
(delete from stag_tab where trans_date = current timestamp -10 minutes)
)
select count(*) from fred
(insert into det_tab select id,name,trans_date from fred);

Every time we run the above SQL, it will move records that are more than 10 minutes old from stag_tab to det_tab.

 

Summary


In this chapter, we reviewed the different types of replication available today, and how they are compared. We looked at the different types of Q replication, namely unidirectional/stored procedure/CCD, bidirectional, peer-to-peer, tree replication, and Event Publishing. We discussed various DB2 replication sources including XML data and compressed data and looked at filtering and transformations. Finally, we covered operating Q replication in HADR and DPF environments. Now that we have an overview of what Q replication is, we can look at the Q replication components in more detail, which is what we will do in the next chapter.

About the Author

  • Pav Kumar-Chatterjee

    Pav Kumar-Chatterjee (Eur Ing, CENG, MBCS) has been involved in DB2 support on the mainframe platform since 1991, and on midrange platforms since 2000. Before joining IBM he worked as a database administrator in the airline industry as well as various financial institutions in the UK and Europe. He has held various positions during his time at IBM, including in the Software Business Services team and the global BetaWorks organization. His current position is a DB2 technical specialist in the Software Business. He was involved with Information Integrator (the forerunner of Replication Server) since its inception, and has helped numerous customers design and implement Q replication solutions, as well as speaking about Q replication at various conferences.
    Pav Kumar-Chatterjee has co-authored the “DB2 pureXML Cookbook” (978-0-13-815047-1) published in August 2009.

    Browse publications by this author
IBM InfoSphere Replication Server and Data Event Publisher
Unlock this book and the full library for $5 a month*
Start now