Oracle Warehouse Builder 11g: Getting Started — Save 50%
Extract, Transform, and Load data to build a dynamic, operational data warehouse
In this article by Bob Griesemer, we will learn about how to synchronize objects, its use in mapping and also, the auto binding of tables to dimensional objects.
We created tables, dimensions, and a cube; and new tables were automatically created for each dimension and cube. We then created mappings to map data from tables to tables, dimensions, and a cube. What happens if, let's say for example, a table definition is updated after we've defined it and created a mapping or mappings that include it? What if a dimensional object is changed? In that case, what happens to the underlying table? This is what we are going to discuss in this section.
One set of changes that we'll frequently find ourselves making is changes to the data we've defined for our data warehouse. We may get some new requirements that lead us to capture a new data element that we have not captured yet. We'll need to update our staging table to store it and our staging mapping to load it. Our dimension mapping(s) will need to be updated to store the new data element along with the underlying table. We could make manual edits to all the affected objects in our project, but the Warehouse Builder provides us some features to make that easier.
Changes to tables
Let's start the discussion by looking at table updates. If we have a new data element that needs to be captured, it will mean finding out where that data resides in our source system and updating the associated table definition in our module for that source system.
Updating object definitions
There are a couple of ways to update table definitions. Our choice will depend on how the table was defined in the Warehouse Builder in the first place. The two options are:
- It could be a table in a source database system, in which case the table was physically created in the source database and we just imported the table definition into the Warehouse Builder.
- It could be a table we defined in our project in the Warehouse Builder and then deployed to the target database to create it. Our staging table would be an example of this second option.
In the first case, we can re-import the source table using the procedures generally used for importing source metadata. When re-importing tables, the Warehouse Builder will do a reconciliation process to update the already imported table with any changes it detects in the source table. For the second case, we can manually edit the table definition in our project to reflect the new data element.
For the first case where the table is in a source database system, the action we choose also depends on whether that source table definition is in an Oracle database or a third-party database. If it is in a third-party database, we're going to encounter an error. Hence, we'll be forced to make manual edits to our metadata for that source until that bug is fixed. If the table is in an Oracle database, re-importing the table definition would not be a problem and it will do the reconciliation process, picking up any new data elements or changes to the existing ones.
For a hands-on example here, let's turn to our new project that we created earlier while discussing snapshots. We copied our POS_TRANS_STAGE table over to this project, so let's use that table as an example of a changing table, as we defined the table structure manually in the Warehouse Builder Design Center and then deployed it to the target database to actually create it. For this example, we won't actually re-deploy it because we'll be using that second project we created. It doesn't have a valid location defined, but we can still edit the table definition and investigate how to reconcile that edit in the next section.
So, let's edit the POS_TRANS_STAGE table in the ACME_PROJ_FOR_COPYING project in the Design Center by double-clicking on it to launch it in the Data Object Editor. We'll just add a column called STORE_AREA_SIZE to the table for storing the size of the store in square feet or square meters. We'll click on the Columns tab, scroll it all the way to the end, enter the name of the column, then select NUMBER for the data type, and leave the precision and scale to the default (that is 0) for this example.
We can validate and generate the object without having a valid location defined, so we'll do that. The validation and generation should complete successfully; and if we look at the script, we'll see the new column included.
We now need a mapping that uses that table, which we have back in our original project. Let's use the copy and paste technique we used earlier to copy the STAGE_MAP mapping over to this new project. We'll open the ACME_DW_PROJECT project, answering Save to the prompt to save or revert. Then on the STAGE_MAP mapping entry, we'll select Copy from the pop-up menu. We'll open the ACME_PROJ_FOR_COPYING project and then on the Mappings node, select Paste on the pop-up menu.
We ordinarily won't copy an object and paste it into a whole new project just for making changes. We're only doing it here so that we can make changes without worrying about interfering with a working project.
|Extract, Transform, and Load data to build a dynamic, operational data warehouse|
eBook Price: $32.99
Book Price: $54.99
Many operators we use in a mapping represent a corresponding workspace object. If the workspace object (for instance, a table) changes, then the operator also needs to change to be kept in sync. The process of synchronization is what accomplishes that, and it has to be invoked by us when changes are made.
Now that we have the updated table definition for the POS_TRANS_STAGE table, we have to turn our attention to any mappings that have included a table operator for the changed table because they will have to be synchronized to pick up the change. We know how to create a mapping with a table operator that represents a table in the database. These operators are bound to an actual table using a table definition like we just edited. When the underlying table definition gets updated, we have to synchronize those changes with any mappings that include that table. We now have our STAGE_MAP mapping copied over to our new project. So let's open that in the mapping editor by double-clicking on it and investigate the process of synchronizing.
We'll double-check to make sure we've opened the mapping in the correct project as we now have the same mapping name defined in two separate projects. This is perfectly acceptable and any changes we make to one won't affect the other, but we need to make doubly sure that we're in the correct project. In this case we want to be in the ACME_ PROJ_FOR_COPYING project, not in the original ACME_DW_PROJECT project. Another reason is that the operators in the mapping still point back to the original object, which we're going to fix by synchronizing; and we don't want to update the wrong mapping.
When we open the mapping for the first time, it may have all the objects overlapping in the middle of the mapping. So we'll just click on the auto-layout button. If we look at the POS_TRANS_STAGE mapping operator, we can scroll down the INOUTGRP1 attribute group or maximize the operator to view all the attributes to see that the new STORE_AREA_SIZE column is not included.
To update the operator in the mapping to include the new column name, we must perform the task of synchronization, which reconciles the two and makes any changes to the operator to reflect the underlying table definition. We could just manually edit the properties for the operator to add the new column name, but that still wouldn't actually synchronize it with the actual table. Doing the synchronization will accomplish both—add the new column name and synchronize with the table. In this particular case there is another reason to synchronize, and that is we copied this mapping into the new project from another mapping where it was already synchronized with tables in that project. This synchronization information is not automatically updated when the mapping is copied.
To synchronize, we right-click on the header of the table operator in the mapping and select Synchronize... from the pop-up menu, or click on the table operator header and select Synchronize... from the main menu Edit entry, or press the F7 key. This will pop up the Synchronize dialog box as shown next:
Now we can see why it's so important to make sure we're in the correct project. From the entry indicating the repository object from which it will synchronize, we can see that it's still set to point to the original POS_TRANS_STAGE table in the ACME_DW_PROJECT project and not the new one we just edited in this project. If we were to rely upon this, we would think we are in the wrong project. We need to click on the dropdown menu and select the POS_TRANS_STAGE table in our new COPY_MODULE. In fact, this new copy module is the only one we have available. This is good because we wouldn't want to select an object in another module. It's only set that way in this case because it was just copied from that other project. However, we can tell something is a little strange there because the path listed for the POS_TRANS_STAGE table stops at ACME_DW_PROJECT and no icon is displayed for the type of object. When we select the POS_TRANS_STAGE table in our new project, we get the correct display as shown next:
This looks much better. Notice how the path includes the workspace name now to fully place it in context. It knows what kind of object it is, a table, so it can display the correct icon. Now we can proceed with deciding whether this will be inbound or outbound.
Inbound or outbound
Now that we have the correct repository object specified to synchronize with, we have to select whether this is an inbound or outbound synchronization. Inbound is the one we want and is the default. It says to use the specified repository object to update the operator in our mapping for matching. If we were to select outbound, it would update the workspace object with the changes we've made to the operator in the mapping
Matching and synchronizing strategy
Having decided on inbound, we now have to decide upon a matching strategy to use. The online help goes into good detail about what each of those strategies is, but in our case, we'll want to select Match By Object Position or Match By Object Name. The Match by Object ID option uses the underlying unique ID that is created for each attribute to do the matching with, and that unique ID is not guaranteed to match between projects. It is a uniquely created ID internal to the Warehouse Builder metadata, which uniquely identifies each attribute. The unique ID it stores in the operator for each attribute is the unique ID from the original table it was synchronized with. If we use that option, it will treat all the objects as new because it is not going to get a match on any of them due to using different unique IDs for the copied table.
If we select the Replace synchronize strategy, its side effect in the mapping is that all the connections we've made to the existing attributes in the table from the aggregator will be deleted. This is because it has removed all the existing attributes and replaced them with new attributes from the new table with all the new IDs. If we had selected the Merge synchronize strategy, it would leave all the existing attributes alone. However, it would add in (or merge in) all the attributes from the new table, in effect duplicating them all in our operator, which is clearly not what we want.
Thankfully, there is a solution that will work fine and that is either of the other twoMatching Strategy selections. By selecting Match by Object Position, we'd be telling it to match the operator with the repository object position-by-position, regardless of the unique IDs. So it will not wipe out any connections we've already made as long as there is an attribute in the same corresponding position in the workspace table object. The same holds true for Match By Object Name, but this option matches objects by the name of the object and not the position or ID. We know the operator will match all the names and positions of the existing columns, and that the new column has been added to the end. Therefore, we can use either of those two strategies to match and our mapping will remain intact with the existing connections.
With these two options, the Synchronize Strategy of merge or replace does not make any difference because all the attributes of the operator in the mapping will be matched in either case. They only indicate what to do with differences. And because the only difference is a new column in the table, regardless of whether we merge in the difference or replace the difference, the net effect is the addition of the new column in the operator.
Viewing the synchronization plan
Based on our selection of the matching and synchronization strategy, the dialog box gives us the option to view what it is going to do before we do it just to be sure we have made the proper selections. We can click on the View Synchronization Plan button to launch a dialog box, which will show us what it is going to do. It is nice because we can view the plan without having it actually do anything. So let's select Match By Object ID for the matching strategy and Replace for synchronization strategy, and click on the View Synchronization Plan button. This will launch the Synchronization Plan dialog box as shown next:
The source is the POS_TRANS_STAGE table definition in the workspace and the target is the table operator in the mapping for that table. When matching by object ID, nothing is going to match because the object IDs for the new table are all different from the original table. The Replace option says to replace all differences with the source definitions, so we'll see deletes for all the attributes in the target and creates for all the attributes from the source. That is why all the connections would disappear from our mapping if we used this option.
Let's try the Merge synchronize strategy option with the object ID match by changing that drop-down in the dialog box to Merge and clicking on the Refresh Plan button. Remember that no actual changes are being made here; it is only telling us what it would do if we made those selections in the main dialog box and clicked on OK there. This option will display the following:
Here we can see that it is creating new entries in the target for all the table entries as it didn't fi nd any matches. It is leaving all the existing target attributes alone, thus merging in the differences. This is clearly not what we want because as indicated previously, it will add in duplicates of every attribute.
If we select either Match By Object Position or Match By Object Name and refresh the plan, we'll see that it lists one action of Create for the new column. There may be updates for existing columns that match, but there should be no other creates or deletes showing. This is what we want, so we'll click on the OK button to close the Synchronization Plan dialog box. Back in the main Synchronization dialog box, we'll select Match By Object Name as the matching strategy and Replace as the synchronization strategy.
If this had been an actual update that we had source data for, we would make sure the source data table definition in the workspace was updated to reflect the new data element. As this is related to the store, in our case it would probably have been the STORE source table in the ACME_POS module under Databases | Non-Oracle | ODBC. We would then perform the same synchronization operation we just performed on the STORES table operator. We would then have to map that new value through the joiner and to the new column in the POS_TRANS_STAGE table.
We will move along now and discuss one final change and the feature the Warehouse Builder provides for handling it. It is the changing of dimensions and their underlying tables, and keeping them properly bound.
|Extract, Transform, and Load data to build a dynamic, operational data warehouse|
eBook Price: $32.99
Book Price: $54.99
Changes to dimensional objects and auto-binding
When we create dimensional objects (dimensional in this context being used to refer to both dimensions and cubes), we see how it automatically creates a matching relational table for us. This would be used to hold the data for the object because we had selected ROLAP for the storage implementation. This table was then bound to the dimension with dimension attributes, and levels bound to columns in the table.
If we carried our previous example one step further, we'd need to add an attribute to our STORE dimension to hold the new value for the size of the store. This would mean we would have to make sure the STORE dimension and the STORE table stayed properly synchronized. This is not quite the same concept as we just discussed. We are now talking about two data objects, and not a data object and an operator in a mapping. That is why the Warehouse Builder generally refers to this as binding instead of synchronizing.
Let's begin by setting up our ACME_PROJ_FOR_COPYING project with copies of the STORE dimension and STORE table for trying this out. We're going to copy the STORE dimension over to the new project and leave the STORE table behind because, as we'll see in a moment, that is going to get automatically generated.
Now that we have the dimension copied over, there is a bit of housekeeping we need to do first. As with the mapping having a reference back to the table in the original project, our STORE dimension will still be bound back to the STORE table in the ACME_DW_PROJECT, and we need to fix that first before continuing with our example. This will be good for us to get more practice working with objects in the Data Object Editor. So let's open the STORE dimension in the editor by double-clicking on it.
If we right-click in the header of the dimension in the Canvas and select Detail View..., another tab is created in the Canvas named for the dimension and it now displays both the dimension and its underlying table; however, we never copied over the table. If we click on the table, the window details pane updates to display the information about the table. It is showing us that the dimension is still bound to the original table in our main project we just copied from. This is demonstrated in the following image:
Clearly, that is not good. We need to sever the connection with that table and rebind it to a table in the current project. So let's click on the Dimensional tab, right-click on the header of the STORE dimension, and select UnBind from the pop-up menu. This will switch us right back to the STORE tab, but the STORE table will no longer be there. Now we need a table to be bound to this dimension. This is where the Auto Bind function comes in. With Auto Bind, we can have the Warehouse Builder automatically create the table for us with all the dimension attributes properly bound. To do this, we need to be back on the Dimensional tab.
There is no officially documented process or function for manually binding a dimension and table together. However, it's possible to do it via a feature called Experts, which is one of the more advanced topics we can't cover in this introductory article. There is an Expert that has been specifi cally developed already to do just that. It can be found on the Oracle Technology web site on the Warehouse Builder Utility Exchange at http://www.oracle.com/technology/products/warehouse/htdocs/OWBexchange.html.
In the product area box, click on the Experts button and then the Search button. It is the Create Dimension Expert utility that has an Expert to manually bind a table and dimension. The version says 10gR2, but it will still work in 11gR1, which we're using for this article.
This is a site Oracle maintains for Warehouse Builder tips, features, code, utilities, and so on that are not found in the official release and are not officially supported. However, much of the content (including the previously referenced Expert) is developed by the Oracle developers themselves who actually work on the Warehouse Builder. We can find a lot of good stuff on that site, so need not fear the fact that Oracle says it's "unsupported".
Let's click on the Dimensional tab. Now right-click on the STORE dimension and select Auto Bind from the pop-up menu. This will create a new STORE table for us, automatically bind the existing dimension attributes and levels to columns in the table, and switch us back to the STORE tab showing us the details. Now if we click on the STORE table, it will show COPY_MODULE as the module and not ACME_DWH. We can now proceed with our previous example about adding a column. In this case, we'll want to add a column to the STORE dimension to save the size value, So let's go back to the Dimensional tab to do that.
We could have saved ourselves an Auto Bind step here by just editing the dimension before we did the first Auto Bind. But the intent was to re-create the situation as it would be for real if we had to edit a dimension that was already bound to a table, which is a real-world situation we'll run into quite frequently. The first Auto Bind was just to set up that scenario.
On the Attributes tab, scroll down to the end and enter a new attribute called AREA_SIZE. Change the data type to NUMBER with the precision and scale set to zero. We'll make it an attribute of the STORE level. So click on the STORE level on the Levels tab and scroll down the attributes, and check the box beside the AREA_SIZE name.
The view of the dimension and table on the STORE tab is "Read-only", meaning we can't make any changes there. Switching back to the Dimensional tab, we can make changes. However, the Data Object Editor can sometimes still show "Read-only" on the Dimensional tab. Simply refresh the view by right-clicking on the Canvas window background and selecting Refresh to clear the readonly setting. Sometimes it can also help just to click in the canvas outside the dimension and then reselect the dimension to redisplay the attributes below. The preceding changes can now be made on the Dimensional tab of the canvas.
Let's save our work and go back to the STORE tab to check the STORE table, and we'll see that there is no AREA_SIZE column. On the Dimensional tab, we need to perform the Auto Bind again on the dimension, and that will update the table to include the new column. We do not need to do the UnBind this time because the correct table is bound; we just want it updated in place.
After the Auto Bind, the table has been updated now to include the new column. We can verify this on the STORE tab by inspecting the table.
If this were a working project we had previously deployed, we would need to deploy this updated table and the dimension to actually update the database. We would also need to perform the synchronization (which we discussed in the previous section) in any mappings that included a dimension operator for the STORE dimension, so any mapping operators that referenced the dimension would be up to date.
When making a change for real like this, make sure the deployment action for the table is set to Upgrade and not Replace if there is data in the table already, else the deployment will fail. It should default to Upgrade. However, watch out for the error that may occur when trying to do a deployment with an Upgrade status.
This completes our discussion of some additional editing features that we can use as we develop and maintain increasingly complicated data warehouses. We have touched upon just the basics about the Warehouse Builder in this article that we need to know to be able to use it to construct a data warehouse. There is a wealth of more advanced features and capabilities we did not have time to cover in this article, so we just mentioned a few along the way. But this lays the groundwork and has equipped us with the ability to build a complete working data warehouse. There are a lot of resources on the Internet to help us further our education about the tool and to provide assistance if we have questions, much of which Oracle provides directly from its web sites.
In this article, we have seen a valuable feature that is the ability to keep the objects synchronized with the operators in the mappings that refer to those objects. Also, we can automatically update objects that are bound together, such as dimensions and the tables used to implement them. These features will assist greatly in the task of making changes to our data warehouse, which will inevitably need to be done in any data warehouse project we undertake as nothing stays static for very long and constant improvement should always be happening.
About the Author :
Bob Griesemer has over 27 years of software and database engineering/DBA experience in both government and industry, solving database problems, designing and loading data warehouses, developing code, leading teams of developers, and satisfying customers. He has been working in various roles involving database development and administration with the Oracle Database with every release since Version 6 of the database from 1993 to the present. He has also been performing various tasks, including data warehouse design and implementation, administration, backup and recovery, development of Perl code for web-based database access, writing Java code utilizing JDBC, migrating legacy databases to Oracle, and developing Developer/2000 Oracle Forms applications. He is currently an Oracle Database Administrator Certified Associate, and is employed by the Northrop Grumman Corporation, where he is currently a Senior Database Analyst on a large data warehouse project.