Oracle Warehouse Builder 11g: Getting Started — Save 50%
Extract, Transform, and Load data to build a dynamic, operational data warehouse
In this article by Bob Griesemer, we will discuss about Mapping, mainly STORE Mapping in Oracle Warehouse Builder 11g. We will build mappings with the additional features of Oracle Warehouse Builder. We will introduce the concept of transformations and operators that are available in OWB, which can be used for transforming and manipulating data between source and target. Along the way, we'll get to build a quick mapping for creating and loading a table that will be used as a lookup table.
In this article, we will begin to see the real power and flexibility the Warehouse Builder provides us for loading a data warehouse. When we complete the mappings in this article, we will have a complete collection of objects and mappings. We can deploy and run these to build and load our data warehouse.
The basic procedure to build a mapping is the same—start with adding a source and a target, and then include any operators in between needed for data flow and transformation.
Let's start this article with the STORE dimension and we'll see some new operators that are involved in transformations.Let's begin by creating a new mapping called STORE_MAP. In the Design Center, we will right-click on the Mappings node of the ACME_DW_PROJECT | Databases | Oracle | ACME_DWH database and select New.... Enter STORE_MAP for the name of the mapping and we will be presented with a blank Mapping Editor window. In this window, we will begin designing our mapping to load data into the STORE dimension.
Adding source and target operators
Once the POS_TRANS_STAGE staging table is loaded with data, this can be used to load data into our dimensions and cube. We'll now use this POS_TRANS_STAGE table as our source table. Let's drag this table onto the mapping from the Explorer window.
The target for this mapping is going to be the STORE dimension, so we'll drag this dimension from Databases | Oracle | ACME_DWH | Dimensions onto the mapping and drop it to the right of the POS_TRANS_STAGE table operator. Remember that we build our mappings from the left to the right, with source on the left and target on the right. We'll be sure to leave some space between the two because we'll be filling that in with some more operators as we proceed.
Now that we have our source and target included, let's take a moment to consider the data elements we're going to need for our target and where to get them from the source. Our target for this mapping, the STORE dimension, has the following attributes for the STORE level for which we'll need to have source data:
For the REGION level, we'll need data for the following attributes:
For the COUNTRY level, we'll need data for the following attributes:
The complete and fully expanded STORE dimension in our mapping appears like the following screenshot:
We might be tempted to include the ID fields in the above list of data elements for populating, but these are the attributes that will be filled in automatically by the Warehouse Builder. The Warehouse Builder fills them using the sequence that was automatically created for us when we built the dimension. We don't have to be concerned with connecting any source data to them.
Now that we know what we need to populate in our STORE dimension, let's turn our attention over to the POS_TRANS_STAGE dimension for the candidate data elements that we can use. In this table, we see the following data elements for populating data in our STORE dimension:
It is easy to see which of these attributes will be used to map data to attributes in the STORE level of the STORE dimension. They will map into the corresponding attributes in the dimension in the STORE group. We'll need to connect the following attributes together:
- STORE_NAME to NAME
- STORE_NUMBER to STORE_NUMBER
- STORE_ADDRESS1 to ADDRESS1
- STORE_ADDRESS2 to ADDRESS2
- STORE_CITY to CITY
- STORE_STATE to STATE
- STORE_ZIPPOSTALCODE to ZIP_POSTALCODE
- STORE_REGION to REGION_NAME
There is another attribute in our STORE dimension that we haven't accounted for yet—the COUNTY attribute. We don't have an input attribute to provide direct information about it. It is a special case that we will handle after we take care of these more straightforward attributes and will involve the lookup table that we discussed earlier in the introduction of this article.
We're not going to directly connect the attributes mentioned in the list by just dragging a line between each of them. There are some issues with the source data that we are going to have to account for in our mapping. Connecting the attributes directly like that would mean the data would be loaded into the dimension as is, but we have investigated the source data and discovered that much of the source data contains trailing blanks due to the way the transactional system stores it. Some of the fields should be made all uppercase for consistency.
Given this additional information, we'll summarize the issues with each of the fields that need to be corrected before loading into the target and then we'll see how to implement the necessary transformations in the mapping to correct them:
- STORE_NAME, STORE_NUMBER: We need to trim spaces and change these attributes to uppercase to facilitate queries as they are part of the business identifier
- STORE_ADDRESS1, ADDRESS2, CITY, STATE, and ZIP_POSTALCODE: We need to trim spaces and change the STATE attribute to uppercase
- STORE_REGION: We need to trim spaces and change this attribute to uppercase
All of these needs can be satisfied and we can have the desired effect by applying pre-existing SQL functions to the data via Transformation Operators.
|Extract, Transform, and Load data to build a dynamic, operational data warehouse|
eBook Price: $32.99
Book Price: $54.99
Adding Transformation Operators
The Transformation Operator is a generic operator that is used to represent several built-in or custom-built functions or procedures for operating on data in order to make some kind of change or transformation to it. Let's take a look at the available list of transformations. In the Design Center, we can look at a list of available transformations either custom or pre-built in the database in the Global Explorer panel under Public Transformations. There are several categories of transformations available to us as shown in the following screenshot:
We are primarily interested in the Character category because that is where we'll find functions that operate on character strings, and that can convert them to uppercase and remove whitespace. We can expand any of these lists to take a look at the names of the various transformations names available. We can also go to the online help for detailed explanations of all the functions in the Transformations heading under the ETL Design Reference section of the online help table of contents. You can access this by selecting Help | Table of Contents from the main menu of the Design Center, or by pressing the F1 key. The particular transformation names we need under the character heading are the upper() function to convert to uppercase and the trim() function to remove whitespace.
We can now move back to the Mapping Editor where we're creating our STORE_MAP mapping and begin to add the transformations, and through them connect source to target. The first data element we'll map is the store name, so let's drag a Transformation Operator onto the mapping and drop it between the POS_TRANS_STAGE table and the STORE dimension. We can find the Transformation Operator in the Palette window as shown in the following screenshot:
After dropping the Transformation Operator on the mapping, it will pop up a dialog box where we select the transformation we want to use. We have two options in this dialog box— create an unbound operator (basically, one that is not tied to an existing repository object) or select from an existing object. We'll select an existing one because we know that the function that will suit our purpose already exists. We'll scroll the window down until we see the TRIM() function as shown in the following screenshot:
Searching for a function
If we want to find the function quickly, rather than manually scrolling the window down, there's a not-so-obvious feature of this dialog box called the search capability. If we start typing the name of the function we want, it will automatically scroll down the list with each letter typed, until it settles on the one we want. For example, type a T and it highlights the very first line, TRANSFORMATIONS. Type an R next and it stays on that line because transformations start with those two letters. But type an I next and it jumps right down to the TRIM function we need. This option is much better to quickly find what we're looking for than manually scrolling the window. This option is a great help as it's so easy to scroll right by what we're looking for without realizing it. If you click anywhere on the window before typing, the search string will start the search at that point.
We'll now click on the TRIM function in the window and then on the OK button. This will display a TRIM Transformation Operator window on our mapping. It is like any other operator in that it has attributes, which are in groups depending on whether they are input, output, or both. In this case, a TRIM operator has one input attribute and one output attribute. The input attribute is the string we want to trim the whitespace from and the output attribute represents the result of applying the TRIM operator to the input string. It looks like the following screenshot:
With all of these Transformation Operators that we can select from, the attribute names will appear similar to the above character attributes named CHAR_, a return value named VALUE. We can change these if we want, but this will become tedious when large numbers of Transformation Operators are required. Leaving the Transformation Operators's attributes as they are will not affect the operation of the mapping.
We can now connect our STORE_NAME attribute in the POS_TRANS_STAGE mapping table operator to this new TRIM operator. We'll drag a line from STORE_NAME to CHAR_ in the TRIM operator. This succeeds in mapping the input for our new TRIM Transformation Operator, but now we need to map the output somewhere. We could just drag a line from the VALUE output attribute over to the STORE dimension. But we've said before that we need to apply an UPPER transformation on this value as well as a trim, so the value that ultimately gets loaded into our dimension will be in all uppercase letters.
Upper and lowercase issues
When working between an MS SQL Server Database and an Oracle Database, we will frequently find that the case of the strings we're working with becomes an issue. The Oracle Database is very case sensitive. If we store a string in the database as 'Some String', then searching for 'some string' will not get a match. It will match in SQL Server, even though the case is different. This is why, it is a good idea to store key fields that uniquely identify a record in the database in all uppercase. By doing so, we won't encounter a possible situation where two records get loaded into our data warehouse that differ by only the case of the key identifier, or we don't get a match at all because of a different case.
Now we need an UPPER transformation added for our STORE_NAME, so let's drag another Transformation Operator onto the mapping and drop it to the right of the TRIM operator. It is perfectly acceptable and very common, in fact, to have to map the output from one Transformation Operator into the input of another Transformation Operator. We will select the UPPER() function this time from the resulting pop-up window. It is close to the TRIM function in the dialog box as shown in the following screenshot:
The UPPER() function is similar to the TRIM() function in the number of arguments it takes and the value it returns—which is one in both the cases. It is different in that the UPPER() function does not specify the type of the arguments as the TRIM() function does. We can see from the previous screenshot that the type is listed as UNSPECIFIED. We know these are character functions because that is where we found them in the list. As a TRIM function removes characters (blank characters) from a string, this string must be a varchar2 string and not a string of a char type. A varchar2 string is a variable length string up to the maximum length it was defined with; so if you remove some characters from it, it just shrinks in size. However a char string is a string with a fixed length.
The database will fill up a char string with blanks up to the maximum size of the string if you store a string that is smaller than the defined size of the char string. A TRIM() will have no effect on this kind of field. An UPPER() function, on the other hand, will work on a string of any type. The Oracle Database SQL Language Reference manual (which can be found at http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/functions214.htm#i90176) indicates that the parameter can be any of the following: CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB. When we look up the TRIM() function, we see that it can only be a VARCHAR2 for input.
Thus, the UPPER transformation is not able to know beforehand the exact type of the input and output parameters. It will know about that only when we drag an actual value to it. There is also a difference in how the operator looks on the mapping just after being dropped. We can see in the following screenshot that unlike other operators in our mapping, the attribute type indicators in this operator (that appear to the right of the attributes) show as blank boxes:
These blank boxes will fill in automatically at some point after an attribute gets mapped to it, so let's continue. We will need the output of the TRIM operator to be the input of this one, so we'll drag a line from VALUE in the TRIM operator to CHAR_ in the UPPER operator. The type indicator in the window on the mapping will eventually update automatically to reflect the type that was connected to it, which is VARCHAR2 in this case.
We can now connect to the STORE dimension as we don't have any more transformations that we need to do to the STORE_NAME, but we need to decide where in the STORE dimension to connect. This is where it could be easy to make the wrong connection because looking at the STORE dimension we see that it has three NAME attributes, all of which can be used as input, as circled in the following screenshot:
Here the key in deciding which attribute to use depends on the levels and hierarchy that can exist in a dimension, and how certain attributes can be designated as dimension attributes, which can be found at every level of the dimension. In this case, the NAME attribute is just such an attribute. The hierarchy in this case is COUNTRY, REGION, STORE; and each level of the hierarchy has a NAME associated with it. This is the name of the store, so let's make sure to drag the line to the NAME attribute in the STORE group of the STORE dimension operator, which is the bottommost operator in the above screenshot.
At this point, our mapping should look roughly similar to the following. The placement of the operators on the mapping will vary, but it should generally be similar to the POS_TRANS_STAGE mapping table on the left as input, the TRIM and UPPER operators in the middle connecting the STORE_NAME attribute from input to the NAME attribute of the STORE group in the STORE dimension as output:
To keep our mapping from becoming too cluttered, collapse the transformation windows into their icon view after making all the connections through them. The icons take up much less space on the mapping and can always be opened again by double-clicking on them if needed in the future. We can click on the down arrow in the upper-right corner of the window to collapse them into the icon view.
Next, we'll take care of the STORE_NUMBER as it is the second part of the business identifier for the store. The name and number of the store are what uniquely identify a single store in the ACME Toys and Gizmos company, and are handled similarly in our mapping. The same two transformations are needed for the STORE_NUMBER field as for STORE_NAME from the POS_TRANS_STAGE input table, but we can't reuse the existing two transformations we just dropped onto our mapping. We will need to drag two more transformations to our mapping, making one a TRIM and another an UPPER just as we did for the STORE_NAME. We'll connect them in a manner similar to how we connected the previous two transformations, but this time we'll start with the POS_TRANS_STAGE mapping table operator. We will connect the STORE_NUMBER attribute to the input of the TRIM, the output of the TRIM to the input of the new UPPER we just dropped onto the mapping, and the output of the UPPER to the STORE_NUMBER attribute of the STORE dimension. There is only one STORE_NUMBER attribute in the dimension, unlike the name, because the STORE_NUMBER is not defined as a dimension attribute; it exists only at the STORE level as a level attribute.
At this point we have our STORE_NAME and STORE_NUMBER attributes connected to the dimension, and we'll continue with the two address fields, the city, the state, and the zip/postal code field. We determined that these fields will need to have spaces trimmed, but we do not want to make them uppercase except for the state field. They are not a part of the unique business identifier for an individual store and, apart from the state field, can be any combination of characters and/or numbers, which make them less likely to be queried for. The state field contains states in the US, which are commonly expressed as two uppercase characters, and so we'll apply the UPPER transformation to it.
We will need six more Transformation Operators dropped into our mapping, with five being for TRIM operators for each of those five fields and one for an UPPER() function to use for the state field. The following attributes of the POS_TRANS_STAGE mapping table operator will provide the input for the five TRIM operators:
The output of the TRIM operators for all but the STORE_STATE attribute will be connected directly to STORE level attributes of the STORE dimension as follows:
The TRIM output for the STORE_STATE attribute will be connected to the UPPER Transformation Operator, and the output from the UPPER operator will be connected to the STATE attribute in the STORE dimension.
After making all these connections, our mapping should now look similar to the following with all the Transformation Operators collapsed into their icon views:
We're not done with this mapping yet as we still have to map the STORE_REGION attribute to the STORE level, and map both the REGION and COUNTRY levels. Before continuing, let's save our work so far with the Mapping | Save All menu entry on the Mapping Editor. We can also use the Ctrl+S key combination.
Information about both the region and country comes from two attributes in our source staging table, the STORE_REGION and STORE_COUNTRY attributes. These are character fields for the name of the region and country the store is located in. While designing our STORE dimension , a NAME and DESCRIPTION field were created for us by default. We decided to leave it that way as that is a common design technique for dimensions and avoids the error we mentioned back then about not having any updatable fields. As we don't have separate name and description fields to draw from at this point, we'll just fill the same information into both fields in the STORE dimension. The NAME field is identified as the business identifier, so we'll put the value we store there into uppercase and leave the description in whatever case the source was in.
Let's start with the region attribute. We can see in our STORE dimension that there is a REGION_NAME attribute in the STORE group (level). This attribute indicates in which region on the REGION level this store is located. Looking at the REGION level we can see that there is a COUNTRY_NAME located there, which indicates the country from the COUNTRY level where the region is located. In terms of our mapping, this determines where we map the STORE_REGION and STORE_COUNTRY attributes to.
The first mapping change we'll do for the region is to finish up the STORE level attributes by mapping the STORE_REGION from the stage table to the REGION_NAME attribute in the STORE dimension, STORE level. We indicated earlier that names should be capitalized and spaces trimmed, so we'll drag two more Transformation Operators into our mapping—TRIM and UPPER—and map the STORE_REGION to the TRIM, the TRIM to the UPPER, and the UPPER to the REGION_NAME field.
This completes the STORE level except for the COUNTY attribute, and we still have this attribute plus the REGION and COUNTRY levels to complete. At this point, we've become more proficient in doing our mapping and including transformations. So we'll just continue to the REGION level and add the following connections and transformations without having to walk through each one in detail:
- STORE_REGION to NAME in the REGION level using TRIM and UPPER transformations
- STORE_REGION to DESCRIPTION in the REGION level using a TRIM transformation
- STORE_COUNTRY to COUNTRY_NAME in the REGION level using TRIM and UPPER transformations
- STORE_COUNTRY to NAME in the COUNTRY level using TRIM and UPPER transformations
- STORE_COUNTRY to DESCRIPTION in the COUNTRY level using a TRIM transformation
But we'll want to implement the following tip to make our mapping easier and less cluttered.
We have had a couple of instances earlier where the same input attribute needs to be mapped to more than one target attribute. We learned previously that we couldn't reuse a Transformation Operator on two different input attributes. However, we can reuse a Transformation Operator if the output goes to two different attributes. Multiple connections can be created from an output attribute in an operator, but only one input connection is allowed.
For example, we can use just one TRIM operator on the REGION_NAME and have its output go to an UPPER operator and also directly to the DESCRIPTION attribute in the REGION group of the STORE dimension. The output of the UPPER operator can then be connected to both the REGION_NAME of the STORE level and the NAME attribute of the REGION level. The same technique can be applied to the mappings for the COUNTRY_NAME.
The bottom line is to reuse the TRIM and UPPER operators just added for the NAME in the REGION level, and add one TRIM and one UPPER operator for the COUNTRY_NAME.
After adding the two additional transformations and making the connections already mentioned, our mapping should now look similar to the following screenshot:
Our STORE dimension is now mapped for every attribute except for the COUNTY attribute. We've saved this one for last because it is the most complex of our attributes to map for this dimension. The reason is that we don't have an exact match with an attribute from our input staging table to use. Let's save our work at this point and then investigate further how we need to map this attribute.
Using a Key Lookup operator
Key Lookup operators, as the name implies, are used for looking up information from other sources based on some key attribute(s) in a mapping. This is exactly what we will need to do to get the information for the COUNTY attribute of our STORE dimension. However, only tables, views, dimensions, and cubes can be used as the source for this operator. This means we need a table that can be used to look up the required county information. Importing the source metadata for a flat file called counties.csv, creates a file in our ACME_FILES module in Design Center for this file that contains the names of counties. It looks like we ought to be able to use the information in that file to build a lookup table, so that's exactly what we're going to do right now. We will use a simple Warehouse Builder mapping to do it in a couple of easy steps. First, we will need to create an external table to represent the counties.csv file. We could use the counties.csv file directly, but that would require using the SQL*Loader utility, which would not be consistent with the PL/SQL access that can be used for all the other sources. So we will create an external table using the simple steps outlined in the next section, and then follow that by using that external table as the source in a new mapping to load a lookup table.
Creating an external table
Metadata was imported from the counties.csv file which was created in a module separate from our main database module because a file is not a part of the database. However, external tables are created in the database as they are accessed just like regular database tables. However, unlike a regular table whose data is stored in the database, an external table's data is stored in a flat file that is external to the database.
External tables are created under the Oracle | ACME_DWH | External Tables node in the Design Center, so we'll right-click on it and select New... from the pop-up menu. This will launch the External Table wizard, which will guide us through the process. It is a three-step process that involves providing a name to use for the external table, specifying the file to use, and specifying the default location. The steps are as follows:
- By clicking on Next on the Welcome screen, we come to the screen labeled Step 1. We'll name this external table COUNTIES and click on Next to continue to the screen labeled Step 2.
- In this step we'll select the file that contains the metadata for the external table. It will display the name of any files that have been defined in our Files module. We can see our COUNTIES_CSV file listed, so we'll select that and click on Next to continue.
- This brings us to the screen labeled Step 3 where we will select the default location to use for this table. The drop-down menu on this screen will display the file locations that have been defined in the Design Center. We will select the ACME_FILES_LOCATION entry, which is for the files that exist for this project. Clicking on Next will bring us to the Summary screen where we can verify the information we just specified. It should look similar to the following screenshot:
- When we click on the Finish button, it will create a new entry called COUNTIES under the External Tables node in our project in the Design Center.
If we see anything we'd like to change, we can click on the Back button to move back through the screens to make any changes and click on Next until we get back here .
The wizard has created an external table with the column attributes that were listed in the Summary screen. These attributes correspond to the fields that are stored in the counties.csv flat file. We can query this table just as we query a table in the database
Creating and loading a lookup table
Now that we have our source table defined for our new lookup table, let's create a new mapping called COUNTIES_LOOKUP_MAP using the same method we've used previously. The steps to create a lookup table are:
- Right-click on the Mappings node, select New..., enter COUNTIES_LOOKUP_MAP in the name field, and click on the OK button.
- In the Mapping Editor that pops up, let's drag an External Table Operator from the Palette window onto the mapping.
- On the Add External Table Operator pop-up window that appears, our COUNTIES external table is visible. We will select that and click on the OK button to continue. This will drop an External Table Operator on our mapping that is bound to our COUNTIES external table.
- We need to get that data loaded into a regular table in the database, so next we'll drag a Table Operator onto the mapping.
- In the resulting Add Table Operator pop up that appears, we specify what table we want to add. We've seen this add operator dialog box before, but we've always been choosing an existing object to add. This time we're going to check the first option to Create unbound operator with no attributes and we'll give it the name COUNTIES_LOOKUP by typing that name into the box. This is shown in the following screenshot:
- We'll click on OK and it will drop a Table Operator onto our mapping with no attributes defined in it.
As this table doesn't yet exist, there are a few different ways we can go about creating it to hold our county information. We could create the table in the Design Center in the Tables node under our database module, and then drag that table into our mapping. Alternatively, we could create the table in the database and then import the metadata for that table as we import source metadata, or we can take the path we're taking now to make full use of the Warehouse Builder's automation and flexibility and create the table as we need it.
We need to define the attributes and we know we need the data loaded from the external table, so let's use these attributes in our example . We might think we have to enter each of these attributes individually into the Table Operator and then drag a line from the corresponding attribute in the external table over to the new table. But the Warehouse Builder makes this very easy; with one drag we can map an attribute group instead of individual attributes.
Let's drag a line from the output group (OUTGRP1) of our COUNTIES external table over to the input/output group (INOUTGRP1) of our new COUNTIES_LOOKUP table. With that one action, the new table operator immediately goes from being empty to having three attributes defined in it. These attributes have names that are the same as the external table attribute names and connecting lines are drawn for all three attributes to map them from the external table. This is very neat, and it just saved us a bunch of time.
This mapping is done. However, there is one more step we need to take to actually create the lookup table definition. Remember we created our table operator as an unbound operator, which means it's not associated with any database object. If we look in the ACME_DWH | Tables node, there is no table named COUNTIES_LOOKUP. The steps to create a new table object and to bind this operator to it are as follows:
When we right-click on the unbounded operator, the pop-up menu has a menu selection called Create and Bind.... With this option we will create a new table object in the OWB Tables node and bind this operator to it.
Let's select that menu entry from the pop-up menu and it will present us with the following dialog box:
- The name is the same as what we gave to the operator. We could name the underlying bound table something different, but it's best to leave it with the same name for clarity.
- The Create in: text field is to specify the module in which to create the new table under our project in the Design Center. It has defaulted to the Tables node under the current ACME_DWH module, and that is exactly where we want it. The drop-down option provides a listing of every Tables node in all the modules that are currently defined in our current project if we want to create it in one of those other modules.
- When we click on the OK button on this dialog box, a table is created in the Tables node and is bound to the operator.
To verify that, we can navigate to the ACME_DWH | Tables node under our database module and there is the new COUNTIES_LOOKUP table now. This completes the mapping and table creation. Our new table is now ready to include in a mapping as a Key Lookup operator.
To ensure that we don't end up with duplicate records in our new lookup table, we can take an extra step to define a primary key on this table.
When we use the option to create a table in this manner, it creates a basic, no-frills table with no constraints defined on it. To add a primary key, we'll perform the following steps:
- In the Design Center, open the COUNTIES_LOOKUP table in the Data Object Editor by double-clicking on it under the Tables node.
- Click on the Constraints tab.
- Click on the Add Constraint button.
- Type PK_COUNTIES_LOOKUP (or any other naming convention we might choose) in the Name column.
- In the Type column, click on the drop-down menu and select Primary Key
- Click on the Local Columns column, and then click on the Add Local Column button.
- Click on the drop-down menu that appears and select the ID column.
- Close the Data Object Editor.
The new table and mapping is now complete. It is very basic, but gives us an idea of the power of the Warehouse Builder to make our data warehouse design job easier. The mapping just handles inserts into the lookup table from the external table. We could add more bells and whistles to our lookup to handle updates or changes to the existing rows, but that is for more advanced topics.
We'll save our work up to this point with the Ctrl+S key combination, and then move on to make use of this new lookup table to retrieve the county information.
|Extract, Transform, and Load data to build a dynamic, operational data warehouse|
eBook Price: $32.99
Book Price: $54.99
Retrieving the key to use for a Lookup Operator
We now have a table definition created and a mapping completed that can load the table to use to look up the county name. But we need a key that will uniquely identify a record in the table and with which we can look up a county. The key has to be a data element that is unique in the file, and it would be the ID column we defined as the primary key for the table. It is a number that does not repeat itself for any of the rows in the file; so given a particular value of that number, we can find the county and the state that the county is in.
The STORE_NUMBER data element contained in the STORES source table has a code that indicates the county the store is located in for stores in the USA. This is actually a fixed known format, and the positions three through six of the number are actually the code for the location of the store in the county. This number is actually the ID number found in the counties.csv flat file, and which is the ID in the lookup table. So, we now have a key value that we can use to look up the county. However, there are still some more issues we have to work out before we can use it.
The county ID is only a portion of the entire STORE_NUMBER field, so we can't just use the STORE_NUMBER from input as the direct key to a Key Lookup Operator. We will have to extract the ID number out of it and then convert it to a number before we can use it to look up the county. This implies that some more transformations will be needed, so let's work on getting that county ID extracted from the STORE_NUMBER field.
Adding a SUBSTR Transformation Operator
The Transformation Operators available to us in OWB include a SUBSTR (or substring) transformation that will do exactly what we need to extract the county ID value out of the STORE_NUMBER field. The SUBSTR transformation takes three parameters—the string we want to extract the substring from, a number indicating the start position of the substring within the string, and a number indicating the length of the substring to extract.
So, let's drag a Transformation Operator onto the STORE_MAP mapping between the POS_TRANS_STAGE table and the STORE dimension below all the other Transformation Operators. On the resulting Add Transformation Operator pop-up window, select the SUBSTR() transformation and it will place the following operator into our mapping:
For the SUBSTR operator, we need to make sure we select the correct version as there are five different variants of SUBSTR we could choose from. They are SUBSTR, SUBSTR2, SUBSTR4, SUBSTRB, and SUBSTRC. The main SUBSTR version is the one we want because it works on regular character strings. The others only vary in the type of input character string they operate on. A more in-depth description of the SUBSTR() function and its variants is in the Oracle Database SQL Language Reference Manual, which is available online at the Oracle Database Documentation web site.
When first dropped on the mapping, this operator may not look exactly like the above screenshot in which the operator is fully expanded. To see the whole operator contents at once, we can click and drag an edge to manually make the window bigger or click on the symbol in the upper-right corner with the arrow pointing upwards as indicated in the following screenshot, which shows the operator before being expanded fully:
We didn't have this issue with any of the Transformation Operators we included earlier, but it's helpful here for being able to see the entire contents of the operator.
All windows on the mapping, and not just the Transformation Operators, have this feature for expanding the window size. We'll find that the table, dimension, and cube operators need to be expanded frequently to see the entire contents, and this is a quick way to do it.
Let's continue mapping attributes to the SUBSTR operator. The STRING attribute is easy, which will be the STORE_NUMBER from the POS_TRANS_STAGE table. So let's drag a line from STORE_NUMBER down to STRING. The position and length are not so obvious. We don't have any values in the source table to use for those two so we need to create something to use.
The second and the third parameter—the position and length—these need to be constant integer values that we supply. By looking at the list of operators available to us in the Palette window, we see that there is a CONSTANT operator as shown in the following image. We can use this operator by dragging it from the Palette window in the Mapping Editor:
Adding a Constant operator
We'll click and drag a Constant operator onto the mapping to the left of the SUBSTRTransformation Operator. We can see that it has an output group called OUTGRP1 by default. We'll right-click on it and select Open Details... from the pop-up menu. This opens an editor on the CONSTANT operator, which should look like the following screenshot:
The tab that is highlighted when the dialog box opens depends on what was right-clicked. As we can see, this editor has tabs for editing the Name of the operator, the Groups, and the Output Attributes of the output group. The Constant operator only allows output, so there is no input group defined or allowed. If it was an operator that allowed input, such as a function or procedure that took parameters (for example, the SUBSTR operator), there would be an additional tab for Input Attributes also.
Clicking on the Output Attributes tab we see that there are no attributes currently existing for this operator. This is where we will add our constants that we need for the SUBSTR operator. We can actually enter more than one constant in the same operator (which is a good thing to do if we are using those constants together anyway), which we are doing in this case. We could just as easily drag another Constant operator onto the mapping for the other constant we need; it's really just a matter of preference. Functionally, the result will be the same when OWB deploys and executes the mapping.
To add an attribute, click on the Add button. This will create an output attribute in the group with a default name and data type that we can then edit to suit our purposes. We'll change the name of this first constant attribute to reflect the destination for this value, that is the position attribute of the SUBSTR operator, so we'll name it POSITION also. Click on the default name, OUTPUT1, and it will highlight the name and we can then type in what we want it to be. We'll change the name to position.
We don't have to worry about capitalization as the Warehouse Builder will automatically convert everything to uppercase anyway. We'll see this when we click elsewhere on the dialog box and the focus moves out of that field, or if we close the dialog box and then look at the name in the operator on the mapping.
Next, we need to make sure the data type is correct. The position value to which we're going to map this constant in the SUBSTR operator is defined as a NUMBER with no precision or scale specified (that is, both set to zero).
We are not going to bother specifying a precision or scale for the constants we're creating because we don't need the extra data integrity checks in the database and the SUBSTR POSITION attribute is defined that way. We'll leave the precision and scale set to zeros, which is the default.
We need another constant value defined to indicate the length of the substring, so let's add another attribute on the Output Attributes tab of the CONSTANT Editor dialog box. We'll click on the Add button once more and change the name of this attribute to LENGTH to reflect its purpose. We'll leave the data type set to NUMBER, and the default precision and scale set to zero as we did for the POSITION attribute. We'll click on the OK button on the dialog box to close it.
We did not have a chance to specify the actual values of the two constant values we just entered because the Editor dialog box does not include that information. This is set via another property of the attributes, which is available to us in the Attribute Properties window on the left of the Mapping Editor. So we'll click on the first attribute in the CONSTANT operator, the POSITION, and look for the attribute property called Expression. It just so happens to be the very first attribute in the window, so let's click in the blank box to the right of Expression. In this field, we will enter the value to be set to this attribute. We'll enter 3 to indicate that we want the position of the substring to start at the third position of the source string. We can also use the Expression Builder by clicking on the button with three dots (...). But as this is a simple constant, we'll just enter 3 in the box.
Now we'll do the same thing for the LENGTH attribute and we'll enter a length of 4, which is the length of the county ID portion of the STORE_NUMBER.
The next step is to connect our constants to the corresponding attributes of our SUBSTR operator. We'll drag a line from POSITION in the CONSTANT operator to the POSITION attribute of the SUBSTR operator, and from the LENGTH attribute to the SUBSTRING_LENGTH attribute.
Adding a TO_NUMBER transformation
The SUBSTR value is ready and we can use it to look up the county ID, but there's one more transformation we need to apply before we can use it to look up the county name. First, it needs to be converted into a number to match the data type of the ID field in the COUNTIES_LOOKUP table. To do this, we will use the TO_NUMBER() function. So let's drag a Transformation Operator onto our mapping to the right of the SUBSTR operator and select TO_NUMBER from the resulting pop up.
This operator needs three parameters, only one of which is absolutely necessary—the expression we wish to convert to a number. The other two parameters are optional and include a format string that we can use if we have a particular format of number we want (such as a decimal point in a certain place) and a parameter that allows us to set a certain national language format to default to if it's different from the language set in the database. We'll just map the input expression because our number is a straight integer format number. So let's drag a line from the VALUE attribute of SUBSTR to the EXPR input attribute of the TO_NUMBER operator.
We are now ready to look up the number to find the county name. The final step we need to perform now is to actually add the Key Lookup operator that we'll use to do that, so let's continue with that task.
Adding a Key Lookup operator
After that little side trip to quickly create our lookup table and add a SUBSTR operator with a TO_NUMBER transformation to convert the result to a number, we can now add a Key Lookup operator to our mapping for looking up the county name. Let's drag a Key Lookup operator onto the mapping and drop it to the right of the TO_NUMBER operator. We can find the Key Lookup operator in the Palette window just as we did for the other operators we've added. After we drop it in the mapping, the KEY_LOOKUP Wizard is launched and it presents us with the Welcome screen. The wizard will guide us step-by-step through the process of defining our key lookup. It is composed of six steps, which we can see on the opening Welcome screen. A portion of this screen is shown here:
- After the welcome screen, the first step asks us for a name for this Key Lookup. It has a default name of KEY_LOOKUP. We'll change it to COUNTIES_LOOKUP and click on the Next button to proceed to the screen labeled Step 2.
- This screen indicates that Key Lookup operators require one input and one output group, and here we have an opportunity to rename the groups if we desire. We'll leave them with their default names INGRP1 and OUTGRP1, and click on Next to continue.
- The next screen labeled Step 3 asks us to select one or more operators to map into the input group of the Key Lookup operator. This is where we will specify that the output (or return value) of the TO_NUMBER operator should be the value to use as input. We will choose the output attribute from that operator and it will create an input operator in the Key Lookup to match it. We'll look for the TO_NUMBER operator in the left window. We will scroll it down if it's not visible and expand it by clicking on the plus sign, and then we will expand the RETURN entry by clicking on the plus sign. We'll click on the VALUE attribute of the TO_NUMBER operator to select it and will click on the right arrow (>) to assign it to the INGRP1 of the COUNTIES_LOOKUP Key Lookup operator as shown in the following screenshot:
- Now we've indicated that we want to use the output value from our TO_NUMBER operator to be the input value (or key) for the lookup operator. We'll click on Next and in the screen labeled Step 4 we will specify in which object to look up the value.
- The object can be a table, view, dimension, or cube object. In the screen labeled Step 4, we will select our new lookup table. So clicking on the drop-down menu at the top, we expand the ACME_DWH entry and see that it lists all the available tables, views, dimensions, and cubes in our project. We'll select the COUNTIES_LOOKUP table.
- We'll then click in the first row of the Lookup Table Key column. In the resulting drop-down menu that appears (which may take a moment or two to appear, so we'll be patient), we'll select the primary key we defined on the table. We could also have selected an individual column if we did not have a primary key on the table.
- Having selected the primary key, we now have to specify an input attribute.We'll click in that box and see that it has added a row beneath with the ID, which shows as the column to use for the Lookup Table Key. Now we need to select the Input Attribute in the row in order to select the column from input that we want to use to match to this key column. This would be the VALUE attribute from our TO_NUMBER operator that we used in step 3,so we'll select that from the resulting drop-down menu.
It may seem redundant to make this selection here, but there could be more than one attribute used in a lookup. Therefore, we have to go through this step to indicate which input attribute matches with which lookup table key. In this case, we happen to have only a single attribute to use for the lookup.
Now our dialog box should look similar to the following:
- We will click on the Next button to proceed to the final step where we will specify what to return if no record is found in the lookup table. Here we are only concerned with the COUNTY_NAME column as that is the value we need to map to the SALES cube. We'll specify a default value of UNKNOWN rather than just leave it NULL. So we'll click on NULL that currently appears as the default for the COUNTY_NAME value and type in 'UNKNOWN' in the box.
We have to make sure we include the single quotes around this string because it is a character string and the Oracle Database requires single quotes around character literals.
It also has an editor available to give us more power over the expression we might want to use to determine the value. But in our case, we only want a single string to be used, so we can just type it in.
- We will click on the Next button to proceed to the Summary screen. It should look similar to the following screenshot:
- We will click on the Finish button and the wizard ends and drops a Key Lookup operator on our mapping with a connection line already drawn from the output attribute of the TO_NUMBER operator.
- Now connect the COUNTY_NAME field from this Key Lookup operator to the COUNTY attribute in the STORE level of the STORE dimension and we are done with this mapping.
Now we have a completed mapping that will populate our STORE dimension. Our final mapping should look similar to the following screenshot:
Of course, there are an almost infinite number of ways we could have organized our mapping. The mapping we just saw was somewhat compressed to better fit the available size for the image, so we won't focus on making the mapping look exactly like that. The important thing is that all the connections are made as they are shown in the mapping and not where each individual operator appears on the mapping.
Having completed our STORE mapping, we'll save our work with the Ctrl+S key combination.
We have discussed STORE Mapping in Oracle Warehouse Builder.
If you have read this article you may be interested to view :
About the Author :
Bob Griesemer has over 27 years of software and database engineering/DBA experience in both government and industry, solving database problems, designing and loading data warehouses, developing code, leading teams of developers, and satisfying customers. He has been working in various roles involving database development and administration with the Oracle Database with every release since Version 6 of the database from 1993 to the present. He has also been performing various tasks, including data warehouse design and implementation, administration, backup and recovery, development of Perl code for web-based database access, writing Java code utilizing JDBC, migrating legacy databases to Oracle, and developing Developer/2000 Oracle Forms applications. He is currently an Oracle Database Administrator Certified Associate, and is employed by the Northrop Grumman Corporation, where he is currently a Senior Database Analyst on a large data warehouse project.