Oracle Warehouse Builder 11g: Getting Started — Save 50%
Extract, Transform, and Load data to build a dynamic, operational data warehouse
In this article by Bob Griesemer, we'll create the Time Dimension using the wizard that the Warehouse Builder provides for us to simplify the task of building cubes and dimensions. We'll also look at the Data Object Editor in detail.
Let's discuss briefly what a Time dimension is, and then we'll dive right into the Warehouse Builder Design Center and create one. A Time dimension is a key part of most data warehouses. It provides the time series information to describe our data. A key feature of data warehouses is being able to analyze data from several time periods and compare results between them. The Time dimension is what provides us the means to retrieve data by time period.
Do not be confused by the use of the word Time to refer to this dimension. In this case, it does not refer to the time of day but to time in general which can span days, weeks, months, and so on. We are using it because the Warehouse Builder uses the word Time for this type of dimension to signify a time period. So when referring to a Time dimension here, we will be talking about our time period dimension that we will be using to store the date. We will give the name Date to be clear about what information it contains.
Every dimension, whether time or not, has four characteristics that have to be defined in OWB:
- Dimension Attributes
- Level Attributes
The Levels are for defining the levels where aggregations will occur, or to which data can be summed. We must have at least two levels in our Time dimension. While reporting on data from our data warehouse, users will want to see totals summed up by certain time periods such as per day, per month, or per year. These become the levels. A multidimensional implementation includes metadata to enable aggregations automatically at those levels, if we use the OLAP feature. The relational implementation can make use of those levels in queries to sum the data. The Warehouse Builder has the following Levels available for the Time dimension:
- Fiscal week
- Calendar week
- Fiscal month
- Calendar month
- Fiscal quarter
- Calendar quarter
- Fiscal year
- Calendar year
The Dimension Attributes are individual pieces of information we're going to store in the dimension that can be found at more than one level. Each level will have an ID that identifies that level, a start and an end date for the time period represented at that level, a time span that indicates the number of days in the period, and a description of the level.
Each level has Level Attributes associated with it that provide descriptive information about the value in that level. The dimension attributes found at that level and additional attributes specific to the level are included. For example, if we're talking about the Month level, we will find attributes that describe the value for the month such as the month of the year it represents, or the month in the calendar quarter. These would be numbers indicating which month of the year or which month of the quarter it is.
The Oracle Warehouse Builder Users' Guide contains a more complete list of all the attributes that are available. OWB tracks which of these attributes are applicable to which level and allows the setting of a separate description that identifies the attribute for that level. Toward the end of the chapter, when we look at the Data Object Editor, we'll see the feature provided by the Warehouse Builder to view details about objects such as dimensions and cubes.
We must also define at least one Hierarchy for our Time dimension. A hierarchy is a structure in our dimension that is composed of certain levels in order; there can be one or more hierarchies in a dimension. Calendar month, calendar quarter, and calendar year can be a hierarchy. We could view our data at each of these levels, and the next level up would simply be a summation of all the lower-level data within that period. A calendar quarter sum would be the sum of all the values in the calendar month level in that quarter, and the multidimensional implementation includes the metadata to facilitate these kinds of calculations. This is one of the strengths of a multidimensional implementation.
The good news is that the Warehouse Builder contains a wizard that will do all the work for us—create our Time dimension and define the above four characteristics—just by asking us a few questions.
|Extract, Transform, and Load data to build a dynamic, operational data warehouse|
eBook Price: £20.99
Book Price: £33.99
Creating a Time dimension with the Time Dimension Wizard
Let's start creating our Time dimension by launching Design Center if it's not already running. In the Project Explorer window, we're going to expand the Databases node under ACME_DW_PROJECT, and then our ACME data warehouse node ACME_DWH. We will right-click on the Dimensions node, and select New | Using Time Wizard... to launch the Time Dimension Wizard. The Time Dimension Wizard will walk us through a six-step process to define the characteristics of our Time dimension. The first screen will describe these steps for us, which is shown here so we can see what it will be asking us:
- The first step of the wizard will ask us for a name for our Time dimension. We're going to call it DATE_DIM. If we try to use just DATE, it will give us an error message because that is a reserved word in the Oracle Database; so it won't let us use it.
- The next step will ask us what type of storage to use for our new dimension, as shown here:
- Now this brings us to step 3, which asks us to specify the data generation information for our dimension. The Time Dimension Wizard will be automatically creating a mapping for us to populate our Time dimension and will use this information to load data into it. It asks us what year we want to start with, and then how many total years to include starting with that year. The numbers entered here will be determined by what range of dates we expect to load the data for, which will depend on how much historical data we will have available to us. We have checked with the DBAs for ACME Toys and Gizmos Company to get an idea of how many years' worth of data they have and have found out that there is data going back three years. Based on this information, we're going to set the start year to 2007 with the number of years set to three to bring us up to 2009.
The other option available to us on the data generation step is the type of Time dimension to create. It can be based on a calendar year or fiscal year. This provides us with the flexibility to define our Time dimension based on what our company actually uses for its financial year. ACME Toys and Gizmos Company operates on a calendar-year basis, so we'll leave it set at calendar.
- This step is where we choose the hierarchy and levels for our Time dimension. We have to select one of the two hierarchies. We can use the Normal Hierarchy of day, month, quarter, and year; or we can choose the Week Hierarchy, which consists of two levels only—the day and the calendar week. Notice that if we choose the Week Hierarchy, we won't be able to view data by month, quarter, or year as these levels are not available to us. This is seen in the following image:
- Let's move on to step 5 where the wizard will provide us the details about what it is going to create. An example is shown in the following image, which is what you should see if you've made all the same selections as we've moved along.
- Continuing to the last step, it will display a progress bar as it performs each step and will display text in the main window indicating the step being performed. When it completes, we click on the Next button and it takes us to the final screen—the summary screen. This screen is a display of the objects it created and is similar to the previous display in step 5 of 6 that shows the pre-create settings. At this point, these objects have been created and we press the Finish button. Now we have a fully functional Time dimension for our data warehouse.
Here we get to designate whether we want a relational physical implementation in the database or a multidimensional implementation. This is what was referred to earlier as checking a box to switch between the two. Simply select one or the other, and this is how our design will be implemented in the database with no changes by us required at all. We're going to implement our data warehouse using the pure relational option. So we're going to select ROLAP, as shown in the image above. Both the pure relational implementation and the relational OLAP option, which we discussed in the last chapter, are available by selecting the ROLAP option here. We can set a deployment configuration option that defaults to deploying data objects only. But this can be changed to deploy the OLAP metadata to the OLAP catalog also. In both cases, this will result in the generation of relational database objects in a star schema. However, if that option is selected, it will only store the OLAP metadata in the OLAP catalog in the database. We'll see where to set that option when we look at the Data Object Editor.
The levels are not available to us because a week does not roll up or aggregate to a month. Some months have four weeks while some have five, and that's not even exact weeks. The only month that has a month evenly divided by weeks is February, and that's only during non-leap years. So, we can see that weeks do not sum up nicely into months, or any higher level of time. How about year? Surely, that must sum up nicely we might say, aren't there 52 weeks in a year? Multiply 52 by 7 and we get 364 days. So, even that won't work. Thus, if we choose to model weeks as one of our levels, we get day and week and that's it.
This points out an important aspect of aggregation when deciding what our levels should be. It's very important to keep that idea of aggregation or summing in mind when choosing levels, or we will end up with data that doesn't make sense. The Time Dimension Wizard will not allow us to choose levels that don't sum up correctly because it has predefined a list of levels for us to choose from, with preset hierarchies. However, when defining any other dimension type, we'll definitely have to keep this in mind as we'll be specifying levels and hierarchies ourselves rather than choosing from the predefined ones.
We're going to select the normal hierarchy, and now we can choose which of the levels to include. It is always a good idea to include the lowest level possible in our hierarchy to provide maximum flexibility in aggregating data in this dimension. If we leave out day, then we will never be able to view our data by day, but only by month at the lowest level.
In the following image we can see the dimension attributes, levels, and hierarchies that will be created:
We can also see a couple of extra items at the bottom that we haven't discussed yet such as a sequence and a map name.
The sequence is an object that will be created to populate the ID values with unique numbers. It is created automatically for us by the wizard. This ID value is used as what is called the Surrogate Identifier for a level record. This value stands in (acts as a surrogate) for the actual unique identifier for the record. The actual identifier is called a Business Identifier . It contains one or more attributes that have been selected by us to uniquely represent a one-level record to differentiate it from another.
When we link a dimension to a cube, it will use that surrogate identifier as the key to link to, as this is easier for the database to use than a potentially multi-attribute business identifier. However, we think in terms of the business identifier. The Time dimension created by the wizard creates an attribute called CODE, which it uses as the business identifier. It is a number that is used to represent the date for the level record. We will shortly use the New Dimension Wizard to create our Product dimension, and there we'll see how to specify a business identifier explicitly.
The format of the CODE, which is created automatically in a date dimension using the Time Wizard, is documented in the OWB User's Guide in Chapter 14 on Defi ning Dimensional Objects (http://download.oracle.com/docs/cd/B28359_01/owb.111/b31278/toc.htm). The section entitled Using a Time Dimension in a Cube Mapping explains that the format is YYYYMMDD where YYYY is the year, MM the two-digit month, and DD the two-digit day of the month. We'll need this information in Chapter 7 when we actually use the DATE_DIM dimension.
The DATE_DIM_MAP map entry that we can see in the previous image is a mapping for our DATE_DIM dimension, which can be run to populate the dimension. It will be created automatically for us by the wizard.
We could use the Data Object Editor to create our Time dimension, but we would have to manually specify each attribute, level, hierarchy, and sequence to use. Then we would have to create the mapping to populate it. So we definitely saved quite a bit of time by using the wizard.
The Time Dimension Wizard does quite a bit for us. Not only does it create the Time dimension, but also creates a couple of additional objects needed to support it. Take a look at the following image, which is what our project explorer looks like after running this wizard:
Besides the dimension that it created, we now have a mapping that appears under the Mappings node. This is what we will deploy and run to actually build our Time dimension. We can also see that a sequence was created under the Sequences node. This is the sequence the dimension will use for the ID attribute that it created automatically as the surrogate identifier.
This completes our Time dimension.
|Extract, Transform, and Load data to build a dynamic, operational data warehouse|
eBook Price: £20.99
Book Price: £33.99
Using the Data Object Editor
The Data Object Editor is the manual editor interface that the Warehouse Builder provides for us to create and edit objects. We did not have to use it to create a dimension, but more advanced implementations would definitely need to make use of it; for instance, to edit the cube to change the aggregation method that we just discussed. We'll take a brief look at it here before moving on to get an idea of some of the features it provides.
We can get to the Data Object Editor from the Project Explorer by double-clicking on an object, or by highlighting an object (by selecting it with a single click), and then selecting Edit | Open Editor from the menu. Let's open the DATE_DIM dimension in the Data Object Editor and examine it as shown here:
All of the editors available to us in OWB have this same basic layout. Only the content of each section changes depending on what is being edited. We'll start in the upper right window and move around counter clockwise, discussing each window briefly.
- Canvas: Every editor has an area in which the contents are displayed graphically. This is called the Canvas. As we're in the Data Object Editor, the objects in the Canvas will be the objects that we created to hold data, which in this case are our cube and dimensions. Each object is displayed in a box with the name of the object as the title of the box and attributes of the object listed inside the box. These boxes can be moved around and resized manually to suit our tastes. There are three tabs available in the Data Object Editor Canvas: one for Relational, one for Dimensional, and one for Business Definition. They are for displaying objects of the corresponding type. As we're working with cubes and dimensions, these will be displayed on the Dimensional tab. If we were working with the underlying tables, they would have appeared on the Relational tab. The Business Definitions are for interfacing with the Oracle Discoverer Business Intelligence tool to analyze data.
- Explorer: This is roughly analogous to the Project Explorer in the main Design Center interface, but it displays a subset of the objects that is applicable to the type of editor we have open. As this is the Data Object Editor, we can see other data objects in the Explorer. The Available Objects tab shows us objects that are available to include on our Canvas, the Selected Objects tab shows the objects that are actually currently on the Canvas, and will highlight the object currently selected.
- Configuration: The configuration window displays configuration information (properties) about items on our Canvas. If nothing shows in this window, just select an object in the Canvas by clicking on it and the configuration will appear. It is here that we can change the deployment option for the object to deploy OLAP metadata if we want a relational implementation to store the OLAP metadata. With the DATE_DIM dimension selected, scroll the Configuration window down to the Identification section, which contains a setting for the deployment option and we can see that it is set to deploy data objects only. For dimensions, the options are to deploy to catalog only (the OLAP catalog), deploy data objects only, or deploy all to do both. For cubes, there is an additional option to deploy aggregations.
- Palette: The Palette contains each of the objects that can be used in the Data Object Editor. In this case, they are all data objects. The list of objects available will change as the tab is changed in the Canvas to view different types of object. We can use this to create objects on our Canvas by clicking and dragging to the Canvas. This will create a new object where clicking and dragging from the Explorer will place an already created object on the canvas.
- Bird's Eye View: This window displays a miniature version of the entire Canvas and allows us to scroll around the Canvas without using the scroll bars. We can click and drag the blue-colored box around this window to view various portions of the main canvas, which will scroll as we move the blue box. We will find that in most editors, we will quickly outgrow the available space to display everything at once and will have to scroll around to see everything. This can come in very handy for rapidly scrolling the window.
- Dimension Details: This is the window on the lower right and it contains details about the dimension we are currently editing. If nothing is displayed in the window, just click on the DATE_DIM dimension and its details will appear. Six tabs will appear, which display information for us. The names on those six tabs will change depending on the type of object we have selected. If we go ahead and click on the Sales cube, we'll see the tabs change. Back to the dimension, the tabs from left to right are as follows:
- Name: This tab displays the name of the dimension along with some other information specific to the dimension type we are looking at. In this case, it's a Time dimension created by the Time Dimension Wizard and so it displays the range of data in our Time dimension.
- Storage: Here we can see what storage option is set for our dimension object in the database, whether Relational or Multidimensional. If we wanted to switch between the two, this is where we could do it. For a relational implementation, we're able to specify a star or snowflake schema and whether we want to create composite unique keys. A composite key is one made up of more than one column to define uniqueness for a record. In most cases, it is a good idea to have this checked as it enforces uniqueness in the database for our dimension records. For a dimension, it will use the business identifiers we've specified as the key fields.
- Attributes: The attributes tab is where we can see the attributes that are designed for our dimension. It displays the attributes in a tabular form allowing us to view and/or edit them, including adding new attributes or deleting the existing ones. It is here that we can also change the description of our attributes if we wanted, or add descriptions the wizard did not add.
- Levels: This is where we view and/or edit the levels for our dimension. We are able to edit some of the information on this tab for the Time dimension created by the wizard, but not all. We can check and uncheck boxes to indicate which of the various level types we want to use and which attributes are applicable to which level, but that is it. We are not able to add or remove any levels or attributes. If we were to view one of the other dimensions we created, it would be fully editable. For those other dimensions we could also assign different names and descriptions to the attributes for each level.
- Hierarchies: This tab will let us specify hierarchy information for our dimension and will even let us create a new hierarchy. It's possible that we may have selected more levels on the previous page and now need to assign them to a hierarchy. There is also a Create Map button here that will automatically generate the mapping for us if we modify the hierarchies. This is one of the benefits of the Time dimension created by the wizard. Ordinary dimensions such as our Store and Product dimension will not have this Create Map button displayed on their Hierarchies tab.
- Data Viewer : The Data Viewer is a more advanced feature that allows us to actually view the data in an object we are editing. This is only available for an object if it has been deployed to the database and has data loaded into it. It has a query capability to retrieve data and can specify a WHERE clause to get just the data we might need to see. For relational implementations, it will not display the data for a dimension or cube; but we can use it to view the data in the underlying table.
You may have noticed by now that the attributes in our Time Dimension are not editable. They all appear as one solid background. We can scroll the window to display them and see what they are set to, but we can't change them. This is a feature of the Time dimension that was created by the wizard. It has created extra objects (as we saw earlier) to support the Time dimension like a mapping that could break if the wrong changes are made. So, it disallows changes. It is possible to modify the dimension behind the scenes to edit things, but that is a much more advanced topic.
- Cube Details: If we click on the Sales Cube, the details window changes to display the details of our cube and the title changes to Cube Details. The tabs also change slightly:
- Name: It has a name tab like the dimensions to display its name.
- Storage: It has a storage tab as per dimensions. However, we see a different option here under the Relational (ROLAP) option where we can create bitmap indexes. An index is a database feature that allows faster access to data. It is somewhat analogous to the index of a book that allows us to get to a page in the book with the information we want much faster. A bitmap-type index refers to how it is stored in the database and is generally a better option to use for data warehouse implementations (so it is checked by default). There is also a composite unique key checkbox for cubes as there was for dimensions.
For a cube, checking this box will create a unique key out of the foreign keys for the dimensions referenced by the cube. We want to check this box to ensure we can't enter duplicate data into our cube, that is, more than one cube record with the same set of dimension attributes assigned.
- Dimensions: Instead of attributes, the cube has a tab for dimensions. The dimensions referenced by a cube are basically its attributes.
- Measures: The next tab is for the measures of the cube. It is for those values that we are storing in our cube as the facts that we wish to track.
- Aggregations: Instead of hierarchies, a cube has aggregations. There are various methods of aggregation that we can select, as seen in the drop-down box, the most common of which is sum, which is the default. This is where the default aggregation method referred to earlier can be changed. There will be no aggregations in a pure relational implementation, so we will leave this tab set to the defaults and not bother changing it.
- Data Viewer: There is a tab for the data viewer to view cube data just as there is for a dimension. For pure relational implementations, it views the underlying table data.
We can see on our canvas that not only does the DATE_DIM dimension appear, but also a box for our SALES cube with a connecting line to the DATE_DIM dimension box. The Data Object Editor will display objects in context and this is showing us that the DATE_DIM object is referenced from the SALES Cube.
These are the main features of the Data Object Editor. We can use it to view the objects the wizards have created for us, edit them, or create brand new objects from scratch. We can start with an empty canvas and drag new objects from the palette, or existing objects from the explorer, and then connect them. We will see other editors very similar to this from the next chapter when we start to look at ETL and mappings.
In this article, we dove right in to creating our dimension and a cube using the Warehouse Builder Design Center. We used the Wizard available to help us out, as well as investigated the flexibility to manually create, view, and edit objects using the Data Object Editor.
If you have read this article you may be interested to view :
About the Author :
Bob Griesemer has over 27 years of software and database engineering/DBA experience in both government and industry, solving database problems, designing and loading data warehouses, developing code, leading teams of developers, and satisfying customers. He has been working in various roles involving database development and administration with the Oracle Database with every release since Version 6 of the database from 1993 to the present. He has also been performing various tasks, including data warehouse design and implementation, administration, backup and recovery, development of Perl code for web-based database access, writing Java code utilizing JDBC, migrating legacy databases to Oracle, and developing Developer/2000 Oracle Forms applications. He is currently an Oracle Database Administrator Certified Associate, and is employed by the Northrop Grumman Corporation, where he is currently a Senior Database Analyst on a large data warehouse project.