Chapter 9. Performing Advanced Operations with Databases
In this chapter you will learn about advanced operations with databases. The first part of the chapter includes:
The second part of the chapter is fully devoted to datawarehouse-related concepts. The list of the topics that will be covered includes:
Preparing the environment
In order to learn the concepts of this chapter, a database with little or no data is useless. Therefore, the first thing you'll do is populating your Jigsaw puzzle database.
Time for action – populating the Jigsaw database
To load data massively into your Jigsaw database, you must have the Jigsaw database created and the MySQL server running. You already know how to do this. If not, please refer to Chapter 1 for the installation of MySQL and Chapter 8 for the creation of the Jigsaw database.
Note
This tutorial will overwrite all your data in the js
database. If you don't want to overwrite the data in your js
database, you could simply create a new database with a different name and run the js.sql
script to create the tables in your new database.
After checking that everything is in order, follow these instructions:
From Packt's website download the js_data.sql
script file.
Launch the MySQL query browser.
From the File menu select Open Script....
Locate the downloaded file and open it.
At the beginning of the script file you will see this line:
If you created a new database, replace the name js
by the name of your new database.
Click on the Execute button.
At the...
Looking up data in a database
You already know how to create, update, and delete data from a database. It's now time to learn to look up data. Lookup is the act of searching for information in a database. You can look up a column of a single table or you can do more complex lookups. Let's begin with the simplest way of looking up.
Sometimes you need to get information from a database table based on the data you have in your main stream. Let's see how you can do it.
Time for action – using a Database lookup step to create a list of products to buy
Suppose you have an online system for your customers to order products. On a daily basis, the system creates a file with the orders information. Now you will check if you have stock for the ordered products and make a list of the products you'll have to buy.
Create a new transformation.
From the Input category of steps, drag a Get data from XML step to the canvas.
Use it to read the orders.xml
file. In the Content tab, fill the Loop XPath option with the /orders/order
string. In the Fields tab get the fields.
Do a preview. You will see the following:
Note
To keep this exercise simple, the file contains a single product by order.
Add a Sort rows step and use it to sort the data by man_code
, prod_code
.
Add a Group by step and double-click it.
Use the upper grid for grouping by man_code
and prod_code
.
Use the lower grid for adding a field with the number of orders in each group. As Name write quantity
, as Subject ordernumber...
Time for action – using a Database join step to create a list of
suggested products to buy
If your customers ordered a product that is out of stock and you don't want to let them down, you will suggest them some alternative puzzles to buy.
Open the transformation of the previous tutorial and save it under a new name.
Delete the Text file output step.
Double-click the Group by step and add an aggregated field named customers
with the list of customers separated by (,
). Under Subject, select idcus
and as Type, select Concatenate strings separated by
,
.
Double-click the Database lookup step. In the Values to return from the lookup table grid, add pro_theme
as value in the String
field.
Add a Select values step. Use it to select the fields customers
, quantity
, pro_theme
, and pro_name
. Also rename quantity
as quantity_param
and pro_theme
as theme_param
. From the Lookup category, drag a Database join step to the canvas. Create a hop from the Select values step to this step.
Double-click the Database...
Introducing dimensional modeling
So far you have dealt with the Jigsaw puzzles database, a database used for daily operational work. In the real-world, a database like this is maintained by an On-Line Transaction Processing (OLTP) system. The users of an OLTP system perform operational tasks—sell products, process orders, control stock, and so on.
As a counterpart, a datawarehouse is a nonoperational database; it is a specialized database designed for decision support purposes. Users of a datawarehouse analyze the data, and they do it from different points of view.
The most used technique for delivering data to datawarehouse users is dimensional modeling. This technique makes databases simple and understandable.
The primary table in a dimensional model is the fact table. A fact table
stores numerical measurements of the business such as quantity of products sold, amount represented by the sold products, discounts, taxes, number of invoices, number of claims, and anything that can be measured...
Loading dimensions with data
A dimension is an entity that describes your business—customers and products are examples of dimensions. A very special dimension is the time dimension that you already know. A dimension table (no surprises here) is a table that contains information about a dimension. In this section you will learn to load dimension tables, that is, fill dimension tables with data.
Time for action – loading a region dimension with a
Combination lookup/update step
In this tutorial you will load a dimension that stores geographical information.
Launch Spoon.
Create a new transformation.
Drag a Table input step to the canvas and double-click it.
As connection select js
.
In the SQL area type the following query:
Click on OK.
Expand the Data Warehouse category of steps.
Select the Combination lookup/update step and drag it to the canvas.
Create a hop from the Table input step to this new step.
Double-click the Combination lookup/update step.
As Connection select dw
.
As Target table browse and select lk_regions
or simply type it.
Enter id
as Technical key field and lastupdate
as Date of last update field.
Click OK.
After the Combination lookup/update step, add an Update step.
Double-click the Update step.
Select dw
as Connection and lk_regions
as Target table.
Fill the upper grid adding the...
Time for action – keeping a history of product changes with the Dimension lookup/update step
Let's load a puzzles dimension along with the history of the changes in puzzle attributes:
Create a new transformation.
Drag a Table input step to the work area and double-click it.
Select js
as Connection.
Type the following query in the SQL area:
Click on OK.
Add an Add constants step, and create a hop from the Table input, step toward it.
Use the step to add a Date
field named changedate
. As Format type dd/MM/yyyy
, and as Value, type 01/10/2009
.
Expand the Data Warehouse category of steps.
Select the Dimension lookup/update step and drag it to the canvas.
Create a hop from the Add constants step to this new step.
Double-click the Dimension lookup/update step.
As Connection select dw
.
As Target table type lk_puzzles
.
Fill the Key fields as shown:
Select id
as Technical key field.
In the frame Creation...
In this chapter you learned to perform some advanced operations on databases.
First, you populated the Jigsaw database in order to have data for the activities in the chapter. Then, you learned to do simple and complex searches in a database.
Then you were introduced to dimensional concepts and learned what dimensions are and how to load them with PDI. You learned about Type I, Type II, Type III SCDs and mini-dimensions. You still have to learn when and how to use those dimensions. You will do so in Chapter 12.
The steps you learned in this and the preceding chapter are far from being the full list of steps that PDI offers to work with databases. However, taking into account all you learned, you are now ready to use PDI for implementing most of your database requirements. In the next chapter, you will switch to a totally different yet core subject needed to work with PDI—jobs.