Visual ETL Development With IBM DataStage

Exclusive offer: get 50% off this eBook here
Oracle SOA Suite 11g R1 Developer's Guide

Oracle SOA Suite 11g R1 Developer's Guide — Save 50%

Develop Service-Oriented Architecture Solutions with Oracle SOA Suite with this Oracle book and eBook

$35.99    $18.00
by Djoni Darmawikarta | May 2008 | Architecture & Analysis

ETL (Extract Transform Load) is the most resource consuming part of data warehouse development and maintenance. An ETL tool, particularly one that is GUI-based, can leverage the productivity and quality of ETL development and maintenance.

WebSphere DataStage from IBM is an ETL tool. In DataStage you build and execute ETL jobs visually on its GUI clients. This article by Djoni Darmawikarta shows how to build and execute an ETL job with DataStage. Specifically, we’ll build a job that loads a Customer Dimension table from an input sequential file (implementing an SCD1, Slowly Changing Dimension Type1; a well-known technique from Dimensional Data Warehouse methodology).

You use Designer, one of the DataStage clients, for ETL jobs development. When you start Designer, in addition to entering your host system and credential, you must also specify the project you’d like to work on.

Visual ETL Development With IBM DataStage

A DataStage project stores jobs and define their environment, such as their security and execution resources. Your project, as well as your user account, is typically created by your DataStage administrator.

To develop a new job, on the icon toolbar, click the drop-down arrow to the right of the New icon (I circle in red).

Visual ETL Development With IBM DataStage

We’ll develop a parallel job, so click the Parallel Job from the drop-down list.

Visual ETL Development With IBM DataStage

In DataStage, a parallel job can execute in parallel mode (in simple terms, this means the job does not process its data records sequentially, one record after another).

We can now start to design our job in the job canvas. For now, our parallel job name is Untitled 1, as shown on the canvas title.

Visual ETL Development With IBM DataStage

Our job extracts its input data from a customer sequential file, transforms the data, and then loads it into an Oracle table. The customer file contains new customers and name changes of the existing customers. The customer file has more data (more columns) than we need; we’ll extract only those we need.

In DataStage, you build a job by selecting and linking stages on the job canvas. So, on the Palette, click File, look for Sequential File stage by clicking the up or down arrow.

Visual ETL Development With IBM DataStage

Drag and drop our first stage, the Sequential File stage, from the Palette to the left side of the canvas.

We’d like to layout our stages left to right to visualize them processing data in that direction, so we start placing the input stage on the left of the canvas (we can move stages around the canvas, though, by dragging them).

Visual ETL Development With IBM DataStage

Rename the Sequential File stage to Customer by selecting (clicking) and clicking it again, and typing over the default name (Sequential_File_1).

Visual ETL Development With IBM DataStage

Next, add Transformer stage (under Processing palette) and Oracle Enterprise stage (under Database palette) that we need for our job. Rename them as shown.

We use the Transformer stage to change (transform) customer names to uppercase.

Visual ETL Development With IBM DataStage

Now that we have the stages we need for our job, we’ll link them together. Right click the Customer stage, hold and drag & drop on the Uppercase stage, and then, from Uppercase stage to the Customer_Dim stage.

Visual ETL Development With IBM DataStage

You can rename the links just like the way you rename stages.

Visual ETL Development With IBM DataStage

Just in case something happens to our system, we’d like to save our works so far.

Visual ETL Development With IBM DataStage

Name the job Load_Customer_Dim and you can choose the folder (called Category in DataStage) of your choice where you want to put the job.

Visual ETL Development With IBM DataStage

Notice the canvas title now reflects the name of our parallel job (changes from its default name given when we’re creating it).

Visual ETL Development With IBM DataStage

The Customer stage has a warning sign (the yellow! sign), which indicates that it has an error. This is because we haven’t specified its parameters. For example, the file it refers to. So, let’s fix it.

Double-click the Customer stage. On the Properties tab select the File property, and type in the file name including its full path. The sequential file must be accessible by the DataStage server.

Visual ETL Development With IBM DataStage

Next, we need to specify the format of our file. For example, set the Record level’s Final delimiter and Field default’s Delimiter as shown.

Visual ETL Development With IBM DataStage

We must also specify the record layout, i.e. its fields (columns).

Visual ETL Development With IBM DataStage

To test that the stage will be able to read the data as specified, click the View Data button (upper right hand corner).

You can specify the number of rows (records) you’d like to view.

Visual ETL Development With IBM DataStage

When you click OK and you have set the correct specification for the file, you’d see the data.

Visual ETL Development With IBM DataStage

When you finish viewing the data, close the view window as well as the property window. We’re done with the Customer stage.

Oracle SOA Suite 11g R1 Developer's Guide Develop Service-Oriented Architecture Solutions with Oracle SOA Suite with this Oracle book and eBook
Published: July 2010
eBook Price: $35.99
Book Price: $79.99
See more
Select your format and quantity:

We’ll now fix the specification of the Customer_Dim stage. What we want here is to insert a new customer and apply the customer name changes to the existing customer record on the CUSTOMER_DIM table. So, put down the specification in the sequence of the screenshots shown.

Visual ETL Development With IBM DataStage

Visual ETL Development With IBM DataStage

Visual ETL Development With IBM DataStage

Visual ETL Development With IBM DataStage

Visual ETL Development With IBM DataStage

Note that we use Oracle’s sequence to generate the CUSTOMER_ID; the value of this column doesn’t come from the input.

When you close the property window, back to the main Designer’s window all warnings on the stages should have disappeared.

Visual ETL Development With IBM DataStage

Though the Uppercase stage doesn’t have any warning sign, we haven’t actually specified what transformation we’d like this stage to do. We need the first two columns only. So, on the mapping window select the two columns on the input link, then drag and drop onto the load link.

Visual ETL Development With IBM DataStage

Visual ETL Development With IBM DataStage

We’d like to just pass the CUSTOMER_NO data, but we’d like to convert (transform) CUSTOMER_NAME data into uppercase. So, double-click the input.CUSTOMER_NAME on the Derivation section of the load link. Erase the content of the derivation editor (input.CUSTOMER_NAME), and then click the ... button (ellipsis).

Visual ETL Development With IBM DataStage

Select Function.

Visual ETL Development With IBM DataStage

Expand String and select UpCase.

Visual ETL Development With IBM DataStage

Visual ETL Development With IBM DataStage

Click the ellipsis button again, and then select Input Column.

Visual ETL Development With IBM DataStage

Select CUSTOMER_NAME and then click on the white space outside of the editor.

Visual ETL Development With IBM DataStage

Close the window by clicking OK.

Visual ETL Development With IBM DataStage

Let’s now compile the job by clicking the compile button (I circle in red).

Visual ETL Development With IBM DataStage

Confirm to save the job when prompted.

Visual ETL Development With IBM DataStage

When the compilation status confirms your job doesn’t have any error, you job is ready to run. Close the compilation status window; and let’s test this job.

Visual ETL Development With IBM DataStage

Note that you must already have the Customer sequential file with some data in it, as well the Customer_Dim table in its Oracle database—DataStage doesn’t (can’t) create it.

Click the run button (red-circled) to run the job.

Visual ETL Development With IBM DataStage

Click Run on the Job Run Options window.

Visual ETL Development With IBM DataStage

You can monitor the job execution right on the canvas (the flow of data, how many records have been processed). When the job is completed successfully you’ll see the green links with two numbers: one number shows the number of records processed (flowed) in the link (e.g. input: 5), the other the processing speed (5 rows/sec).

Visual ETL Development With IBM DataStage

When you check the Oracle table, you should have the four customer records in the table with the CUSTOMER_ID populated with the Oracle sequence values, and that the customer names are all in uppercase.

Summary

In this article I showed you how easy it is using a GUI client we built an ETL job with the IBM WebSphere DataStage. DataStage has a lot of other functions and features, such as the availability of built-in stages (we used only three stages in this article; we can also build our own custom stage), other GUI clients (such as that for debugging, job flow development.

Service Oriented Architecture: An Integration Blueprint Successfully implement your own enterprise integration architecture using the Trivadis Integration Architecture Blueprint with this SOA book and eBook
Published: June 2010
eBook Price: $20.99
Book Price: $34.99
See more
Select your format and quantity:

About the Author :


Djoni Darmawikarta built his career in IBM Asia Pacific and Canada as a software engineer, international consultant, instructor and project manager, for a total of 17 years. He's currently a technical specialist in the Data Warehousing and Business Intelligence team of a Toronto-based insurance company. Outside of his office works, Djoni writes IT articles and books.

 

Books From Packt

Mastering phpMyAdmin 2.11 for Effective MySQL Management
Mastering phpMyAdmin 2.11 for Effective MySQL Management

Building Powerful and Robust Websites with Drupal 6
Building Powerful and Robust Websites with Drupal 6

Learning Drupal 6 Module Development
Learning Drupal 6 Module Development

Building Websites with Joomla! 1.5
Building Websites with Joomla! 1.5

WordPress Complete
WordPress Complete

OpenCms 7 Development
OpenCms 7 Development

WordPress Theme Design
WordPress Theme Design

Drupal 5 Themes
Drupal 5 Themes

 


Your rating: None Average: 4.8 (4 votes)
Very helpful information. by
Very helpful information.
a great start by
It was extremely helpful. Please add some more complex projects too. Thanks in advance.
Very Nice by
Very nice. This helps.
thanks by
Very nicely discribed thank you for sharing it :)
Life saver by
This has been really helpful

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
7
D
t
X
X
B
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software