Data Profiling with IBM Information Analyzer

by Djoni Darmawikarta | July 2008 | Architecture & Analysis

Data profiling is essentially data mining, but for a different purpose. You mine data to understand, to gain better knowledge about the data. While the more common use of data mining is for gaining the data insights for business purpose (e.g. customer buying characteristics), data profiling is for technical purpose. To be more precise, you do data profiling to gather and analyze the technical metadata characteristics of the data. Information Analyzer, data profiling software from IBM, helps you gain insight into such technical metadata characteristics as, for example, column data type and size (length).

This article is based on a case where a database table grew its size unexpectedly and its initial disk space allocation got strained. Looking at the growth pattern of the table, such as the number of new records (which was not that huge), didn’t give us any clue about the cause of the problem. In this article, Djoni Darmawikarta will step through the Information Analyzer process, run one of its functions called Column Analysis on a simple table (a scaled-down version of the real table), and show how the profiling output help solve the problem.

Information Analyzer is a client-server software. A data profiling user (metadata analyst) works on its GUI client, so to make it easier to show you how I solve the problem I’ll use a lot of screenshots.

Our example data is an Oracle table that has two columns and three rows (In real life, they can typically be more than 50 and a few millions, respectively).

When you start the Information Analyzer client, called Information Server Console, you’ll be shown its start-up screen; and then, its log-in window.

Data Profiling with IBM Information Analyzer

Data Profiling with IBM Information Analyzer

When your log-in is successful, the console main window will show up.

Data Profiling with IBM Information Analyzer

Assuming the Oracle table that we’d like to profile is new; we must identify it to the Analyzer, which technically means importing its metadata.

Make sure you have connected the Oracle database to the Information Analyzer server before you import the metadata of its tables.

Expand Metadata Management from the HOME drop-down menu.

Data Profiling with IBM Information Analyzer

Then, click Import Metadata.

Data Profiling with IBM Information Analyzer

Our example Oracle data (table) is in the CLROPER database (hosted in DDOM02), so select CLROPER and then click Identify Next Level.

Data Profiling with IBM Information Analyzer

It might take a while, particularly for a database that has many tables and many columns; so just wait.

Data Profiling with IBM Information Analyzer

On the completion message screen, click OK to close the screen.

Data Profiling with IBM Information Analyzer

All tables in CLROPER database will be identified (listed) including our example table named SPACE1. We’ll next identify the columns of our SPACE1 table; so select SPACE1 and then click Identify Next Level.

Data Profiling with IBM Information Analyzer

The result shows that Analyzer has correctly identified the two columns of the table.

Data Profiling with IBM Information Analyzer

Now, import metadata of all columns of the table by selecting the table and then clicking Import.

Click OK to continue.

Data Profiling with IBM Information Analyzer

Wait for completion.

Data Profiling with IBM Information Analyzer

Click OK on the successful completion screen.

Data Profiling with IBM Information Analyzer

We’re now done with the metadata of the data; we’re now ready to start our profiling task.

In Information Analyzer (as in most other software of these days) we group our profiling works into projects. Here, I just use an existing project (DJONI_TEST), so select Open Project from the drop-down arrow on the right of NO PROJECT SELECTED.

Data Profiling with IBM Information Analyzer

You’ll be shown the list of existing projects. Select your project, and click Open.

Data Profiling with IBM Information Analyzer

Our previous (existing) profiling works are shown.

Data Profiling with IBM Information Analyzer

Next, open click Project Properties from the OVERVIEW drop-down menu.

Data Profiling with IBM Information Analyzer

Go to the Data Sources tab. Our SPACE1 table is not in the list yet, as we haven’t identified it specifically in our project (we did in the previous steps at the server-wide level); so we need to add it into our project, click Add.

Data Profiling with IBM Information Analyzer

Expand the SPACE1 table to see its columns. Select all of the columns as we want to profile all of them, and then click OK.

Data Profiling with IBM Information Analyzer

When completed, click Save All, and then close the Project Properties window.

Data Profiling with IBM Information Analyzer

Now, we’re ready to profile our SPACE1 data, to analyze its columns. On the main toolbar select Investigate | Column analysis.

Sign up for a Packt account to see the rest of this article

Now that you've read a few articles, you might want to consider signing up for a Packt account. It takes a matter of seconds, will give you access to all the articles on PacktPub.com, and once you've signed up you'll be returned here to carry on reading your article.

Furthermore, you'll gain access to nine free ebooks, and be offered a free trial of PacktLib, Packt's online library. Simply enter your details here, or log in to your existing account.

Log in

...or register

well explained by
thanks alot
Nicely Explained by
It really helps to understand about Information Analyzer. Nice Job, Keep posting..
Excellent work by
Excellent work. do we have Primary Key analysis and etc as same as above.
Great work by
This is a great article with good efforts, Taking the screenshots help many of the beginners to understand easily.5/5*
Nice by
It'll be nice if you can create a similar article for the pther functions like cross domain, PK, FK analysis.
Good one by
I Appreciate you for a such good article for begineers.
Fantastic article for an begineer by
I would give a 5 * for this articel....good one. keep posting many such articles. Regards Ven

Post new comment

Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Sort A-Z