Installing Pentaho Data Integration with MySQL

Exclusive offer: get 50% off this eBook here
Pentaho 3.2 Data Integration: Beginner's Guide

Pentaho 3.2 Data Integration: Beginner's Guide — Save 50%

Explore, transform, validate, and integrate your data with ease

$29.99    $15.00
by María Carina Roldán | April 2010 | Java

In this article by María Carina Roldán, author of Pentaho 3.2 Data Integration: Beginner's Guide, we will learn how to install Pentaho Data Integration (PDI). We will also play around with the graphical designer—Spoon. The article also discusses installing MySQL server.

In order to work with Pentaho 3.2 Data Integration(PDI) you need to install the software. It's a simple task; let's do it.

Time for action – installing PDI

These are the instructions to install Kettle, whatever your operating system. The only prerequisite to install PDI is to have JRE 5.0 or higher installed. If you don't have it, please download it from http://www.javasoft.com/ and install it before proceeding. Once you have checked the prerequisite, follow these steps:

  1. From http://community.pentaho.com/sourceforge/ follow the link to Pentaho Data Integration (Kettle). Alternatively, go directly to the download page http://sourceforge.net/projects/pentaho/files/Data Integration.
  2. Choose the newest stable release. At this time, it is 3.2.0.

  3. Download the file that matches your platform. The preceding screenshot should help you.
  4. Unzip the downloaded file in a folder of your choice —C:/Kettle or /home/your_dir/kettle.
  5. If your system is Windows, you're done. Under UNIX-like environments, it's recommended that you make the scripts executable. Assuming that you chose Kettle as the installation folder, execute the following command:
    cd Kettle
    chmod +x *.sh

What just happened?

You have installed the tool in just a few minutes. Now you have all you need to start working

Launching the PDI graphical designer: Spoon

Now that you've installed PDI, you must be eager to do some stuff with data. That will be possible only inside a graphical environment. PDI has a desktop designer tool named Spoon. Let's see how it feels to work with it.

Time for action – starting and customizing Spoon

In this tutorial you're going to launch the PDI graphical designer and get familiarized with its main features.

  1. Start Spoon.
    • If your system is Windows, type the following command:

      Spoon.bat
    • In other platforms such as Unix, Linux, and so on, type:
      Spoon.sh 
    • If you didn't make spoon.sh executable, you may type:
      sh Spoon.sh 
  2. As soon as Spoon starts, a dialog window appears asking for the repository connection data. Click the No Repository button. The main window appears. You will see a small window with the tip of the day. After reading it, close that window.
  3. A welcome! window appears with some useful links for you to see.
  4. Close the welcome window. You can open that window later from the main menu.
  5. Click Options... from the Edit menu. A window appears where you can change various general and visual characteristics. Uncheck the circled checkboxes:

  6. Select the tab window Look Feel.
  7. Change the Grid size and Preferred Language settings as follows:

  8. Click the OK button.
  9. Restart Spoon in order to apply the changes. You should neither see the repository dialog, nor the welcome window. You should see the following screen instead:

What just happened?

You ran for the first time the graphical designer of PDI Spoon, and applied some custom configuration.

From the Look Feel configuration window, you changed the size of the dotted grid that appears in the canvas area while you are working. You also changed the preferred language. In the Option tab window, you chose not to show either the repository dialog or the welcome window at startup. These changes were applied as you restarted the tool, not before.

The second time you launched the tool, the repository dialog didn't show up. When the main window appeared, all the visible texts were shown in French, which was the selected language, and instead of the welcome window, there was a blank screen.

Spoon

This tool that you're exploring in this section is the PDI's desktop design tool. With Spoon you design, preview, and test all your work, that is, transformations and jobs. When you see PDI screenshots, what you are really seeing are Spoon screenshots. The other PDI components that you will meet in the following chapters are executed from terminal windows.

Setting preferences in the Options window

In the tutorial you changed some preferences in the Options window. There are several look and feel characteristics you can change beyond those you changed. Feel free to experiment with this setting.

Remember to restart Spoon in order to see the changes applied.

If you choose any language as preferred language other than English, you should select a diff erent language as alternati ve. If you do so, every name or descripti on not translated to your preferred language will be shown in the alternative language.

Just for the curious people: Italian and French are the overall winners of the list of languages to which the tool has been translated from English. Below them follow Korean, Argenti neanSpanish, Japanese, and Chinese.

One of the setti ngs you changed was the appearance of the welcome window at start up. The welcome window has many useful links, all related with the tool: wiki pages, news, forum access, and more. It's worth exploring them.

You don't have to change the settings again to see the welcome window. You can open it from the menu Help | Show the Welcome Screen.

Storing transformations and jobs in a repository

The first time you launched Spoon, you chose No Repository. After that, you confi gured Spoon to stop asking you for the Repository option. You must be curious about what the repository is and why not to use it. Let's explain it. As said, the results of working with PDI are Transformati ons and Jobs. In order to save the Transformations and Jobs, PDI offers two methods:

  • Repository: When you use the repository method you save jobs and transformations in a repository. A repository is a relational database specially designed for this purpose.
  • Files: The files method consists of saving jobs and transformations as regular XML files in the filesystem, with extension kjb and ktr respectively.

The following diagram summarizes this:

You cannot mix the two methods (files and repository) in the same project. Therefore, you must choose the method when you start the tool. Why did we choose not to work with repository, or in other words, to work with fi les? This is mainly for the following two reasons:

  • Working with files is more natural and practical for most users.
  • Working with repository requires minimum database knowledge and that you also have access to a database engine from your computer. Having both preconditions would allow you to learn working with both methods. However, it's probable that you haven't.

Creating your first transformation

Until now, you've seen the very basic elements of Spoon. For sure, you must be waiti ng to do some interesting task beyond looking around. It's time to create your first transformation.

Time for action – creating a hello world transformation

How about starting by saying Hello to the World? Not original but enough for a very first practical exercise. Here is how you do it:

  1. Create a folder named pdi_labs under the folder of your choice.
  2. Open Spoon.
  3. From the main menu select File | New Transformation.
  4. At the left-hand side of the screen, you'll see a tree of Steps. Expand the Input branch by double-clicking it.
  5. Left -click the Generate Rows icon.
  6. Without releasing the button, drag-and-drop the selected icon to the main canvas. The screen will look like this:

  7. Double-click the Generate Rows step that you just put in the canvas and fill the text boxes and grid as follows:

  8. From the Steps tree, double-click the Flow step.
  9. Click the Dummy icon and drag-and-drop it to the main canvas.
  10. Click the Generate Rows step and holding the Shift key down, drag the cursor towards the Dummy step. Release the button. The screen should look like this:

  11. Right-click somewhere on the canvas to bring up a contextual menu.
  12. Select New note. A note editor appears.
  13. Type some description such as Hello World! and click OK.
  14. From the main menu, select Transformation | Configuration. A window appears to specify transformation properties. Fill the Transformation name with a simple name as hello_world. Fill the Description field with a short description such as My first transformation. Finally provide a more clear explanation in the Extended description text box and click OK.
  15. From the main menu, select File | Save.
  16. Save the transformation in the folder pdi_labs with the name hello_world.
  17. Select the Dummy step by left -clicking it.
  18. Click on the Preview button in the menu above the main canvas.

  19. A debug window appears. Click the Quick Launch button.
  20. The following window appears to preview the data generated by the transformation:

  21. Close the preview window and click the Run button.

  22. A window appears. Click Launch.
  23. The execution results are shown in the bottom of the screen. The Logging tab should look as follows:

Pentaho 3.2 Data Integration: Beginner's Guide Explore, transform, validate, and integrate your data with ease
Published: April 2010
eBook Price: $29.99
Book Price: $49.99
See more
Select your format and quantity:

What just happened?

You've just created your first transformation.

First, you created a new transformation. From the tree on the left , you dragged two steps and drop them into the canvas. Finally, you linked them with a hop.

With the Generate Rows step, you created 10 rows of data with the message Hello World!. The Dummy step simply served as a destination of those rows.

After creating the transformation, you did a preview. The preview allowed you to see the content of the created data, this is, the 10 rows with the message Hello World!

Finally, you ran the transformation. You could see the results of the execution at the bottom of the windows. There is a tab named Step Metrics with information about what happens with each steps in the transformation. There is also a Logging tab showing a complete detail of what happened.

Directing the Kettle engine with transformations

As shown in the following diagram, transformation is an entity made of steps linked by hops. These steps and hops build paths through which data flows. The data enters or is created in a step, the step applies some kind of transformation to it, and finally the data leaves that step. Therefore, it's said that a transformation is data-flow oriented.

 

A transformation itself is not a program nor an executable file. It is just plain XML. The transformation contains metadata that tells the Kettle engine what to do. A step is the minimal unit inside a transformati on. A big set of steps is available. These steps are grouped in categories such as the input and flow categories that you saw in the example. Each step is conceived to accomplish a specific function, going from reading a parameter to normalizing a dataset. Each step has a configurati on window. These windows vary according to the functionality of the steps and the category to which they belong. What all steps have in common are the name and description:




Step property

Description

Name

A representative name inside the transformation.

A hop is a graphical representation of data flowing between two steps—an origin and a destination. The data that flows through that hop constitutes the output data of the origin step and the input data of the destination step.

Exploring the Spoon interface

As you just saw, the Spoon is the tool using which you create, preview, and run transformations. The following screenshot shows you the basic work areas:

Viewing the transformation structure

If you click the View icon in the upper left corner of the screen, the tree will change to show the structure of the transformation currently being edited.

 

Running and previewing the transformation

The Preview functionality allows you to see a sample of the data produced for selected steps. In the previous example, you previewed the output of the Dummy Step. The Run option effectively runs the whole transformation.

Whether you preview or run a transformation, you'll get an execution results window showing what happened. Let's explain it through an example.

Time for action – running and previewing the hello_world transformation

Let's do some testing and explore the results:

  1. Open the hello_world transformation.
  2. Edit the Generate Rows step, and change the limit from 10 to 1000 so that it generates 1,000 rows.
  3. Select the Logging tab window at the bottom of the screen
  4. Click on Run.
  5. In the Log level drop-down list, select RowLevel detail.
  6. Click on Launch.
  7. You can see how the logging window shows every task in a very detailed way.

  8. Edit the Generate Rows step, and change the limit to 10,000 so that it generates 10,000 rows.
  9. Select the Step Metrics.
  10. Run the transformation.
  11. You can see how the numbers change as the rows travel through the steps.

What just happened?

You did some tests with the hello_world transformation and saw the results in the Execution Results window.

Previewing the results in the Execution Results window

The Execution Results window shows you what is happening while you preview or run a transformation.

The Logging tab shows the execution of your transformation, step by step. By default, the level of the logging detail is Basic but you can change it to see different levels of detail—from a minimal logging (level Minimal) to a very detailed one (level RowLevel). The Step Metrics tab shows, for each step of the transformation, the executed operations and several status and information columns. You may be interested in the following columns:



Column

Description

Read

Contains the number of rows coming from previous steps

Written

Contains the number of rows leaving from this step toward the next

Input

Number of rows read from a file or table

Output

Number of rows written to a file or table

Errors

Errors in the execution. If there are errors, the whole row becomes red

Active

Tells the current status of the execution

In the example, you can see that the Generate Rows step writes rows, which then are read by the Dummy step. The Dummy step also writes the same rows, but in this case those go nowhere.

Installing MySQL

Before skipping to the next chapter, let's devote some minutes to the installation of MySQL.

As MySQL is the world's most popular open source database, it was the database engine chosen for the database-related tutorials in the article.

In this section you will learn to install the MySQL database engine both in Windows and Ubuntu, the most popular distributi on of Linux these days. As the procedures for installing the soft ware are different, a separate explanati on is given for each system.

Time for action – installing MySQL on Windows

In order to install MySQL on your Windows system, please follow these instructions:

  1. Open an internet browser and type http://dev.mysql.com/downloads/mysql/.
  2. Select the Microsoft Windows platform and download the mysql-essential package that matches your system: 32-bit or 64-bit.
  3. Double-click the downloaded file. A wizard will guide you through the process.
  4. When asked about the setup type, select Typical.
  5. Several screens follow. When the wizard is complete you'll have the option to configure the server. Check Configure the MySQL Server now and click Finish.
  6. A new wizard will be launched that lets you configure the server.
  7. When asked about the configurati on type, select Standard Configuration.
  8. When prompted, set the Windows options as shown in the next screenshot:

  9. When prompted for the security options, provide a password for the root user. You'll have to retype the password.
  10. In the next window click on Execute to proceed with the configuration. When the configuration is done, you'll see this:

  11. Click on Finish. After installing MySQL it is recommended that you install the GUI tools for administering and querying the database.
  12. Open an Internet browser and type http://dev.mysql.com/downloads/gui-tools/.
  13. Look for the Windows downloads and download the Windows (x86) package.
  14. Double-click the downloaded file. A wizard will guide you through the process.
  15. When asked about the setup type, select Complete.
  16. Several screens follow. Just follow the wizard instructions.
  17. When the wizard ends, you'll have the GUI tools added to the MySQL menu.

What just happened?

You downloaded and installed MySQL on your Windows system. You also installed MySQL GUI tools, a software package that includes an administrator and a query browser utility and that will make your life easier when working with the database.

Time for action – installing MySQL on Ubuntu

This tutorial shows you the procedure to install MySQL on Ubuntu.

Please follow these instructions:

  1. Check that you have access to the Internet.
  2. Open the Synaptic package manager from System | Administration | Synaptic Package Manager.

  3. Under Quick search type mysql-server and click on the Search button.
  4. Among the results, locate mysql-server-5.1, click in the tiny square to the left , and select Mark for Installation.
  5. You'll be prompted for confirmation. Click on Mark.
  6. Now search for a package named mysql-admin.
  7. When found, mark it for installation in the same way.
  8. Click on Apply on the main toolbar

  9. A window shows up asking for confirmation. Click on Mark again. What follows is the download process followed by the installation process.
  10. At a particular moment a window appears asking you for a password for the root user—the administrator of the database. Enter a password of your choice. You'll have to enter it twice.
  11. When the process ends, you will see the changes applied.

  12. Under Applications a new menu will also be added to access the GUI tools.

What just happened?

You installed MySQL server and GUI Tools in your Ubuntu system.

Summary

In this article, you were introduced to Pentaho Data Integration. Specifically, you learned how to install the tool. You were also introduced to Spoon, the graphical designer of PDI, and you created your first transformation.

About the Author :


María Carina Roldán

María Carina, born in Esquel, Argentina, earned her Bachelor's degree in Computer Science at UNLP in La Plata and then moved to Buenos Aires where she has been living since 1994.

She has worked as a BI consultant for almost 15 years. Over the last six, she has been dedicated full time to developing BI solutions using the Pentaho Suite. Currently, she works for Webdetails—a Pentaho company—as an ETL specialist.

She is the author of Pentaho 3.2 Data Integration Beginner’s Guide book published by Packt Publishing in April 2009 and co-author of Pentaho Data Integration 4 Cookbook, also published by Packt Publishing in June 2011.

Books From Packt

NetBeans Platform 6.9 Developer's Guide
NetBeans Platform 6.9 Developer's Guide

Funambol Mobile Open Source
Funambol Mobile Open Source

GlassFish Security
GlassFish Security

Amazon SimpleDB Developer Guide
Amazon SimpleDB Developer Guide

JSF 2.0 Cookbook [RAW]
JSF 2.0 Cookbook [RAW]

JBoss AS 5 Development
JBoss AS 5 Development

Flex 3 with Java
Flex 3 with Java

iReport 3.7
iReport 3.7

 

Your rating: None Average: 4 (1 vote)

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
d
C
n
3
y
4
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software