Reader small image

You're reading from  Instant Pentaho Data Integration Kitchen

Product typeBook
Published inJul 2013
Reading LevelBeginner
PublisherPackt
ISBN-139781849696906
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Sergio Ramazzina
Sergio Ramazzina
author image
Sergio Ramazzina

Sergio Ramazzina is an experienced software architect/trainer with more than 25 years of experience in the IT field. He has worked on a broad number of projects for banks and major Italian companies and has designed complex enterprise solutions in Java, JavaEE, and Ruby. He started using Pentaho products from the very beginning in late 2003. He gained thorough experience by deploying Pentaho as an open source BI solution, standalone or deeply integrated in other applications as the analytical engine of choice. In 2009, due to his experience in the Java/JavaEE world and appreciation for the open source world and its main ideas, he began participating actively as a contributor to some of the Pentaho projects such as JPivot, Saiku, CDF, and CDA and rose to the Pentaho Active Contributor level. At that time, he started participating as a BI architect and Pentaho expert on a wide number of projects where open source BI and Pentaho were the main players. In late 2010, he founded Serasoft, a young Italian consulting firm that specializes in delivering high value open source Business Intelligence solutions. With the team in Serasoft, he shared his passion and experience in designing and delivering highly innovative enterprise solutions to help users make their work more effective. In July 2013, he published his first book, Instant Pentaho Data Integration Kitchen, Packt Publishing. He is also passionate about skiing, tennis, and photography, and he loves his young daughter, Camilla, very much. You can follow him on Twitter at @sramazzina. You can also look at his profile on LinkedIn at http://it.linkedin.com/in/sramazzina/.
Read more about Sergio Ramazzina

Right arrow

Designing a simple PDI transformation (Simple)


This recipe guides you through creating a simple PDI transformation using the graphical development environment Spoon. Using this simple example, we will see how to play with PDI command-line tools. The goal of this recipe is to extract a list of customers located in a selected country. The country to be exported is identified through an input parameter and the export is made to an Excel file located in the same directory where the transformation is run.

Getting ready

To get ready for this recipe, you first need to check that your Java environment is configured properly; to do this, check that the JAVA_HOME environment variable is set. Even if PDI while starting up tries to guess the value of the JAVA_HOME environment variable from the system, it is always good practice to set the JAVA_HOME environment variable. As soon as this is done, you need to start Spoon, the PDI's graphical development environment. You can start Spoon from the command line using the appropriate script located in the PDI's home directory. As soon as you get into the PDI home directory, you can run the proper script depending on the specific operating system environment.

Have a look at the following options:

  • If you are on Windows, use the script spoon.bat to start the application

  • If you are on Linux or Mac, use the script spoon.sh to start the application

How to do it...

  1. Create a new empty transformation. To do this, you can perform either of the following steps:

    1. Click on the New button from the toolbar menu and select the Transformation item entry.

    2. Select the Transformation item entry by navigating to File | New or by pressing Ctrl + N.

  2. Go to the Transformation properties dialog and define a new transformation parameter called p_country. To do this, perform the following steps:

    1. Open the Transformation settings dialog by either pressing Ctrl + T or by right-clicking anywhere on the working area to the right and selecting Transformation settings from the newly displayed contextual menu.

    2. Once the dialog opens, select the Parameters tab and add a new parameter called p_country.

  3. Select the Design tab from the left view. The list of category folders will appear in the Steps panel. Perform the following steps on them:

    1. From the Input category folder, get a Text File Input step and drag-and-drop it into the working area on the right.

    2. From the Job category folder, get a Get Variables step and drag-and-drop it into the working area on the right.

    3. From the Flow category folder, get a Filter and a Dummy step and drag-and-drop them into the working area on the right.

    4. Rename the Dummy step Discarded customers.

    5. From the Output category folder, get a Microsoft Excel Output step and drag-and-drop it into the working area on the right.

  4. Connect the steps together in the following specified order:

    1. Connect the Text File Input step to the Get Variable step.

    2. Connect the Get Variable step to the Filter step.

    3. Connect the Filter step to the Microsoft Excel Output step and then connect the Filter step to the Dummy step as well.

  5. Configure the Text File Input step as follows:

    1. Open the Text File Input step properties dialog by either double-clicking on the step icon in the working area or right-clicking on the step icon and selecting Edit step.

    2. Configure the name of the step.

    3. Under the File tab, configure the file to be read by typing the complete name of the file in the File or Directory input field. Because the sample file is located in the same sample directory where the transformation resides, a good approach to naming the file in a way that is location independent is to use a system variable to parameterize the directory name where the file is located. In our case, the complete filename is ${Internal.Transformation.Filename.Directory}/customers.txt.

  6. After the name of the file has been typed in, click on the Add button; the file will be added to the selected files located beneath as follows:

    1. Select the Fields tab. You must fill in the table describing the fields format.

    2. Click on the Get Fields button. The Text File Input step automatically analyzes a first set of 100 rows in the input file and tries to guess the field names to be imported. By the end of this inspection, all the fields will automatically get defined. Remember to verify the guessed data types so that only the CUSTOMER_KEY field has an Integer data type while the remaining fields have a String data type.

    3. Click on OK and close the Text File Input step properties dialog.

  7. Configure the Get Variable step as follows:

    1. Open the Get Variable step properties by either double-clicking on the step icon in the working area or right-clicking on the step icon and selecting Edit step.

    2. Configure the name of the step.

    3. Click on the first row under the Field column, add a new column; named filter_country and press the Tab key.

    4. The cursor goes to the next column. Add the name of the parameter whose value is used to populate the new field: ${p_country}. Press the Tab key.

    5. In the next column, where the cursor goes, select String as the data type of the new field.

    6. Click on OK and close the Get Variable step properties dialog.

  8. Configure the Filter step as follows:

    1. Open the Filter step properties dialog by either double-clicking on the step icon in the working area or right-clicking on the step icon and selecting Edit step.

    2. From the Send "true" data to step combobox, select the Write selected country customers item entry to set the path that is to be navigated to any time the result of the condition is true.

    3. From the Send "false" data to step combobox, select the Discarded customers item entry to set the path that is to be navigated to anytime the result of the condition is false.

    4. Add a new condition called COUNTRY = filter_country.

    5. To also manage the cases where the p_country parameter has not been set by the user, when starting the procedure, add another condition called filter_country IS NULL. Set this new condition as an alternate to the previous condition by using the OR logical operator.

    6. Click on OK and close the Filter step properties dialog.

  9. Configure the Microsoft Excel Output step as follows:

    1. Open the Microsoft Excel Output step properties dialog by either double-clicking on the step icon in the working area or right-clicking on the step icon and selecting Edit step.

    2. Configure the name of the step.

    3. Under the File tab, configure the name of the export file by typing the complete name of the file in the Filename input field. Remember to write the name of the file without any extension because the file extension is located in the Extension input field. The exported file will be put in the same samples directory where the transformation resides. In any case, a good approach is to type the filename in a way that is location independent, using a system variable to parameterize the directory name where the file is located. In our case, the complete filename is ${Internal.Transformation.Filename.Directory}/selected_country_customers.

    4. Select the Fields tab. You must fill in the table describing the field's format.

    5. Click on the Get Fields button. The Text File Input step automatically analyzes a first set of 100 rows in the input file and tries to guess the field names to be imported. At the end of this inspection, all the fields will automatically get defined.

    6. Click on OK and close the Microsoft Excel Output step properties dialog.

  10. Save the transformation with the suggested name read-customers.ktr.

  11. The transformation design is now complete. You can now go to the Design a simple job (Simple) recipe and create the job that will use this transformation.

There's more...

Now that we have designed a sample transformation, let's analyze a quick way to easily find and get directly to the needed steps.

How to quickly find the steps to use

A set of category folders organizes transformations to facilitate the user's search process. If you are unsure about the right location of the step you are looking for, you can easily find it using the search functionality. To do this, go to the Search input text field located in the upper-left corner of the design view and write the name of the step you are looking for. While you are typing, you will see the items in the Steps panel whose name starts with the letters you are typing appear as shown in the following screenshot:

Previous PageNext Page
You have been reading a chapter from
Instant Pentaho Data Integration Kitchen
Published in: Jul 2013Publisher: PacktISBN-13: 9781849696906
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Sergio Ramazzina

Sergio Ramazzina is an experienced software architect/trainer with more than 25 years of experience in the IT field. He has worked on a broad number of projects for banks and major Italian companies and has designed complex enterprise solutions in Java, JavaEE, and Ruby. He started using Pentaho products from the very beginning in late 2003. He gained thorough experience by deploying Pentaho as an open source BI solution, standalone or deeply integrated in other applications as the analytical engine of choice. In 2009, due to his experience in the Java/JavaEE world and appreciation for the open source world and its main ideas, he began participating actively as a contributor to some of the Pentaho projects such as JPivot, Saiku, CDF, and CDA and rose to the Pentaho Active Contributor level. At that time, he started participating as a BI architect and Pentaho expert on a wide number of projects where open source BI and Pentaho were the main players. In late 2010, he founded Serasoft, a young Italian consulting firm that specializes in delivering high value open source Business Intelligence solutions. With the team in Serasoft, he shared his passion and experience in designing and delivering highly innovative enterprise solutions to help users make their work more effective. In July 2013, he published his first book, Instant Pentaho Data Integration Kitchen, Packt Publishing. He is also passionate about skiing, tennis, and photography, and he loves his young daughter, Camilla, very much. You can follow him on Twitter at @sramazzina. You can also look at his profile on LinkedIn at http://it.linkedin.com/in/sramazzina/.
Read more about Sergio Ramazzina