Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Pentaho 3.2 Data Integration: Beginner's Guide

You're reading from  Pentaho 3.2 Data Integration: Beginner's Guide

Product type Book
Published in Apr 2010
Publisher Packt
ISBN-13 9781847199546
Pages 492 pages
Edition 1st Edition
Languages

Table of Contents (27) Chapters

Pentaho 3.2 Data Integration Beginner's Guide
Credits
Foreword
The Kettle Project
About the Author
About the Reviewers
Preface
1. Getting Started with Pentaho Data Integration 2. Getting Started with Transformations 3. Basic Data Manipulation 4. Controlling the Flow of Data 5. Transforming Your Data with JavaScript Code and the JavaScript Step 6. Transforming the Row Set 7. Validating Data and Handling Errors 8. Working with Databases 9. Performing Advanced Operations with Databases 10. Creating Basic Task Flows 11. Creating Advanced Transformations and Jobs 12. Developing and Implementing a Simple Datamart 13. Taking it Further Working with Repositories Pan and Kitchen: Launching Transformations and Jobs from the Command Line Quick Reference: Steps and Job Entries Spoon Shortcuts Introducing PDI 4 Features Pop Quiz Answers Index

Chapter 10. Creating Basic Task Flows

So far you have been working with data. You got data from a file, a sheet, or a database, transformed it somehow, and sent it back to some file or table in a database. You did it by using PDI transformations. A PDI transformation does not run in isolation. Usually, it is embedded in a bigger process. Here are some examples:

  • Download a file, clean it, load the information of the file in a database, and fill an audit file with the result of the operation.

  • Generate a daily report and transfer the report to a shared repository.

  • Update a datawarehouse. If something goes wrong, notify the administrator by e-mail.

All these examples are typical processes of which a transformation is only a piece. These types of processes can be implemented by PDI Jobs. In this chapter, you will learn to build basic jobs. These are the topics that will be covered:

  • Introduction to jobs

  • Executing tasks depending upon conditions

Introducing PDI jobs


A PDI job is analogous to a process. As with processes in real life, there are basic jobs and there are jobs that do really complex tasks. Let's start by creating a job in the first group—a hello world job.

Time for action – creating a simple hello world job


In this tutorial, you will create a very simple job so that you get an idea of what jobs are about.

Although you will now learn how to create a job, for this tutorial you first have to create a transformation.

  1. Open Spoon.

  2. Create a new transformation.

  3. Drag a Generate rows step to the canvas and double-click it.

  4. Add a String value named message, with the value Hello, World!.

  5. Click on OK.

  6. Add a Text file output step and create a hop from the Generate rows step to this new step.

  7. Double-click the step.

  8. Type ${LABSOUTPUT}/chapter10/hello as filename.

  9. In the Fields tab, add the only field in the stream—message.

  10. Click on OK.

  11. Inside the folder where you save your work, create a folder named transformations.

  12. Save the transformation with the name hello_world_file.ktr in the folder you just created. The following is your final transformation:

    Now you are ready to create the main job.

  13. Select File | New | Job or press Ctrl+Alt+N. A new job is created.

  14. Press Ctrl+J...

Receiving arguments and parameters in a job


Jobs, as well as transformations, are more flexible when receiving parameters from outside. You already learned to parameterize your transformations by using named parameters and command-line arguments. Let's extend these concepts to jobs.

Time for action – customizing the hello world file with arguments and parameters


Let's create a more flexible version of the job you did in the previous section.

  1. Create a new transformation.

  2. Press Ctrl+T to bring up the Transformation properties window.

  3. Select the Parameters tab.

  4. Add a named parameter HELLOFOLDER. Insert chapter10 as the default value.

  5. Click on OK.

  6. Drag a Get System Info step to the canvas .

  7. Double-click the step.

  8. Add a field named yourname. Select command line argument 1 as the Type.

  9. Click on OK.

  10. Now add a Formula step located in the Scripting category of steps.

  11. Use the step to add a String field named message. As Formula, type "Hello, " & [yourname] & "!".

  12. Finally, add a Text file output step.

  13. Use the step to send the message data to a file. Enter ${LABSOUTPUT}/${HELLOFOLDER}/hello as the name of the file.

  14. Save the transformation in the transformations folder you created in the previous tutorial, under the name hello_world_param.ktr.

  15. Open the hello_world.kjb job you created...

Running jobs from a terminal window


In the main tutorial of this section, both the job and the transformation called by the job used a named parameter. The transformation also required a command-line argument. When you executed the job from Spoon, you provided both the parameter and the argument in the job dialog window. You will now learn to launch the job and provide that information from a terminal window.

Time for action – executing the hello world job from a terminal window


In order to run the job from a terminal window, follow these instructions:

  1. Open a terminal window.

  2. Go to the directory where Kettle is installed.

    • On Windows systems type:

      	C:\pdi-ce>kitchen /file:c:/pdi_labs/hello_world_param.kjb Maria -param:"HELLOFOLDER=my_work" /norep
    • On Unix, Linux, and other Unix-like systems type:

      	/home/yourself/pdi-ce/kitchen.sh /file:/home/yourself/pdi_labs/hello_world_param.kjb Maria -param:"HELLOFOLDER=my_work" /norep
  3. If your job is in another folder, modify the command accordingly. You may also replace the name Maria with your name, of course. If your name has spaces, enclose the whole argument within "".

  4. You will see how the job runs, following the log in the terminal:

  5. Go to the output folder—the folder pointed by your LABS_OUTPUT variable.

  6. A folder named my_work should have been created.

  7. Check the content of the folder. A file named hello.txt should be there. Edit the file. You should see the...

Using named parameters and command-line arguments in transformations


As you know, transformations accept both arguments from the command line and named parameters. When you run a transformation from Spoon, you supply the values for arguments and named parameters in the transformation dialog window that shows up when you launch the execution. From a terminal window, you provide those values in the Pan command line.

In this chapter you learned to run a transformation embedded in a job. Here, the methods you have for supplying named parameters and arguments needed by the transformation are quite similar. From Spoon you supply the values in the job dialog window that shows up when you launch the job execution. From the terminal window you provide the values in the Kitchen command line.

Note

Whether you run a job from Spoon or from Kitchen, the named parameters and arguments you provide are unique and shared by the main job and all transformations called by that job. Each transformation, as well...

Time for action – calling the hello world transformation with fixed arguments and parameters


This time you will call the parameterized transformation from a new job.

  1. Open the hello_world.kjb job you created in the first section and save it as hello_world_fixedvalues.kjb.

  2. Double-click the Create a folder job entry.

  3. Replace the chapter10 string by the string fixedfolder.

  4. Double-click the transformation job entry.

  5. Change the Transformation filename as ${Internal.Job.Filename.Directory}/transformations/hello_world_param.ktr.

  6. Fill the Argument tab as follows.

  7. Click the Parameters tab and fill it as follows:

  8. Click on OK.

  9. Save the job.

  10. Open a terminal window and go to the directory where Kettle is installed.

    • On Windows systems type:

      	C:\pdi-ce>kitchen /file:c:/pdi_labs/hello_world_param.kjb /norep
    • On Unix, Linux, and other Unix-like systems type:

      	/home/yourself/pdi-ce/kitchen.sh /file:/home/yourself/pdi_labs/hello_world_param.kjb /norep
  11. When the execution finishes, check the output folder. A folder named...

Deciding between the use of a command-line argument and a named parameter


Both command-line arguments and named parameters are means for creating more flexible jobs and transformations. The following table summarizes the differences and the reasons for using one or the other. In the first column, the word argument refers to the external value you will use in your job or transformation. That argument could be implemented as a named parameter or as a command-line argument.

Running job entries under conditions


A job may contain any number of entries. Not all of them execute always. Some of them execute depending on the result of previous entries in the flow. Let's see it in practice.

Time for action – sending a sales report and warning the administrator if something is wrong


Now you will build a sales report and send it by e-mail. In order to follow the tutorial, you will need two simple prerequisites:

  • As the report will be based on the Jigsaw database you created in Chapter 8, you will need the MySQL server running.

  • In order to send e-mails, you will need at least one valid Gmail account. Sign up for an account. Alternatively, if you are familiar with you own SMTP configuration, you could use it instead.

Once you've checked these prerequisites, you are ready to start.

  1. Create a new transformation.

  2. Add a Get System Info step. Use it to add a field named today. As Type, select Today 00:00:00.

  3. Now add a Table input step.

  4. Double-click the step.

  5. As Connection, select js—the name of the connection to the jigsaw puzzles database.

    Note

    Note that if the connection is not shared, you will have to define it.

  6. In the SQL frame, type the following statement:

    SELECT   pay_code
           , COUNT...

Summary


In this chapter, you learned the basics about PDI jobs—what a job is, what you can do with a job, and how jobs are different from transformations. In particular, you learned to use a job for running one or more transformations.

You also saw how to use named parameters in jobs, and how to supply parameters and arguments to transformations when they are run from jobs.

In the next chapter, you will learn to create jobs that are a little more elaborative than the jobs you created here, which will give you more power to implement all types of processes.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Pentaho 3.2 Data Integration: Beginner's Guide
Published in: Apr 2010 Publisher: Packt ISBN-13: 9781847199546
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}

Situation

Solution using named parameters

Solution using arguments

It is desirable to have a default for the argument

Named parameters are perfect in this case. You provide default values at the time you define them.

Before using the command-line argument, you have to evaluate if it was provided in the command line. If not, you have to set the default value at that moment.

The argument is mandatory

You don't have means to determine if the user provided a value for the named parameter.

To know if the user provided...