Packt+ | Advance your knowledge in tech

You're reading from Pentaho 3.2 Data Integration: Beginner's Guide

Product type Book

Published in Apr 2010

Publisher Packt

ISBN-13 9781847199546

Pages 492 pages

Edition 1st Edition

Languages

Java

Concepts

Business Intelligence

Table of Contents (27) Chapters

Pentaho 3.2 Data Integration Beginner's Guide

Credits

Foreword

The Kettle Project

About the Author

About the Reviewers

Preface

1. Getting Started with Pentaho Data Integration

2. Getting Started with Transformations

3. Basic Data Manipulation

4. Controlling the Flow of Data

5. Transforming Your Data with JavaScript Code and the JavaScript Step

6. Transforming the Row Set

7. Validating Data and Handling Errors

8. Working with Databases

9. Performing Advanced Operations with Databases

10. Creating Basic Task Flows

11. Creating Advanced Transformations and Jobs

12. Developing and Implementing a Simple Datamart

13. Taking it Further

Working with Repositories

Pan and Kitchen: Launching Transformations and Jobs from the Command Line

Quick Reference: Steps and Job Entries

Spoon Shortcuts

Introducing PDI 4 Features

Pop Quiz Answers

Index

Chapter 10. Creating Basic Task Flows

So far you have been working with data. You got data from a file, a sheet, or a database, transformed it somehow, and sent it back to some file or table in a database. You did it by using PDI transformations. A PDI transformation does not run in isolation. Usually, it is embedded in a bigger process. Here are some examples:

Download a file, clean it, load the information of the file in a database, and fill an audit file with the result of the operation.
Generate a daily report and transfer the report to a shared repository.
Update a datawarehouse. If something goes wrong, notify the administrator by e-mail.

All these examples are typical processes of which a transformation is only a piece. These types of processes can be implemented by PDI Jobs. In this chapter, you will learn to build basic jobs. These are the topics that will be covered:

Introduction to jobs
Executing tasks depending upon conditions

Introducing PDI jobs

A PDI job is analogous to a process. As with processes in real life, there are basic jobs and there are jobs that do really complex tasks. Let's start by creating a job in the first group—a hello world job.

Time for action – creating a simple hello world job

In this tutorial, you will create a very simple job so that you get an idea of what jobs are about.

Although you will now learn how to create a job, for this tutorial you first have to create a transformation.

Open Spoon.
Create a new transformation.
Drag a Generate rows step to the canvas and double-click it.
Add a String value named message, with the value Hello, World!.
Click on OK.
Add a Text file output step and create a hop from the Generate rows step to this new step.
Double-click the step.
Type ${LABSOUTPUT}/chapter10/hello as filename.
In the Fields tab, add the only field in the stream—message.
Click on OK.
Inside the folder where you save your work, create a folder named transformations.
Save the transformation with the name hello_world_file.ktr in the folder you just created. The following is your final transformation:
Now you are ready to create the main job.
Select File | New | Job or press Ctrl+Alt+N. A new job is created.
Press Ctrl+J...

Receiving arguments and parameters in a job

Jobs, as well as transformations, are more flexible when receiving parameters from outside. You already learned to parameterize your transformations by using named parameters and command-line arguments. Let's extend these concepts to jobs.

Time for action – customizing the hello world file with arguments and parameters

Let's create a more flexible version of the job you did in the previous section.

Create a new transformation.
Press Ctrl+T to bring up the Transformation properties window.
Select the Parameters tab.
Add a named parameter HELLOFOLDER. Insert chapter10 as the default value.
Click on OK.
Drag a Get System Info step to the canvas .
Double-click the step.
Add a field named yourname. Select command line argument 1 as the Type.
Click on OK.
Now add a Formula step located in the Scripting category of steps.
Use the step to add a String field named message. As Formula, type "Hello, " & [yourname] & "!".
Finally, add a Text file output step.
Use the step to send the message data to a file. Enter ${LABSOUTPUT}/${HELLOFOLDER}/hello as the name of the file.
Save the transformation in the transformations folder you created in the previous tutorial, under the name hello_world_param.ktr.
Open the hello_world.kjb job you created...

Running jobs from a terminal window

In the main tutorial of this section, both the job and the transformation called by the job used a named parameter. The transformation also required a command-line argument. When you executed the job from Spoon, you provided both the parameter and the argument in the job dialog window. You will now learn to launch the job and provide that information from a terminal window.

Time for action – executing the hello world job from a terminal window

In order to run the job from a terminal window, follow these instructions:

Open a terminal window.

Go to the directory where Kettle is installed.

On Windows systems type:

	C:\pdi-ce>kitchen /file:c:/pdi_labs/hello_world_param.kjb Maria -param:"HELLOFOLDER=my_work" /norep

On Unix, Linux, and other Unix-like systems type:

	/home/yourself/pdi-ce/kitchen.sh /file:/home/yourself/pdi_labs/hello_world_param.kjb Maria -param:"HELLOFOLDER=my_work" /norep

If your job is in another folder, modify the command accordingly. You may also replace the name Maria with your name, of course. If your name has spaces, enclose the whole argument within "".
You will see how the job runs, following the log in the terminal:
Go to the output folder—the folder pointed by your LABS_OUTPUT variable.
A folder named my_work should have been created.
Check the content of the folder. A file named hello.txt should be there. Edit the file. You should see the...

Using named parameters and command-line arguments in transformations

As you know, transformations accept both arguments from the command line and named parameters. When you run a transformation from Spoon, you supply the values for arguments and named parameters in the transformation dialog window that shows up when you launch the execution. From a terminal window, you provide those values in the Pan command line.

In this chapter you learned to run a transformation embedded in a job. Here, the methods you have for supplying named parameters and arguments needed by the transformation are quite similar. From Spoon you supply the values in the job dialog window that shows up when you launch the job execution. From the terminal window you provide the values in the Kitchen command line.

Note

Whether you run a job from Spoon or from Kitchen, the named parameters and arguments you provide are unique and shared by the main job and all transformations called by that job. Each transformation, as well...

Time for action – calling the hello world transformation with fixed arguments and parameters

This time you will call the parameterized transformation from a new job.

Open the hello_world.kjb job you created in the first section and save it as hello_world_fixedvalues.kjb.
Double-click the Create a folder job entry.
Replace the chapter10 string by the string fixedfolder.
Double-click the transformation job entry.
Change the Transformation filename as ${Internal.Job.Filename.Directory}/transformations/hello_world_param.ktr.
Fill the Argument tab as follows.
Click the Parameters tab and fill it as follows:
Click on OK.
Save the job.

Open a terminal window and go to the directory where Kettle is installed.

On Windows systems type:

	C:\pdi-ce>kitchen /file:c:/pdi_labs/hello_world_param.kjb /norep

On Unix, Linux, and other Unix-like systems type:

	/home/yourself/pdi-ce/kitchen.sh /file:/home/yourself/pdi_labs/hello_world_param.kjb /norep

When the execution finishes, check the output folder. A folder named...

Deciding between the use of a command-line argument and a named parameter

Both command-line arguments and named parameters are means for creating more flexible jobs and transformations. The following table summarizes the differences and the reasons for using one or the other. In the first column, the word argument refers to the external value you will use in your job or transformation. That argument could be implemented as a named parameter or as a command-line argument.

Running job entries under conditions

A job may contain any number of entries. Not all of them execute always. Some of them execute depending on the result of previous entries in the flow. Let's see it in practice.

Time for action – sending a sales report and warning the administrator if something is wrong

Now you will build a sales report and send it by e-mail. In order to follow the tutorial, you will need two simple prerequisites:

As the report will be based on the Jigsaw database you created in Chapter 8, you will need the MySQL server running.
In order to send e-mails, you will need at least one valid Gmail account. Sign up for an account. Alternatively, if you are familiar with you own SMTP configuration, you could use it instead.

Once you've checked these prerequisites, you are ready to start.

Create a new transformation.
Add a Get System Info step. Use it to add a field named today. As Type, select Today 00:00:00.
Now add a Table input step.
Double-click the step.
As Connection, select js—the name of the connection to the jigsaw puzzles database.
Note
Note that if the connection is not shared, you will have to define it.
In the SQL frame, type the following statement:
```
SELECT   pay_code
       , COUNT...
```

Summary

In this chapter, you learned the basics about PDI jobs—what a job is, what you can do with a job, and how jobs are different from transformations. In particular, you learned to use a job for running one or more transformations.

You also saw how to use named parameters in jobs, and how to supply parameters and arguments to transformations when they are run from jobs.

In the next chapter, you will learn to create jobs that are a little more elaborative than the jobs you created here, which will give you more power to implement all types of processes.

The rest of the chapter is locked

You have been reading a chapter from

Pentaho 3.2 Data Integration: Beginner's Guide

Published in: Apr 2010 Publisher: Packt ISBN-13: 9781847199546

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime}

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

Aug 2023 7 hours 40 minutes

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

Aug 2023 22 hours 48 minutes

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

Sep 2023 8 hours 36 minutes

Building AI Applications with ChatGPT APIs

Sep 2023 8 hours 36 minutes

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Oct 2023 21 hours 12 minutes

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

Aug 2023 14 hours 0 minutes

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

Dec 2023 8 hours 0 minutes

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

Nov 2023 22 hours 8 minutes

Situation	Solution using named parameters	Solution using arguments
It is desirable to have a default for the argument	Named parameters are perfect in this case. You provide default values at the time you define them.	Before using the command-line argument, you have to evaluate if it was provided in the command line. If not, you have to set the default value at that moment.
The argument is mandatory	You don't have means to determine if the user provided a value for the named parameter.	To know if the user provided...