





















































[box type="note" align="" class="" width=""]This is a book excerpt from Learning Pentaho Data Integration 8 CE - Third Edition written by María Carina Roldán. From this book, you will learn to explore, transform, and integrate your data across multiple sources.[/box]
Today, we will learn to configure and use Job executor along with capturing the result filenames.
The Job Executor is a PDI step that allows you to execute a Job several times simulating a loop. The executor receives a dataset, and then executes the Job once for each row or a set of rows of the incoming dataset. To understand how this works, we will build a very simple example. The Job that we will execute will have two parameters: a folder and a file. It will create the folder, and then it will create an empty file inside the new folder. Both the name of the folder and the name of the file will be taken from the parameters. The main transformation will execute the Job iteratively for a list of folder and file names.
Let's start by creating the Job:
4. Double-click the Create a folder entry. As Folder name, type ${FOLDER_NAME}.
5. Double-click the Create file entry. As File name, type ${FOLDER_NAME}/${FILE_NAME}.
6. Save the Job and test it, providing values for the folder and filename. The Job should create a folder with an empty file inside, both with the names that you provide as parameters.
Now create the main Transformation:
4. As the name of the file, you can create any name of your choice. As an example, we will create a random name. For this, we use a Generate random value and a UDJE step, and configure them as shown:
5. With the last step selected, run a preview. You should see the full list of folders and filenames, as shown in the next sample image:
6. At the end of the stream, add a Job Executor step. You will find it under the Flow category of steps.
7. Double-click on the Job Executor step.
8. As Job, select the path to the Job created before, for example, ${Internal.Entry.Current.Directory}/create_folder_and_file.kjb
9. Configure the Parameters grid as follows:
10. Close the window and save the transformation.
11. Run the transformation. The Step Metrics in the Execution Results window reflects what happens:
12. Click on the Logging tab. You will see the full log for the Job.
13. Browse your filesystem. You will find all the folders and files just created.
As you see, PDI executes the Job as many times as the number of rows that arrives to the Job Executor step, once for every row. Each time the Job executes, it receives values for the named parameters, and creates the folder and file using these values.
Just as it happens with the Transformation Executors that you already know, the Job Executors can also be configured with similar settings. This allows you to customize the behavior and the output of the Job to be executed. Let's summarize the options.
The Job Executor doesn't cause the Transformation to abort if the Job that it runs has errors. To verify this, run the sample transformation again. As the folders already exist, you expect that each individual execution fails. However, the Job Executor ends without error. In order to capture the errors in the execution of the Job, you have to get the execution results. This is how you do it:
3. Double-click on the Job Executor and select the Execution results tab. You will see the list of metrics and results available. The Field name column has the names of the fields that will contain these results. If there are results you are not interested in, delete the value in the Field name column. For the results that you want to keep, you can leave the proposed field name or type a different name. The following screenshot shows an example that only generates a field for the log:
4. When you are done, click on OK.
5. With the destination step selected, run a preview. You will see the results that you just defined, as shown in the next example:
6. If you copy any of the lines and paste it into a text editor, you will see the full log
for the execution, as shown in the following example:
2017/10/26 23:45:53 - create_folder_and_file - Starting entry
[Create a folder]
2017/10/26 23:45:53 - create_folder_and_file - Starting entry
[Create file]
2017/10/26 23:45:53 - Create file - File
[c:/pentaho/files/folder1/sample_50n9q8oqsg6ib.tmp] created!
2017/10/26 23:45:53 - create_folder_and_file - Finished job entry
[Create file] (result=[true])
2017/10/26 23:45:53 - create_folder_and_file - Finished job entry
[Create a folder] (result=[true])
As you know, jobs don't work with datasets. Transformations do. However, you can still use the Job Executor to send the rows to the Job. Then, any transformation executed by your Job can get the rows using a Get rows from result step.
By default, the Job Executor executes once for every row in your dataset, but there are several possibilities where you can configure in the Row Grouping tab of the configuration window:
If the Job has named parameters—as in the example that we built—you provide values for them in the Parameters tab of the Job Executor step. For each named parameter, you can assign the value of a field or a fixed-static-value. In case you execute the Job for a group of rows instead of a single one, the parameters will take the values from the first row of data sent to the Job.
At the output of the Job Executor, there is also the possibility to get the result filenames. Let's modify the Transformation that we created to show an example of this kind of output:
4. Double-click the Job Executor and select the Result files tab.
5. Configure it as shown:
...
... - Write to log.0 -
... - Write to log.0 - ------------> Linenr 1----------------------
--------
... - Write to log.0 - filename =
file:///c:/pentaho/files/folder1/sample_5agh7lj6ncqh7.tmp
... - Write to log.0 -
... - Write to log.0 - ====================
... - Write to log.0 -
... - Write to log.0 - ------------> Linenr 2----------------------
--------
... - Write to log.0 - filename =
file:///c:/pentaho/files/folder2/sample_6n0rhmrpvj21n.tmp
... - Write to log.0 -
... - Write to log.0 - ====================
... - Write to log.0 -
... - Write to log.0 - ------------> Linenr 3----------------------
--------
... - Write to log.0 - filename =
file:///c:/pentaho/files/folder3/sample_7ulkja68vf1td.tmp
... - Write to log.0 -
... - Write to log.0 - ====================
...
The example that you just created showed the option with a Job Executor.
We learned how to nest jobs and iterate the execution of jobs. You can know more about executing transformations in an iterative way and launching transformations and jobs from the Command Line from this book Learning Pentaho Data Integration 8 CE - Third Edition.