Packt+ | Advance your knowledge in tech

You're reading from Instant Pentaho Data Integration Kitchen

Product typeBook

Published inJul 2013

Reading LevelBeginner

PublisherPackt

ISBN-139781849696906

Edition1st Edition

Languages

Java

Tools

Pentaho

Concepts

Business Intelligence

Author (1)

Sergio Ramazzina

Executing PDI jobs from a filesystem (Simple)

This recipe guides you through simply starting a PDI job using the script Kitchen. In this case, the PDI job we are going to start is stored locally in the computer filesystem, but it could be anywhere in the network in any place that is directly accessible. You will learn how to start simple jobs both with and without a set of input parameters previously defined in the job.

Using command-line scripts is a fast way to start batches, but it is also the easiest way to schedule our jobs using our operating system's scheduler. The script accepts a set of inline arguments to pass the proper options required by the program to run our job in any specific situation.

Getting ready

To get ready for this recipe, we first need to check that our Java environment is configured properly; to do this, check that the JAVA_HOME environment variable is set. Even if all the PDI scripts, when started, call other scripts that try to find out about our Java execution environment to get the values of the JAVA_HOME variable, it is always a good rule of thumb to have that variable set properly anytime we work with a Java application.

The Kitchen script is in the PDI home directory, so the best thing to do to launch the script in the easiest way is to add the path to the PDI home directory to the PATH variable. This gives you the ability to start the Kitchen script from any place without specifying the absolute path to the Kitchen file location. If you do not do this, you will always have to specify the complete path to the Kitchen script file.

To play with this recipe, we will use the samples in the directory <book_samples>/sample1; here, <book_samples> is the directory where you unpacked all the samples of the book.

How to do it…

For starting a PDI job in Linux or Mac, use the following steps:

Open the command-line terminal and go to the <book_samples>/sample1 directory.
Let's start the sample job. To identify which job file needs to be started by Kitchen, we need to use the –file argument with the following syntax:
```
–file: <complete_filename_to_job_file>
```
Remember to specify either an absolute path or a relative path by properly setting the correct path to the file. The simplest way to start the job is with the following syntax:
```
$ kitchen.sh –file:./export-job.kjb
```
If you're not positioned locally in the directory where the job files are located, you must specify the complete path to the job file as follows:
```
$ kitchen.sh –file:/home/sramazzina/tmp/samples/export-job.kjb
```
Another option to start our job is to separately specify the name of the directory where the job file is located and then give the name of the job file. To do this, we need to use the –dir argument together with the –file argument. The –dir argument lets you specify the location of the job file directory using the following syntax:
```
–dir: <complete_path_to_ job_file_directory>
```
So, if we're located in the same directory where the job resides, to start the job, we can use the following new syntax:
```
$ kitchen.sh – dir:. –file:export-job.kjb
```
If we're starting the job from a different directory than the directory where the job resides, we can use the absolute path and the –dir argument to set the job's directory as follows:
```
$ kitchen.sh –dir:/home/sramazzina/tmp/samples –file:export-job.kjb
```

For starting a PDI job with parameters in Linux or Mac, perform the following steps:

Normally, PDI manages input parameters for the executing job. To set parameters using the command-line script, we need to use a proper argument. We use the –param argument to specify the parameters for the job we are going to launch. The syntax is as follows:
```
-param: <parameter_name>= <parameter_value>
```
Our sample job and transformation does accept a sample parameter called p_country that specifies the name of the country we want to export the customers to a file. Let's suppose we are positioned in the same directory where the job file resides and we want to call our job to extract all the customers for the country U.S.A. In this case, we can call the Kitchen script using the following syntax:
```
$ kitchen.sh –param:p_country=USA -file=./export-job.kjb
```
Of course, you can apply the –param switch to all the other three cases we detailed previously.

For starting a PDI job in Windows, use the following steps:

In Windows, a PDI job from the filesystem can be started by following the same rules that we saw previously, using the same arguments in the same way. The only difference is in the way we specify the command-line arguments.
Any time we start the PDI jobs from Windows, we need to specify the arguments using the / character instead of the – character we used for Linux or Mac. Therefore, this means that:
```
-file: <complete_filename_to_job_file>
```
Will become:
```
/file: <complete_filename_to_job_file>
```
And:
```
–dir: <complete_path_to_ job_file_directory>
```
Will become:
```
/dir: <complete_path_to_ job_file_directory>
```
From the directory <book_samples>/sample1, if you want to start the job, you can run the Kitchen script using the following syntax:
```
C:\temp\samples>Kitchen.bat /file:./export-job.kjb
```
Regarding the use of PDI parameters in command-line arguments, the second important difference on Windows is that we need to substitute the = character in the parameter assignment syntax with the : character. Therefore, this means that:
```
–param: <parameter_name>= <parameter_value>
```
Will become:
```
/param: <parameter_name>: <parameter_value>
```
From the directory <book_samples>/sample1, if you want to extract all the customers for the country U. S. A, you can start the job using the following syntax:
```
C:\temp\samples>Kitchen.bat /param:p_country:USA /file:./export-job.kjb
```

For starting the PDI transformations, perform the following steps:

The Pan script starts PDI transformations. On Linux or Mac, you can find the pan.sh script in the PDI home directory. Assuming that you are in the same directory, <book_samples>/sample1, where the transformation is located, you can start a simple transformation with a command in the following way:
```
$ pan.sh –file:./read-customers.ktr
```
If you want to start a transformation by specifying some parameters, you can use the following command:
```
$ pan.sh –param:p_country=USA –file:./read-customers.ktr
```
In Windows, you can use the Pan.bat script, and the sample commands will be as follows:
```
C:\temp\samples>Pan.bat /file:./read-customers.ktr
```
Again, if you want to start a transformation by specifying some parameters, you can use the following command:
```
C:\temp\samples>Pan.bat /param:p_country=USA /file:./read-customers.ktr
```

You have been reading a chapter from

Instant Pentaho Data Integration Kitchen

Published in: Jul 2013Publisher: PacktISBN-13: 9781849696906

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Sergio Ramazzina

Sergio Ramazzina is an experienced software architect/trainer with more than 25 years of experience in the IT field. He has worked on a broad number of projects for banks and major Italian companies and has designed complex enterprise solutions in Java, JavaEE, and Ruby. He started using Pentaho products from the very beginning in late 2003. He gained thorough experience by deploying Pentaho as an open source BI solution, standalone or deeply integrated in other applications as the analytical engine of choice. In 2009, due to his experience in the Java/JavaEE world and appreciation for the open source world and its main ideas, he began participating actively as a contributor to some of the Pentaho projects such as JPivot, Saiku, CDF, and CDA and rose to the Pentaho Active Contributor level. At that time, he started participating as a BI architect and Pentaho expert on a wide number of projects where open source BI and Pentaho were the main players. In late 2010, he founded Serasoft, a young Italian consulting firm that specializes in delivering high value open source Business Intelligence solutions. With the team in Serasoft, he shared his passion and experience in designing and delivering highly innovative enterprise solutions to help users make their work more effective. In July 2013, he published his first book, Instant Pentaho Data Integration Kitchen, Packt Publishing. He is also passionate about skiing, tennis, and photography, and he loves his young daughter, Camilla, very much. You can follow him on Twitter at @sramazzina. You can also look at his profile on LinkedIn at http://it.linkedin.com/in/sramazzina/.
Read more about Sergio Ramazzina

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages