Reader small image

You're reading from  Pentaho Data Integration Quick Start Guide

Product typeBook
Published inAug 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781789343328
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
María Carina Roldán
María Carina Roldán
author image
María Carina Roldán

María Carina Roldán was born in Argentina and has a bachelor's degree in computer science. She started working with Pentaho back in 2006. She spent all these years developing BI solutions, mainly as an ETL specialist, and working for different companies around the world. Currently, she lives in Buenos Aires and works as an independent consultant. Carina is the author of Learning Pentaho Data Integration 8 CE, published by Packt in December 2017. She has also authored other books on Pentaho, all of them published by Packt.
Read more about María Carina Roldán

Right arrow

Chapter 2. Getting Familiar with Spoon

This chapter will show you how to work with Spoon by designing, debugging, and testing a transformation. In addition to exploring Spoon features, you will also learn the basics for handling errors when you are designing a transformation.

This chapter will cover the following topics:

  • Exploring the Spoon interface
  • Designing, previewing, and running transformations
  • Defining and using Kettle variables
  • Running transformations with the pan utility

Exploring the Spoon interface


In Chapter 1, Getting Started with PDI, you used Spoon to create your first transformation. In this chapter, you will learn more about the experience of working with Spoon. First, let's take a look at its interface. The following screenshot shows you the different areas, menus, and toolboxes present in Spoon:

Spoon interface

The following provides a brief description of every component shown in the preceding screenshot:

  • Main Menu: This menu includes general options, such as opening and saving files (namely, transformations and jobs), editing and searching features, configuration settings, and help options.

Note

Most of the options in the main menu contain shortcuts that you can memorize and use, if you prefer to do so.

 

  • Main Toolbar: This toolbar serves as an alternative way to create, open, and save files.
  • Transformation Toolbar: This toolbar contains options for running, previewing, and validating the open transformation.
  • Work Area: This is the area where you create...

Designing, previewing, and running transformations


In this section, we will create a transformation that is a bit more interesting than the one you already built. In doing this, you will have a chance to learn about the process of designing transformations, while also previewing your work.

The task is as follows: you will be given a file with a list of cities in the USA, along with their zip codes and their state names. You will have to generate a file containing only the cities in the state of NY, sorted by zip code. We will split the task into the following steps:

  • Designing and previewing the transformation
  • Learning to deal with errors that may appear
  • Saving and running the transformation

Designing and previewing a transformation

Let's start by developing the first part of the transformation. We will read the file and filter the data. In this case, the solution is quite straightforward (this will not always be the case). There is a PDI step for each of the tasks to accomplish. The CSV file input...

Defining and using Kettle variables


In PDI, you can define and use variables, just as you do when you code in any computer language. We already defined a couple of variables when we created the kettle.properties file in Chapter 1, Getting Started with PDI. Now, we will see where and how to use them.

It's simple: any time you see a dollar sign by the side of a textbox, you can use a variable:

Sample textboxes that allow variables

You can reference a variable by enclosing its name in curly braces, preceded by a dollar sign (for example, ${INPUT_FOLDER}).

Note

A less used notation for a variable is as follows: %%<variable name>%% (for example, %%INPUT_FOLDER%%).

Let's go back to the transformation created in the previous section. Instead of a fixed value for the location of the output file, we will use variables. The following describes how to do it:

  1. Open the transformation (if you had closed it). You can do this from Main Menu or from Main Toolbar.
  2. Double-click on the Text file output step....

Running transformations with the Pan utility


So far, you have used Spoon to create and run transformations. However, if you want to run a transformation in a production environment, you won't use Spoon, but a command-line utility named Pan.

Let's quickly look at how to use this tool.

If you browse the PDI installation directory, you will see two versions of the utility: Pan.bat and Pan.sh. You will use the first if you have a Windows environment, and the second for other systems.

Note

In the next step-by-step tutorial, we will assume that you have Windows, but you should make the required adjustments if you have a different system.

 

The simplest way to run a transformation with Pan is to provide the full path of the transformation that you want to run. You can execute Pan in Windows as follows:

Pan.bat /file=<ktr file name>

For Unix, Linux, and other Unix-like systems, use the following command:

./Pan.sh /file=<ktr file name>

Let's suppose that you want to run the first transformation...

Summary


This chapter served to help you get used to Spoon, the PDI graphical designer. First, you learned how to work with the tool when you created, previewed, and ran transformations. When you worked with transformations, you had the opportunity to use Kettle variables, both predefined and user defined. You also learned how to deal with common errors. Finally, you experimented with the Pan utility, which is used for running transformations from the command line.

Now that you have seen an overview of the tool, you're ready to get into the details of extracting data. That will be the subject of Chapter 3, Extracting Data.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Pentaho Data Integration Quick Start Guide
Published in: Aug 2018Publisher: PacktISBN-13: 9781789343328
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
María Carina Roldán

María Carina Roldán was born in Argentina and has a bachelor's degree in computer science. She started working with Pentaho back in 2006. She spent all these years developing BI solutions, mainly as an ETL specialist, and working for different companies around the world. Currently, she lives in Buenos Aires and works as an independent consultant. Carina is the author of Learning Pentaho Data Integration 8 CE, published by Packt in December 2017. She has also authored other books on Pentaho, all of them published by Packt.
Read more about María Carina Roldán