Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Pentaho 3.2 Data Integration: Beginner's Guide

You're reading from  Pentaho 3.2 Data Integration: Beginner's Guide

Product type Book
Published in Apr 2010
Publisher Packt
ISBN-13 9781847199546
Pages 492 pages
Edition 1st Edition
Languages

Table of Contents (27) Chapters

Pentaho 3.2 Data Integration Beginner's Guide
Credits
Foreword
The Kettle Project
About the Author
About the Reviewers
Preface
1. Getting Started with Pentaho Data Integration 2. Getting Started with Transformations 3. Basic Data Manipulation 4. Controlling the Flow of Data 5. Transforming Your Data with JavaScript Code and the JavaScript Step 6. Transforming the Row Set 7. Validating Data and Handling Errors 8. Working with Databases 9. Performing Advanced Operations with Databases 10. Creating Basic Task Flows 11. Creating Advanced Transformations and Jobs 12. Developing and Implementing a Simple Datamart 13. Taking it Further Working with Repositories Pan and Kitchen: Launching Transformations and Jobs from the Command Line Quick Reference: Steps and Job Entries Spoon Shortcuts Introducing PDI 4 Features Pop Quiz Answers Index

Chapter 5. Transforming Your Data with JavaScript Code and the JavaScript Step

Whichever transformation you need to do on your data, you have a big chance of finding that PDI steps are able to do the job. Despite that, it may happen that there are not proper steps that serve your requirements, or that an apparently minor transformation consumes a lot of steps linked in a very confusing arrangement difficult to test or understand. Putting colorful icons here and there is funny and practical, but there are some situations like the ones described above where you inevitably will have to code. This chapter explains how to do it with JavaScript and the special JavaScript step.

In this chapter you will learn how to:

  • Insert and test JavaScript code in your transformations

  • Distinguish situations where coding is the best option, from those where there are better alternatives

Doing simple tasks with the JavaScript step


One of the traditional steps inside PDI is the JavaScript step that allows you to code inside PDI. In this section you will learn how to use it for doing simple tasks.

Time for action – calculating scores with JavaScript


The International Musical Contest mentioned in Chapter 4 has already taken place. Each duet performed twice. The first time technical skills were evaluated, while in the second, the focus was on artistic performance.

Each performance was assessed by a panel of five judges who awarded a mark out of a possible 10.

The following is the detailed list of scores:

Note that the fields don't fit in the screen, so the lines are wrapped and dotted lines are added for you to distinguish each line.

Now you have to calculate, for each evaluated skill, the overall score as well as an average score.

  1. Download the sample file from the Packt website.

  2. Create a transformation and drag a Fixed file input step to the canvas to read the file.

  3. Fill the configuration window as follows:

  4. Press the Get Fields button. A window appears to help you define the columns.

  5. Click between the fields to add markers that define the limits. The window will look like this:

  6. Click on Next...

Time for action – testing the calculation of averages


Let's test the code you've just created.

  1. Double-click the JavaScript step.

  2. Click on the Test script button.

  3. A window appears to create a set of rows for testing. Fill it like here:

  4. Click on Preview the transformation. A window appears showing five identical rows with the provided sample values. Close the preview window.

  5. Click on OK to test the code.

    A window appears with the result that will appear when we execute the script with the test data.

What just happened?

You tested the code of the JavaScript step.

You clicked on the Test script button, and created a dataset that served as the basis for testing the script. You previewed the test dataset.

After that, you did the test itself. A window appeared showing you how the created dataset looks like after the execution of the script—the totalScore and wAverage fields were added, and the skill field was converted to uppercase.

Testing the script using the Test script button

The Test script button...

Enriching the code


In the previous section, you learned how to insert code in your transformation by using a JavaScript step. In this section, you will see how to use variables from outside to give flexibility to your code. You also will learn how to take control of the rows from inside the JavaScript step.

Time for action – calculating flexible scores by using variables


Suppose that by the time you are creating the transformation, the weights for calculating the weighted average are unknown. You can modify the transformation by using parameters. Let's do it:

  1. Open the transformation of the previous section and save it with a new name.

  2. Press Ctrl+T to open the Transformation properties dialog window.

  3. Select the Parameters tab and fill it like here:

  4. Replace the JavaScript step by a new one and double-click it.

  5. Expand the Transform Scripts branch of the tree at the left of the window.

  6. Right-click the script named Script 1, select Rename, and type main as the new name.

  7. Position the mouse cursor over the editing window and right-click to bring up the following contextual menu:

  8. Select Add new to add the script, which will execute before your main code.

  9. A new script window appears. The script is added to the list of scripts under Transform Scripts.

  10. Bring up the contextual menu again, but this time clicking...

Reading and parsing unstructured files


It is marvelous to have input files where the information is well formed; that is, the number of columns and the type of its data is precise, all rows follow the same pattern, and so on. However, it is common to find input files where the information has little or no structure, or the structure doesn't follow the matrix (n rows by m columns) you expect. In this section you will learn how to deal with such files.

Time for action – changing a list of house descriptions with JavaScript


You won the lottery and decided to invest the money in a new house. You asked a real-estate agency for a list of candidate houses for you and it gave you this:

...
Property Code: MCX-011
Status: Active
5 bedrooms
5 baths
Style: Contemporary
Basement
Laundry room
Fireplace
2 car garage
Central air conditioning
More Features: Attic, Clothes dryer, Clothes washer, Dishwasher

Property Code: MCX-012
4 bedrooms
3 baths
Fireplace
Attached parking
More Features: Alarm System, Eat in Kitchen, Powder Room

Property Code: MCX-013
3 bedrooms
...

You want to compare the properties before visiting them, but you're finding it hard to do so because the file doesn't have a precise structure. Fortunately, you have the JavaScript step, which will help you to give the file some structure.

  1. Create a new transformation.

  2. Get the sample file from Packt site and read it with a Text file input step. Uncheck the Header checkbox and create a single...

Avoiding coding by using purpose-built steps


You saw through the exercises how powerful the JavaScript step is for helping you in your transformations. In older versions of PDI, coding JavaScript was the only means you had for doing specific tasks. In the latest releases of PDI, actual steps appeared that eliminate the need for coding in many cases. Here you have some examples of that:

  • Formula: You saw it in Chapter 3. Before the appearance of this step, there were a lot of functions such as the text functions that you could only solve with JavaScript.

  • Analytic Query: This step offers a way to retrieve information from rows before or after the current.

  • Split field to rows: The step is used to create several rows from a single string value. You used this step in Chapter 3 to create a new row for each word found in a file.

Analytic Query and Split fields to row are examples of where not only the need for coding was eliminated, they also eliminated the need for accessing internal objects...

Summary


In this chapter, you learned to code JavaScript into PDI. Specifically, you learned:

  • What the JavaScript step is and how to use it

  • How to modify fields and add new fields to your dataset from inside your JavaScript step

  • How to deal with unstructured input data

You also considered the pros and cons of coding JavaScript inside your transformations, as well as alternative ways to do things, avoiding writing code when possible.

As a bonus, you learned the concept of named parameters.

If you feel confident with all you've learned until now, you are certainly ready to move on to the next chapter, where you will learn in a simple fashion how to solve some sophisticated problems such as normalizing data from pivot tables.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Pentaho 3.2 Data Integration: Beginner's Guide
Published in: Apr 2010 Publisher: Packt ISBN-13: 9781847199546
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}