Packt+ | Advance your knowledge in tech

You're reading from Pentaho 3.2 Data Integration: Beginner's Guide

Product type Book

Published in Apr 2010

Publisher Packt

ISBN-13 9781847199546

Pages 492 pages

Edition 1st Edition

Languages

Java

Concepts

Business Intelligence

Table of Contents (27) Chapters

Pentaho 3.2 Data Integration Beginner's Guide

Credits

Foreword

The Kettle Project

About the Author

About the Reviewers

Preface

1. Getting Started with Pentaho Data Integration

2. Getting Started with Transformations

3. Basic Data Manipulation

4. Controlling the Flow of Data

5. Transforming Your Data with JavaScript Code and the JavaScript Step

6. Transforming the Row Set

7. Validating Data and Handling Errors

8. Working with Databases

9. Performing Advanced Operations with Databases

10. Creating Basic Task Flows

11. Creating Advanced Transformations and Jobs

12. Developing and Implementing a Simple Datamart

13. Taking it Further

Working with Repositories

Pan and Kitchen: Launching Transformations and Jobs from the Command Line

Quick Reference: Steps and Job Entries

Spoon Shortcuts

Introducing PDI 4 Features

Pop Quiz Answers

Index

Appendix C. Quick Reference: Steps and Job Entries

This appendix summarizes the purpose of the steps and job entries used in the tutorials throughout the book. For each of them, you can see the name of the Time for action section where it was introduced and also a reference to the chapters where you can find more examples that use it.

Note

How to use this reference

Suppose you are inside Spoon, editing a Transformation. If the transformation uses a step that you don't know and you want to understand what it does or how to use it, double-click the step and take note of the title of the settings window; that title is the name of the step. Then search for that name in the transformation steps reference table. The steps are listed in alphabetical order so that you can find them quickly. The last column will take you to the place in the book where the step is explained.

The same applies to jobs. If you see in a job an unknown entry, double-click the entry and take note of the title of the settings window; that title is the name of the entry. Then search for that name in the job entries reference table. The job entries are also listed in alphabetical order.

Transformation steps

The following table includes all the transformation steps used in the book. For a full list of steps and their descriptions, select Help | Show step plug-in information in Spoon's main menu.

You can also visit http://wiki.pentaho.com/display/EAI/Pentaho+Data+Integration+v3.2.+Steps for a full step reference along with some examples.

Name	Purpose	Time for action
Abort	Aborts a transformation	Aborting when there are too many errors (Chapter 7); also in Chapters 11 and 12
Add constants	Adds one or more constant fields to the stream	Gathering progress and merging all together (Chapter 4); also in Chapters 7, 8, and 9
Add sequence	Gets the next value from a sequence	Assigning tasks by Distributing (Chapter 4); also in Chapters 6 and 11
Append streams	Appends two streams in an ordered way	Giving priority to Bouchard by using Append Stream (Chapter 4)
Calculator	Creates new fields by performing simple calculations	Reviewing examination by using the Calculator step (Chapter 3); also in Chapters 6 and 8
Combination lookup/update	Updates a junk dimension. Alternatively, it can be used to update Type I SCD.	Loading a region dimension with a Combination lookup/update step (Chapter 9); also in Chapter 12
Copy rows to result	Write rows to the executing job. The information will then be passed to the next entry in the job.	Splitting the generation of top scores by copying and getting rows (Chapter 11)
Data Validator	Validates fields based on a set of rules	Checking films file with the Data Validator (Chapter 7)
Database join	Executes a database query using stream values as parameters	Using a Database join step to create a list of suggested products to buy (Chapter 9)
Database lookup	Looks up values in a database table	Using a Database lookup step to create a list of products to buy (Chapter 9), also in Chapter 12
Delay row	For each incoming row, waits a given time before giving the row to the next step	Generating custom files by executing a transformation for every input row (Chapter 11)
Delete	Delete data in a database table	Deleting data about discontinued items (Chapter 8)
Dimension lookup/update	Updates or looks up a Type II SCD. Alternatively, it can be used to update Type I SCD or hybrid dimensions.	Keeping a history of product changes with the Dimension lookup/update step (Chapter 9), also in Chapter 12
Dummy (do nothing)	This step type doesn't do anything! However it is used often.	Creating a hello world transformation (Chapter 1), also in Chapters 2, 3, 7, and 9
Excel Input	Reads data from a Microsoft Excel (`.xls`) file	Browsing PDI new features by copying a dataset (Chapter 4); also in Chapter 8
Excel Output	Writes data to a Microsoft Excel (`.xls`) file	Getting data from an XML file with information about countries (Chapter 2); also in Chapters 4 and 10
Filter rows	Splits the stream in two upon a given condition. Alternatively, it is used to let pass just the rows that meet the condition.	Counting frequent words by filtering (Chapter 3); also in Chapters 4, 6, 7, 9, 11, and 12
Fixed file input	Reads data from a fixed width file	Calculating Scores with JavaScript (Chapter 5)
Formula	Creates new fields by using formulas. It uses Pentaho's libformula.	Reviewing examination by using the Formula step (Chapter 3); also in Chapters 10 and 11
Generate Rows	Generates a number of equal rows	Creating a hello world transformation (Chapter 1); also in Chapters 6, 9, and 10
Get data from XML	Gets data from XML files	Getting data from an XML file with information about countries(Chapter 2); also in chapters 3 and 9
Get rows from result	Reads rows from a previous entry in a job	Splitting the generation of top scores by copying and getting rows (Chapter 11)
Get System Info	Gets information from the system like system date, arguments, etc.	Updating a file with news about examination (Chapter 2) also in Chapters 7, 8, 10, 11, and 12
Get Variables	Takes the values of environment or Kettle variables and adds them as fields in the stream	Creating the time dimension dataset(Chapter 6)
Group by	Builds aggregates in a group by fashion. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly	Calculating World Cup statistics by grouping data (Chapter 3); also in Chapters 4, 7, and 9
If field value is null	If a field is null, it changes its value to a constant. It can be applied to all fields of a same data type, or to particular fields	Enhancing a films file by converting rows to columns (Chapter 6)
Insert / Update	Updates or inserts rows in a database table	Inserting new products or updating existent ones (Chapter 8)
Mapping (sub-transformation)	Runs a subtransformation	Calculating the top scores with a subtransformation (Chapter 11)
Mapping input specification	Specifies the input interface of a sub-transformation	Calculating the top scores with a subtransformation (Chapter 11)
Mapping output specification	Specifies the output interface of a sub-transformation	Calculating the top scores with a subtransformation (Chapter 11)
Modified Java Script Value	Allows you to code Javascript to modify or create new fields. It's also possible to code Java	Calculating Scores with JavaScript(Chapter 5); also in Chapters 6, 7, and 11
Number range	Creates ranges based on a numeric field	Capturing errors while calculating the age of a film (Chapter 7); also in Chapter 8
Regex Evaluation	Evaluates a field with a regular expression	Validating Genres with a Regex Evaluation step (Chapter 7); also in Chapter 12
Row denormaliser	Denormalises rows by looking up key-value pairs	Enhancing a films file by converting rows to columns (Chapter 6)
Row Normaliser	Normalises data de-normalised	Enhancing the matches file by normalizing the dataset (Chapter 6)
Select values	Selects, reorders, or removes fields. Also allows you to change the metadata of fields	Reading all your files at a time using a single Text file input step (Chapter 2); also in Chapters 3, 4, 6, 7, 8, 9, 11, and 12
Set Variables	Sets Kettle variables based on a single input row	Updating a file with news about examinations by setting a variable with the name of the file (Chapter 11); also in Chapter 12
Sort rows	Sorts rows based upon field values, ascending or descending	Reviewing examinations by using the Calculator step (Chapter 3); also in Chapters 4, 6, 7, 8, 9, and 11
Split field to rows	Splits a single string field and creates a new row for each split term	Counting frequent words by filtering (Chapter 3)
Split Fields	Splits a single field into more than one	Calculating World Cup statistics by grouping data (Chapter 3); also in Chapters 6 and 11
Stream lookup	Looks up values coming from another stream in the transformation	Finding out which language people speak (Chapter 3); also in Chapter 6
Switch / Case	Switches a row to a certain target step based on the value of a field	Assigning tasks by filtering priorities with the Switch/ Case step (Chapter 4)
Table input	Reads data from a database table	Getting data about shipped orders (Chapter 8); also in Chapters 9, 10, and 12
Table output	Writes data to a database table	Loading a table with a list of manufacturers (Chapter 8), also in Chapters 9 and 12
Text file input	Reads data from a text file	Reading all your files at a time using a single Text file input step (Chapter 2); also in Chapters 3, 5, 6, 7, 8, and 11
Text file output	Writes data to a text file	Sending the results of matches to a plain file (Chapter 2); also in Chapters 3, 7, 9, 10, and 11
Update	Updates data in a database table	Loading a region dimension with a Combination lookup/update step (Chapter 9)
Value Mapper	Maps values of a certain field from one value to another	Browsing PDI new features by copying a dataset (Chapter 4)

Job entries

The following table includes all the job entries used in the book. For a full list of job entries and their descriptions, select Help | Show job entries plug-in information in Spoon's main menu.

You can also visit http://wiki.pentaho.com/display/EAI/Pentaho+Data+Integration+v3.2.+Job+Entries for more information.

There you'll find a full job entries reference and some examples as well.

Name	Purpose	Time for action
Abort job	Aborts the job	Updating a file with news about examinations by setting a variable with the name of the file (Chapter 11)
Create a folder	Creates a folder	Creating a simple Hello world job (Chapter 10)
Delete file	Deletes a file	Generating custom files by executing a transformation for every input row (Chapter 11)
Evaluate rows number in a table	Evaluates the content of a table	Loading the dimensions for the sales datamart (Chapter 12)
File Exists	Checks if a file exists	Updating a file with news about examinations by setting a variable with the name of the file (Chapter 11)
Job	Executes a job	Generating the files with top scores by nesting jobs (Chapter 11); also in Chapter 12
Mail	Sends an e-mail	Sending a sales report and warning the administrator if something were wrong (Chapter 10)
Special entries	Start job entry; mandatory at the beginning of a job	Creating a simple Hello world job (Chapter 10); also in Chapters 11 and 12
Success	Forces the success of a job execution	Updating a file with news about examinations by setting a variable with the name of the file (Chapter 11); also in Chapter 12
Transformation	Executes a transformation	Creating a simple Hello world job (Chapter 10); also in Chapters 11 and 12

Note

Note that this appendix is just a quick reference. It's not meant at all for learning to use PDI. In order to learn from scratch, you should read the book starting from the first chapter.

The rest of the chapter is locked

You're reading from Pentaho 3.2 Data Integration: Beginner's Guide

Table of Contents (27) Chapters

Appendix C. Quick Reference: Steps and Job Entries

Note

Transformation steps

Job entries

Note

Unlock this book and the full library FREE for 7 days

Personalised recommendations for you