Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Pentaho 3.2 Data Integration: Beginner's Guide

You're reading from  Pentaho 3.2 Data Integration: Beginner's Guide

Product type Book
Published in Apr 2010
Publisher Packt
ISBN-13 9781847199546
Pages 492 pages
Edition 1st Edition
Languages

Table of Contents (27) Chapters

Pentaho 3.2 Data Integration Beginner's Guide
Credits
Foreword
The Kettle Project
About the Author
About the Reviewers
Preface
Getting Started with Pentaho Data Integration Getting Started with Transformations Basic Data Manipulation Controlling the Flow of Data Transforming Your Data with JavaScript Code and the JavaScript Step Transforming the Row Set Validating Data and Handling Errors Working with Databases Performing Advanced Operations with Databases Creating Basic Task Flows Creating Advanced Transformations and Jobs Developing and Implementing a Simple Datamart Taking it Further Working with Repositories Pan and Kitchen: Launching Transformations and Jobs from the Command Line Quick Reference: Steps and Job Entries Spoon Shortcuts Introducing PDI 4 Features Pop Quiz Answers Index

Chapter 13. Taking it Further

The lessons learned in previous chapters gave you the basis of PDI. If you liked working with PDI and intend to use it in your own projects, there is much more ranging from applying best practices to using PDI integrated with the Pentaho BI Suite.

This chapter points you the right direction for taking it further. The chapter begins by giving you some advice to take into account in your daily work with PDI. After that it introduces you some advanced PDI concepts for you to know to what extent you can use the tool beyond the basics.

PDI best practices


If you intend to work seriously with PDI, knowing how to accomplish different tasks is not enough. Here are some guidelines that will help you go in the right direction.

  • Outline your ideas on paper before creating a transformation or a job:

    Don't drop steps randomly on the canvas trying to get things working. You could end up with a transformation or job that is difficult to understand and even useless.

  • Document your work:

    Write at least a simple description in the transformations and jobs setting windows. Replace the default names of steps and job entries with meaningful ones. Use notes to clarify the purpose of the transformations and jobs. Doing this, your work will be quite self documented.

  • Make your jobs and transformations clear to understand:

    Arrange the elements in the canvas so that it doesn't look like a puzzle to solve. Memorize the shortcuts for arrangement and alignment, and use them regularly. You'll find a full list in Appendix D, Spoon shortcuts.

  • Organize...

Getting the most out of PDI


Throughout the book you learned, step by step, how to use PDI for accomplishing several kinds of tasks— reading from different kinds of sources, writing back to them, transforming data in several ways, loading data into databases, and even loading a full data mart. You already have the knowledge and the experience to do anything you want or you need with PDI from now on. However, PDI offers you some more features that may be useful for you as well. The following sections will introduce them and will guide you so that you know where to look for in case they want to put them into practice.

Extending Kettle with plugins

As you could see while learning Kettle, there is a large set of steps and job entries to choose from when designing jobs and transformations. The number rises above 200 between steps and entries! If you still feel like you need more, there are more options—plugins.

Kettle plugins are basically steps or job entries that you install separately. The available...

Integrating PDI and the Pentaho BI suite


In this book you learned to use PDI standalone, but as mentioned in the first chapter, it is possible to use it integrated with the rest of the suite. There are a couple of options for doing so.

PDI as a process action

In Chapter 1 you were introduced to the Pentaho platform. Everything in the Pentaho platform is made by action sequences. An action sequence is, as its name suggests, a sequence of atomic actions that together accomplish small business processes.

Look at the following sample with regard to the Puzzle business:

Consider that you regularly receive updated price lists (one for each manufacturer) and you drop the files in a given folder. When you decide to hike the prices, you process one of those files and get a web-based report with the updated prices. You can implement that process with an action sequence.

There are four atomic actions in this sequence. You already know how to do the first and third actions (building the list of available...

PDI Enterprise Edition and Kettle Developer Support


Pentaho offers an Enterprise Edition of the Pentaho BI Suite and also for PDI. The PDI Enterprise Edition adds an Enterprise Console for performance monitoring, remote administration, and alerting. There are also a growing number of extra plugins for enterprise customers. In addition to the PDI extensions, customers get services and support, indemnification, software maintenance (fix versions, e.g. 3.2.2), and a knowledge base with additional technical resources.

Since the end of 2009, Pentaho also offers Kettle Developer Support for the Community Edition. With this, you can get direct assistance from the product experts for the design, development, and testing phases of the ETL lifecycle. This option is perfect for getting started, removing roadblocks, and troubleshooting ETL processes.

For further information, check the Pentaho site (www.pentaho.com).

Summary


This chapter provided you with a list of best practices to apply while working with PDI. If you follow the given advice, your work will not only be useful, but also flexible, reusable, documented, and neatly presented.

You were introduced to PDI plugins, a mechanism that allows you to customize the tool.

A quick review about remote execution and clustering was given for those interested in developing PDI in large environments.

Finally, an introduction was given showing you how PDI can be used not only as a standalone tool but can also be integrated with the Pentaho BI suite.

Some links and references were provided for those of you who, after reading the book and particularly this chapter, are anxious to learn more.

I hope you enjoyed reading the book and learning PDI, and will start using PDI to solve all your data requirements.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Pentaho 3.2 Data Integration: Beginner's Guide
Published in: Apr 2010 Publisher: Packt ISBN-13: 9781847199546
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}