Chapter 13. Taking it Further
The lessons learned in previous chapters gave you the basis of PDI. If you liked working with PDI and intend to use it in your own projects, there is much more ranging from applying best practices to using PDI integrated with the Pentaho BI Suite.
This chapter points you the right direction for taking it further. The chapter begins by giving you some advice to take into account in your daily work with PDI. After that it introduces you some advanced PDI concepts for you to know to what extent you can use the tool beyond the basics.
If you intend to work seriously with PDI, knowing how to accomplish different tasks is not enough. Here are some guidelines that will help you go in the right direction.
Outline your ideas on paper before creating a transformation or a job:
Don't drop steps randomly on the canvas trying to get things working. You could end up with a transformation or job that is difficult to understand and even useless.
Document your work:
Write at least a simple description in the transformations and jobs setting windows. Replace the default names of steps and job entries with meaningful ones. Use notes to clarify the purpose of the transformations and jobs. Doing this, your work will be quite self documented.
Make your jobs and transformations clear to understand:
Arrange the elements in the canvas so that it doesn't look like a puzzle to solve. Memorize the shortcuts for arrangement and alignment, and use them regularly. You'll find a full list in Appendix D, Spoon shortcuts.
Organize...
Getting the most out of PDI
Throughout the book you learned, step by step, how to use PDI for accomplishing several kinds of tasks— reading from different kinds of sources, writing back to them, transforming data in several ways, loading data into databases, and even loading a full data mart. You already have the knowledge and the experience to do anything you want or you need with PDI from now on. However, PDI offers you some more features that may be useful for you as well. The following sections will introduce them and will guide you so that you know where to look for in case they want to put them into practice.
Extending Kettle with plugins
As you could see while learning Kettle, there is a large set of steps and job entries to choose from when designing jobs and transformations. The number rises above 200 between steps and entries! If you still feel like you need more, there are more options—plugins.
Kettle plugins are basically steps or job entries that you install separately. The available...
Integrating PDI and the Pentaho BI suite
In this book you learned to use PDI standalone, but as mentioned in the first chapter, it is possible to use it integrated with the rest of the suite. There are a couple of options for doing so.
In Chapter 1 you were introduced to the Pentaho platform. Everything in the Pentaho platform is made by action sequences. An
action sequence is, as its name suggests, a sequence of atomic actions that together accomplish small business processes.
Look at the following sample with regard to the Puzzle business:
Consider that you regularly receive updated price lists (one for each manufacturer) and you drop the files in a given folder. When you decide to hike the prices, you process one of those files and get a web-based report with the updated prices. You can implement that process with an action sequence.
There are four atomic actions in this sequence. You already know how to do the first and third actions (building the list of available...
PDI Enterprise Edition and Kettle Developer Support
Pentaho offers an Enterprise Edition of the Pentaho BI Suite and also for PDI. The PDI Enterprise Edition adds an Enterprise Console for performance monitoring, remote administration, and alerting. There are also a growing number of extra plugins for enterprise customers. In addition to the PDI extensions, customers get services and support, indemnification, software maintenance (fix versions, e.g. 3.2.2), and a knowledge base with additional technical resources.
Since the
end of 2009, Pentaho also offers Kettle Developer Support for the Community Edition. With this, you can get direct assistance from the product experts for the design, development, and testing phases of the ETL lifecycle. This option is perfect for getting started, removing roadblocks, and troubleshooting ETL processes.
For further information, check the Pentaho site (www.pentaho.com).
This chapter provided you with a list of best practices to apply while working with PDI. If you follow the given advice, your work will not only be useful, but also flexible, reusable, documented, and neatly presented.
You were introduced to PDI plugins, a mechanism that allows you to customize the tool.
A quick review about remote execution and clustering was given for those interested in developing PDI in large environments.
Finally, an introduction was given showing you how PDI can be used not only as a standalone tool but can also be integrated with the Pentaho BI suite.
Some links and references were provided for those of you who, after reading the book and particularly this chapter, are anxious to learn more.
I hope you enjoyed reading the book and learning PDI, and will start using PDI to solve all your data requirements.