Table of Contents
Preface
Chapter 1: Working with Databases
Chapter 2: Reading and Writing Files
Chapter 3: Manipulating XML Structures
Chapter 4: File Management
Chapter 5: Looking for Data
Chapter 6: Understanding Data Flows
Chapter 7: Executing and Reusing Jobs and Transformations
Chapter 8: Integrating Kettle and the Pentaho Suite
Chapter 9: Getting the Most Out of Kettle
Appendix: Data Structures
Index
- Chapter 1: Working with Databases
- Introduction
- Connecting to a database
- Getting data from a database
- Getting data from a database by providing parameters
- Getting data from a database by running a query built at runtime
- Inserting or updating rows in a table
- Inserting new rows where a simple primary key has to be generated
- Inserting new rows where the primary key has to be generated based on stored values
- Deleting data from a table
- Creating or altering a database table from PDI (design time)
- Creating or altering a database table from PDI (runtime)
- Inserting, deleting, or updating a table depending on a field
- Changing the database connection at runtime
- Loading a parent-child table
- Chapter 2: Reading and Writing Files
- Introduction
- Reading a simple file
- Reading several files at the same time
- Reading unstructured files
- Reading files having one field by row
- Reading files with some fields occupying two or more rows
- Writing a simple file
- Writing an unstructured file
- Providing the name of a file (for reading or writing) dynamically
- Using the name of a file (or part of it) as a field
- Reading an Excel file
- Getting the value of specific cells in an
- Excel file
- Writing an Excel file with several sheets
- Writing an Excel file with a dynamic number of sheets
- Chapter 3: Manipulating XML Structures
- Introduction
- Reading simple XML files
- Specifying fields by using XPath notation
- Validating well-formed XML files
- Validating an XML file against DTD definitions
- Validating an XML file against an XSD schema
- Generating a simple XML document
- Generating complex XML structures
- Generating an HTML page using XML and XSL transformations
- Chapter 4: File Management
- Introduction
- Copying or moving one or more files
- Deleting one or more files
- Getting files from a remote server
- Putting files on a remote server
- Copying or moving a custom list of files
- Deleting a custom list of files
- Comparing files and folders
- Working with ZIP files
- Chapter 5: Looking for Data
- Introduction
- Looking for values in a database table
- Looking for values in a database (with complex conditions or multiple tables involved)
- Looking for values in a database with extreme flexibility
- Looking for values in a variety of sources
- Looking for values by proximity
- Looking for values consuming a web service
- Looking for values over an intranet or Internet
- Chapter 6: Understanding Data Flows
- Introduction
- Splitting a stream into two or more streams based on a condition
- Merging rows of two streams with the same or different structures
- Comparing two streams and generating differences
- Generating all possible pairs formed from two datasets
- Joining two or more streams based on given conditions
- Interspersing new rows between existent rows
- Executing steps even when your stream is empty
- Processing rows differently based on the row number
- Chapter 7: Executing and Reusing Jobs and Transformations
- Introduction
- Executing a job or a transformation by setting static arguments and parameters
- Executing a job or a transformation from a job by setting arguments and parameters dynamically
- Executing a job or a transformation whose name is determined at runtime
- Executing part of a job once for every row in a dataset
- Executing part of a job several times until a condition is true
- Creating a process flow
- Moving part of a transformation to a subtransformation
- Chapter 8: Integrating Kettle and the Pentaho Suite
- Introduction
- Creating a Pentaho report with data coming from PDI
- Configuring the Pentaho BI Server for running PDI jobs and transformations
- Executing a PDI transformation as part of a Pentaho process
- Executing a PDI job from the Pentaho User Console
- Generating files from the PUC with PDI and the CDA plugin
- Populating a CDF dashboard with data coming from a PDI transformation
- Chapter 9: Getting the Most Out of Kettle
- Introduction
- Sending e-mails with attached files
- Generating a custom log file
- Programming custom functionality
- Generating sample data for testing purposes
- Working with Json files
- Getting information about transformations and jobs (file-based)
- Getting information about transformations and jobs (repository-based)
- Appendix: Data Structures
- Book's data structure
- Museum's data structure
- Outdoor data structure
- Steel Wheels structure


