Pentaho Data Integration 4 Cookbook


There is a newer version of this book available - Pentaho Data Integration Cookbook - Second Edition
Pentaho Data Integration 4 Cookbook
eBook: $26.99
Formats: PDF, PacktLib, ePub and Mobi formats
$22.94
save 15%!
Print + free eBook + free PacktLib access to the book: $71.98    Print cover: $44.99
$44.99
save 37%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Overview
Table of Contents
Author
Support
Sample Chapters
  • Manipulate your data by exploring, transforming, validating, integrating, and more
  • Work with all kinds of data sources such as databases, plain files, and XML structures among others
  • Use Kettle in integration with other components of the Pentaho Business Intelligence Suite
  • Each recipe is a carefully organized sequence of instructions packed with screenshots, tables, and tips to complete the task as efficiently as possible

Book Details

Language : English
Paperback : 352 pages [ 235mm x 191mm ]
Release Date : June 2011
ISBN : 1849515247
ISBN 13 : 9781849515245
Author(s) : Adrián Sergio Pulvirenti, María Carina Roldán
Topics and Technologies : All Books, Big Data and Business Intelligence, Data, Cookbooks, Java, Open Source

Table of Contents

Preface
Chapter 1: Working with Databases
Chapter 2: Reading and Writing Files
Chapter 3: Manipulating XML Structures
Chapter 4: File Management
Chapter 5: Looking for Data
Chapter 6: Understanding Data Flows
Chapter 7: Executing and Reusing Jobs and Transformations
Chapter 8: Integrating Kettle and the Pentaho Suite
Chapter 9: Getting the Most Out of Kettle
Appendix: Data Structures
Index
  • Chapter 1: Working with Databases
    • Introduction
    • Connecting to a database
    • Getting data from a database
    • Getting data from a database by providing parameters
    • Getting data from a database by running a query built at runtime
    • Inserting or updating rows in a table
    • Inserting new rows where a simple primary key has to be generated
    • Inserting new rows where the primary key has to be generated based on stored values
    • Deleting data from a table
    • Creating or altering a database table from PDI (design time)
    • Creating or altering a database table from PDI (runtime)
    • Inserting, deleting, or updating a table depending on a field
    • Changing the database connection at runtime
    • Loading a parent-child table
    • Chapter 2: Reading and Writing Files
      • Introduction
      • Reading a simple file
      • Reading several files at the same time
      • Reading unstructured files
      • Reading files having one field by row
      • Reading files with some fields occupying two or more rows
      • Writing a simple file
      • Writing an unstructured file
      • Providing the name of a file (for reading or writing) dynamically
      • Using the name of a file (or part of it) as a field
      • Reading an Excel file
      • Getting the value of specific cells in an
      • Excel file
      • Writing an Excel file with several sheets
      • Writing an Excel file with a dynamic number of sheets
      • Chapter 3: Manipulating XML Structures
        • Introduction
        • Reading simple XML files
        • Specifying fields by using XPath notation
        • Validating well-formed XML files
        • Validating an XML file against DTD definitions
        • Validating an XML file against an XSD schema
        • Generating a simple XML document
        • Generating complex XML structures
        • Generating an HTML page using XML and XSL transformations
        • Chapter 4: File Management
          • Introduction
          • Copying or moving one or more files
          • Deleting one or more files
          • Getting files from a remote server
          • Putting files on a remote server
          • Copying or moving a custom list of files
          • Deleting a custom list of files
          • Comparing files and folders
          • Working with ZIP files
          • Chapter 5: Looking for Data
            • Introduction
            • Looking for values in a database table
            • Looking for values in a database (with complex conditions or multiple tables involved)
            • Looking for values in a database with extreme flexibility
            • Looking for values in a variety of sources
            • Looking for values by proximity
            • Looking for values consuming a web service
            • Looking for values over an intranet or Internet
            • Chapter 6: Understanding Data Flows
              • Introduction
              • Splitting a stream into two or more streams based on a condition
              • Merging rows of two streams with the same or different structures
              • Comparing two streams and generating differences
              • Generating all possible pairs formed from two datasets
              • Joining two or more streams based on given conditions
              • Interspersing new rows between existent rows
              • Executing steps even when your stream is empty
              • Processing rows differently based on the row number
              • Chapter 7: Executing and Reusing Jobs and Transformations
                • Introduction
                • Executing a job or a transformation by setting static arguments and parameters
                • Executing a job or a transformation from a job by setting arguments and parameters dynamically
                • Executing a job or a transformation whose name is determined at runtime
                • Executing part of a job once for every row in a dataset
                • Executing part of a job several times until a condition is true
                • Creating a process flow
                • Moving part of a transformation to a subtransformation
                • Chapter 8: Integrating Kettle and the Pentaho Suite
                  • Introduction
                  • Creating a Pentaho report with data coming from PDI
                  • Configuring the Pentaho BI Server for running PDI jobs and transformations
                  • Executing a PDI transformation as part of a Pentaho process
                  • Executing a PDI job from the Pentaho User Console
                  • Generating files from the PUC with PDI and the CDA plugin
                  • Populating a CDF dashboard with data coming from a PDI transformation
                  • Chapter 9: Getting the Most Out of Kettle
                    • Introduction
                    • Sending e-mails with attached files
                    • Generating a custom log file
                    • Programming custom functionality
                    • Generating sample data for testing purposes
                    • Working with Json files
                    • Getting information about transformations and jobs (file-based)
                    • Getting information about transformations and jobs (repository-based)

                      Adrián Sergio Pulvirenti

                      Adrián Sergio Pulvirenti was born in Buenos Aires, Argentina, in 1972. He earned his Bachelor's degree in Computer Sciences at UBA, one of the most prestigious universities in South America. He has dedicated more than 15 years to developing desktop and web-based software solutions. Over the last few years he has been leading integration projects and development of BI solutions.

                      María Carina Roldán

                      María Carina, born in Esquel, Argentina, earned her Bachelor's degree in Computer Science at UNLP in La Plata and then moved to Buenos Aires where she has been living since 1994. She has worked as a BI consultant for almost 15 years. Over the last six, she has been dedicated full time to developing BI solutions using the Pentaho Suite. Currently, she works for Webdetails—a Pentaho company—as an ETL specialist. She is the author of Pentaho 3.2 Data Integration Beginner’s Guide book published by Packt Publishing in April 2009 and co-author of Pentaho Data Integration 4 Cookbook, also published by Packt Publishing in June 2011.
                      Sorry, we don't have any reviews for this title yet.

                      Code Downloads

                      Download the code and support files for this book.


                      Submit Errata

                      Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.

                      Sample chapters

                      You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

                      Frequently bought together

                      Pentaho Data Integration 4 Cookbook +    Metasploit Penetration Testing Cookbook =
                      50% Off
                      the second eBook
                      Price for both: £16.14

                      Buy both these recommended eBooks together and get 50% off the cheapest eBook.

                      What you will learn from this book

                      • Configure Kettle to connect to databases, explore them, and perform CRUD operations
                      • Read, write, and parse simple and unstructured files
                      • Solve common Excel needs such as reading from a particular cell or generating several sheets at a time
                      • Read, validate, and generate simple and complex XML structures
                      • Manipulate files by copying, deleting, compressing, or transferring to remote servers
                      • Look up information from different sources such as databases, web services, or spreadsheets among others
                      • Work with data flows performing operations such as joining, merging, or filtering rows
                      • Customize the Kettle logs to your needs
                      • Embed Java code in your transformations to gain performance and flexibility
                      • Execute and reuse transformations and jobs in different ways
                      • Integrate Kettle with Pentaho Reporting, Pentaho Dashboards, Community Data Access, and Pentaho BI Platform

                      In Detail

                      Pentaho Data Integration (PDI, also called Kettle), one of the data integration tools leaders, is broadly used for all kind of data manipulation such as migrating data between applications or databases, exporting data from databases to flat files, data cleansing, and much more. Do you need quick solutions to the problems you face while using Kettle?

                      Pentaho Data Integration 4 Cookbook explains Kettle features in detail through clear and practical recipes that you can quickly apply to your solutions. The recipes cover a broad range of topics including processing files, working with databases, understanding XML structures, integrating with Pentaho BI Suite, and more.

                      Pentaho Data Integration 4 Cookbook shows you how to take advantage of all the aspects of Kettle through a set of practical recipes organized to find quick solutions to your needs. The initial chapters explain the details about working with databases, files, and XML structures. Then you will see different ways for searching data, executing and reusing jobs and transformations, and manipulating streams. Further, you will learn all the available options for integrating Kettle with other Pentaho tools.

                      Pentaho Data Integration 4 Cookbook has plenty of recipes with easy step-by-step instructions to accomplish specific tasks. There are examples and code that are ready for adaptation to individual needs.

                      Learn to solve data manipulation problems using the Pentaho Data Integration tool Kettle.

                      Approach

                      This book has step-by-step instructions to solve data manipulation problems using PDI in the form of recipes. It has plenty of well-organized tips, screenshots, tables, and examples to aid quick and easy understanding.

                      Who this book is for

                      If you are a software developer or anyone involved or interested in developing ETL solutions, or in general, doing any kind of data manipulation, this book is for you. It does not cover PDI basics, SQL basics, or database concepts. You are expected to have a basic understanding of the PDI tool, SQL language, and databases.

                      Code Download and Errata
                      Packt Anytime, Anywhere
                      Register Books
                      Print Upgrades
                      eBook Downloads
                      Video Support
                      Contact Us
                      Awards Voting Nominations Previous Winners
                      Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
                      Resources
                      Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software