Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Pentaho 3.2 Data Integration: Beginner's Guide

You're reading from  Pentaho 3.2 Data Integration: Beginner's Guide

Product type Book
Published in Apr 2010
Publisher Packt
ISBN-13 9781847199546
Pages 492 pages
Edition 1st Edition
Languages

Table of Contents (27) Chapters

Pentaho 3.2 Data Integration Beginner's Guide
Credits
Foreword
The Kettle Project
About the Author
About the Reviewers
Preface
1. Getting Started with Pentaho Data Integration 2. Getting Started with Transformations 3. Basic Data Manipulation 4. Controlling the Flow of Data 5. Transforming Your Data with JavaScript Code and the JavaScript Step 6. Transforming the Row Set 7. Validating Data and Handling Errors 8. Working with Databases 9. Performing Advanced Operations with Databases 10. Creating Basic Task Flows 11. Creating Advanced Transformations and Jobs 12. Developing and Implementing a Simple Datamart 13. Taking it Further Working with Repositories Pan and Kitchen: Launching Transformations and Jobs from the Command Line Quick Reference: Steps and Job Entries Spoon Shortcuts Introducing PDI 4 Features Pop Quiz Answers Index

Time for action – creating a PDI repository


To create a repository, follow these steps:

  1. Open MySQL Command Line Client.

  2. In the command window, type the following:

    CREATE DATABASE PDI_REPO;
  3. Open Spoon.

  4. If the repository dialog appears, skip to step 6.

  5. Open the repository dialog from the Repository | Connect to repository menu.

  6. Click on New to create a new repository. The repository information dialog shows up. Click on New to create a new database connection.

  7. The database connection window appears. Define a connection to the database you have just created and give a name to the connection— PDI_REPO_CONN in this case.

    Tip

    If you want to refer to the steps on creating the database connection, check out Time for action – creating a connection to the Steel Wheels database section in Chapter 8.

  8. Test the connection to see that it is properly configured.

  9. Click OK to close the database connection window. The Select database connection box will show the created connection.

  10. Give the name MY_REPO to the repository. As description, type My first repository.

  11. Click on Create or Upgrade.

  12. PDI will ask you if you are sure you want to create the repository on the specified database connection. Answer Yes if you are sure of the settings you entered.

  13. A dialog appears asking if you want to do a dry run to evaluate the generated SQL before execution.

  14. Answer No unless you want to preview the SQL that will create the reposprogress window appears showing you the progress while the repository is being created.

  15. Finally, you see a window with the message Kettle created the repository on the specified connection. Close the dialog window.

  16. Click on OK to close the repository information window. You will be back in the repository dialog, this time with a new repository available in the repository drop-down list.

  17. If you want to start working with the created repository, please refer to the Working with the repository storage system section. If not, click on No Repository. This will close the window.

What just happened?

In MySQL you created a new database named PDI_REPO. Then you used that database to create a PDI repository.

Creating repositories to store your transformationand jobs

A Kettle repository is a database that provides you with a storage system for your transformations and jobs. The repository is the alternative to the *.ktr and *.kjb file-based system.

In order to create a new repository, a database must have been created previously. In the tutorial, the repository was created in a MySQL RDBMS. However, you can create your repositories in any relational database.

Note

The PDI repository database should be used exclusively for its purpose!

Note that if the repository has already been created from another machine or by another user, that is, another profile in the operating system, you don't have to create the repository again. In that case, just define the connection to the repository but don't create it again. In other words, follow all the instructions but don't click the Create or Upgrade button.

Once you have created a repository, its name, description, and connection information are stored in a file named repositories.xml, which is located in the PDI home directory. The repository database is populated with a bunch of tables with familiar names such as transformation, job, steps, and steps_type.

Note that you may have more than one repository—different repositories for different projects, different repositories for different versions of a project, a repository just for testing new PDI features, and another for serious development, and so on. Therefore, it is important that you give the repositories meaningful names and descriptions so that you don't get confused if you have more than one.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}