Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Pentaho 3.2 Data Integration: Beginner's Guide

You're reading from  Pentaho 3.2 Data Integration: Beginner's Guide

Product type Book
Published in Apr 2010
Publisher Packt
ISBN-13 9781847199546
Pages 492 pages
Edition 1st Edition
Languages

Table of Contents (27) Chapters

Pentaho 3.2 Data Integration Beginner's Guide
Credits
Foreword
The Kettle Project
About the Author
About the Reviewers
Preface
1. Getting Started with Pentaho Data Integration 2. Getting Started with Transformations 3. Basic Data Manipulation 4. Controlling the Flow of Data 5. Transforming Your Data with JavaScript Code and the JavaScript Step 6. Transforming the Row Set 7. Validating Data and Handling Errors 8. Working with Databases 9. Performing Advanced Operations with Databases 10. Creating Basic Task Flows 11. Creating Advanced Transformations and Jobs 12. Developing and Implementing a Simple Datamart 13. Taking it Further Working with Repositories Pan and Kitchen: Launching Transformations and Jobs from the Command Line Quick Reference: Steps and Job Entries Spoon Shortcuts Introducing PDI 4 Features Pop Quiz Answers Index

Time for action – logging into a repository


To log into an existent repository, follow these instructions:

  1. Launch Spoon.

  2. If the repository dialog window doesn't show up, select Repository | Connect to repository from the main menu. The repository dialog window appears.

  3. In the drop-down list, select the repository you want to log into.

  4. Type your username and password. If you have never created any users, use the default username and password—admin and admin. Click on OK.

  5. You will now be logged into the repository. You will see the name of the repository in the upper-left corner of Spoon:

What just happened?

You opened Spoon and logged into a repository. In order to do that, you provided the name of the repository and proper credentials. Once you did it, you were ready to start working with the repository.

Logging into a repository by using credentials

If you want to work with the repository storage system, you have to log into the repository before you begin your work. In order to do that, you have to choose the repository and provide a repository username and password.

The repository dialog that allows you to log into the repository can be opened from the main Spoon menu. If you intend to log into the repository often, you'd better select Edit | Options... and check the general option Show repository dialog at startup?. This will cause the repository dialog to always show up when you launch Spoon.

It is possible to log into the repository automatically. Let's assume you have a repository named MY_REPO and you use the default user. Add the following lines to the kettle.properties file:

KETTLE_REPOSITORY=MY_REPO
KETTLE_USER=admin
KETTLE_PASSWORD=admin

The next time you launch Spoon, you will be logged into the repository automatically.

Tip

For details about the kettle.properties file, refer to the section on Kettle variables in Chapter 2.

Note

Because the log information is exposed, auto login is not recommended.

Defining repository user accounts

To log into a repository, you need a user account. Every repository user has a profile that dictates the permissions that the user has on the repository. There are three predefined profiles:

Profile

Permissions

Read-only

Cannot create nor modify any element in the repository

User

Can create, modify, and delete any object in the repository excepting users and profiles

Administrator

Has full permissions, including creating new users and profiles

There are also two predefined users:

  • admin: A user with Administrator profile. This is the user you used to log into the repository for the first time. It has full permissions on the repository.

  • guest: A user with Read-only profile.

If you have Administrator profile, you can create, modify, rename, or delete users and profiles from the Repository explorer. For details, please refer to the section Examining and modifying the contents of a repository with the Repository explorer, later in this chapter. Any user may change his/her own user information both from the Repository explorer and from the Repository | Edit current user menu optio.

Creating transformations and jobs in repository folders

In a repository, the jobs and transformations are organized in folders. A folder in a repository fulfills the same purpose as a folder in your drive—it allows you to keep your work organized. Once you create a folder, you can save both transformations and jobs in it.

While connected to a repository you design, preview, and run jobs and transformations just as you do with files. However, there are some differences when it comes to opening, creating, or saving your work. So, let's summarize how you do those tasks when logged into a repository:

Task

Procedure

Open a transformation / job

Select File | Open. The Repository explorer shows up. Navigate the repository until you find the transformation or job you want to open. Double-click it.

Create a folder

Select Repository | Explore repository, expand the transformation or job tree, locate the parent folder, right-click and create the folder. Alternatively, double-click the parent folder.

Create a transformation

Select File | New | Transformation or press Ctrl+N.

Create a Job

Select File | New | Job or press Ctrl+Alt+N.

Save a transformation

Press Ctrl+T. Give a name to the transformation. In the Directory textbox, select the folder where the transformation is going to be saved. Press Ctrl+S. The transformation will now be saved in the selected directory under the given name.

Save a job

Press Ctrl+J. Give a name to the job. In the Directory textbox, select the folder where the job is going to be saved. Press Ctrl+S. The job will be saved in the selected directory under the given name.

Creating database connections, partitions, servers, and clusters

Besides users, profiles, jobs, and transformations, there are some additional PDI elements that you can define:

Element

Description

Database connections

Connection definitions to relational databases. These are covered in Chapter 8.

Partition schemas

Partitioning is a mechanism by which you send individual rows to different copies of the same step—for example, based on a field value.

This is an advanced topic not covered in this book.

Slave servers

Slave servers are installed in remote machines to execute jobs and transformations remotely. They are introduced in Chapter 13.

Clusters

Clusters are groups of slave servers that collectively execute a job or a transformation. They are also introduced in Chapter 13.

All these elements can also be created, modified, and deleted from the Repository explorer.

Once you create any of these elements, it is automatically shared by all repository users.

Backing up and restoring a repository

A PDI repository is a database. As such, you may regularly backup it with the utilities provided by the RDBMS. However, PDI offers you a method for creating a backup in an XML file.

You create a backup from the Repository explorer. Right-click the name of the repository and select Export all objects to an XML file. You will be asked for the name and location of the XML file that will contain the backup data. In order to back up a single folder, instead of right-clicking the repository name, right-click the name of the folder.

You can restore a backup made in an XML file also from the Repository explorer. Right-click the name of the repository and select Import all objects from an XML file. You will be asked for the name and location of the XML file that contains the backup.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}