Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Pentaho Data Integration Quick Start Guide
Pentaho Data Integration Quick Start Guide

Pentaho Data Integration Quick Start Guide: Create ETL processes using Pentaho

By María Carina Roldán
€19.99 €13.98
Book Aug 2018 178 pages 1st Edition
eBook
€19.99 €13.98
Print
€24.99
Subscription
€14.99 Monthly
eBook
€19.99 €13.98
Print
€24.99
Subscription
€14.99 Monthly

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Aug 30, 2018
Length 178 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781789343328
Vendor :
Pentaho
Category :
Table of content icon View table of contents Preview book icon Preview Book

Pentaho Data Integration Quick Start Guide

Chapter 1. Getting Started with PDI

Pentaho Data Integration (PDI) is a popular business intelligence tool, used for exploring, transforming, validating, and migrating data, along with other useful operations. PDI allows you to perform all of the preceding tasks thanks to its friendly user interface, modern architecture, and rich functionality. This book will introduce you to the tool, giving you a quick understanding of the daily tasks that you can perform with it.

We will cover the following topics in this chapter:

  • Introducing PDI
  • Installing PDI
  • Configuring the graphical designer tool
  • Creating a simple transformation
  • Understanding the Kettle home directory

Introducing PDI


PDI, also known as Kettle, is a very powerful tool. It can be used for performing typical Extract, Transform, and Load (ETL) processes. PDI gets data from different sources and manipulates it in many ways (deduplicating, filtering, cleaning, and formatting, among others), saving the data in different formats and destinations. The following diagram illustrates a very simple example of an ETL process designed with PDI:

ETL process

Aside from the preceding processes, PDI serves to migrate data between applications, access and manipulate real-time data, access data in the cloud, orchestrate administrative tasks, and more.

Installing PDI


The following are the instructions to install the PDI Community Edition (CE), irrespective of the operating system that you may be using:

  • Make sure that you have JRE 8.0 installed.

Note

If you don't have JRE 8.0 installed, download it from http://www.java.com Redash source code by cloning the repository, and install it before proceeding. Make sure that the JAVA_HOME system variable is set.

PDI on SourceForge.net

  • Download the available ZIP file, which will serve you for all platforms.
  • Unzip the downloaded file in a folder of your choice (for example, c:/software/pdi or /home/pdi_user/pdi).
  • Browse your disk and look for the PDI folder that was just created. You will see a folder named data-integration, with several subfolders (lib, plugins, samples, and more) and a bunch of scripts (spoon.bat, pan.bat, and others), which we will soon learn how to use.

Configuring the graphical designer tool


Spoon is PDI's desktop designer tool. With Spoon, you can design, preview, and test all of your work (that is, transformations and jobs).

Before starting to work with PDI, it's advisable to take a look at the Spoon interface and do some minimal configuration. The instructions are as follows:

  • Start Spoon: If your system is Windows, run Spoon.bat from within the PDI installation directory. On other platforms, such as Unix, Linux, and so on, open a Terminal window and type spoon.sh.
  • The main window will show up, with a Welcome! window already open, as shown in the following screenshot:

Welcome page

Note

The Welcome! page includes some links to web resources, forums, and more, as well as some shortcuts for working with PDI. You can reach that window at any time by navigating to the Help Welcome Screen option.

In order to customize Spoon, do the following:

  • Click on Options... in the Tools menu. A window appears, where you can change various general characteristics, as follows:

Options

 

  • Many of the options in this tab will not make sense to you yet. Instead of doing anything here, select the tab Look & Feel:

Look & Feel options

  • Feel free to change any of the options in this tab (for example, the font color or size). Click on the OK button.
  • Restart Spoon to apply the changes.

Creating a simple transformation


Transformations and jobs are the main PDI artifacts. Transformations are data-flow oriented entities, while jobs are task-oriented. In this book, we will start by learning all about transformations, focusing on jobs later. To get a quick idea of what, exactly, a transformation is, we will start by creating a simple one. This will also allow you to see what it's like to work with Spoon.

Our first transformation will find out the current version of PDI (Kettle), and will print the value to the log. Proceed as follows:

  • On the Welcome page, click on the New transformation link, located under the WORK link group. Alternatively, press Ctrl + N.
  • A new tab will appear, with the title Transformation 1. It's in this tab that you will create your work.
  • To the left of the screen, under the Design tab, you'll see a tree of folders. Expand the Input folder by double-clicking on it.

Note

Note that if you work in macOS, a single click is enough.

  • Then, left-click on the Get System Info icon, and, without releasing the button, drag and drop the selected icon to the work area (that is, the blank area that occupies almost all of the screen). You should see something like this:

Dragging and dropping a step

  • Double-click on the Get System Info icon. A configuration window will show up. Fill in the first row in the grid, as shown in the following screenshot. Note that you don't have to type the Kettle version. Instead, you can choose it from a list of available options:

Configuring the Get System Info step

  • In the Design tab, double-click on the Utility folder, click on the Write to log icon, and drag and drop it to the work area.
  • Put the mouse cursor over the Get System Info icon and wait until a tiny toolbar shows up, as shown in the following screenshot:

Mouseover assistance toolbar

  • Click on the output connector (the icon highlighted in the preceding image) and drag it towards the Write to log icon. A greyed hop is displayed.
  • When the mouse cursor is over the Write to log step, release the button. A link (a hop, from now on) is created, from the first step to the second one. The screen should look as follows:

Connecting steps with a hop

Let's add some color note to our work, as follows:

  • Right-click anywhere in the work area to bring up a contextual menu.
  • In the menu, select the New Note... option. A note editor will appear.
  • Type a description, such as My first transformation. Select the Font style tab and choose a nice font and some colors for your note, and then click on OK. The following should be the final result:

My first transformation

  • Save the transformation by pressing Ctrl + S. PDI will ask for a destination folder. Select the folder of your choice, and give the transformation a name. PDI will save the transformation as a file with a ktr extension (for example, sample_transformation.ktr).

Finally, let's run the transformation to see what happens:

  • Click on the Run icon, located in the transformation toolbar:

Run icon in the transformation toolbar

  • A window named Run Options will appear. Click on Run.
  • At the bottom of the screen, you should see a log with the results of the execution:

Execution Results

Understanding the Kettle home directory


When you run Spoon for the first time, a folder named .kettle is created in your home directory by default. This folder is referred to as the Kettle home directory.

The folder contains several configuration files, mainly created and updated by the different PDI tools. Among these files, there is the kettle.properties file.

The purpose of the kettle.properties file – created along with the .kettle folder, the first time you run Spoon – is to contain variable definitions with a broad scope: Java Virtual Machine. Therefore, it's the perfect place to define general settings; some examples are as follows:

  • Database connection settings: host, database name, and so on
  • SMTP settings: SMTP server, port, and so on
  • Common input and output folders
  • Directory to send log files to

Before continuing, let's add some variables to the file. Suppose that you have two folders, named C:/PDI/INPUT and C:/PDI/OUTPUT, which you will use for storing files. The objective will be to add two variables, named INPUT_FOLDER and OUTPUT_FOLDER, containing those values:

  1. Locate the Kettle home directory. If you work in Windows, the folder could be C:\Documents and Settings\<your_name> or C:\Users\<your_name>, depending on which Windows version you have. If you work in Linux (or similar) or macOS, the folder will most likely be /home/<your_name>/.
  2. Edit the kettle.properties file. You will see that it only contains commented sample lines.
  3. You can safely remove the contents of the file and define your own variables by typing the following lines:
       INPUT_FOLDER=C:/PDI/INPUT
       OUTPUT_FOLDER=C:/PDI/OUTPUT

Save the file and restart Spoon, so that it can recognize the variables defined in the file. We will learn how to use these variables in Chapter 2Getting Familiar with Spoon.

 

 

Summary


In this chapter, you were introduced to Pentaho Data Integration. Specifically, you learned what PDI is, and you installed the tool. You were introduced to Spoon, PDI's graphical designer tool, and you created your first transformation. You were also introduced to the Kettle home directory and the kettle.properties file, which will be used throughout the rest of the book.

In Chapter 2, Getting Familiar with Spoon, you will learn much more about the process of creating, testing, and running transformations in Spoon.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Take away the pain of starting with a complex and powerful system
  • Simplify your data transformation and integration work
  • Explore, transform, and validate your data with Pentaho Data Integration

Description

Pentaho Data Integration(PDI) is an intuitive and graphical environment packed with drag and drop design and powerful Extract-Transform-Load (ETL) capabilities. Given its power and flexibility, initial attempts to use the Pentaho Data Integration tool can be difficult or confusing. This book is the ideal solution. This book reduces your learning curve with PDI. It provides the guidance needed to make you productive, covering the main features of Pentaho Data Integration. It demonstrates the interactive features of the graphical designer, and takes you through the main ETL capabilities that the tool offers. By the end of the book, you will be able to use PDI for extracting, transforming, and loading the types of data you encounter on a daily basis.

What you will learn

Design, preview and run transformations in Spoon Run transformations using the Pan utility Understand how to obtain data from different types of files Connect to a database and explore it using the database explorer Understand how to transform data in a variety of ways Understand how to insert data into database tables Design and run jobs for sequencing tasks and sending emails Combine the execution of jobs and transformations

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Aug 30, 2018
Length 178 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781789343328
Vendor :
Pentaho
Category :

Table of Contents

15 Chapters
Title Page Chevron down icon Chevron up icon
Copyright and Credits Chevron down icon Chevron up icon
Dedication Chevron down icon Chevron up icon
Packt Upsell Chevron down icon Chevron up icon
Foreword Chevron down icon Chevron up icon
Contributors Chevron down icon Chevron up icon
Preface Chevron down icon Chevron up icon
Getting Started with PDI Chevron down icon Chevron up icon
Getting Familiar with Spoon Chevron down icon Chevron up icon
Extracting Data Chevron down icon Chevron up icon
Transforming Data Chevron down icon Chevron up icon
Loading Data Chevron down icon Chevron up icon
Orchestrating Your Work Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Filter icon Filter
Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.