Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Learning Pentaho Data Integration 8 CE - Third Edition

You're reading from  Learning Pentaho Data Integration 8 CE - Third Edition

Product type Book
Published in Dec 2017
Publisher Packt
ISBN-13 9781788292436
Pages 500 pages
Edition 3rd Edition
Languages

Table of Contents (23) Chapters

Title Page
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
Getting Started with Pentaho Data Integration Getting Started with Transformations Creating Basic Task Flows Reading and Writing Files Manipulating PDI Data and Metadata Controlling the Flow of Data Cleansing, Validating, and Fixing Data Manipulating Data by Coding Transforming the Dataset Performing Basic Operations with Databases Loading Data Marts with PDI Creating Portable and Reusable Transformations Implementing Metadata Injection Creating Advanced Jobs Launching Transformations and Jobs from the Command Line Best Practices for Designing and Deploying a PDI Project

Chapter 13. Implementing Metadata Injection

This chapter is about a powerful feature of Pentaho Data Integration (PDI): metadata injection, which is basically about injecting metadata into a template Transformation at runtime. In this chapter, we will explain the motivation behind this feature and then we will give a couple of practical examples for you to learn how to implement this feature.

We will be covering the following topics in this chapter:

  • Introducing metadata injection
  • Discovering metadata and injecting it
  • Identifying use cases to implement metadata injection

Introducing metadata injection


Throughout the book, we have been talking about PDI metadata, the data that describes the PDI datasets. Metadata includes field names and data types, among other attributes. Inside PDI, metadata not only refers to datasets, but also to other entities. For example, the definition of an input file—name, description, columns--is also considered as metadata.

You usually define the metadata in the configuration windows of the different steps. You do this manually while you are developing or modifying a Transformation in Spoon. This works perfectly when you know exactly how the data looks like—for example, when you are reading a file—or how you want it to be—for example, when you are creating new fields. There are situations where this is not the case, and you don't know the metadata until runtime. This is a kind of situation where metadata injection can help.

Explaining how metadata injection works

Let's see how metadata injection works through a very simple example...

Discovering metadata and injecting it


Let's move to a use case a bit more elaborate than the previous one. We will continue working with sales data. In this case, we will work with an Excel file named sales_data.xls, which has a single sheet. There are several fields in this file, but we are only interested in the following: PRODUCTLINE, PRODUCTCODE, and QUANTITYORDERED. The problem is that the fields can be in any order in the Excel file. We will only know the order when we read the file.

In the same way as before, we need to create a template with missing data and then a Transformation that injects that data.

Let's start with the template. As we don't have the list of fields, we will fill the Fields grid with generic names—HEADER1, HEADER2, and so on. We have to select and keep only three fields. For this, we will use a Select Values step and leave the task of filling it to the Transformation that injects the missing data:

  1. Create a Transformation.
  1. Add a Microsoft Excel input step, and configure...

Identifying use cases to implement metadata injection


So far, we used injection to deal with dynamic sources. The opposite could have been dealing with dynamic targets. An example of this is generating files with a variable number of fields.

Metadata injection can also be used to reduce repetitive tasks. A typical example is the loading of text files into staging tables. Suppose that you have a text file that you want to load into a staging table. Besides the specific task of loading the table, you want to apply some validations—for example, checking for non-null values, storing audit information such as user and timestamp for the execution, counting the number of processed rows and log in a result table, among other tasks. 

Now suppose that you have to do this for a considerable quantity of different files. You could take this process as the base and start copying and pasting, adapting the process for each file. This is, however, not a good idea for a list of reasons:

  • It is time-consuming
  • It...

Summary


In this chapter, you learned the basics about metadata injection. You learned what metadata injection is about and how it works. After that, you developed a couple of examples with PDI, which will serve as patterns for implementing your own solutions. Finally, you were introduced to use cases where injection can be useful.

By learning metadata injection, you already have all the knowledge to create advanced transformations. In the next chapter, we will switch back to jobs to continue learning advanced concepts.

 

 

 

lock icon The rest of the chapter is locked
You have been reading a chapter from
Learning Pentaho Data Integration 8 CE - Third Edition
Published in: Dec 2017 Publisher: Packt ISBN-13: 9781788292436
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}