Chapter 13. Implementing Metadata Injection
This chapter is about a powerful feature of Pentaho Data Integration (PDI): metadata injection, which is basically about injecting metadata into a template Transformation at runtime. In this chapter, we will explain the motivation behind this feature and then we will give a couple of practical examples for you to learn how to implement this feature.
We will be covering the following topics in this chapter:
- Introducing metadata injection
- Discovering metadata and injecting it
- Identifying use cases to implement metadata injection
Introducing metadata injection
Throughout the book, we have been talking about PDI metadata, the data that describes the PDI datasets. Metadata includes field names and data types, among other attributes. Inside PDI, metadata not only refers to datasets, but also to other entities. For example, the definition of an input file—name, description, columns--is also considered as metadata.
You usually define the metadata in the configuration windows of the different steps. You do this manually while you are developing or modifying a Transformation in Spoon. This works perfectly when you know exactly how the data looks like—for example, when you are reading a file—or how you want it to be—for example, when you are creating new fields. There are situations where this is not the case, and you don't know the metadata until runtime. This is a kind of situation where metadata injection can help.
Explaining how metadata injection works
Let's see how metadata injection works through a very simple example...
Discovering metadata and injecting it
Let's move to a use case a bit more elaborate than the previous one. We will continue working with sales data. In this case, we will work with an Excel file named sales_data.xls
, which has a single sheet. There are several fields in this file, but we are only interested in the following: PRODUCTLINE
, PRODUCTCODE
, and QUANTITYORDERED
. The problem is that the fields can be in any order in the Excel file. We will only know the order when we read the file.
In the same way as before, we need to create a template with missing data and then a Transformation that injects that data.
Let's start with the template. As we don't have the list of fields, we will fill the Fields
grid with generic names—HEADER1
, HEADER2
, and so on. We have to select and keep only three fields. For this, we will use a Select Values
step and leave the task of filling it to the Transformation that injects the missing data:
- Add a
Microsoft Excel input
step, and configure...
Identifying use cases to implement metadata injection
So far, we used injection to deal with dynamic sources. The opposite could have been dealing with dynamic targets. An example of this is generating files with a variable number of fields.
Metadata injection can also be used to reduce repetitive tasks. A typical example is the loading of text files into staging tables. Suppose that you have a text file that you want to load into a staging table. Besides the specific task of loading the table, you want to apply some validations—for example, checking for non-null values, storing audit information such as user and timestamp for the execution, counting the number of processed rows and log in a result table, among other tasks.
Now suppose that you have to do this for a considerable quantity of different files. You could take this process as the base and start copying and pasting, adapting the process for each file. This is, however, not a good idea for a list of reasons:
- It is time-consuming
- It...
In this chapter, you learned the basics about metadata injection. You learned what metadata injection is about and how it works. After that, you developed a couple of examples with PDI, which will serve as patterns for implementing your own solutions. Finally, you were introduced to use cases where injection can be useful.
By learning metadata injection, you already have all the knowledge to create advanced transformations. In the next chapter, we will switch back to jobs to continue learning advanced concepts.