Reader small image

You're reading from  Pentaho Data Integration Quick Start Guide

Product typeBook
Published inAug 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781789343328
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
María Carina Roldán
María Carina Roldán
author image
María Carina Roldán

María Carina Roldán was born in Argentina and has a bachelor's degree in computer science. She started working with Pentaho back in 2006. She spent all these years developing BI solutions, mainly as an ETL specialist, and working for different companies around the world. Currently, she lives in Buenos Aires and works as an independent consultant. Carina is the author of Learning Pentaho Data Integration 8 CE, published by Packt in December 2017. She has also authored other books on Pentaho, all of them published by Packt.
Read more about María Carina Roldán

Right arrow

Getting data from other sources


So far, we have been getting data from plain files and databases. These are two of the most common data sources, but there are many more kinds of sources available in PDI, mainly grouped in, but not limited to, the Input folder. The following subsections will present some of the sources that we didn't cover in the previous sections, which are also of use.

XML and JSON

With PDI, you can read XML files or parse fields whose contents are in an XML structure. In both cases, you parse the XML with the Get data from XML input step. For specifying the fields to read, you use XPath notation. When the XML is very big or complex, there is an alternative step:XML Input Stream (StAX).

Similarly, you can parse JSON structures with the JSON Input step. For specifying the fields in this case, you use JSONPath notation.

Also, you can parse both XML and JSON structures with JavaScript or Java code, by using the Modified Java Script Value step or the User Defined Java Class step...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Pentaho Data Integration Quick Start Guide
Published in: Aug 2018Publisher: PacktISBN-13: 9781789343328

Author (1)

author image
María Carina Roldán

María Carina Roldán was born in Argentina and has a bachelor's degree in computer science. She started working with Pentaho back in 2006. She spent all these years developing BI solutions, mainly as an ETL specialist, and working for different companies around the world. Currently, she lives in Buenos Aires and works as an independent consultant. Carina is the author of Learning Pentaho Data Integration 8 CE, published by Packt in December 2017. She has also authored other books on Pentaho, all of them published by Packt.
Read more about María Carina Roldán