Reader small image

You're reading from  Talend Open Studio Cookbook

Product typeBook
Published inOct 2013
Reading LevelIntermediate
PublisherPackt
ISBN-139781782167266
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Rick Barton
Rick Barton
author image
Rick Barton

Rick Barton is a freelance consultant who has specialized in data integration and ETL for the last 13 years as part of an IT career spanning over 25 years. After gaining a degree in Computer Systems from Cardiff University, he began his career as a firmware programmer before moving into Mainframe data processing and then into ETL tools in 1999. He has provided technical consultancy to some of the UKs largest companies, including banks and telecommunications companies, and was a founding partner of a Big Data integration consultancy. Four years ago he moved back into freelance development and has been working almost exclusively with Talend Open Studio and Talend Integration Suite, on multiple projects, of various sizes, in UK. It is on these projects that he has learned many of the lessons that can be found in this, his first book.
Read more about Rick Barton

Right arrow

Chapter 9. Working with XML, Queues, and Web Services

This chapter describes some of the features of the Talend data integration suite that interfaces with technologies used in the Talend ESB (Enterprise Service Bus) Studio. We will cover the following recipes in this chapter:

  • Using tXMLMap to read XML

  • Using tXMLMap to create an XML document

  • Reading complex hierarchical XML

  • Writing complex XML

  • Calling a SOAP web service

  • Calling a RESTful web service

  • Reading and writing to a queue

  • Ensuring lossless queue using sessions

Introduction


It is for this chapter that we are using Talend Studio for ESB. This chapter is an amalgam of tools and techniques associated with low latency or real-time processing. It also covers the areas where the Talend DI tool set overlaps with the Talend ESB tool set.

But first, let's look at some of the key principles required for this chapter:

  • tXMLMap: tXMLMap is the XML equivalent of tMap, providing most of the same functionality as tMap, but with the added ability to process XML data as well.

  • XPATH: tXMLMap is good for moderately complex XML; however, processing complex multi-level XML is more complex. This is where XPATH is used: to decompose the input XML into more manageable chunks.

  • tXMLOutput, tWriteXMLField: These components are used to create complex multi-level XML structures from flat structures.

  • Web services: The Talend studio for ESB provides simple to use capabilities for creating and consuming both SOAP and RESTful web services.

  • Message Queues: Talend ESB contains a copy of...

Using tXMLMap to read XML


This recipe shows how we can convert an XML record stored in a file into a format that is readable by tXMLMap, and how we can then read and process the data in the XML record.

Getting ready

Open the job jo_cook_ch09_0010_readXMLFile.

How to do it...

The first stage of this process is to convert the XML file into Java Document format for use by the downstream component.

  1. Drag a tFileInputXML component onto the canvas.

  2. Edit the schema and add a column named payload. Make it a type of Document, as shown in the screenshot:

  3. Open the tFileInputXML component and change the File name/Stream field to context.cookbookData+"/chapter9/chapter09_jo_0010_customerData.xml".

  4. Change the Loop Xpath query field to "/".

  5. Add an Xpath query of ".", and tick the box Get Nodes.

  6. Your tFileInputXML should look like the one shown in the next screenshot:

    Reading using tXMLMap

  7. Add a tXMLMap component to the canvas and link to the tFileInputXML component.

  8. Open the tXMLMap component and right-click on payload...

Using tXMLMap to create an XML document


This recipe is the reverse of the previous recipe, in that we'll be reading in a flat format and converting it to an XML document for output. It is recommended that you have understood the previous recipe prior to attempting this one.

Getting ready

Open the job jo_cook_ch09_0020_createXMLDocument file.

How to do it...

The first stage of the process is to convert the input data into a Java Document that can store the XML.

  1. Drag a tXMLMap component onto the canvas, and link the tFileInputDelimited component to it.

  2. Create an output table named customerDocumentOut, and add a field named payload. Make the field a type of Document.

  3. You will see that the field in the output table has changed to become a simple XML structure.

  4. As we did in the previous recipe, retrieve the XML format from the file containing our target XML structure.

  5. Drag the fields from input to output, and set the countryOfResidence component to UK.

  6. Your tXMLMap component should look like the one in...

Reading complex hierarchical XML


The first two recipes show how tXMLMap can be used to map between XML formats visually, much like the tMap component; however, it can become overly complex and difficult to manage when there are multiple levels of hierarchy and multiple loops within the XML. This recipe shows how we can deconstruct a more complex XML record into individual sets of data while ensuring that the hierarchical relationships between the data are not lost.

Getting ready

Open the job jo_cook_ch09_0040_readComplexXML file. If you view the input file chapter09_jo_0040_orderDate.xml, you will see that we have a hierarchy of customer that has many orders, and orders have many items.

How to do it...

First, we will create a customer schema using the XML schema wizard.

  1. In the metadata panel under File XML | Chapter 09 right-click and select the option Create file xml.

  2. Name the XML file sc_cook_ch9_0040_XMLorderDataCustomer.

  3. Select Input XML, then click on Next.

  4. Click on Browser to select the XML...

Writing complex XML


This is a very useful recipe for building complex XML structures containing many looping elements and deep hierarchies, and once the principles are understood, it is simple and quick to implement. If you use XML frequently, then we hope that this will become one of your staple recipes.

To make this exercise more understandable, it is necessary to understand a little about the method prior to using it.

Understanding the XML structure

The XML structure we are aiming to create is shown in the following screenshot:

As you can see, a customer can have many orders, and an order can have many order items.

Node

Note

I am using the term node to describe an XML tag that contains one or more other tags or nodes. For example a customer node may contain many order nodes.

Method

We will build a three-tier XML structure building the hierarchy one level at a time:

  1. First, we build the customer node.

  2. Then, we build an orders node that contains many individual order nodes and add them as a child of...

Calling a SOAP web service


This recipe shows how a SOAP-based web service can be called from Talend. We will be using a very simple Talend web service that will return the weather conditions in a given city.

Getting ready...

  1. Open the job jo_cook_ch09_weatherService and run it. You will see the output in the console, the last line of which will be web service [endpoint: http://localhost:8090/services/cookbookWeatherService] published.

  2. This means that the web service is now available to be called by our consumer job.

  3. Now open the job jo_cook_ch09_0060_consumeSOAP.

How to do it...

  1. Drag a tESBConsumer component to the canvas and open it.

  2. Change the WSDL to http://localhost:8090/services/cookbookWeatherService?wsdl.

  3. Tick the box for Populate schema to repository on finish. This will ensure the XML metadata schemas we need for the SOAP request and response are created in the repository for us to use later.

  4. Click on the button to refresh the details, as highlighted in the following screenshot, and you will...

Calling a RESTful web service


This recipe shows how a RESTful web service can be called from Talend. The REST service we will be using is a Google maps service, so you will need to be connected to the Internet to perform this recipe.

Getting ready

Open the job jo_cook_ch09_0070_consumeRestService.

How to do it...

  1. Drag a tRESTClient component onto the canvas.

  2. Set the URL to http://maps.googleapis.com the Relative Path to "maps/api/geocode/xml".

  3. Add two Query Parameters "address" with a value of "Trafalgar Square" and "sensor" with a value of "false".

  4. Your tRestClient should look like the one below:

  5. Link a tLogRow component to the response and error flows out of the tRestClient component.

  6. Run the job.

How it works...

The tRestClient is a component that enables us to define all of the features of a REST request. In this case, we define the URL of the Google APIs, and then the Geocoder API within the maps functions.

Finally, we define the address for which we wish to provide the information.

There's more...

Reading and writing to a queue


Talend ESB is supplied with the Apache ActiveMQ software for creating message queues and topics. This recipe shows how we can write to and read from an ActiveMQ queue.

Getting ready

First, we'll need to start ActiveMQ.

  1. Navigate to the folder <talend installation folder>\Runtime_ESBSE\activemq\bin and double-click on the file activemq.bat.

  2. This will open a command window. Do not close this command window while you are doing this recipe.

  3. You can access the ActiveMQ administration console by opening the URL localhost:8161/admin. This will allow you to view your queues and topics.

  4. Open the job jo_cook_ch09_0080_readWriteQueue.

How to do it...

The first thing to do is to write a message to a queue.

Writing to the queue

  1. Drag a tMomOutput component to the canvas.

  2. Create a flow between the tFileInputXML and the tMomOutput components.

  3. Open the component and set the MQ Server to ActiveMQ, the To field to customerData, and the MessageType to Queue. Your tMomOuptut should look...

Ensuring lossless queues using sessions


In any production system, it is imperative that the data isn't lost when being read/written to or from a data source/target. This recipe shows how this is achieved when reading and writing to queues using the tMom component.

Getting ready

Open the job jo_cook_ch09_0090_losslesQueues.

How to do it...

In a similar fashion to creating sessions with a database, we will first add ActiveMQ connection that will create the session.

  1. Open the tMomConnection component to the canvas and tick the box Use Transacted.

  2. Open the tMomOutput component and tick the Use existing connection box.

  3. Set the To field to losslessQueue, and the Message Type to Queue.

    Add the rollback and commit components

  4. Drag a tMomCommit component to the canvas and link this to the tFixedFlowInput using an OnSubjobOk trigger.

  5. Open the component and set the MQ Server to ActivMQ.

    Successful run

  6. Run the job and then check the queue in the web browser.

  7. You will see that the queue losslessQueue has been created...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Talend Open Studio Cookbook
Published in: Oct 2013Publisher: PacktISBN-13: 9781782167266
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Rick Barton

Rick Barton is a freelance consultant who has specialized in data integration and ETL for the last 13 years as part of an IT career spanning over 25 years. After gaining a degree in Computer Systems from Cardiff University, he began his career as a firmware programmer before moving into Mainframe data processing and then into ETL tools in 1999. He has provided technical consultancy to some of the UKs largest companies, including banks and telecommunications companies, and was a founding partner of a Big Data integration consultancy. Four years ago he moved back into freelance development and has been working almost exclusively with Talend Open Studio and Talend Integration Suite, on multiple projects, of various sizes, in UK. It is on these projects that he has learned many of the lessons that can be found in this, his first book.
Read more about Rick Barton