Reader small image

You're reading from  Talend Open Studio Cookbook

Product typeBook
Published inOct 2013
Reading LevelIntermediate
PublisherPackt
ISBN-139781782167266
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Rick Barton
Rick Barton
author image
Rick Barton

Rick Barton is a freelance consultant who has specialized in data integration and ETL for the last 13 years as part of an IT career spanning over 25 years. After gaining a degree in Computer Systems from Cardiff University, he began his career as a firmware programmer before moving into Mainframe data processing and then into ETL tools in 1999. He has provided technical consultancy to some of the UKs largest companies, including banks and telecommunications companies, and was a founding partner of a Big Data integration consultancy. Four years ago he moved back into freelance development and has been working almost exclusively with Talend Open Studio and Talend Integration Suite, on multiple projects, of various sizes, in UK. It is on these projects that he has learned many of the lessons that can be found in this, his first book.
Read more about Rick Barton

Right arrow

Chapter 5. Using Java in Talend

Java is a hugely popular and incredibly rich programming language. Talend is a Java code generator which makes use of many open source Java libraries, so this means that Talend functionality can easily be extended by integrating Java code into Talend jobs.

This chapter contains recipes that show some of the techniques for making use of Java within Talend jobs.

  • Performing one-off pieces of logic using tJava

  • Setting the context and globalMap variables using tJava

  • Adding complex logic into a flow using tJavaRow

  • Creating pseudo components using tJavaFlex

  • Creating custom functions using code routines

  • Importing JAR files to allow use of external Java classes

Introduction


For many data integration requirements, the standard Talend components provides the means to process the data from start to end without needing to use Java code apart from in tMap.

For more complex requirements, it is often necessary to add additional Java logic to a job, and in other cases it may be that adding custom Java code will provide a simpler or more elegant or more efficient code than using the standard components.

Performing one-off pieces of logic using tJava


The tJava component allows one-off logic to be added to a job. Common uses of tJava include setting global or context variables prior to the main data processing stages and printing logging messages.

Getting ready

Open the job jo_cook_ch05_0000_tJava.

How to do it…

  1. Open the tJava component.

  2. Type in the following code:

    System.out.println("Executing job "+jobName+" at "+TalendDate.getDate("CCYY-MM-dd HH:mm:ss"));
  3. Run the job. You will see that message is printed showing the job name and the date and time of execution.

How it works…

If you examine the code, you will see that the Java code is simply added to the generated code as is. This is why you must remember to add ; to the end of the line to avoid compilation errors.

See also

  • Setting context variables and globalMap variables using tJava, in this chapter.

Setting the context and globalMap variables using tJava


Although this recipe is centered on the use of tJava, it also acts as a convenient means of illustrating how the context and globalMap variables can be directly referenced from within the majority of Talend components.

Getting ready

Open jo_cook_ch05_0010_tJavaContextGlobalMap, then open the context panel, and you should see a variable named testValue.

How to do it…

  1. Open tMap_1 and type in the following code:

    System.out.println("tJava_1");
    context.testValue ="testValue is now initialized";
    globalMap.put("gmTestValue", "gmTestValue is now initialized");
  2. Open tMap_2 and type in the following code:

    System.out.println("tJava_2");
    System.out.println("context.testValue is: "+context.testValue);
    System.out.println("gmTestValue is: "+(String) globalMap.get("gmTestValue"));
  3. Run the job. You will see that the variables initialized in the first tJava are printed correctly in the second.

How it works…

The context and globalMap variables are stored as globally...

Adding complex logic into a flow using tJavaRow


The tJavaRow component allows Java logic to be performed for every record within a flow.

Getting ready

Open the job jo_cook_ch05_0020_tJavaRow.

How to do it…

  1. Add the tJavaRow and tLogRow components.

  2. Link the flows as shown in the following screenshot:

  3. Open the schema and you will see that there are no fields in the output. Highlight name, dateOfBirth, and age, and click on the single arrow.

  4. Use the + button to add new columns cleansedName (String) and rowCount (Integer), so that the schema looks like the following:

  5. Close the schema by pressing ok and then press the Generate code button in the main tJavaRow screen. The generated code will be as follows:

    //Code generated according to input schema and output schema
    output_row.name = input_row.name;
    output_row.dateOfBirth = input_row.dateOfBirth;
    output_row.age = input_row.timestamp;
    output_row.cleanedName = input_row.age;
    output_row.rowCount = input_row.age; 
  6. Change the row output_row.age = input_row.timestamp...

Creating pseudo components using tJavaFlex


The tJavaFlex component is similar to the tJavaRow component, in that it is included into a flow. The difference between the two components is that the tJavaFlex component has pre and post processes that are performed before and after the individual rows are processed, so it is similar to a pre-built Talend component.

Getting ready

Open the job jo_cook_ch05_0030_tJavaFlex.

How to do it…

  1. Open the tJavaFlex component.

  2. In the Start Code section, enter the following:

    String allNames = "";
    Integer NB_LINE = 0;
  3. In the Main Code section enter the following:

    allNames = allNames + row1.name + "|";
    NB_LINE += 1;
  4. In the End Code section, enter the following:

    globalMap.put("allNames", allNames);
    globalMap.put("tJavaFlex_1_NB_LINE", NB_LINE);
  5. Open tJava and enter the following:

    System.out.println("All names concatenated: "+(String) globalMap.get("allNames"));
    System.out.println("Count of rows: "+(Integer) globalMap.get("tJavaFlex_1_NB_LINE"));
  6. Run the job. You will see that...

Creating custom functions using code routines


Code routines enable the developer to create re-usable Java classes that can be integrated into Talend jobs, and in particular within tMap.

In the validation chapter, there is an example of a simple code routine. This recipe is a fuller explanation of creating and using code routines within Talend.

Getting ready

Open the job jo_cook_ch05_0040_codeRoutine.

How to do it…

  1. In the metadata section, open the Code folder and right-click on Routines. Select Create routine.

  2. Name the routine regexUtilities and click on Finish. This will open a Java package and create a new class called regexUtilities, and a test method called helloExample.

  3. Copy the following code immediately after the end of the helloExample method.

        /**
         * regexData: return the first instance of regex pattern in a string.  
         * Returns null if there is no text matching the pattern.
         * e.g. regexData(".*r", "world") # returns "wor"
         * 
         * {talendTypes} String
         * 
       ...

Importing JAR files to allow use of external Java classes


Occasionally, during development it is necessary (or simpler) to make use of Java classes that aren't already included within Talend. These may be pre-existing Java code such as financial calculations or open source libraries, which are provided by The Apache Software Foundation (www.apache.org).

In this example, we will make use of a simple Java class ExternalValidations and its ExternalValidateCustomerName method. This class performs the following simple validation:

if (customerName.startsWith("J ")) {
  return customerName.replace("J ", "James ");
} else {
  if (customerName.startsWith("Jo ")) {
    return customerName.replace("Jo ", "Joanne ");
    } else {
    return customerName;
  }
}

Getting ready

Open job jo_cook_ch05_0050_externalClasses.

How to do it…

  1. Create a code routine called externalValidation.

  2. Right-click and select the option Edit routine Libraries.

  3. In the next dialogue, click on New.

  4. Select the option Browse a library file...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Talend Open Studio Cookbook
Published in: Oct 2013Publisher: PacktISBN-13: 9781782167266
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Rick Barton

Rick Barton is a freelance consultant who has specialized in data integration and ETL for the last 13 years as part of an IT career spanning over 25 years. After gaining a degree in Computer Systems from Cardiff University, he began his career as a firmware programmer before moving into Mainframe data processing and then into ETL tools in 1999. He has provided technical consultancy to some of the UKs largest companies, including banks and telecommunications companies, and was a founding partner of a Big Data integration consultancy. Four years ago he moved back into freelance development and has been working almost exclusively with Talend Open Studio and Talend Integration Suite, on multiple projects, of various sizes, in UK. It is on these projects that he has learned many of the lessons that can be found in this, his first book.
Read more about Rick Barton