Reader small image

You're reading from  Talend Open Studio Cookbook

Product typeBook
Published inOct 2013
Reading LevelIntermediate
PublisherPackt
ISBN-139781782167266
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Rick Barton
Rick Barton
author image
Rick Barton

Rick Barton is a freelance consultant who has specialized in data integration and ETL for the last 13 years as part of an IT career spanning over 25 years. After gaining a degree in Computer Systems from Cardiff University, he began his career as a firmware programmer before moving into Mainframe data processing and then into ETL tools in 1999. He has provided technical consultancy to some of the UKs largest companies, including banks and telecommunications companies, and was a founding partner of a Big Data integration consultancy. Four years ago he moved back into freelance development and has been working almost exclusively with Talend Open Studio and Talend Integration Suite, on multiple projects, of various sizes, in UK. It is on these projects that he has learned many of the lessons that can be found in this, his first book.
Read more about Rick Barton

Right arrow

Chapter 10. Debugging, Logging, and Testing

This chapter contains exercises that illustrate the methods provided by Talend to locate and correct code, display logging information, and create test data. They are as follows:

  • Finding the location of compilation errors using the Problems tab

  • Locating execution errors from the console output

  • Using the Talend debug mode – row-by-row execution

  • Using the Java debugger to debug Talend jobs

  • Using tLogRow to show data in a row

  • Using tJavaRow to display row information

  • Using tJava to display status messages and variables

  • Printing out the context

  • Dumping console output to a file from within a job

  • Creating simple test data using tRowGenerator

  • Creating complex test data using tRowGenerator, tFlowToIterate, tMap, and sequences

  • Creating random test data using lookups

  • Creating test data using Excel

  • Testing logic – most used pattern

  • Killing a job from within tJavaRow

Introduction


When our code eventually runs as a production job, it is expected that it will be robust, reliable, and bug free. For this to happen, it will usually pass through various stages of testing, including the unit test stage performed by the developer.

This section shows some of the methods that can be used to ensure that developers can find and fix problems quickly during this testing phase.

Debugging

The ability to find and locate issues within code quickly and efficiently is the key to successful delivery of projects. Talend provides methods for debugging, and so does Eclipse.

Logging

Talend provides useful components for logging and capturing errors in tWarn, tDie, and tFlowMeter. It also provides mechanisms for logging information to the console, which can be a quick and valuable debugging tool, and is a vital part of the developers' armory. It is often quicker to send and view messages and information to the log output during development than it is to do the same to say a database...

Find the location of compilation errors using the Problems tab


When you begin working with Talend, you will inevitably hit compilation errors when you run a job. This recipe will show you how to easily identify the errors using Talend.

Getting ready

Open the jo_cook_ch10_0010_findCompilationErrors job.

How to do it...

The steps for finding the location of compilation errors using the Problems tab is as follows:

  1. Run the job, and Talend will notify you that the job has compilation errors. Click on Cancel to stop the job executing.

  2. Now click on the Problems tab, and you will see the errors, as shown in the following screenshot:

  3. If you click one each on you will see that the focus moves between the two tMap components. This means that there is an error in each of the tMap components.

  4. To locate the error exactly, click on the Code tab, highlighted in the previous diagram.

  5. You should now see the generated Java code and that there are two red markers on the right-hand side of the code.

  6. Click on the top marker...

Locating execution errors from the console output


This recipe shows that the often complex errors returned by Java can, in the main, be located fairly easily if you know how.

If you are already familiar with Java, this exercise is trivial; however, if you are not then Java errors can often seem very intimidating.

Getting ready

Open the jo_cook_ch10_0020_findExecutionError job.

How to do it...

The steps for locating execution errors from the console output are as follows:

  1. Run the job. It will fail with a null pointer error. Note the line number from the first line of the list of lines; 2636.

  2. Open the Code tab and press CTRL + L.

  3. Type 2636 as the line number, and you will be taken to the following line:

  4. This is the line that caused the job to fail. There is null data in the customer.age field.

How it works...

It is fairly obvious from the message that the error occurred in tMap_1, but it's not so obvious unless you know Java error messages. Unlike compilation errors, Talend does not list the error in...

Using the Talend debug mode – row-by-row execution


This recipe will show how we can find Talend data issues by watching the data as it flows between components using the Talend debug mode.

Getting ready

Open the jo_cook_ch10_0030_useDebugMode job.

How to do it...

The steps for using the Talend debug mode are as follows:

  1. Open the run tab, and select the Debug Run option on the left-hand side as shown in the following screenshot:

  2. Click on Traces Debug and the job will execute, and you can watch the data in the rows as they progress along the main flow of the sub-job until the error is hit, and the job fails.

How it works...

Being able to view the data progressing through the job in real time allows us to see that the third row failed. Because the reported error is a null pointer exception and the only field in the row that has a null pointer is the age, we can confirm if the input age value is incorrect.

There's more…

You will notice that the execution of a job is slowed down considerably by using this...

Using the Java debugger to debug Talend jobs


Occasionally, it is necessary to delve deeper into the Java code generated by Talend in order to locate and understand the cause of a bug. This recipe is a very light introduction for debugging Talend code using the Java debugging mode in Talend.

Getting ready

Open the jo_cook_ch10_0040_useJavaDebugger job.

How to do it...

The steps for using the Java debugger to debug Talend jobs are as follows:

  1. Select the Debug Run option from the Run dialogue and click on the down arrow for the run type. Select Java Debug to run using the Java option.

  2. Confirm the perspective switch by clicking Yes.

  3. Click the resume icon to start the job running.

  4. The job will execute and return an error. Scroll through the console output (bottom panel), and you will see the error, as shown in the following screenshot:

  5. Click the hyperlink for line 2574. This will take you to the line that is causing an error.

    Adding a breakpoint to allow inspection of data:

  6. Right-click on the line number...

Using tLogRow to show data in a row


This recipe demonstrates some simple but interesting features of tLogRow, one of the simplest components in Talend.

Getting ready

Open the jo_cook_ch10_0050_tLogRow job and run it. This is the default format for tLogRow.

How to do it...

The steps for using tLogRow display data in a row are as follows:

  1. Open the tLogRow component and change the Field Seperator to, and execute. This will give you a CSV output.

  2. Click on the option Use fixed lengths for values, and set all the Length columns to 30. You will see a formatted output. If not, then you will need to copy the console output to a text editor. Note that you will need to use CTRL + A followed by CTRL + C to copy, because right-click does not work in the console.

  3. Now, change the Length columns to -30. Notice that, the information is now left-justified rather than right-justified.

  4. Close tLogRow and change the component name from tLogRow_1 to customers.

  5. Open the tLogRow component and change the type to Vertical...

Using tJavaRow to display row information


Although tLogRow is flexible and very useful, it does have some limitations, in that it only prints what is defined in a schema. tJavaRow doesn't have the same limitations. This recipe will show you how it can be utilized.

Getting ready

Open the jo_cook_ch10_0060_tJavaRow job.

How to do it...

The steps for using tJavaRow to display row information are as follows:

  1. Run the job. You will see data in the console output sorted by customer key.

  2. Remove the tLogRow component, and add a tJavaRow component in its place.

  3. Open the tJavaRow component and add the following code:

    //Test for change of key and print heading lines if key has changed
    if (Numeric.sequence(input_row.name, 1, 1) == 1){
        System.out.println("\n\n******************** Records for customer name: "+input_row.name+" ***********************");
        System.out.printf("%-20s %-20s %-30s %-3s \n","name","DOB","timestamp","age");
    }    
    
    // print formatted output fields
    System.out.printf("%-20s %-20s %...

Using tJava to display status messages and variables


tJava is a very useful component for logging purposes, because it can be used in its own sub job. This enables tJava to be used to print job status information at given points in the process. The following recipe demonstrates this.

Getting ready

Open the jo_cook_ch10_0070_loggingWithtJava job.

How to do it...

The steps for using tJava to display status messages and variables are as follows:

  1. Open tJava_1 and add the following code:

    System.out.println("\n\nSearching directory "+context.cookbookData+"chapter10 for files matching wildcard *jo*\n\n");
  2. Open tJava_2 and add the following code:

    System.out.println("Processing file: "+((String)globalMap.get("tFileList_1_CURRENT_FILE")));
  3. Open tJava_3 and add the following code:

    System.out.println("\n\nCompleted......"+((Integer)globalMap.get("tFileList_1_NB_FILE"))+" files found\n\n");    

How it works...

tJava_1 and tJava_3 simply print out process status information (starting process and process end). tJava_2...

Printing out the context


This recipe here is for completeness rather than because it is in any way complex.

Getting ready

Open the jo_cook_ch10_0080_tContextDump job.

How to do it...

The steps for printing out the context are as follows:

  1. Open the Context tab, and you will see a set of context variables.

  2. Drag a tContextDump component from the palette.

  3. Attach a tLogRow component.

  4. Run the job.

How it works...

tContextDump simply dumps all the context variables defined within the job into a flow that can then be logged via tLogRow.

There's more…

This component is most useful when running code that has been deployed to a server, because the log information is usually stored in a file. This allows us to check the values of the context variables at the time of execution that would otherwise be hidden from us. This is invaluable for debugging a deployed process that has failed.

Tip

Often, contexts contain sensitive information, such as user names and passwords to system resources. If you do not want these to...

Dumping the console output to a file from within a job


This recipe shows how you can dump all logging data to a file, while still running the job in the Studio. It is particularly useful when debugging large data sets.

Getting ready

Open the jo_cook_ch10_0090_consoleToFile job.

How to do it...

The steps for dumping console output to a file from within a job are as follows:

  1. Run the job and view the console output.

  2. Add the following code to tJava_1:

    // redirect the console output to a file from within studio
    System.setOut(new java.io.PrintStream(new java.io.BufferedOutputStream(new java.io.FileOutputStream(context.cookbookData+"outputData/chapter10/chapter10_jo_0090_consoleOut.txt"))));
  3. Run the job. You will see only the job's start and end messages.

  4. Open the file in the cookbook data directory under output/chapter10 named chapter10_jo_0090_consoleOut.txt. You will see that the logging information has been copied to the file, as shown in the following screenshot:

How it works...

When the java statement...

Creating simple test data using tRowGenerator


This recipe shows how tRowGenerator allows dummy data to be created for test purposes.

Getting ready

  1. Open the jo_cook_ch10_0100_tRowGenerator job.

  2. Open the tRowGenerator component.

How to do it...

The steps for creating simple test data using tRowGenerator are as follows:

  1. Click on the Functions cell for customerId and select Numeric.sequence.

  2. Click on the Functions cell for firstName and select TalendDataGenerator.getFirstName.

  3. Click on the Functions cell for lastName and select TalendDataGenerator.getLastName.

  4. Click on the Functions cell for DOB and select TalendDate.getRandomDate. Your tRowGenerator should be as shown in the following screenshot:

  5. Exit the tRowGenerator component, and run the job.

How it works...

Talend provides a set of random generators for different field types to enable test data to be created very easily. So, as you can see, we are using a sequence to create sequential customer key, random first names and last names, and a random date...

Creating complex test data using tRowGenerator, tFlowToIterate, tMap, and sequences


This recipe shows how a more complex set of test data can be created. In this example, we will build a set of CSV data ready to be loaded into a database which has the following structure:

  • Customer has 1 or more orders

  • Order has 1 or more order items

Getting ready

  1. Open the jo_cook_ch10_0110_complexTestData job.

  2. You will see a section of code that has been deactivated. Do not activate this code until later in the exercise.

  3. Run the job, and you will see that the customer file is created.

How to do it...

The steps for creating complex test data using tRowGenerator, tFlowTolterate, tMap, and sequences are as follows:

  1. Activate components tFixedFlowInput_2, tMap_2, and tFileOutputDelimited_2. These are exact copies of the customer create components.

  2. Change these newly activated components detailed as follows:

    1. Open tFixedFlowInput_2 and change Number of rows to Numeric.random(1,5).

    2. Open tMap_2. Change the name of the variable...

Creating random test data using lookups


This simple technique shows how we can randomly assign values using lookups.

Getting ready

Open the jo_cook_ch10_0120_randomTestDataLookups job.

How to do it...

The steps for creating random test data using lookups are as follows:

  1. Open tMap.

  2. Open the tMap settings for the productData input flow.

  3. Change the Match Model to First Match.

  4. For the Expr. key for productData, add the code:

    Numeric.random(1,15)
  5. Drag all columns from both inputs to the output.

  6. Your tMap should now look like this:

  7. Exit tMap and run the job.

How it works...

As you will see from the output, the job will add a random product ID and product description to each order item row.

The match model of First Match ensures that only one match is returned for each order item line.

The Numeric.random(1,15) function returns a value from 1 through to 15, which is the number of products in the products list CSV file.

Thus the process will generate a random number for each order line and then use this random number...

Creating test data using Excel


Another useful method of creating test data is to define the data in MS Excel, and then create a job to convert the Excel worksheets into the format required by the application, such as a CSV file or database table.

Getting ready

Open the Excel workbook chapter10_jo_0130_ExcelTestData.xlsx that can be found in the data directory. You will see two worksheets: customer and item.

How to do it...

The steps for creating test data using Excel are as follows:

  1. Highlight the first two rows in the customer table and drag them down to create two more customers.

  2. Copy the first 4 lines from the order workbook and change the customers to be 3 for the first two new rows and 4 for the final two. Ensure that order ids are contiguous.

  3. Open the jo_cook_ch10_0130_excelTestDataLoad job. You will see that the customer Excel file is being copied to an equivalent XML file.

  4. Drag the order Excel object from the repository location, shown as follows:

  5. Drag a tXMLOutput component and link it to...

Testing logic – the most-used pattern


This is probably the most-used job design in Talend programming, and is used to ensure that a snippet of new code is not influenced by external factors within a large and complex job. This simple recipe shows how this can easily be achieved.

Getting ready

Open the jo_cook_ch10_0140_logicTest job.

How to do it...

The steps for testing logic are as follows:

  1. In tFixedFlowInput, tick the box labeled Use Inline Table.

  2. Add the values, as shown in the following screenshot:

  3. In the tMap, add a new field to the output named ageCheckValid, and populate it with the following code:

    customer.age >= 21 && customer.country.equals("UK") ? true : customer.age >= 18 && !customer.country.equals("UK") ? true : false
  4. Run the job to see the results of the test.

How it works...

In this example, we are testing an age limit; 21 or over is valid for the UK, 18 or over valid for the rest of the world.

In tFixedFlowInput, we defined a set of test values that would prove...

Killing a job from within tJavaRow


Most jobs at some point require validation and will often need to be stopped if the data is found to be in error. In most cases, you can use tDie, however, if your error is found in a tJavaRow or tJava, then using tDie becomes quite convoluted. This exercise shows how the same results can be achieved using simple Java functionality.

Getting ready

Open the jo_cook_ch10_0150_killingJobtJavaRow job.

How to do it...

The steps for killing a job from within tJavaRow are as follows:

  1. Run the job. You will see that it fails with a null pointer exception.

  2. Change the line output_row.age = input_row.age; to the following code:

    if (input.age == null) {
      System.out.println("Fatal Error: age is null");
      System.exit(99);
    } else {
      output_row.age = input_row.age;
    }
  3. Run the job again. You will see that the job has been killed in a much more elegant fashion, as shown in the following screenshot:

How it works...

System.exit is a Java kill command and as such will cause an immediate...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Talend Open Studio Cookbook
Published in: Oct 2013Publisher: PacktISBN-13: 9781782167266
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Rick Barton

Rick Barton is a freelance consultant who has specialized in data integration and ETL for the last 13 years as part of an IT career spanning over 25 years. After gaining a degree in Computer Systems from Cardiff University, he began his career as a firmware programmer before moving into Mainframe data processing and then into ETL tools in 1999. He has provided technical consultancy to some of the UKs largest companies, including banks and telecommunications companies, and was a founding partner of a Big Data integration consultancy. Four years ago he moved back into freelance development and has been working almost exclusively with Talend Open Studio and Talend Integration Suite, on multiple projects, of various sizes, in UK. It is on these projects that he has learned many of the lessons that can be found in this, his first book.
Read more about Rick Barton