Talend Open Studio Cookbook

Getting familiar with Talend Open Studio will greatly enhance your data handling and integration capabilities. This is the perfect reference book for beginners and intermediates with a host of practical recipes that clarify even complex features.

Talend Open Studio Cookbook

Cookbook
Rick Barton

Getting familiar with Talend Open Studio will greatly enhance your data handling and integration capabilities. This is the perfect reference book for beginners and intermediates with a host of practical recipes that clarify even complex features.
$26.99
$44.99
RRP $26.99
RRP $44.99
eBook
Print + eBook
$12.99 p/month

Want this title & more? Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.
+ Collection
Free sample

Book Details

ISBN 139781782167266
Paperback270 pages

About This Book

  • A collection of exercises covering all development aspects including schemas, mapping using tMap, database and working with files
  • Get your code ready for the production environment by including the use of contexts and scheduling of jobs in Talend
  • Includes exercises for debugging and testing of code
  • Many additional hints and tips regarding the exercises and their real-life applications

Who This Book Is For

Talend Open Studio Cookbook is principally aimed at relative beginners and intermediate Talend Developers who have used the product to perform some simple integration tasks, possibly via a training course or beginner's tutorials.

Table of Contents

Chapter 1: Introduction and General Principles
Before you begin
Installing the software
Enabling tHashInput and tHashOutput
Chapter 2: Metadata and Schemas
Introduction
Hand-cranking a built-in schema
Propagating schema changes
Creating a generic schema from the existing metadata
Cutting and pasting schema information
Dropping schemas to empty components
Creating schemas from lists
Chapter 3: Validating Data
Introduction
Enabling and disabling reject flows
Gathering all rejects prior to killing a job
Validating against the schema
Rejecting rows using tMap
Checking a column against a list of allowed values
Checking a column against a lookup
Creating validation rules for more complex requirements
Creating binary error codes to store multiple test results
Chapter 4: Mapping Data
Introduction
Simple mapping and tMap time savers
Creating tMap expressions
Using the ternary operator for conditional logic
Using intermediate variables in tMap
Filtering input rows
Splitting an input row into multiple outputs based on input conditions
Joining data using tMap
Hierarchical joins using tMap
Using reload at each row to process real-time / near real-time data
Chapter 5: Using Java in Talend
Introduction
Performing one-off pieces of logic using tJava
Setting the context and globalMap variables using tJava
Adding complex logic into a flow using tJavaRow
Creating pseudo components using tJavaFlex
Creating custom functions using code routines
Importing JAR files to allow use of external Java classes
Chapter 6: Managing Context Variables
Introduction
Creating a context group
Adding a context group to your job
Adding contexts to a context group
Using tContextLoad to load contexts
Using implicit context loading to load contexts
Turning implicit context loading on and off in a job
Setting the context file location in the operating system
Chapter 7: Working with Databases
Introduction
Setting up a database connection
Importing the table schemas
Reading from database tables
Using context and globalMap variables in SQL queries
Printing your input query
Writing to a database table
Printing your output query
Managing database sessions
Passing a session to a child job
Selecting different fields and keys for insert, update, and delete
Capturing individual rejects and errors
Database and table management
Managing surrogate keys for parent and child tables
Rewritable lookups using an in-process database
Chapter 8: Managing Files
Introduction
Appending records to a file
Reading rows using a regular expression
Using temporary files
Storing intermediate data in the memory using tHashMap
Reading headers and trailers using tMap
Reading headers and trailers with no identifiers
Using the information in the header and trailer
Adding a header and trailer to a file
Moving, copying, renaming, and deleting files and folders
Capturing file information
Processing multiple files at once
Processing control/validation files
Creating and writing files depending on the input data
Chapter 9: Working with XML, Queues, and Web Services
Introduction
Using tXMLMap to read XML
Using tXMLMap to create an XML document
Reading complex hierarchical XML
Writing complex XML
Calling a SOAP web service
Calling a RESTful web service
Reading and writing to a queue
Ensuring lossless queues using sessions
Chapter 10: Debugging, Logging, and Testing
Introduction
Find the location of compilation errors using the Problems tab
Locating execution errors from the console output
Using the Talend debug mode – row-by-row execution
Using the Java debugger to debug Talend jobs
Using tLogRow to show data in a row
Using tJavaRow to display row information
Using tJava to display status messages and variables
Printing out the context
Dumping the console output to a file from within a job
Creating simple test data using tRowGenerator
Creating complex test data using tRowGenerator, tFlowToIterate, tMap, and sequences
Creating random test data using lookups
Creating test data using Excel
Testing logic – the most-used pattern
Killing a job from within tJavaRow
Chapter 11: Deploying and Scheduling Talend Code
Introduction
Creating compiled executables
Using a different context
Adding command-line context parameters
Managing job dependencies
Capturing and acting on different return codes
Returning codes from a child job without tDie
Passing parameters to a child job
Executing non-Talend objects and operating system commands
Chapter 12: Common Mistakes and Other Useful Hints and Tips
Introduction
My tab is missing
Finding the code routine
Finding a new context variable
Reloads going missing at each row global variable
Dragging component globalMap variables
Some complex date formats
Capturing tMap rejects
Adding job name, project name, and other job specific information
Printing tMap variables
Stopping memory errors in Talend

What You Will Learn

  • Manipulate schemas quickly and easily
  • Validate your data and create test data
  • Use Java code within Talend
  • Debug your Talend code
  • Use tMap effectively
  • Create and manage files including complex file formats
  • Access queues, web services, and XML within Talend
  • Deploy, as well as schedule your Talend code

In Detail

Data integration is a key component of an organization’s technical strategy, yet historically the tools have been very expensive. Talend Open Studio is the world’s leading open source data integration product and has played a huge part in making open source data integration a popular choice for businesses worldwide.

This book is a welcome addition to the small but growing library of Talend Open Studio resources. From working with schemas to creating and validating test data, to scheduling your Talend code, you will get acquainted with the various Talend database handling techniques. Each recipe is designed to provide the key learning point in a short, simple and effective manner.

This comprehensive guide provides practical exercises that cover all areas of the Talend development lifecycle including development, testing, debugging and deployment. The book delivers design patterns, hints, tips, and advice in a series of short and focused exercises that can be approached as a reference for more seasoned developers or as a series of useful learning tutorials for the beginner.

The book covers the basics in terms of schema usage and mappings, along with dedicated sections that will allow you to get more from tMap, files, databases and XML.

Geared towards the whole lifecycle, the Talend Open Studio Cookbook shows readers great ways to handle everyday tasks, and provides an insight into all areas of a development cycle including coding, testing, and debugging of code to provide start-to-finish coverage of the product.

Authors

Table of Contents

Chapter 1: Introduction and General Principles
Before you begin
Installing the software
Enabling tHashInput and tHashOutput
Chapter 2: Metadata and Schemas
Introduction
Hand-cranking a built-in schema
Propagating schema changes
Creating a generic schema from the existing metadata
Cutting and pasting schema information
Dropping schemas to empty components
Creating schemas from lists
Chapter 3: Validating Data
Introduction
Enabling and disabling reject flows
Gathering all rejects prior to killing a job
Validating against the schema
Rejecting rows using tMap
Checking a column against a list of allowed values
Checking a column against a lookup
Creating validation rules for more complex requirements
Creating binary error codes to store multiple test results
Chapter 4: Mapping Data
Introduction
Simple mapping and tMap time savers
Creating tMap expressions
Using the ternary operator for conditional logic
Using intermediate variables in tMap
Filtering input rows
Splitting an input row into multiple outputs based on input conditions
Joining data using tMap
Hierarchical joins using tMap
Using reload at each row to process real-time / near real-time data
Chapter 5: Using Java in Talend
Introduction
Performing one-off pieces of logic using tJava
Setting the context and globalMap variables using tJava
Adding complex logic into a flow using tJavaRow
Creating pseudo components using tJavaFlex
Creating custom functions using code routines
Importing JAR files to allow use of external Java classes
Chapter 6: Managing Context Variables
Introduction
Creating a context group
Adding a context group to your job
Adding contexts to a context group
Using tContextLoad to load contexts
Using implicit context loading to load contexts
Turning implicit context loading on and off in a job
Setting the context file location in the operating system
Chapter 7: Working with Databases
Introduction
Setting up a database connection
Importing the table schemas
Reading from database tables
Using context and globalMap variables in SQL queries
Printing your input query
Writing to a database table
Printing your output query
Managing database sessions
Passing a session to a child job
Selecting different fields and keys for insert, update, and delete
Capturing individual rejects and errors
Database and table management
Managing surrogate keys for parent and child tables
Rewritable lookups using an in-process database
Chapter 8: Managing Files
Introduction
Appending records to a file
Reading rows using a regular expression
Using temporary files
Storing intermediate data in the memory using tHashMap
Reading headers and trailers using tMap
Reading headers and trailers with no identifiers
Using the information in the header and trailer
Adding a header and trailer to a file
Moving, copying, renaming, and deleting files and folders
Capturing file information
Processing multiple files at once
Processing control/validation files
Creating and writing files depending on the input data
Chapter 9: Working with XML, Queues, and Web Services
Introduction
Using tXMLMap to read XML
Using tXMLMap to create an XML document
Reading complex hierarchical XML
Writing complex XML
Calling a SOAP web service
Calling a RESTful web service
Reading and writing to a queue
Ensuring lossless queues using sessions
Chapter 10: Debugging, Logging, and Testing
Introduction
Find the location of compilation errors using the Problems tab
Locating execution errors from the console output
Using the Talend debug mode – row-by-row execution
Using the Java debugger to debug Talend jobs
Using tLogRow to show data in a row
Using tJavaRow to display row information
Using tJava to display status messages and variables
Printing out the context
Dumping the console output to a file from within a job
Creating simple test data using tRowGenerator
Creating complex test data using tRowGenerator, tFlowToIterate, tMap, and sequences
Creating random test data using lookups
Creating test data using Excel
Testing logic – the most-used pattern
Killing a job from within tJavaRow
Chapter 11: Deploying and Scheduling Talend Code
Introduction
Creating compiled executables
Using a different context
Adding command-line context parameters
Managing job dependencies
Capturing and acting on different return codes
Returning codes from a child job without tDie
Passing parameters to a child job
Executing non-Talend objects and operating system commands
Chapter 12: Common Mistakes and Other Useful Hints and Tips
Introduction
My tab is missing
Finding the code routine
Finding a new context variable
Reloads going missing at each row global variable
Dragging component globalMap variables
Some complex date formats
Capturing tMap rejects
Adding job name, project name, and other job specific information
Printing tMap variables
Stopping memory errors in Talend

Book Details

ISBN 139781782167266
Paperback270 pages
Read More