Talend Open Studio Cookbook


Talend Open Studio Cookbook
eBook: $26.99
Formats: PDF, PacktLib, ePub and Mobi formats
$22.94
save 15%!
Print + free eBook + free PacktLib access to the book: $71.98    Print cover: $44.99
$44.99
save 37%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Overview
Table of Contents
Author
Reviews
Support
Sample Chapters
  • A collection of exercises covering all development aspects including schemas, mapping using tMap, database and working with files
  • Get your code ready for the production environment by including the use of contexts and scheduling of jobs in Talend
  • Includes exercises for debugging and testing of code
  • Many additional hints and tips regarding the exercises and their real-life applications

Book Details

Language : English
Paperback : 270 pages [ 235mm x 191mm ]
Release Date : October 2013
ISBN : 1782167269
ISBN 13 : 9781782167266
Author(s) : Rick Barton
Topics and Technologies : All Books, Big Data and Business Intelligence, Open Source

Table of Contents

Preface
Chapter 1: Introduction and General Principles
Chapter 2: Metadata and Schemas
Chapter 3: Validating Data
Chapter 4: Mapping Data
Chapter 5: Using Java in Talend
Chapter 6: Managing Context Variables
Chapter 7: Working with Databases
Chapter 8: Managing Files
Chapter 9: Working with XML, Queues, and Web Services
Chapter 10: Debugging, Logging, and Testing
Chapter 11: Deploying and Scheduling Talend Code
Chapter 12: Common Mistakes and Other Useful Hints and Tips
Appendix A: Common Type Conversions
Appendix B: Management of Contexts
Index
    • Chapter 2: Metadata and Schemas
      • Introduction
      • Hand-cranking a built-in schema
      • Propagating schema changes
      • Creating a generic schema from the existing metadata
      • Cutting and pasting schema information
      • Dropping schemas to empty components
      • Creating schemas from lists
      • Chapter 3: Validating Data
        • Introduction
        • Enabling and disabling reject flows
        • Gathering all rejects prior to killing a job
        • Validating against the schema
        • Rejecting rows using tMap
        • Checking a column against a list of allowed values
        • Checking a column against a lookup
        • Creating validation rules for more complex requirements
        • Creating binary error codes to store multiple test results
        • Chapter 4: Mapping Data
          • Introduction
          • Simple mapping and tMap time savers
          • Creating tMap expressions
          • Using the ternary operator for conditional logic
          • Using intermediate variables in tMap
          • Filtering input rows
          • Splitting an input row into multiple outputs based on input conditions
          • Joining data using tMap
          • Hierarchical joins using tMap
          • Using reload at each row to process real-time / near real-time data
          • Chapter 5: Using Java in Talend
            • Introduction
            • Performing one-off pieces of logic using tJava
            • Setting the context and globalMap variables using tJava
            • Adding complex logic into a flow using tJavaRow
            • Creating pseudo components using tJavaFlex
            • Creating custom functions using code routines
            • Importing JAR files to allow use of external Java classes
            • Chapter 6: Managing Context Variables
              • Introduction
              • Creating a context group
              • Adding a context group to your job
              • Adding contexts to a context group
              • Using tContextLoad to load contexts
              • Using implicit context loading to load contexts
              • Turning implicit context loading on and off in a job
              • Setting the context file location in the operating system
              • Chapter 7: Working with Databases
                • Introduction
                • Setting up a database connection
                • Importing the table schemas
                • Reading from database tables
                • Using context and globalMap variables in SQL queries
                • Printing your input query
                • Writing to a database table
                • Printing your output query
                • Managing database sessions
                • Passing a session to a child job
                • Selecting different fields and keys for insert, update, and delete
                • Capturing individual rejects and errors
                • Database and table management
                • Managing surrogate keys for parent and child tables
                • Rewritable lookups using an in-process database
                • Chapter 8: Managing Files
                  • Introduction
                  • Appending records to a file
                  • Reading rows using a regular expression
                  • Using temporary files
                  • Storing intermediate data in the memory using tHashMap
                  • Reading headers and trailers using tMap
                  • Reading headers and trailers with no identifiers
                  • Using the information in the header and trailer
                  • Adding a header and trailer to a file
                  • Moving, copying, renaming, and deleting files and folders
                  • Capturing file information
                  • Processing multiple files at once
                  • Processing control/validation files
                  • Creating and writing files depending on the input data
                  • Chapter 9: Working with XML, Queues, and Web Services
                    • Introduction
                    • Using tXMLMap to read XML
                    • Using tXMLMap to create an XML document
                    • Reading complex hierarchical XML
                    • Writing complex XML
                    • Calling a SOAP web service
                    • Calling a RESTful web service
                    • Reading and writing to a queue
                    • Ensuring lossless queues using sessions
                    • Chapter 10: Debugging, Logging, and Testing
                      • Introduction
                      • Find the location of compilation errors using the Problems tab
                      • Locating execution errors from the console output
                      • Using the Talend debug mode – row-by-row execution
                      • Using the Java debugger to debug Talend jobs
                      • Using tLogRow to show data in a row
                      • Using tJavaRow to display row information
                      • Using tJava to display status messages and variables
                      • Printing out the context
                      • Dumping the console output to a file from within a job
                      • Creating simple test data using tRowGenerator
                      • Creating complex test data using tRowGenerator, tFlowToIterate, tMap, and sequences
                      • Creating random test data using lookups
                      • Creating test data using Excel
                      • Testing logic – the most-used pattern
                      • Killing a job from within tJavaRow
                      • Chapter 11: Deploying and Scheduling Talend Code
                        • Introduction
                        • Creating compiled executables
                        • Using a different context
                        • Adding command-line context parameters
                        • Managing job dependencies
                        • Capturing and acting on different return codes
                        • Returning codes from a child job without tDie
                        • Passing parameters to a child job
                        • Executing non-Talend objects and operating system commands
                        • Chapter 12: Common Mistakes and Other Useful Hints and Tips
                          • Introduction
                          • My tab is missing
                          • Finding the code routine
                          • Finding a new context variable
                          • Reloads going missing at each row global variable
                          • Dragging component globalMap variables
                          • Some complex date formats
                          • Capturing tMap rejects
                          • Adding job name, project name, and other job specific information
                          • Printing tMap variables
                          • Stopping memory errors in Talend
                          • Appendix B: Management of Contexts
                            • Introduction
                            • Manipulating contexts in Talend Open Studio
                            • Understanding implicit context loading
                            • Understanding tContextLoad
                            • Manually checking and setting contexts

                            Rick Barton

                            Rick Barton is a freelance consultant who has specialized in data integration and ETL for the last 13 years as part of an IT career spanning over 25 years. After gaining a degree in Computer Systems from Cardiff University, he began his career as a firmware programmer before moving into Mainframe data processing and then into ETL tools in 1999. He has provided technical consultancy to some of the UK’s largest companies, including banks and telecommunications companies, and was a founding partner of a “Big Data” integration consultancy. Four years ago he moved back into freelance development and has been working almost exclusively with Talend Open Studio and Talend Integration Suite, on multiple projects, of various sizes, in UK. It is on these projects that he has learned many of the lessons that can be found in this, his first book.

                            Code Downloads

                            Download the code and support files for this book.


                            Submit Errata

                            Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.


                            Errata

                            - 12 submitted: last submission 26 May 2014

                            Errata type: Typo | Page number: 73

                            In the How to do it section, point 1: Open tMap_1 and type in the following code:

                            should be

                            Open tJava_1 and type in the following code:

                            Errata type: Typo | Page number: 73

                            In the How to do it section, point 2: Open tMap_2 and type in the following code:

                            should be

                            Open tJava_2 and type in the following code:

                            Errata type: Technical | Page number: 140

                            Point No. 3

                            Connect an onSubJobOktrigger from the tFileRowCountcomponent to the
                            tFileInputDelimited.

                             

                            should be

                             

                            Connect an onSubJobOktrigger from the tFileRowCountcomponent to the
                            tFileInputFullRow.

                            Errata type: Code | Page number: 152

                            Point 13.

                            To make the job output more useful, open tJavaand insert the following code
                            System.out.println("Processing file: "+((String)globalMap.
                            get("tFileList_1_CURRENT_FILE"))).

                             

                            should be:

                             

                            To make the job output more useful, open tJavaand insert the following code
                            System.out.println("Processing file: "+((String)globalMap.
                            get("tFileList_1_CURRENT_FILE")));.

                            Errata type: Typo | Page number: 92

                             

                            In the How to do it... section, step 1 :

                            Open the tFileInputDelimitedcomponent and change the delimiter to, so that it matches the format in the file. 


                            Should be 


                            Open the tFileInputDelimitedcomponent and change the delimiter to =, so that it matches the format in the file. 

                            Errata type: Technical | Page number: 165

                            How to do it... section, steps 7 and 8:

                            Given:

                            7. Drag name and age to the Target Schema panel Fields to extract.
                            8. Drag the field customer from the Refresh Preview panel, and you will see the values as they will appear in the schema.

                            Should be:

                            7. Drag customerId, name, and age to the Target Schema panel Fields to extract.
                            8. Click on Refresh Preview, and you will see the values as they will appear in the schema.

                            Errata type: Technical | Page number: 210

                            How to do it... section, step 5 

                            Given:
                            Drag a tXMLOutput component and link it to tFileInputExcel 


                            Should be :


                            Drag a tFileOutputXML component and link it to tFileInputExcel 

                            Errata type: Technical | Page number: 208

                            How to do it... section, step 3 and 4

                            Given:

                            3.  Change the Match Model to First Match.

                            4.  For the Expr. key for productData, add the code:

                                 Numeric.random(1,15)

                             

                            Should be:

                            3.  For the Expr. key for productId, add the code:

                                 Numeric.random(1,15)

                            4.  Change the Match Model to First Match.

                            Errata type: Technical | Page number: 136

                            Given:

                            2.  Replace the tFileOutputDelimited with a tHashInput component, having a generic schema of sc_cook_ch8_0040_genericCustomerOut.

                            Should be:

                            2.  Replace the tFileInputDelimited_2 with a tHashInput component, having a generic schema of sc_cook_ch8_0040_genericCustomerOut.

                            Errata type: Technical | Page number: 136

                            Given:

                            2.  Replace the tFileOutputDelimited with a tHashInput component, having a generic schema of sc_cook_ch8_0040_genericCustomerOut.

                            Should be:

                            2.  Replace the tFileInputDelimited_2 with a tHashInput component, having a generic schema of sc_cook_ch8_0040_genericCustomerOut.

                            Errata type: Technical | Page number: 15

                            Given:

                            4.  Type name into the column, and set the length to 50.

                            Should be:

                            4.  Type name into the column, and set the length to 40.


                            Errata type: Grammar | Page number: 5

                            Given: Many any of the examples will write their output to the Talend log/console window when we could easily have written the data out to files or tables.

                            Should be: Many of the examples will write their output to the Talend log/console window when we could easily have written the data out to files or tables.

                            Sample chapters

                            You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

                            Frequently bought together

                            Talend Open Studio Cookbook +    Zend Framework 2 Cookbook =
                            50% Off
                            the second eBook
                            Price for both: $39.00

                            Buy both these recommended eBooks together and get 50% off the cheapest eBook.

                            What you will learn from this book

                            • Manipulate schemas quickly and easily
                            • Validate your data and create test data
                            • Use Java code within Talend
                            • Debug your Talend code
                            • Use tMap effectively
                            • Create and manage files including complex file formats
                            • Access queues, web services, and XML within Talend
                            • Deploy, as well as schedule your Talend code

                            In Detail

                            Data integration is a key component of an organization’s technical strategy, yet historically the tools have been very expensive. Talend Open Studio is the world’s leading open source data integration product and has played a huge part in making open source data integration a popular choice for businesses worldwide.

                            This book is a welcome addition to the small but growing library of Talend Open Studio resources. From working with schemas to creating and validating test data, to scheduling your Talend code, you will get acquainted with the various Talend database handling techniques. Each recipe is designed to provide the key learning point in a short, simple and effective manner.

                            This comprehensive guide provides practical exercises that cover all areas of the Talend development lifecycle including development, testing, debugging and deployment. The book delivers design patterns, hints, tips, and advice in a series of short and focused exercises that can be approached as a reference for more seasoned developers or as a series of useful learning tutorials for the beginner.

                            The book covers the basics in terms of schema usage and mappings, along with dedicated sections that will allow you to get more from tMap, files, databases and XML.

                            Geared towards the whole lifecycle, the Talend Open Studio Cookbook shows readers great ways to handle everyday tasks, and provides an insight into all areas of a development cycle including coding, testing, and debugging of code to provide start-to-finish coverage of the product.

                            Approach

                            Primarily designed as a reference book, simple and effective exercises based upon genuine real-world tasks enable the developer to reduce the time to deliver the results. Presentation of the activities in a recipe format will enable the readers to grasp even the complex concepts with consummate ease.

                            Who this book is for

                            Talend Open Studio Cookbook is principally aimed at relative beginners and intermediate Talend Developers who have used the product to perform some simple integration tasks, possibly via a training course or beginner's tutorials.

                            Code Download and Errata
                            Packt Anytime, Anywhere
                            Register Books
                            Print Upgrades
                            eBook Downloads
                            Video Support
                            Contact Us
                            Awards Voting Nominations Previous Winners
                            Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
                            Resources
                            Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software