Talend for Big Data

Talend for Big Data
eBook: $20.99
Formats: PDF, PacktLib, ePub and Mobi formats
save 15%!
Print + free eBook + free PacktLib access to the book: $55.98    Print cover: $34.99
save 37%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Table of Contents
Sample Chapters
  • Write complex processing job codes easily with the help of clear and step by step instructions
  • Compare, filter, evaluate, and group vast quantities of data using Hadoop Pig
  • Explore and perform HDFS and RDBMS integration with the Sqoop component

Book Details

Language : English
Paperback : 96 pages [ 235mm x 191mm ]
Release Date : February 2014
ISBN : 1782169490
ISBN 13 : 9781782169499
Author(s) : Bahaaldine Azarmi
Topics and Technologies : All Books, Big Data and Business Intelligence, Open Source

Table of Contents

Chapter 1: Getting Started with Talend Big Data
Chapter 2: Building Our First Big Data Job
Chapter 3: Formatting Data
Chapter 4: Processing Tweets with Apache Hive
Chapter 5: Aggregate Data with Apache Pig
Chapter 6: Back to the SQL Database
Chapter 7: Big Data Architecture and Integration Patterns
Appendix: Installing Your Hadoop Cluster with Cloudera CDH VM
  • Chapter 1: Getting Started with Talend Big Data
    • Talend Unified Platform presentation
    • Knowing about the Hadoop ecosystem
    • Prerequisites for running examples
    • Downloading Talend Open Studio for Big Data
    • Installing TOSBD
    • Running TOSBD for the first time
    • Summary
  • Chapter 3: Formatting Data
    • Twitter Sentiment Analysis
    • Writing the tweets in HDFS
    • Setting our Apache Hive tables
    • Formatting tweets with Apache Hive
    • Summary

Bahaaldine Azarmi

Bahaaldine Azarmi  is the cofounder of  reach5.co. With his past experience of working at Oracle and Talend, he has specialized in real-time architecture using service-oriented architecture products, Big Data projects, and web technologies.

Submit Errata

Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.


- 1 submitted: last submission 28 May 2014

The code files for this book is found at https://www.dropbox.com/s/9u16suq55dyddbm/9499OS.files.zip

This link contains

extractHashTags.jar is the asked jar which is used in the chapter 4 when I talk about

Hive custom UDF
- packt_talend_for_big_data_jobs.zip which was exported from Talend Studio and contains 
all the job implemented in the book. The user can directly import this in the studio (File > Import …)
- tweets-00.log is file which contains raw tweets streamed from twitter.

Sample chapters

You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

Frequently bought together

Talend for Big Data +    AJAX and PHP: Building Modern Web Applications 2nd Edition =
50% Off
the second eBook
Price for both: £19.75

Buy both these recommended eBooks together and get 50% off the cheapest eBook.

What you will learn from this book

  • Discover the structure of the Talend Unified Platform
  • Work with Talend HDFS components
  • Implement ELT processing jobs using Talend Hive components
  • Load, filter, aggregate, and store data using Talend Pig components
  • Integrate HDFS with RDBMS using Sqoop components
  • Use the streaming pattern for big data
  • Learn to reuse the partitioning pattern for Big Data

In Detail

Talend, a successful Open Source Data Integration Solution, accelerates the adoption of new big data technologies and efficiently integrates them into your existing IT infrastructure. It is able to do this because of its intuitive graphical language, its multiple connectors to the Hadoop ecosystem, and its array of tools for data integration, quality, management, and governance.

This is a concise, pragmatic book that will guide you through design and implement big data transfer easily and perform big data analytics jobs using Hadoop technologies like HDFS, HBase, Hive, Pig, and Sqoop. You will see and learn how to write complex processing job codes and how to leverage the power of Hadoop projects through the design of graphical Talend jobs using business modeler, meta-data repository, and a palette of configurable components.

Starting with understanding how to process a large amount of data using Talend big data components, you will then learn how to write job procedures in HDFS. You will then look at how to use Hadoop projects to process data and how to export the data to your favourite relational database system.

You will learn how to implement Hive ELT jobs, Pig aggregation and filtering jobs, and simple Sqoop jobs using the Talend big data component palette. You will also learn the basics of Twitter sentiment analysis the instructions to format data with Apache Hive.

Talend for Big Data will enable you to start working on big data projects immediately, from simple processing projects to complex projects using common big data patterns.


This book is written in a concise and easy-to-understand manner, and acts as a comprehensive guide on data analytics and integration with Talend Big Data processing jobs.

Who this book is for

If you are a chief information officer, enterprise architect, data architect, data scientist, software developer, software engineer, or a data analyst who is familiar with data processing projects and who wants to use Talend to get your first Big Data job executed in a reliable, quick, and graphical way, Talend for Big Data is perfect for you.

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software