Talend for Big Data

If you want to start working on big data projects fast, this is the guide you’ve been looking for. Delve deep into Talend and discover how just how easily you can revolutionize your data handling and presentation.

Talend for Big Data

Starting
Bahaaldine Azarmi

If you want to start working on big data projects fast, this is the guide you’ve been looking for. Delve deep into Talend and discover how just how easily you can revolutionize your data handling and presentation.
$20.99
$34.99
RRP $20.99
RRP $34.99
eBook
Print + eBook
$12.99 p/month

Want this title & more? Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.
Code Files
+ Collection
Free Sample

Book Details

ISBN 139781782169499
Paperback96 pages

About This Book

  • Write complex processing job codes easily with the help of clear and step-by-step instructions
  • Compare, filter, evaluate, and group vast quantities of data using Hadoop Pig
  • Explore and perform HDFS and RDBMS integration with the Sqoop component

Who This Book Is For

If you are a chief information officer, enterprise architect, data architect, data scientist, software developer, software engineer, or a data analyst who is familiar with data processing projects and who wants to use Talend to get your first Big Data job executed in a reliable, quick, and graphical way, Talend for Big Data is perfect for you.

Table of Contents

Chapter 1: Getting Started with Talend Big Data
Talend Unified Platform presentation
Knowing about the Hadoop ecosystem
Prerequisites for running examples
Downloading Talend Open Studio for Big Data
Installing TOSBD
Running TOSBD for the first time
Summary
Chapter 2: Building Our First Big Data Job
TOSBD – the development environment
A simple HDFS writer job
Checking the result in HDFS
Summary
Chapter 3: Formatting Data
Twitter Sentiment Analysis
Writing the tweets in HDFS
Setting our Apache Hive tables
Formatting tweets with Apache Hive
Summary
Chapter 4: Processing Tweets with Apache Hive
Extracting hashtags
Extracting emoticons
Joining the dots
Summary
Chapter 5: Aggregate Data with Apache Pig
Knowing about Pig
Extracting the top Twitter users
Extracting the top hashtags, emoticons, and sentiments
Summary
Chapter 6: Back to the SQL Database
Linking HDFS and RDBMS with Sqoop
Exporting and importing data to a MySQL database
Summary
Chapter 7: Big Data Architecture and Integration Patterns
The streaming pattern
The partitioning pattern
Summary

What You Will Learn

  • Discover the structure of the Talend Unified Platform
  • Work with Talend HDFS components
  • Implement ELT processing jobs using Talend Hive components
  • Load, filter, aggregate, and store data using Talend Pig components
  • Integrate HDFS with RDBMS using Sqoop components
  • Use the streaming pattern for big data
  • Learn to reuse the partitioning pattern for Big Data

In Detail

Talend, a successful Open Source Data Integration Solution, accelerates the adoption of new big data technologies and efficiently integrates them into your existing IT infrastructure. It is able to do this because of its intuitive graphical language, its multiple connectors to the Hadoop ecosystem, and its array of tools for data integration, quality, management, and governance.

This is a concise, pragmatic book that will guide you through design and implement big data transfer easily and perform big data analytics jobs using Hadoop technologies like HDFS, HBase, Hive, Pig, and Sqoop. You will see and learn how to write complex processing job codes and how to leverage the power of Hadoop projects through the design of graphical Talend jobs using business modeler, meta-data repository, and a palette of configurable components.

Starting with understanding how to process a large amount of data using Talend big data components, you will then learn how to write job procedures in HDFS. You will then look at how to use Hadoop projects to process data and how to export the data to your favourite relational database system.

You will learn how to implement Hive ELT jobs, Pig aggregation and filtering jobs, and simple Sqoop jobs using the Talend big data component palette. You will also learn the basics of Twitter sentiment analysis the instructions to format data with Apache Hive.

Talend for Big Data will enable you to start working on big data projects immediately, from simple processing projects to complex projects using common big data patterns.

Authors

Table of Contents

Chapter 1: Getting Started with Talend Big Data
Talend Unified Platform presentation
Knowing about the Hadoop ecosystem
Prerequisites for running examples
Downloading Talend Open Studio for Big Data
Installing TOSBD
Running TOSBD for the first time
Summary
Chapter 2: Building Our First Big Data Job
TOSBD – the development environment
A simple HDFS writer job
Checking the result in HDFS
Summary
Chapter 3: Formatting Data
Twitter Sentiment Analysis
Writing the tweets in HDFS
Setting our Apache Hive tables
Formatting tweets with Apache Hive
Summary
Chapter 4: Processing Tweets with Apache Hive
Extracting hashtags
Extracting emoticons
Joining the dots
Summary
Chapter 5: Aggregate Data with Apache Pig
Knowing about Pig
Extracting the top Twitter users
Extracting the top hashtags, emoticons, and sentiments
Summary
Chapter 6: Back to the SQL Database
Linking HDFS and RDBMS with Sqoop
Exporting and importing data to a MySQL database
Summary
Chapter 7: Big Data Architecture and Integration Patterns
The streaming pattern
The partitioning pattern
Summary

Book Details

ISBN 139781782169499
Paperback96 pages
Read More