Avro Source Sink

Exclusive offer: get 50% off this eBook here
Apache Flume: Distributed Log Collection for Hadoop

Apache Flume: Distributed Log Collection for Hadoop — Save 50%

Stream data to Hadoop using Apache Flume with this book and ebook

£13.99    £7.00
by Steve Hoffman | July 2013 | Open Source

In this article by Steve Hoffman, author of Apache Flume: Distributed Log Collection for Hadoop, we'll put Avro to use in communication between Flume agents.

(For more resources related to this topic, see here.)

A typical configuration might look something as follows:

To use the Avro Source, you specify the type property with a value of avro. You need to provide a bind address and port number to listen on:

collector.sources=av1
collector.sources.av1.type=avro
collector.sources.av1.bind=0.0.0.0
collector.sources.av1.port=42424
collector.sources.av1.channels=ch1
collector.channels=ch1
collector.channels.ch1.type=memory
collector.sinks=k1
collector.sinks.k1.type=hdfs
collector.sinks.k1.channel=ch1
collector.sinks.k1.hdfs.path=/path/in/hdfs

Here we have configured the agent on the right that listens on port 42424, uses a memory channel, and writes to HDFS. Here I've used the memory channel for brevity of this example configuration. Also, note that I've given this agent a different name, collector, just to avoid confusion.

The agents on the left—feeding the collector tier—might have a configuration similar to this. I have left the sources off this configuration for brevity:

client.channels=ch1
client.channels.ch1.type=memory
client.sinks=k1
client.sinks.k1.type=avro
client.sinks.k1.channel=ch1
client.sinks.k1.hostname=collector.example.com
client.sinks.k1.port=42424

The hostname value, collector.example.com, has nothing to do with the agent name on that machine, it is the host name (or you can use an IP) of the target machine with the receiving Avro Source. This configuration, named client, would be applied to both agents on the left assuming both had similar source configurations.

Since I don't like single points of failure, I would configure two collector agents with the preceding configuration and instead set each client agent to round robin between the two using a sink group. Again, I've left off the sources for brevity:

client.channels=ch1
client.channels.ch1.type=memory
client.sinks=k1 k2
client.sinks.k1.type=avro
client.sinks.k1.channel=ch1
client.sinks.k1.hostname=collectorA.example.com
client.sinks.k1.port=42424
client.sinks.k2.type=avro
client.sinks.k2.channel=ch1
client.sinks.k2.hostname=collectorB.example.com
client.sinks.k2.port=42424
client.sinkgroups=g1
client.sinkgroups.g1=k1 k2
client.sinkgroups.g1.processor.type=load_balance
client.sinkgroups.g1.processor.selector=round_robin
client.sinkgroups.g1.processor.backoff=true

Summary

In this article, we covered tiering data flows using the Avro Source and Sink. More information on this topic can be found in the book Apache Flume: Distributed Log Collection for Hadoop.

Resources for Article :


Further resources on this subject:


Apache Flume: Distributed Log Collection for Hadoop Stream data to Hadoop using Apache Flume with this book and ebook
Published: July 2013
eBook Price: £13.99
Book Price: £22.99
See more
Select your format and quantity:

About the Author :


Steve Hoffman

Steve Hoffman has 30 years of software development experience and holds a B.S. in computer engineering from the University of Illinois Urbana-Champaign and a M.S. in computer science from the DePaul University. He is currently a Principal Engineer at Orbitz Worldwide.

More information on Steve can be found at http://bit.ly/bacoboy or on Twitter @bacoboy.

This is Steve's first book.

Books From Packt


 Instant MapReduce Patterns – Hadoop Essentials How-to [Instant]
Instant MapReduce Patterns – Hadoop Essentials How-to [Instant]

Hadoop Real-World Solutions Cookbook
Hadoop Real-World Solutions Cookbook

 Hadoop Beginner's Guide
Hadoop Beginner's Guide

Apache Mahout Cookbook
Apache Mahout Cookbook

Apache Tomcat 7 Essentials
Apache Tomcat 7 Essentials

Hadoop MapReduce Cookbook
Hadoop MapReduce Cookbook

Citrix XenDesktop 5.6 Cookbook
Citrix XenDesktop 5.6 Cookbook

 HBase Administration Cookbook
HBase Administration Cookbook


Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software