Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Apache Flume: Distributed Log Collection for Hadoop

You're reading from  Apache Flume: Distributed Log Collection for Hadoop

Product type Book
Published in Feb 2015
Publisher
ISBN-13 9781784392178
Pages 178 pages
Edition 1st Edition
Languages
Author (1):
Steven Hoffman Steven Hoffman
Profile icon Steven Hoffman

Flume 1.X (Flume-NG)


There were many reasons why Flume was refactored. If you are interested in the details, you can read about them at https://issues.apache.org/jira/browse/FLUME-728. What started as a refactoring branch eventually became the main line of development as Flume 1.X.

The most obvious change in Flume 1.X is that the centralized configuration master(s) and Zookeeper are gone. The configuration in Flume 0.9 was overly verbose, and mistakes were easy to make. Furthermore, centralized configuration was really outside the scope of Flume's goals. Centralized configuration was replaced with a simple on-disk configuration file (although the configuration provider is pluggable so that it can be replaced). These configuration files are easily distributed using tools such as cf-engine, Chef, and Puppet. If you are using a Cloudera distribution, take a look at Cloudera Manager to manage your configurations. About two years ago, they created a free version with no node limit, so it may be an attractive option for you. Just be sure you don't manage these configurations manually, or you'll be editing these files manually forever.

Another major difference in Flume 1.X is that the reading of input data and the writing of output data are now handled by different worker threads (called Runners). In Flume 0.9, the input thread also did the writing to the output (except for failover retries). If the output writer was slow (rather than just failing outright), it would block Flume's ability to ingest data. This new asynchronous design leaves the input thread blissfully unaware of any downstream problem.

The first edition of this book covered all the versions of Flume up till Version 1.3.1. This second edition will cover till Version 1.5.2 (the current version at the time of writing this).

You have been reading a chapter from
Apache Flume: Distributed Log Collection for Hadoop
Published in: Feb 2015 Publisher: ISBN-13: 9781784392178
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}