Reader small image

You're reading from  Apache Flume: Distributed Log Collection for Hadoop

Product typeBook
Published inFeb 2015
Reading LevelIntermediate
Publisher
ISBN-139781784392178
Edition1st Edition
Languages
Right arrow
Author (1)
Steven Hoffman
Steven Hoffman
author image
Steven Hoffman

Steve Hoffman has 32 years of experience in software development, ranging from embedded software development to the design and implementation of large-scale, service-oriented, object-oriented systems. For the last 5 years, he has focused on infrastructure as code, including automated Hadoop and HBase implementations and data ingestion using Apache Flume. Steve holds a BS in computer engineering from the University of Illinois at Urbana-Champaign and an MS in computer science from DePaul University. He is currently a senior principal engineer at Orbitz Worldwide (http://orbitz.com/). More information on Steve can be found at http://bit.ly/bacoboy and on Twitter at @bacoboy. This is the first update to Steve's first book, Apache Flume: Distributed Log Collection for Hadoop, Packt Publishing.
Read more about Steven Hoffman

Right arrow

Considerations for multiple data centers


If you run your business out of multiple data centers and have a large volume of data collected, you may want to consider setting up a Hadoop cluster in each data center rather than sending all your collected data back to a single data center. There may be regulatory implications regarding data crossing certain geographic boundaries. Chances are there is somebody in your company who knows much more about compliance than you or I, so seek them out before you start copying data across borders. Of course, not collating your data will make it more difficult to analyze it, as you can't just run one MapReduce job against all the data. Instead, you would have to run parallel jobs and then combine the results in a second pass. Adjusting your data processing procedures is better than potentially breaking the law. Be sure to do your homework.

Pulling all your data into a single cluster may also be more than your networking can handle. Depending on how your data...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Apache Flume: Distributed Log Collection for Hadoop
Published in: Feb 2015Publisher: ISBN-13: 9781784392178

Author (1)

author image
Steven Hoffman

Steve Hoffman has 32 years of experience in software development, ranging from embedded software development to the design and implementation of large-scale, service-oriented, object-oriented systems. For the last 5 years, he has focused on infrastructure as code, including automated Hadoop and HBase implementations and data ingestion using Apache Flume. Steve holds a BS in computer engineering from the University of Illinois at Urbana-Champaign and an MS in computer science from DePaul University. He is currently a senior principal engineer at Orbitz Worldwide (http://orbitz.com/). More information on Steve can be found at http://bit.ly/bacoboy and on Twitter at @bacoboy. This is the first update to Steve's first book, Apache Flume: Distributed Log Collection for Hadoop, Packt Publishing.
Read more about Steven Hoffman