Apache Accumulo for Developers


Apache Accumulo for Developers
eBook: $20.99
Formats: PDF, PacktLib, ePub and Mobi formats
$17.84
save 15%!
Print + free eBook + free PacktLib access to the book: $55.98    Print cover: $34.99
$34.99
save 37%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Overview
Table of Contents
Author
Support
Sample Chapters
  • Shows you how to build Accumulo, Hadoop, and ZooKeeper clusters from scratch on both Windows and Linux
  • Allows you to get hands-on knowledge about how to run Accumulo on Amazon EC2, Google Cloud Platform, Rackspace, and Windows Azure Cloud platforms
  • Packed with practical examples to enable you to manipulate Accumulo with ease

Book Details

Language : English
Paperback : 120 pages [ 235mm x 191mm ]
Release Date : October 2013
ISBN : 1783285990
ISBN 13 : 9781783285990
Author(s) : Guðmundur Jón Halldórsson
Topics and Technologies : All Books, Big Data and Business Intelligence, Open Source

Table of Contents

Preface
Chapter 1: Building an Accumulo Cluster from Scratch
Chapter 2: Monitoring and Managing Accumulo
Chapter 3: Integrating Accumulo into Various Cloud Platforms
Chapter 4: Optimizing Accumulo Performance
Chapter 5: Security
Appendix A: Accumulo Command References
Appendix B: Hadoop Command References
Appendix C: ZooKeeper Command References
Index
  • Chapter 1: Building an Accumulo Cluster from Scratch
    • Necessary requirements
    • Setting up Cygwin
    • Setting up Hadoop
      • SSH configuration
        • Creating a Hadoop user
        • Generating an SSH key for the Hadoop user
      • Installing Hadoop
      • Configuring Hadoop
        • core-site.xml
        • mapred-site.xml
        • hdfs-site.xml
        • hadoop-env.sh
      • Preparing the Hadoop filesystem
      • Starting the Hadoop cluster
      • Multi-node configurations
        • The NameNode website
        • The JobTracker website
        • The TaskTracker website
    • Setting up ZooKeeper
      • Installing ZooKeeper
      • Configuring ZooKeeper
      • Starting ZooKeeper
    • Setting up and configuring Accumulo
      • Installing Accumulo
      • Configuring Accumulo
    • Starting the Accumulo cluster
      • The Accumulo website
    • Connecting to the Accumulo cluster using Java
    • Summary
    • Chapter 2: Monitoring and Managing Accumulo
      • Monitoring
        • Setting up Ganglia
          • Configuring Ganglia
        • Setting up the Graylog2 server
          • Logging using Graylog2
        • Setting up Nagios
        • Hadoop
          • NameNode web interface
          • Finding the logfiles
          • How does Accumulo store files in Hadoop?
          • Live, dead, and decommissioning nodes
        • Accumulo
        • Monitoring a system's overview
      • Elasticity
      • Failover
      • Resource management
      • Summary
      • Chapter 3: Integrating Accumulo into Various Cloud Platforms
        • Amazon EC2
          • Prerequisites for Amazon EC2
          • Creating Amazon EC2 Hadoop and ZooKeeper cluster
          • Setting up Accumulo
        • Google Cloud Platform
          • Prerequisites for Google Cloud Platform
          • Creating the project
          • Installing the Google gcutil tool
            • Configuring credentials
            • Configuring the project
          • Creating the firewall rules
          • Creating the cluster
            • Hadoop
            • ZooKeeper
            • Accumulo
          • Deleting the cluster
        • Rackspace
          • Configuration
          • Network
        • Windows Azure
          • Prerequisites
          • Creating the cluster
            • Hadoop
            • ZooKeeper
            • Accumulo
          • Deleting the cluster
        • Summary
        • Chapter 4: Optimizing Accumulo Performance
          • Prerequisites
          • Hadoop performance
            • Baseline
            • Tuning
              • Tuning parameters for mapred-default.xml
            • HDFS
              • Tuning parameters for mapred-site.xml
              • Tuning parameters for hdfs-site.xml
          • ZooKeeper performance
            • ZooKeeper overview
          • Accumulo performance
            • Tuning parameters for accumulo-site.xml
            • Accumulo overview
            • Accumulo's performance summary
              • Tables
              • Comparing bulk ingest versus batch write
              • Accumulo examples
          • Summary
          • Chapter 5: Security
            • Visibility
              • Creating an Accumulo user
              • Creating tables in Accumulo
              • How does visibility work?
            • Security expression
              • Writing a Java client
            • Authorization
            • User authorizations
            • Handling secure authorization
            • Query Services Layer
            • Summary

            Guðmundur Jón Halldórsson

            Guðmundur Jón Halldórsson is a Software Engineer who enjoys the challenges of complex problems and pays close attention to detail. He is an annual speaker at the Icelandic Computer Society (SKY, http://www.utmessan.is/). Guðmundur is a Software Engineer with extensive experience and management skills, and works for Five Degrees (www.fivedegrees.nl), a banking software company. The company develops and sells high-quality banking software. As a Senior Software Engineer, he is responsible for the development of a backend banking system produced by the company. Guðmundur has a B.Sc. in Computer Sciences from the Reykjavik University. Guðmundur has a long period of work experience as a Software Engineer since 1996. He has worked for a large bank in Iceland, an insurance company, and a large gaming company where he was in the core EVE Online team. Guðmundur is passionate about whatever he does. He loves to play online chess and Sudoku. And when he has time, he likes to read science fiction and history books. He maintains a Facebook page to network with his friends and readers, and blogs about the wonders of programming and cloud computing at http://www.gudmundurjon.net/.
            Sorry, we don't have any reviews for this title yet.

            Code Downloads

            Download the code and support files for this book.


            Submit Errata

            Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.

            Sample chapters

            You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

            Frequently bought together

            Apache Accumulo for Developers +    Python Network Programming Cookbook =
            50% Off
            the second eBook
            Price for both: $33.44

            Buy both these recommended eBooks together and get 50% off the cheapest eBook.

            What you will learn from this book

            • Set up Hadoop, ZooKeeper, and Accumulo
            • Monitor clusters - both performance and application logs
            • Secure your data in Accumulo
            • Optimize Hadoop, ZooKeeper, and Accumulo performance
            • Integrate to various cloud platforms
            • Use the Accumulo command-line shell
            • Employ Ganglina to monitor the cluster and Graylog2 to monitor application logs
            • Understand what tools are needed to optimize Accumulo performance

            In Detail

            Accumulo is a sorted and distributed key/value store designed to handle large amounts of data. Being highly robust and scalable, its performance makes it ideal for real-time data storage. Apache Accumulo is based on Google's BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift.

            Apache Accumulo for Developers is your guide to building an Accumulo cluster both as a single-node and multi-node, on-site and in the cloud. Accumulo has been proven to be able to handle petabytes of data, with cell-level security, and real-time analyses so this is your step by step guide in taking full advantage of this power.

            Apache Accumulo for Developers looks at the process of setting up three systems - Hadoop, ZooKeeper, and Accumulo – and configuring, monitoring, and securing them.

            You will learn to connect Accumulo to both Hadoop and ZooKeeper. You will also learn how to monitor the cluster (single-node or multi-node) to find any performance bottlenecks, and then integrate to Amazon EC2, Google Cloud Platform, Rackspace, and Windows Azure. When integrating with these cloud platforms, we will focus on scripting as well.

            You will also learn to troubleshoot clusters with monitoring tools, and use Accumulo cell-level security to secure your data.

            Approach

            The book will have a tutorial-based approach that will show the readers how to start from scratch with building an Accumulo cluster and learning how to monitor the system and implement aspects such as security.

            Who this book is for

            This book is great for developers new to Accumulo, who are looking to get a good grounding in how to use Accumulo. It’s assumed that you have an understanding of how Hadoop works, both HDFS and the Map/Reduce. No prior knowledge of ZooKeeper is assumed.

            Code Download and Errata
            Packt Anytime, Anywhere
            Register Books
            Print Upgrades
            eBook Downloads
            Video Support
            Contact Us
            Awards Voting Nominations Previous Winners
            Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
            Resources
            Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software