Mastering Apache Cassandra


Mastering Apache Cassandra
eBook: $26.99
Formats: PDF, PacktLib, ePub and Mobi formats
$22.94
save 15%!
Print + free eBook + free PacktLib access to the book: $71.98    Print cover: $44.99
$44.99
save 37%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Overview
Table of Contents
Author
Support
Sample Chapters
  • Complete coverage of all aspects of Cassandra
  • Discusses prominent patterns, pros and cons, and use cases
  • Contains briefs on integration with other software

Book Details

Language : English
Paperback : 340 pages [ 235mm x 191mm ]
Release Date : October 2013
ISBN : 1782162682
ISBN 13 : 9781782162681
Author(s) : Nishant Neeraj
Topics and Technologies : All Books, Big Data and Business Intelligence, Open Source

Table of Contents

Preface
Chapter 1: Quick Start
Chapter 2: Cassandra Architecture
Chapter 3: Design Patterns
Chapter 4: Deploying a Cluster
Chapter 5: Performance Tuning
Chapter 6: Managing a Cluster – Scaling, Node Repair, and Backup
Chapter 7: Monitoring
Chapter 8: Integration
Chapter 9: Introduction to CQL 3 and Cassandra 1.2
Index
  • Chapter 1: Quick Start
    • Introduction to Cassandra
      • Distributed database
      • High availability
      • Replication
      • Multiple data centers
    • A brief introduction to a data model
    • Installing Cassandra locally
    • CRUD with cassandra-cli
    • Cassandra in action
      • Modeling data
      • Writing code
        • Setting up
        • Application
    • Summary
    • Chapter 2: Cassandra Architecture
      • Problems in the RDBMS world
      • Enter NoSQL
        • The CAP theorem
          • Consistency
          • Availability
          • Partition-tolerance
        • Significance of the CAP theorem
      • Cassandra
      • Cassandra architecture
        • Ring representation
        • How Cassandra works
          • Write in action
          • Read in action
        • Components of Cassandra
          • Messaging service
          • Gossip
          • Failure detection
          • Partitioner
          • Replication
          • Log Structured Merge tree
          • CommitLog
          • MemTable
          • SSTable
          • Compaction
          • Tombstones
          • Hinted handoff
          • Read repair and Anti-entropy
      • Summary
      • Chapter 3: Design Patterns
        • The Cassandra data model
          • The counter column
          • The expiring column
          • The super column
          • The column family
          • Keyspaces
          • Data types – comparators and validators
            • Writing a custom comparator
            • The primary index
            • The wide-row index
            • Simple groups
            • Sorting for free, free as in speech
            • An inverse index with a super column family
            • An inverse index with composite keys
            • The secondary index
        • Patterns and antipatterns
          • Avoid storing an entity in a single column (wherever possible)
          • Atomic update
          • Managing time series data
            • Wide-row time series
            • High throughput rows and hotspots
            • Advanced time series
          • Avoid super columns
          • Transaction woes
          • Use expiring columns
          • batch_mutate
        • Summary
        • Chapter 4: Deploying a Cluster
          • Evaluating requirements
            • Hard disk capacity
              • RAM
              • CPU
              • Nodes
              • Network
          • System configurations
            • Optimizing user limits
            • Swapping memory
            • Clock synchronization
            • Disk readahead
          • The required software
            • Installing Oracle Java 6
              • RHEL and CentOS systems
              • Debian and Ubuntu systems
            • Installing the Java Native Access (JNA) library
          • Installing Cassandra
            • Installing from a tarball
            • Installing from ASFRepository for Debian/Ubuntu
            • Anatomy of the installation
              • Cassandra binaries
              • Configuration files
          • Configuring a Cassandra cluster
            • The cluster name
            • The seed node
              • Listen, broadcast, and RPC addresses
            • Initial token
            • Partitioners
              • The random partitioner
              • The byte-ordered partitioner
              • The Murmur3 partitioner
            • Snitches
              • SimpleSnitch
              • PropertyFileSnitch
              • GossipingPropertyFileSnitch
              • RackInferringSnitch
              • EC2Snitch
              • EC2MultiRegionSnitch
            • Replica placement strategies
              • SimpleStrategy
              • NetworkTopologyStrategy
            • Launching a cluster with a script
            • Creating a keyspace
          • Authorization and authentication
          • Summary
          • Chapter 5: Performance Tuning
            • Stress testing
            • Performance tuning
              • Write performance
              • Read performance
                • Choosing the right compaction strategy
                • Size tiered compaction strategy
                • Leveled compaction
                • Row cache
                • Key cache
                • Cache settings
                • Enabling compression
                • Tuning the bloom filter
              • More tuning via cassandra.yaml
                • index_interval
                • commitlog_sync
                • column_index_size_in_kb
                • commitlog_total_space_in_mb
              • Tweaking JVM
                • Java heap
                • Garbage collection
                • Other JVM options
              • Scaling horizontally and vertically
              • Network
            • Summary
              • Chapter 7: Monitoring
                • Cassandra JMX interface
                  • Accessing MBeans using JConsole
                • Cassandra nodetool
                  • Monitoring with nodetool
                    • cfstats
                    • netstats
                    • ring and describering
                    • tpstats
                    • compactionstats
                    • info
                  • Administrating with nodetool
                    • drain
                    • decommission
                    • move
                    • removetoken
                    • repair
                    • upgradesstable
                    • snapshot
                • DataStax OpsCenter
                  • OpsCenter Features
                  • Installing OpsCenter and an agent
                    • Prerequisites
                    • Running a Cassandra cluster
                    • Installing OpsCenter from Tarball
                    • Setting up an OpsCenter agent
                  • Monitoring and administrating with OpsCenter
                  • Other features of OpsCenter
                • Nagios – monitoring and notification
                  • Installing Nagios
                    • Prerequisites
                    • Preparation
                    • Installation
                    • Nagios plugins
                • Cassandra log
                  • Enabling Java Options for GC Logging
                • Troubleshooting
                  • High CPU usage
                  • High memory usage
                  • Hotspots
                  • OpenJDK may behave erratically
                  • Disk performance
                  • Slow snapshot
                  • Getting help from the mailing list
                • Summary
                • Chapter 8: Integration
                  • Using Hadoop
                  • Hadoop and Cassandra
                    • Introduction to Hadoop
                      • HDFS – Hadoop Distributed File System
                      • Data management
                      • Hadoop MapReduce
                      • Reliability of data and process in Hadoop
                    • Setting up local Hadoop
                    • Testing the installation
                  • Cassandra with Hadoop MapReduce
                    • ColumnFamilyInputFormat
                    • ColumnFamilyOutputFormat
                    • ConfigHelper
                      • Wide-row support
                      • Bulk loading
                      • Secondary index support
                  • Cassandra and Hadoop in action
                    • Executing, debugging, monitoring, and looking at results
                  • Hadoop in Cassandra cluster
                    • Cassandra filesystem
                  • Integration with Pig
                    • Installing Pig
                    • Integrating Pig and Cassandra
                  • Cassandra and Solr
                    • Development note on Solandra
                      • DataStax Enterprise – the next level Solr integration
                  • Summary
                  • Chapter 9: Introduction to CQL 3 and Cassandra 1.2
                    • CQL – the Cassandra Query Language
                    • CQL 3 for Thrift refugees
                      • Wide rows
                      • Composite columns
                    • CQL 3 basics
                      • The CREATE KEYSPACE query
                      • The CREATE TABLE query
                      • Compact storage
                      • Creating a secondary index
                      • The INSERT query
                      • The SELECT query
                      • select expression
                      • The WHERE clause
                      • The ORDER BY clause
                      • The LIMIT clause
                      • The USING CONSISTENCY clause
                      • The UPDATE query
                      • The DELETE query
                      • The TRUNCATE query
                      • The ALTER TABLE query
                        • Adding a new column
                        • Dropping an existing column
                        • Modifying the data type of an existing column
                        • Altering table options
                      • The ALTER KEYSPACE query
                      • BATCH querying
                      • The DROP INDEX query
                      • The DROP TABLE query
                      • The DROP KEYSPACE query
                      • The USE statement
                    • What's new in Cassandra 1.2?
                      • Virtual Nodes
                      • Off-heap Bloom filters
                      • JBOD improvements
                      • Parallel leveled compaction
                      • Murmur3 partitioner
                      • Atomic batches
                      • Query profiling
                      • Collections support
                        • Sets
                        • Lists
                        • Maps
                    • Support for programming languages
                    • Summary

                    Nishant Neeraj

                    Nishant Neeraj (http://naishe.in) is a software engineer at the BrightContext corporation. He builds software that can handle massive in-stream data, process it, and store it reliably, efficiently, and most importantly, quickly. He also manages the cloud infrastructure and makes sure that things stay up no matter what hit the data center in the middle of hardware failures and sudden surges of data inflow. He has six years of experience in building web applications in Java as a backend engineer. He has been using Cassandra in production-ready web applications since Version 0.6 in 2010. His interests lie in building scalable applications for large data sets. He works with Java, MySQL, Cassandra, Twitter Storm, Amazon Web Services, JavaScript, and Linux on a daily basis, and he has recently developed an interest in Machine Learning, Data Analysis, and Data Science in general.
                    Sorry, we don't have any reviews for this title yet.

                    Code Downloads

                    Download the code and support files for this book.


                    Submit Errata

                    Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.

                    Sample chapters

                    You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

                    Frequently bought together

                    Mastering Apache Cassandra +    Microsoft Exchange 2010 PowerShell Cookbook =
                    50% Off
                    the second eBook
                    Price for both: $44.10

                    Buy both these recommended eBooks together and get 50% off the cheapest eBook.

                    What you will learn from this book

                    • Write programs using Cassandra’s features more efficiently
                    • Learn how to get the most out of a given infrastructure and Improve performance, tweak JVM
                    • Manage clusters and perform housekeeping activities
                    • Keep an eye on Cassandra processes and machines that hold the data store – get to know simple monitoring mechanisms, such as open sourced and proprietary ones
                    • Squeeze the value of the data that you hold in Cassandra
                    • Learn CQL 3 quickly and use Cassandra with Java, Python, NodeJS, Scala, and PHP

                    In Detail

                    Apache Cassandra is the perfect choice for building fault tolerant and scalable databases. Implementing Cassandra will enable you to take advantage of its features which include replication of data across multiple datacenters with lower latency rates. This book details these features that will guide you towards mastering the art of building high performing databases without compromising on performance.

                    Mastering Apache Cassandra aims to give enough knowledge to enable you to program pragmatically and help you understand the limitations of Cassandra. You will also learn how to deploy a production setup and monitor it, understand what happens under the hood, and how to optimize and integrate it with other software.

                    Mastering Apache Cassandra begins with a discussion on understanding Cassandra’s philosophy and design decisions while helping you understand how you can implement it to resolve business issues and run complex applications simultaneously.

                    You will also get to know about how various components of Cassandra work with each other to give a robust distributed system. The different mechanisms that it provides to solve old problems in new ways are not as twisted as they seem; Cassandra is all about simplicity. Learn how to set up a cluster that can face a tornado of data reads and writes without wincing.

                    If you are a beginner, you can use the examples to help you play around with Cassandra and test the water. If you are at an intermediate level, you may prefer to use this guide to help you dive into the architecture. To a DevOp, this book will help you manage and optimize your infrastructure. To a CTO, this book will help you unleash the power of Cassandra and discover the resources that it requires.

                    Approach

                    Mastering Apache Cassandra is a practical, hands-on guide with step-by-step instructions. The smooth and easy tutorial approach focuses on showing people how to utilize Cassandra to its full potential.

                    Who this book is for

                    This book is aimed at intermediate Cassandra users. It is best suited for startups where developers have to wear multiple hats: programmer, DevOps, release manager, convincing clients, and handling failures. No prior knowledge of Cassandra is required.

                    Code Download and Errata
                    Packt Anytime, Anywhere
                    Register Books
                    Print Upgrades
                    eBook Downloads
                    Video Support
                    Contact Us
                    Awards Voting Nominations Previous Winners
                    Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
                    Resources
                    Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software