Reader small image

You're reading from  Cassandra 3.x High Availability - Second Edition

Product typeBook
Published inAug 2016
Reading LevelIntermediate
Publisher
ISBN-139781786462107
Edition2nd Edition
Languages
Right arrow
Author (1)
Robbie Strickland
Robbie Strickland
author image
Robbie Strickland

Robbie Strickland has been involved in the Apache Cassandra project since 2010, and he initially went to production with the 0.5 release. He has made numerous contributions over the years, including work on drivers for C# and Scala and multiple contributions to the core Cassandra codebase. In 2013 he became the very first certified Cassandra developer, and in 2014 DataStax selected him as an Apache Cassandra MVP. Robbie has been an active speaker and writer in the Cassandra community and is the founder of the Atlanta Cassandra Users Group. Other examples of his writing can be found on the DataStax blog, and he has presented numerous webinars and conference talks over the years.
Read more about Robbie Strickland

Right arrow

Chapter 4.  Data Centers

One of Cassandra's most compelling high availability features is its support for multiple data centers. In fact, this feature gives it the capability to scale reliably with a level of ease that few other data stores can match.

In this chapter, we'll explore Cassandra's data center support, covering the following topics:

  • Use cases for multiple data centers

  • Using a separate data center for online analytics

  • Replication across data centers

  • An in-depth look at configuring snitches

  • Multi-region EC2 implementations

  • Multi-data center consistency levels

Database administrators have struggled for many years to reliably replicate data across multiple geographies, a task that is made especially difficult when that system is attempting to maintain ACID guarantees. The best we could typically hope for was to keep a relatively recent backup for failover purposes.

Distributed database designs have made this easier, but many still require complex configurations and have significant limitations...

Use cases for multiple data centers


There are several key use cases for deploying Cassandra across multiple data centers, including the obvious failover and load balancing scenarios. Let's examine a few of these cases.

Live backup

Traditional database backups involve taking periodic snapshots of the data and storing them offsite in case the system fails, in which case there will be downtime as a new system is brought up and the data is restored. This strategy also inevitably leads to data loss for the time period between the last backup and the point of failure.

Cassandra supports these types of backups, and we will discuss this in greater depth in Chapter 9 , Failing Gracefully. While snapshot backups are still useful to protect against data corruption or accidental updates, Cassandra's data center support can be used to provide a current backup for cases such as hardware failures.

The basic idea involves setting up a second data center that maintains a current set of replicas that can be...

Data center setup


The mechanism for defining a data center depends on the snitch you specify in cassandra.yaml. Take a look at the previous chapter if you need a refresher on the various types of snitches. You'll recall that the snitch's role is to tell Cassandra what your network topology looks like, so it can know how to place replicas across your cluster. When configuring a snitch, it's important to make sure that the data centers resolved by the snitch match those in your schema.

With this in mind, let's take a closer look at what configuration looks like for each of the snitch options.

RackInferringSnitch

There really isn't any configuration to perform on the RackInferringSnitch, as long as your IP addressing scheme matches your topology. Specifically, it uses the second, third, and fourth octets to define data center, rack, and node, respectively, as follows:

This strategy can work well for simple deployments in physical data centers where IP addresses can be predicted reliably. The...

Replication across data centers


In previous chapters, we have touched on the idea that Cassandra can automatically replicate across multiple data centers. There are other systems that allow for similar replication; however, the ease of configuration and general robustness set Cassandra apart. Let's take a detailed look at how this works.

Setting replication factors

You will recall from Chapter 3, Replication that replication is configured via CQL at the keyspace level. Since we're on the topic of multiple data centers, it's important to understand that you'll always want to use the NetworkTopologyStrategy, since the SimpleStrategy does not allow for setting replication factor for each data center.

Attempting to use SimpleStrategy in a multi-data center environment would result in random replica placement across data centers. Coordination traffic across nodes would incur significant latency, as requests would often require nodes in more than one data center to satisfy the requested consistency...

Summary


After reading this chapter and the previous one, you should have a solid understanding of how Cassandra ensures that your data is available when needed and protected from loss due to node or data center failure. By now you should be able to set up and configure a cluster across multiple geographical regions, and be familiar enough with data centers to begin the journey to analyzing your live data without cumbersome and expensive ETL processes.

So far we've focused on what it takes to get started with a solid Cassandra foundation for your application. In the next chapter, we will talk about what it looks like when your use case grows beyond your original plan and you need to scale out your cluster.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Cassandra 3.x High Availability - Second Edition
Published in: Aug 2016Publisher: ISBN-13: 9781786462107
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Robbie Strickland

Robbie Strickland has been involved in the Apache Cassandra project since 2010, and he initially went to production with the 0.5 release. He has made numerous contributions over the years, including work on drivers for C# and Scala and multiple contributions to the core Cassandra codebase. In 2013 he became the very first certified Cassandra developer, and in 2014 DataStax selected him as an Apache Cassandra MVP. Robbie has been an active speaker and writer in the Cassandra community and is the founder of the Atlanta Cassandra Users Group. Other examples of his writing can be found on the DataStax blog, and he has presented numerous webinars and conference talks over the years.
Read more about Robbie Strickland