Reader small image

You're reading from  Cassandra High Availability

Product typeBook
Published inDec 2014
PublisherPackt
ISBN-139781783989126
Edition1st Edition
Right arrow
Author (1)
Robbie Strickland
Robbie Strickland
author image
Robbie Strickland

Robbie Strickland has been involved in the Apache Cassandra project since 2010, and he initially went to production with the 0.5 release. He has made numerous contributions over the years, including work on drivers for C# and Scala and multiple contributions to the core Cassandra codebase. In 2013 he became the very first certified Cassandra developer, and in 2014 DataStax selected him as an Apache Cassandra MVP. Robbie has been an active speaker and writer in the Cassandra community and is the founder of the Atlanta Cassandra Users Group. Other examples of his writing can be found on the DataStax blog, and he has presented numerous webinars and conference talks over the years.
Read more about Robbie Strickland

Right arrow

Chapter 3. Replication

Replication is perhaps the most critical feature of a distributed data store, as it would otherwise be impossible to make any sort of availability guarantee in the face of a node failure. As you learned in Chapter 1, Cassandra's Approach to High Availability, Cassandra employs a sophisticated replication system that allows fine-grained control over replica placement and consistency guarantees.

In this chapter, we'll explore Cassandra's replication mechanism in depth, including the following topics:

  • The replication factor

  • How replicas are placed

  • How Cassandra resolves consistency issues

  • Maintaining replication factor during node failures

  • Consistency levels

  • Choosing the right replication factor and consistency level

At the end of this chapter, you'll be able understand how to configure replication and tune consistency for your specific use cases. You'll be able to intelligently choose options that will provide the fault tolerance and consistency guarantees that are appropriate...

The replication factor


On the surface, setting the replication factor seems to be a fundamentally straightforward idea. You configure Cassandra with the number of replicas you want to maintain (during keyspace creation), and the system dutifully performs the replication for you, thus protecting you when something goes wrong. So by defining a replication factor of three, you will end up with a total of three copies of the data. There are a number of variables in this equation, and we'll cover many of these in detail in this chapter. Let's start with the basic mechanics of setting the replication factor.

Replication strategies

One thing you'll quickly notice is that the semantics to set the replication factor depend on the replication strategy you choose. The replication strategy tells Cassandra exactly how you want replicas to be placed in the cluster.

There are two strategies available:

  • SimpleStrategy: This strategy is used for single data center deployments. It is fine to use this for testing...

Snitches


As discussed earlier, Cassandra is able to intelligently place replicas across the cluster if you provide it with enough information about your topology. You give this insight to Cassandra through a snitch, which is set using the endpoint_snitch property in cassandra.yaml. The snitch is also used to help Cassandra route client requests to the closest nodes to reduce network latency.

As of version 2.0, there are eight available snitch options (and you can write your own as well):

  • SimpleSnitch: This snitch is a companion to the SimpleStrategy replication strategy. It is designed for simple single data center configurations.

  • RackInferringSnitch: As the name implies, this snitch attempts to infer your network topology. Using this snitch is discouraged because it assumes that your IP addressing scheme reflects your data center and rack configuration. For this to work properly, your addresses must be in the following form:

  • PropertyFileSnitch: Using this snitch allows the administrator...

Consistency conflicts


In Chapter 1, Cassandra's Approach to High Availability, we discussed Cassandra's tunable consistency characteristics. For any given call, it is possible to achieve either strong consistency or eventual consistency. In the former case, we can know for certain that the copy of the data that Cassandra returns will be the latest. In the case of eventual consistency, the data returned may or may not be the latest, or there may be no data returned at all if the node is unaware of newly inserted data. Under eventual consistency, it is also possible to see deleted data if the node you're reading from has not yet received the delete request.

Depending on the read_repair_chance setting and the consistency level chosen for the read operation (more on this in the anti-entropy section later in this chapter), Cassandra might block the client and resolve the conflict immediately, or this might occur asynchronously. If data in conflict is never requested, the system will resolve the...

Balancing the replication factor with consistency


There are many considerations when choosing a replication factor, including availability, performance, and consistency. Since our topic is high availability, let's presume your desire is to maintain data availability in the case of node failure.

It's important to understand exactly what your failure tolerance is, and this will likely be different depending on the nature of the data. The definition of failure is probably going to vary among use cases as well, as one case might consider data loss a failure, whereas another accepts data loss as long as all queries return.

Achieving the desired availability, consistency, and performance targets requires coordinating your replication factor with your application's consistency level configurations. In order to assist you in your efforts to achieve this balance, let's consider a single data center cluster of 10 nodes and examine the impact of various configuration combinations:

Summary


In this chapter, we introduced the foundational concepts of replication and consistency. In our discussion, we outlined the importance of the relationship between replication factor and consistency level, and their impact on performance, data consistency, and availability.

By now, you should be able to make sound decisions specific to your use cases. This chapter might serve as a handy reference in the future as it can be challenging to keep all these details in mind.

In the previous two chapters, we've been gradually expanding from how Cassandra locates individual pieces of data to its strategy to replicate it and keep it consistent.

In the next chapter, we'll take things a step further and take a look at its multiple data center capabilities, as no highly available system is truly complete without the ability to distribute itself geographically.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Cassandra High Availability
Published in: Dec 2014Publisher: PacktISBN-13: 9781783989126
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Robbie Strickland

Robbie Strickland has been involved in the Apache Cassandra project since 2010, and he initially went to production with the 0.5 release. He has made numerous contributions over the years, including work on drivers for C# and Scala and multiple contributions to the core Cassandra codebase. In 2013 he became the very first certified Cassandra developer, and in 2014 DataStax selected him as an Apache Cassandra MVP. Robbie has been an active speaker and writer in the Cassandra community and is the founder of the Atlanta Cassandra Users Group. Other examples of his writing can be found on the DataStax blog, and he has presented numerous webinars and conference talks over the years.
Read more about Robbie Strickland

RF

Write CL

Read...