Reader small image

You're reading from  Cassandra High Availability

Product typeBook
Published inDec 2014
PublisherPackt
ISBN-139781783989126
Edition1st Edition
Right arrow
Author (1)
Robbie Strickland
Robbie Strickland
author image
Robbie Strickland

Robbie Strickland has been involved in the Apache Cassandra project since 2010, and he initially went to production with the 0.5 release. He has made numerous contributions over the years, including work on drivers for C# and Scala and multiple contributions to the core Cassandra codebase. In 2013 he became the very first certified Cassandra developer, and in 2014 DataStax selected him as an Apache Cassandra MVP. Robbie has been an active speaker and writer in the Cassandra community and is the founder of the Atlanta Cassandra Users Group. Other examples of his writing can be found on the DataStax blog, and he has presented numerous webinars and conference talks over the years.
Read more about Robbie Strickland

Right arrow

Chapter 6. High Availability Features in the Native Java Client

If you are relatively new to Cassandra, you may be unaware that the native client libraries from DataStax are a recent development. In fact, prior to their introduction, there were numerous libraries (and forks of those projects) just for the Java language. Throw in the other languages, each with their own idiosyncrasies, and you'd know that the situation was really quite dire.

Complicating the scenario was the lack of any universally accepted query mechanism as CQL was initially poorly received. The only real common ground to describe queries and data models was the underlying Thrift protocol. While this worked reasonably well for early adopters, it made assimilation of newer users quite difficult. It is a testament to Cassandra's extraordinary architecture, speed, and scalability that it was able to survive those early days.

After several revisions of CQL, the introduction of a native binary protocol, and DataStax's work on...

Thrift versus the native protocol


Cassandra users fall into two general categories. The first category consists of those who have been using it for a while and have grown accustomed to working directly with storage rows via a Thrift-based client, and second, those who are relatively new to Cassandra and are confused by the role Thrift plays in the modern Cassandra world. Hopefully, we can clear up the confusion and set both groups on the right path. Thrift is an RPC mechanism combined with a code generator, and for several years it formed the underlying protocol layer for clients communicating with Cassandra. This allowed the early developers of Cassandra itself to focus on the database rather than the clients. But, as we hinted at in the introduction, there are numerous negative side effects of this strategy:

  • There was no common language to describe data models and queries as each client implemented different abstractions on top of the underlying Thrift protocol.

  • Thrift was limited to the...

Setting up the environment


To get the most out of this chapter, you should prepare your development environment with the following prerequisites:

  • Java Development Kit (JDK) 1.7 for your platform, which can be obtained at http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html.

  • Integrated Development Environment (IDE) or any text editor of your choice.

  • Either a local Cassandra installation, or the ability to connect to a remote cluster.

  • The DataStax native Java driver for your Cassandra version. If you're using Maven for dependency management, add the following lines of code to your pom.xml file:

    <dependency>
      <groupId>com.datastax.cassandra</groupId>
      <artifactId>cassandra-driver-core</artifactId>
      <version>[version_number]</version>
    </dependency>

If you're using the 1.x driver, you may notice that it has a significant number of dependencies (compared to only four with the 2.x version). For this reason, you should...

Connecting to the cluster


To get connected, start by creating a Cluster reference, which you will construct using a builder pattern. You will specify each additional option by chaining method calls together to produce the desired configuration, then finally, calling the build() method to initialize the Cluster instance.

Let's build a cluster that's initialized with a list of possible initial contact points:

private Cluster cluster; // defined at class level
// you should only build the cluster once per app
cluster = Cluster.builder()
  .addContactPoints("10.10.10.1", "10.10.10.2", "10.10.10.3")
  .build();

Note

You should only have one instance of Cluster in your application for each physical cluster as this class controls the list of contact points and key connection policies such as compression, failover, request routing, and retries.

While this basic example will suffice to play around with the driver locally, the Cluster builder supports a number of additional options that are relevant for...

Executing statements


While the Cluster acts as a central place to manage connection-level configuration options, you will need to establish a Session instance to perform actual work against the cluster. This is done by calling the connect() method on your Cluster instance. Here, we connect to the contacts keyspace:

private Session session; // defined at class level
session = cluster.connect("contacts");

Once you have created the Session, you will be able to execute CQL statements as follows:

String insert = "INSERT INTO contact (id, email) " +
  "VALUES (" +
  "bd297650-2885-11e4-8c21-0800200c9a66," +
  "'contact@example.com' " +
");";
session.execute(insert);

You can submit any valid CQL statement to the execute() method, including schema modifications.

Note

Unless you have a large number of keyspaces, you should create one Session instance for each keyspace in your application, because it provides connection pooling and controls the node selection policy (it uses a round-robin approach by default...

Handling asynchronous requests


Since Cassandra is designed for significant scale, it follows that most applications using it would be designed with similar scalability in mind. One principle characteristic of high performance applications is that they do not block threads unnecessarily, and instead attempt to maximize available resources.

As previously discussed, one of the downsides to the older Thrift protocol was its lack of support for asynchronous requests. Fortunately, this situation has been remedied with the native driver, making the process of building scalable applications on top of Cassandra significantly easier.

Tip

Blocking on I/O, such as with calls to Cassandra, can cause significant bottlenecks in high-throughput applications. Since a slow application can be the same as a dead application, you should use the asynchronous API to avoid blocking whenever possible.

If you are familiar with the java.util.concurrent package, and the Future class specifically, the asynchronous API will...

Load balancing


Since Cassandra is a distributed database with the ability to add and remove nodes easily, it's important for the client to be able to send requests to new nodes that join the cluster, or to stop sending requests to removed or dead nodes.

Some databases use special middleman processes to broker requests to available nodes, thus relieving the client of the requirement to maintain a list of hosts. Since Cassandra is a peer-to-peer system, with no special nodes or broker processes, the client must be aware of the topology of the cluster.

You should not use a load balancer between the client and Cassandra, as the client handles this via its load balancing policies. Adding a separate load balancer will actually prevent the client from understanding the cluster, which is what allows it to perform many of its duties.

Behind the scenes, the native driver connects to the cluster and learns about the topology of the ring. While other legacy Thrift-based clients were able to make use of...

Failing over to a remote data center


The foundation of any robust load balancing strategy is DCAwareRoundRobinPolicy, because we'll assume you will be deploying to more than one data center. However, the implementation hides an interesting failover feature in the constructor overrides, which is worth a look.

In Chapter 4, Data Centers, we discussed several use cases for multiple data centers, with failover being one key scenario. If your desire is to fail over to a backup data center, and should replicas in your client's primary data center fail, you might be interested in the two additional parameters you can pass to the DCAwareRoundRobinPolicy constructor, which are mentioned here:

  • usedHostsPerRemoteDc: This vaguely named parameter allows you to specify a number of hosts in a remote data center that can be used by this client, should your local data center fail to satisfy the request. Note that by default, this will be ignored for LOCAL_ONE and LOCAL_QUORUM consistency levels.

  • allowRemoteDCsForLocalConsistencyLevel...

Tying it all together


In attempting to develop a comprehensive approach to handling failure, we will start by assuming that you prefer consistency when possible but want your application to remain available even if the desired consistency level cannot be satisfied; you are also willing to experience slower client response rather than denying requests.

With these ideas in mind, we can tie the concepts you have learned throughout this chapter together in a policy that answers this demand. Take a look at the following example, which makes use of the previously discussed features:

// defined at class level
String localDC = "DC1";
ConsistencyLevel defaultCL = ConsistencyLevel.LOCAL_QUORUM;
private Cluster cluster;

// initialized once per application
cluster = cluster.builder()
  .addContactPoints("10.10.10.1", "10.10.10.2", "10.10.10.3")
  .withRetryPolicy(new LoggingRetryPolicy(
    DowngradingConsistencyRetryPolicy.INSTANCE))
  .withLoadBalancingPolicy(new TokenAwarePolicy(
    new DCAwareRoundRobinPolicy...

Summary


In this chapter, you learned the value of the native driver as a tool to assist you in developing a highly available application built on top of Cassandra. Hopefully, it has been apparent that this objective involves a partnership between the application and the database, and that poor decisions on either end can dramatically affect availability.

However, the native driver has a wealth of functionality beyond what is covered here, so it would be worth your while to spend some time understanding its features and subtleties, as with any new piece of software.

In the next chapter, we will look at another aspect of designing highly available applications on Cassandra. We'll explore how the right data models can make or break your system, and what to do to ensure success.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Cassandra High Availability
Published in: Dec 2014Publisher: PacktISBN-13: 9781783989126
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Robbie Strickland

Robbie Strickland has been involved in the Apache Cassandra project since 2010, and he initially went to production with the 0.5 release. He has made numerous contributions over the years, including work on drivers for C# and Scala and multiple contributions to the core Cassandra codebase. In 2013 he became the very first certified Cassandra developer, and in 2014 DataStax selected him as an Apache Cassandra MVP. Robbie has been an active speaker and writer in the Cassandra community and is the founder of the Atlanta Cassandra Users Group. Other examples of his writing can be found on the DataStax blog, and he has presented numerous webinars and conference talks over the years.
Read more about Robbie Strickland