Reader small image

You're reading from  Cassandra 3.x High Availability - Second Edition

Product typeBook
Published inAug 2016
Reading LevelIntermediate
Publisher
ISBN-139781786462107
Edition2nd Edition
Languages
Right arrow
Author (1)
Robbie Strickland
Robbie Strickland
author image
Robbie Strickland

Robbie Strickland has been involved in the Apache Cassandra project since 2010, and he initially went to production with the 0.5 release. He has made numerous contributions over the years, including work on drivers for C# and Scala and multiple contributions to the core Cassandra codebase. In 2013 he became the very first certified Cassandra developer, and in 2014 DataStax selected him as an Apache Cassandra MVP. Robbie has been an active speaker and writer in the Cassandra community and is the founder of the Atlanta Cassandra Users Group. Other examples of his writing can be found on the DataStax blog, and he has presented numerous webinars and conference talks over the years.
Read more about Robbie Strickland

Right arrow

Chapter 6.  High Availability Features in the Native Java Client

If you are relatively new to Cassandra, you may be unaware that the native client libraries from DataStax are a recent development. In fact, prior to their introduction there were numerous libraries (and forks of those projects) just for the Java language. Throw in the other languages, each with their own idiosyncrasies, and the situation was really quite dire.

Complicating the scenario was the lack of any universally accepted query mechanism, as Cassandra Query Language (CQL) was initially poorly received. The only real common ground for describing queries and data models was the underlying Thrift protocol. While this worked reasonably well for early adopters, it made assimilation of newer users quite difficult. It is a testament to Cassandra's extraordinary architecture, speed, and scalability that it was able to survive those early days.

After several revisions of CQL, the introduction of a native binary protocol, and DataStax...

Thrift versus the native protocol


Cassandra users fall into two general categories:

  •  Those who have been using it a while and have grown accustomed to working directly with storage rows via a Thrift-based client.

  • Those who are relatively new to Cassandra and are confused by the role Thrift plays in the modern Cassandra world.

Hopefully we can clear up the confusion and set both groups on the right path. Thrift is a remote procedure call (RPC) mechanism combined with a code generator, and for several years, it formed the underlying protocol layer for clients communicating with Cassandra. This allowed the early developers of Cassandra itself to focus on the database rather than the clients. But, as we hinted at in the introduction, there are numerous negative side effects of this strategy:

  • There was no common language to describe data models and queries, as each client implemented different abstractions on top of the underlying Thrift protocol.

  • Thrift was limited to the lowest common denominator...

Setting up the environment


To get the most out of this chapter, you should prepare your development environment with the following prerequisites:

  • Java Development Kit (JDK) 1.8 for your platform, which can be obtained at http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html .

  • The Integrated Development Environment (IDE), or text editor of your choice.

  • Either a local Cassandra installation, or the ability to connect to a remote cluster.

  • The DataStax native Java driver for your Cassandra version. If you're using Maven for dependency management, add the following lines to your pom.xml file:

      <dependency> 
        <groupId>com.datastax.cassandra</groupId> 
        <artifactId>cassandra-driver-core</artifactId> 
        <version>[version_number]</version> 
     </dependency> 

Now that you're set up for coding, let's get familiar with some of the basics of the driver. The first step is to establish a connection to...

Connecting to the cluster


To get connected, you will start by creating a Cluster reference, which you will then construct using a builder pattern. You will specify each additional option by chaining method calls together to produce the desired configuration, then finally calling the build() method to initialize Cluster.

Let's build a cluster that's initialized with a list of possible initial contact points:

private Cluster cluster; // defined at class level 
// you should only build the cluster once per app 
cluster = Cluster.builder() 
   .addContactPoints("10.10.10.1", "10.10.10.2", "10.10.10.3") 
   .build(); 

Tip

You should only have one instance of Cluster in your application for each physical cluster, as this class controls the list of contact points and key connection policies such as compression, failover, request routing, and retries.

While this basic example will suffice for playing around with the driver locally, the Cluster builder supports a number of additional options that are...

Executing statements


While the Cluster acts as a central place to manage connection-level configuration options, you will need to establish a Session to perform actual work against the cluster. This is done by calling the connect() method on your Cluster instance.

To run the following examples, you will need to create the contacts keyspace and contact table, as follows:

CREATE KEYSPACE contacts  
WITH REPLICATION = { 
   'class' : 'SimpleStrategy',  
   'replication_factor' : 1 
}; 
 
USE contacts; 
 
CREATE TABLE contact ( 
   id UUID, 
   email TEXT PRIMARY KEY 
); 

After the schema is created, you can connect to the contacts keyspace:

private Session session; // defined at class level 
session = cluster.connect("contacts"); 

Once you have created the Session, you will be able to execute CQL statements, as follows:

String insert = "INSERT INTO contact (id, email) " + 
               "VALUES (" + 
               "bd297650-2885-11e4-8c21-0800200c9a66," + 
               "'contact@example.com...

Handling asynchronous requests


Since Cassandra is designed for significant scale, it follows that most applications using it would be designed with similar scalability in mind. One principal characteristic of high performance applications is that they do not block unnecessarily, and instead attempt to maximize available resources.

As previously discussed, one of the downsides to the older Thrift protocol was its lack of support for asynchronous requests. Fortunately, this situation has been remedied with the native driver, making the process of building scalable applications on top of Cassandra significantly easier.

Tip

Blocking on I/O, such as with calls to Cassandra, can cause significant bottlenecks in high-throughput applications. Since a slow application can be the same as a dead application, you should use the asynchronous API to avoid blocking whenever possible.

If you are familiar with the java.util.concurrent package, and the Future class specifically, the asynchronous API will look...

Load balancing


Since Cassandra is a distributed database with the ability to add and remove nodes easily, it's important for the client to be able to send requests to new nodes that join the cluster, or to stop sending requests to removed or dead nodes.

Some databases use special middle-man processes to broker requests to available nodes, thus relieving the client of the requirement to maintain a list of hosts. Since Cassandra is a peer-to-peer system, with no special nodes or broker processes, the client must be aware of the topology of the cluster.

Tip

You should not use a load balancer between the client and Cassandra, as the client handles this via its load balancing policies. Adding a separate load balancer will actually prevent the client from understanding the cluster, which is what allows it to perform many of its duties.

Behind the scenes, the native driver connects to the cluster and learns about the topology of the ring. While legacy Thrift-based clients were able to make use of an...

Tying it all together


In attempting to develop a comprehensive approach to handling failure, we will start by assuming you prefer consistency when possible, but want your application to remain available even if the desired consistency level cannot be satisfied. You are also willing to experience slower client response rather than denying requests.

With these ideas in mind, we can tie the concepts you have learned throughout this chapter together in a policy that answers this demand. Take a look at the following example, which makes use of the previously discussed features:

// defined at class level 
private String localDC = "DC1"; 
private ConsistencyLevel defaultCL =  
 ConsistencyLevel.LOCAL_QUORUM; 
private Cluster cluster; 
 
LoadBalancingPolicy dcPolicy = 
 DCAwareRoundRobinPolicy.builder() 
  .withLocalDc(localDC) 
  .withUsedHostsPerRemoteDc(2) 
  .build(); 
 
// initialized once per application 
cluster = cluster.builder() 
   .addContactPoints("10.10.10.1", "10.10.10.2", "10.10.10...

Summary


In this chapter, you have learned the value of the native driver as a tool to assist you in developing a highly available application built on top of Cassandra. Hopefully it has been apparent that this objective involves a partnership between the application and the database, and that poor decisions on either end can dramatically affect availability.

However, the native driver has a wealth of functionality beyond what has been covered here, so it would be worth your while to spend some time understanding its features and subtleties, as with any new piece of software.

In Chapter 7, Modeling for Availability we will look at another aspect of designing highly available applications in Cassandra. We'll explore how the right data models can make or break your system, and what to do to ensure success.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Cassandra 3.x High Availability - Second Edition
Published in: Aug 2016Publisher: ISBN-13: 9781786462107
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Robbie Strickland

Robbie Strickland has been involved in the Apache Cassandra project since 2010, and he initially went to production with the 0.5 release. He has made numerous contributions over the years, including work on drivers for C# and Scala and multiple contributions to the core Cassandra codebase. In 2013 he became the very first certified Cassandra developer, and in 2014 DataStax selected him as an Apache Cassandra MVP. Robbie has been an active speaker and writer in the Cassandra community and is the founder of the Atlanta Cassandra Users Group. Other examples of his writing can be found on the DataStax blog, and he has presented numerous webinars and conference talks over the years.
Read more about Robbie Strickland