Packt+ | Advance your knowledge in tech

You're reading from Cassandra High Availability

Product typeBook

Published inDec 2014

PublisherPackt

ISBN-139781783989126

Edition1st Edition

Tools

Cassandra

Concepts

Database Administration

Author (1)

Robbie Strickland

Chapter 6. High Availability Features in the Native Java Client

If you are relatively new to Cassandra, you may be unaware that the native client libraries from DataStax are a recent development. In fact, prior to their introduction, there were numerous libraries (and forks of those projects) just for the Java language. Throw in the other languages, each with their own idiosyncrasies, and you'd know that the situation was really quite dire.

Complicating the scenario was the lack of any universally accepted query mechanism as CQL was initially poorly received. The only real common ground to describe queries and data models was the underlying Thrift protocol. While this worked reasonably well for early adopters, it made assimilation of newer users quite difficult. It is a testament to Cassandra's extraordinary architecture, speed, and scalability that it was able to survive those early days.

After several revisions of CQL, the introduction of a native binary protocol, and DataStax's work on...

Thrift versus the native protocol

Cassandra users fall into two general categories. The first category consists of those who have been using it for a while and have grown accustomed to working directly with storage rows via a Thrift-based client, and second, those who are relatively new to Cassandra and are confused by the role Thrift plays in the modern Cassandra world. Hopefully, we can clear up the confusion and set both groups on the right path. Thrift is an RPC mechanism combined with a code generator, and for several years it formed the underlying protocol layer for clients communicating with Cassandra. This allowed the early developers of Cassandra itself to focus on the database rather than the clients. But, as we hinted at in the introduction, there are numerous negative side effects of this strategy:

There was no common language to describe data models and queries as each client implemented different abstractions on top of the underlying Thrift protocol.
Thrift was limited to the...

Setting up the environment

To get the most out of this chapter, you should prepare your development environment with the following prerequisites:

Java Development Kit (JDK) 1.7 for your platform, which can be obtained at http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html.
Integrated Development Environment (IDE) or any text editor of your choice.
Either a local Cassandra installation, or the ability to connect to a remote cluster.
The DataStax native Java driver for your Cassandra version. If you're using Maven for dependency management, add the following lines of code to your pom.xml file:
```
<dependency>
  <groupId>com.datastax.cassandra</groupId>
  <artifactId>cassandra-driver-core</artifactId>
  <version>[version_number]</version>
</dependency>
```

If you're using the 1.x driver, you may notice that it has a significant number of dependencies (compared to only four with the 2.x version). For this reason, you should...

Connecting to the cluster

To get connected, start by creating a Cluster reference, which you will construct using a builder pattern. You will specify each additional option by chaining method calls together to produce the desired configuration, then finally, calling the build() method to initialize the Cluster instance.

Let's build a cluster that's initialized with a list of possible initial contact points:

private Cluster cluster; // defined at class level
// you should only build the cluster once per app
cluster = Cluster.builder()
  .addContactPoints("10.10.10.1", "10.10.10.2", "10.10.10.3")
  .build();

Note

You should only have one instance of Cluster in your application for each physical cluster as this class controls the list of contact points and key connection policies such as compression, failover, request routing, and retries.

While this basic example will suffice to play around with the driver locally, the Cluster builder supports a number of additional options that are relevant for...

Executing statements

While the Cluster acts as a central place to manage connection-level configuration options, you will need to establish a Session instance to perform actual work against the cluster. This is done by calling the connect() method on your Cluster instance. Here, we connect to the contacts keyspace:

private Session session; // defined at class level
session = cluster.connect("contacts");

Once you have created the Session, you will be able to execute CQL statements as follows:

String insert = "INSERT INTO contact (id, email) " +
  "VALUES (" +
  "bd297650-2885-11e4-8c21-0800200c9a66," +
  "'contact@example.com' " +
");";
session.execute(insert);

You can submit any valid CQL statement to the execute() method, including schema modifications.

Note

Unless you have a large number of keyspaces, you should create one Session instance for each keyspace in your application, because it provides connection pooling and controls the node selection policy (it uses a round-robin approach by default...

Handling asynchronous requests

Since Cassandra is designed for significant scale, it follows that most applications using it would be designed with similar scalability in mind. One principle characteristic of high performance applications is that they do not block threads unnecessarily, and instead attempt to maximize available resources.

As previously discussed, one of the downsides to the older Thrift protocol was its lack of support for asynchronous requests. Fortunately, this situation has been remedied with the native driver, making the process of building scalable applications on top of Cassandra significantly easier.

Tip

Blocking on I/O, such as with calls to Cassandra, can cause significant bottlenecks in high-throughput applications. Since a slow application can be the same as a dead application, you should use the asynchronous API to avoid blocking whenever possible.

If you are familiar with the java.util.concurrent package, and the Future class specifically, the asynchronous API will...

Load balancing

Since Cassandra is a distributed database with the ability to add and remove nodes easily, it's important for the client to be able to send requests to new nodes that join the cluster, or to stop sending requests to removed or dead nodes.

Some databases use special middleman processes to broker requests to available nodes, thus relieving the client of the requirement to maintain a list of hosts. Since Cassandra is a peer-to-peer system, with no special nodes or broker processes, the client must be aware of the topology of the cluster.

You should not use a load balancer between the client and Cassandra, as the client handles this via its load balancing policies. Adding a separate load balancer will actually prevent the client from understanding the cluster, which is what allows it to perform many of its duties.

Behind the scenes, the native driver connects to the cluster and learns about the topology of the ring. While other legacy Thrift-based clients were able to make use of...

Failing over to a remote data center

The foundation of any robust load balancing strategy is DCAwareRoundRobinPolicy, because we'll assume you will be deploying to more than one data center. However, the implementation hides an interesting failover feature in the constructor overrides, which is worth a look.

In Chapter 4, Data Centers, we discussed several use cases for multiple data centers, with failover being one key scenario. If your desire is to fail over to a backup data center, and should replicas in your client's primary data center fail, you might be interested in the two additional parameters you can pass to the DCAwareRoundRobinPolicy constructor, which are mentioned here:

usedHostsPerRemoteDc: This vaguely named parameter allows you to specify a number of hosts in a remote data center that can be used by this client, should your local data center fail to satisfy the request. Note that by default, this will be ignored for LOCAL_ONE and LOCAL_QUORUM consistency levels.
allowRemoteDCsForLocalConsistencyLevel...

Tying it all together

In attempting to develop a comprehensive approach to handling failure, we will start by assuming that you prefer consistency when possible but want your application to remain available even if the desired consistency level cannot be satisfied; you are also willing to experience slower client response rather than denying requests.

With these ideas in mind, we can tie the concepts you have learned throughout this chapter together in a policy that answers this demand. Take a look at the following example, which makes use of the previously discussed features:

// defined at class level
String localDC = "DC1";
ConsistencyLevel defaultCL = ConsistencyLevel.LOCAL_QUORUM;
private Cluster cluster;

// initialized once per application
cluster = cluster.builder()
  .addContactPoints("10.10.10.1", "10.10.10.2", "10.10.10.3")
  .withRetryPolicy(new LoggingRetryPolicy(
    DowngradingConsistencyRetryPolicy.INSTANCE))
  .withLoadBalancingPolicy(new TokenAwarePolicy(
    new DCAwareRoundRobinPolicy...

Summary

In this chapter, you learned the value of the native driver as a tool to assist you in developing a highly available application built on top of Cassandra. Hopefully, it has been apparent that this objective involves a partnership between the application and the database, and that poor decisions on either end can dramatically affect availability.

However, the native driver has a wealth of functionality beyond what is covered here, so it would be worth your while to spend some time understanding its features and subtleties, as with any new piece of software.

In the next chapter, we will look at another aspect of designing highly available applications on Cassandra. We'll explore how the right data models can make or break your system, and what to do to ensure success.

The rest of the chapter is locked

You have been reading a chapter from

Cassandra High Availability

Published in: Dec 2014Publisher: PacktISBN-13: 9781783989126

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Robbie Strickland

Robbie Strickland has been involved in the Apache Cassandra project since 2010, and he initially went to production with the 0.5 release. He has made numerous contributions over the years, including work on drivers for C# and Scala and multiple contributions to the core Cassandra codebase. In 2013 he became the very first certified Cassandra developer, and in 2014 DataStax selected him as an Apache Cassandra MVP. Robbie has been an active speaker and writer in the Cassandra community and is the founder of the Atlanta Cassandra Users Group. Other examples of his writing can be found on the DataStax blog, and he has presented numerous webinars and conference talks over the years.
Read more about Robbie Strickland

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages