Packt+ | Advance your knowledge in tech

You're reading from Cassandra 3.x High Availability - Second Edition

Product typeBook

Published inAug 2016

Reading LevelIntermediate

Publisher

ISBN-139781786462107

Edition2nd Edition

Languages

Java

Tools

Cassandra

Concepts

Database Administration

Author (1)

Robbie Strickland

Chapter 6. High Availability Features in the Native Java Client

If you are relatively new to Cassandra, you may be unaware that the native client libraries from DataStax are a recent development. In fact, prior to their introduction there were numerous libraries (and forks of those projects) just for the Java language. Throw in the other languages, each with their own idiosyncrasies, and the situation was really quite dire.

Complicating the scenario was the lack of any universally accepted query mechanism, as Cassandra Query Language (CQL) was initially poorly received. The only real common ground for describing queries and data models was the underlying Thrift protocol. While this worked reasonably well for early adopters, it made assimilation of newer users quite difficult. It is a testament to Cassandra's extraordinary architecture, speed, and scalability that it was able to survive those early days.

After several revisions of CQL, the introduction of a native binary protocol, and DataStax...

Thrift versus the native protocol

Cassandra users fall into two general categories:

Those who have been using it a while and have grown accustomed to working directly with storage rows via a Thrift-based client.
Those who are relatively new to Cassandra and are confused by the role Thrift plays in the modern Cassandra world.

Hopefully we can clear up the confusion and set both groups on the right path. Thrift is a remote procedure call (RPC) mechanism combined with a code generator, and for several years, it formed the underlying protocol layer for clients communicating with Cassandra. This allowed the early developers of Cassandra itself to focus on the database rather than the clients. But, as we hinted at in the introduction, there are numerous negative side effects of this strategy:

There was no common language to describe data models and queries, as each client implemented different abstractions on top of the underlying Thrift protocol.
Thrift was limited to the lowest common denominator...

Setting up the environment

To get the most out of this chapter, you should prepare your development environment with the following prerequisites:

Java Development Kit (JDK) 1.8 for your platform, which can be obtained at http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html .
The Integrated Development Environment (IDE), or text editor of your choice.
Either a local Cassandra installation, or the ability to connect to a remote cluster.
The DataStax native Java driver for your Cassandra version. If you're using Maven for dependency management, add the following lines to your pom.xml file:

      <dependency> 
        <groupId>com.datastax.cassandra</groupId> 
        <artifactId>cassandra-driver-core</artifactId> 
        <version>[version_number]</version> 
     </dependency>

Now that you're set up for coding, let's get familiar with some of the basics of the driver. The first step is to establish a connection to...

Connecting to the cluster

To get connected, you will start by creating a Cluster reference, which you will then construct using a builder pattern. You will specify each additional option by chaining method calls together to produce the desired configuration, then finally calling the build() method to initialize Cluster.

Let's build a cluster that's initialized with a list of possible initial contact points:

private Cluster cluster; // defined at class level 
// you should only build the cluster once per app 
cluster = Cluster.builder() 
   .addContactPoints("10.10.10.1", "10.10.10.2", "10.10.10.3") 
   .build();

Tip

You should only have one instance of Cluster in your application for each physical cluster, as this class controls the list of contact points and key connection policies such as compression, failover, request routing, and retries.

While this basic example will suffice for playing around with the driver locally, the Cluster builder supports a number of additional options that are...

Executing statements

While the Cluster acts as a central place to manage connection-level configuration options, you will need to establish a Session to perform actual work against the cluster. This is done by calling the connect() method on your Cluster instance.

To run the following examples, you will need to create the contacts keyspace and contact table, as follows:

CREATE KEYSPACE contacts  
WITH REPLICATION = { 
   'class' : 'SimpleStrategy',  
   'replication_factor' : 1 
}; 
 
USE contacts; 
 
CREATE TABLE contact ( 
   id UUID, 
   email TEXT PRIMARY KEY 
);

After the schema is created, you can connect to the contacts keyspace:

private Session session; // defined at class level 
session = cluster.connect("contacts");

Once you have created the Session, you will be able to execute CQL statements, as follows:

String insert = "INSERT INTO contact (id, email) " + 
               "VALUES (" + 
               "bd297650-2885-11e4-8c21-0800200c9a66," + 
               "'contact@example.com...

Handling asynchronous requests

Since Cassandra is designed for significant scale, it follows that most applications using it would be designed with similar scalability in mind. One principal characteristic of high performance applications is that they do not block unnecessarily, and instead attempt to maximize available resources.

As previously discussed, one of the downsides to the older Thrift protocol was its lack of support for asynchronous requests. Fortunately, this situation has been remedied with the native driver, making the process of building scalable applications on top of Cassandra significantly easier.

Tip

Blocking on I/O, such as with calls to Cassandra, can cause significant bottlenecks in high-throughput applications. Since a slow application can be the same as a dead application, you should use the asynchronous API to avoid blocking whenever possible.

If you are familiar with the java.util.concurrent package, and the Future class specifically, the asynchronous API will look...

Load balancing

Since Cassandra is a distributed database with the ability to add and remove nodes easily, it's important for the client to be able to send requests to new nodes that join the cluster, or to stop sending requests to removed or dead nodes.

Some databases use special middle-man processes to broker requests to available nodes, thus relieving the client of the requirement to maintain a list of hosts. Since Cassandra is a peer-to-peer system, with no special nodes or broker processes, the client must be aware of the topology of the cluster.

Tip

You should not use a load balancer between the client and Cassandra, as the client handles this via its load balancing policies. Adding a separate load balancer will actually prevent the client from understanding the cluster, which is what allows it to perform many of its duties.

Behind the scenes, the native driver connects to the cluster and learns about the topology of the ring. While legacy Thrift-based clients were able to make use of an...

Tying it all together

In attempting to develop a comprehensive approach to handling failure, we will start by assuming you prefer consistency when possible, but want your application to remain available even if the desired consistency level cannot be satisfied. You are also willing to experience slower client response rather than denying requests.

With these ideas in mind, we can tie the concepts you have learned throughout this chapter together in a policy that answers this demand. Take a look at the following example, which makes use of the previously discussed features:

// defined at class level 
private String localDC = "DC1"; 
private ConsistencyLevel defaultCL =  
 ConsistencyLevel.LOCAL_QUORUM; 
private Cluster cluster; 
 
LoadBalancingPolicy dcPolicy = 
 DCAwareRoundRobinPolicy.builder() 
  .withLocalDc(localDC) 
  .withUsedHostsPerRemoteDc(2) 
  .build(); 
 
// initialized once per application 
cluster = cluster.builder() 
   .addContactPoints("10.10.10.1", "10.10.10.2", "10.10.10...

Summary

In this chapter, you have learned the value of the native driver as a tool to assist you in developing a highly available application built on top of Cassandra. Hopefully it has been apparent that this objective involves a partnership between the application and the database, and that poor decisions on either end can dramatically affect availability.

However, the native driver has a wealth of functionality beyond what has been covered here, so it would be worth your while to spend some time understanding its features and subtleties, as with any new piece of software.

In Chapter 7, Modeling for Availability we will look at another aspect of designing highly available applications in Cassandra. We'll explore how the right data models can make or break your system, and what to do to ensure success.

The rest of the chapter is locked

You have been reading a chapter from

Cassandra 3.x High Availability - Second Edition

Published in: Aug 2016Publisher: ISBN-13: 9781786462107

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Robbie Strickland

Robbie Strickland has been involved in the Apache Cassandra project since 2010, and he initially went to production with the 0.5 release. He has made numerous contributions over the years, including work on drivers for C# and Scala and multiple contributions to the core Cassandra codebase. In 2013 he became the very first certified Cassandra developer, and in 2014 DataStax selected him as an Apache Cassandra MVP. Robbie has been an active speaker and writer in the Cassandra community and is the founder of the Atlanta Cassandra Users Group. Other examples of his writing can be found on the DataStax blog, and he has presented numerous webinars and conference talks over the years.
Read more about Robbie Strickland

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages