Packt+ | Advance your knowledge in tech

You're reading from Cassandra 3.x High Availability - Second Edition

Product typeBook

Published inAug 2016

Reading LevelIntermediate

Publisher

ISBN-139781786462107

Edition2nd Edition

Languages

Java

Tools

Cassandra

Concepts

Database Administration

Author (1)

Robbie Strickland

Chapter 5. Scaling Out

In the old days, a significant increase in system traffic would cause excitement for the sales organization and strike fear in the hearts of the operations team. Fortunately, Cassandra makes the process of scaling out a relatively pain-free affair, so both your sales and operations teams can enjoy the fruits of your success.

This chapter will give you a complete rundown of the processes, tools, and design considerations when adding nodes or data centers to your topology. We will cover the following topics:

Choosing the right hardware configuration
Scaling out versus scaling up
Adding nodes
The bootstrapping process
Adding a data center
How to size your cluster correctly

It should go without saying that making proper choices regarding the underlying infrastructure is a key component to achieving good performance and high availability. Conversely, poor choices can lead to a host of issues, and recovery can sometimes be difficult.

Let's begin the chapter with some guidance on...

Choosing the right hardware configuration

There are a number of points to consider when deciding on a node configuration, including disk sizes, memory requirements, and number of processor cores. The right choices here depend quite a bit on your use case and whether you are on physical or virtual infrastructure, but we will discuss some general guidelines here.

Since Cassandra is designed to be deployed in large-scale clusters on commodity hardware, an important consideration is whether to use fewer large nodes or a greater number of smaller nodes.

Regardless of whether you're using physical or virtual machines, there are a few key principles to keep in mind:

More RAM equals faster reads, so the more you have the better they will perform. This is because Cassandra can take advantage of its cache capabilities as well as larger memory tables. More space for memory tables means fewer scans to the on-disk SSTables. More memory also results in better filesystem caching, which reduces disk operations...

Scaling out versus scaling up

So you know it's time to add more muscle to your cluster, but how do you know whether to scale up or out?

If you're not familiar with the difference, scaling up refers to converting existing infrastructure to better or more robust hardware (or instance types in cloud environments). This could mean adding storage capacity, increasing memory, moving to newer machines with more cores, and so on.

Scaling out simply means adding more machines that roughly match the specifications of the existing machines. Since Cassandra scales linearly with its peer-to-peer architecture, scaling out is often more desirable.

Tip

In general, it is better to replace physical hardware components incrementally rather than all at one time. This is because in large systems failures tend to come after hardware ages to a certain point, which is statistically likely to happen simultaneously for some subset of your nodes. For example, purchasing a large lot of drives from a single source at one...

Growing your cluster

The process of adding a node to an existing Cassandra cluster ranges from trivial when vnodes are used to somewhat tedious if you are manually assigning tokens. Let's start with the manual case, as the vnodes process is a subset of this.

Adding nodes without vnodes

As previously mentioned, the procedure for adding a node to a cluster without vnodes enabled is straightforward, if not a bit tedious. In general, you should add one node at a time, unless you're able to double the size of the cluster. Doubling removes the need to reassign tokens, as Cassandra's default of bisecting another node's range will be sufficient. The first step is to determine the new total cluster size, then compute tokens for all nodes.

To compute tokens, follow the DataStax documentation at http://www.datastax.com/documentation/cassandra/1.2/cassandra/configuration/configGenTokens_c.html . There are also several useful online tools to help you, such as this one at http://www.geroba.com/cassandra...

How to scale up

Properly scaling up your Cassandra cluster is not a difficult process, but it does require you to carefully follow established procedures to avoid undesirable side effects. There are two general approaches to consider:

Upgrade in place: To upgrade in place involves taking each node out of the ring, one at a time, bringing its new replacement online, and allowing the new node to bootstrap. This choice makes the most sense if a subset of your cluster needs upgrading rather than an entire data center. This assumes, of course, that your replication factor is greater than ONE. To upgrade an entire data center, it may be preferable to allow replication to automatically build the new nodes.
Using data center replication: Since Cassandra already supports bringing up another data center via replication, you can use this mechanism to populate your new hardware with existing data and then switch to the new data center when replication is complete.

Upgrading in place

If you have determined...

Removing nodes

While the material in this chapter is primarily focused on adding capacity to your cluster, there may be times when reducing capacity is what you're hoping to accomplish. There are a number of valid reasons for doing this. Perhaps you're experiencing smaller transaction volumes than originally anticipated for a new application, or maybe you've changed your data retention plan. In some cases you may want to move to a smaller cluster with more capable nodes, especially in cloud environments where this transition is made easier.

Regardless of your reasons for doing so, knowing how to remove nodes from your cluster will certainly come in handy at some point in your Cassandra experience. Let's take a look at this process now.

Removing nodes within a data center

Fortunately, the process for removing a node is quite simple:

Run nodetool repair on all your keyspaces. This will ensure that any updates which may be present only on the node you're removing will be preserved in the remaining...

Other data migration scenarios

At times you may need to migrate large amounts of data from one cluster to another. A common reason for this is the need to transition data between networks that cannot see each other, or moving from classic Amazon EC2 to a newer Virtual Private Cloud (VPC) infrastructure.

If you find yourself in this situation, you can use these steps to ensure a smooth transition to the new infrastructure:

Set up your new cluster using the information you learned from this chapter, configure your cluster, and duplicate the schema from your existing cluster.
Change your application to write to both clusters. This is certainly the most significant change, as it likely requires code changes in your application.
Verify you are receiving writes to both clusters to avoid potential data loss.
Create a snapshot of your old cluster using the nodetool snapshot command.
Load the snapshot data into your new cluster using the sstableloader command. This command actually streams the data into...

Snitch changes

As you should recall from Chapter 4, Data Centers, the snitch tells Cassandra what your network topology looks like, and therefore, affects data placement in the cluster. If you haven't inserted any data, you can change the snitch without consequence. Otherwise multiple steps are required, as is a full cluster restart, which will result in downtime.

The following procedure should be used to change snitches:

Update your topology properties files, which means cassandra-topology.properties or cassandra-rackdc.properties, depending on which snitch you specify. In the case of the PropertyFileSnitch, make sure all nodes have the same file. For GossipingPropertyFileSnitch or EC2MultiRegionSnitch, each node should have a file indicating its place in the topology.
Update the snitch in cassandra.yaml. You will need to do this for every node in the cluster.
Restart all nodes, one at a time. Any time you make a change to cassandra.yaml, you must restart the node.
Change the replication strategy...

Summary

This chapter has covered quite a few procedures for handling a variety of cluster changes, from adding a single node, to expanding with a new data center, to migrating your entire cluster.

While it would be unreasonable to expect anyone to commit all these processes to memory, let this chapter serve as a reference for the times when these sometimes rare events occur. And perhaps most importantly, take note of these scenarios so you can know when it's time to read the manual rather than just trying to figure it out on your own. Distributed databases can be wonderful when handled correctly, but quite unforgiving when misused.

We've spent the last five chapters looking at a variety of mostly administrative and design related concepts, but now it's time to dig in and look at some application code. In the next chapter, we will take a look at the native client library (specifically the Java variant, although there are also drivers for C# and Python that follow similar principles).

The native...

The rest of the chapter is locked

You have been reading a chapter from

Cassandra 3.x High Availability - Second Edition

Published in: Aug 2016Publisher: ISBN-13: 9781786462107

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Author (1)

Robbie Strickland

Robbie Strickland has been involved in the Apache Cassandra project since 2010, and he initially went to production with the 0.5 release. He has made numerous contributions over the years, including work on drivers for C# and Scala and multiple contributions to the core Cassandra codebase. In 2013 he became the very first certified Cassandra developer, and in 2014 DataStax selected him as an Apache Cassandra MVP. Robbie has been an active speaker and writer in the Cassandra community and is the founder of the Atlanta Cassandra Users Group. Other examples of his writing can be found on the DataStax blog, and he has presented numerous webinars and conference talks over the years.
Read more about Robbie Strickland

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages