Reader small image

You're reading from  Mastering Apache Cassandra 3.x - Third Edition

Product typeBook
Published inOct 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781789131499
Edition3rd Edition
Languages
Right arrow
Authors (3):
Aaron Ploetz
Aaron Ploetz
author image
Aaron Ploetz

Aaron Ploetz is the NoSQL Engineering Lead for Target, where his DevOps team supports Cassandra, MongoDB, and Neo4j. He has been named a DataStax MVP for Apache Cassandra three times and has presented at multiple events, including the DataStax Summit and Data Day Texas. Aaron earned a BS in Management/Computer Systems from the University of Wisconsin-Whitewater, and an MS in Software Engineering from Regis University. He and his wife, Coriene, live with their three children in the Twin Cities area.
Read more about Aaron Ploetz

Tejaswi Malepati
Tejaswi Malepati
author image
Tejaswi Malepati

Tejaswi Malepati is the Cassandra Tech Lead for Target. He has been instrumental in designing and building custom Cassandra integrations, including a web-based SQL interface and data validation frameworks between Oracle and Cassandra. Tejaswi earned a master's degree in computer science from the University of New Mexico, and a bachelor's degree in electronics and communication from Jawaharlal Nehru Technological University in India. He is passionate about identifying and analyzing data patterns in datasets using R, Python, Spark, Cassandra, and MySQL.
Read more about Tejaswi Malepati

Nishant Neeraj
Nishant Neeraj
author image
Nishant Neeraj

Nishant Neeraj is an independent software developer with experience in developing and planning out architectures for massively scalable data storage and data processing systems. Over the years, he has helped to design and implement a wide variety of products and systems for companies, ranging from small start-ups to large multinational companies. Currently, he helps drive WealthEngine's core product to the next level by leveraging a variety of big data technologies.
Read more about Nishant Neeraj

View More author details
Right arrow

Configuring a Cluster

This chapter will cover planning, configuring, and deploying an Apache Cassandra cluster. By the end of this chapter, you will understand the decisions behind provisioning resources, installing Cassandra, and getting nodes to behave properly while working together to serve large-scale datasets. When appropriate, considerations for cloud deployments will be interjected.

Specifically, this chapter will cover the following topics:

  • Sizing hardware and computer resources for Cassandra deployments
  • Operating system optimizations
  • Tips and suggestions on orchestration
  • Configuring the JVM
  • Configuring Cassandra

At the end of this chapter, you should be able to make good decisions about architecting an Apache Cassandra cluster. You should have an understanding of the target instances and providers to deploy on, and be able to articulate the pros/cons of deploying to...

Evaluating instance requirements

Knowing how to appropriately size hardware for a new Cassandra cluster is a vital step to helping your application team succeed. Instances running Apache Cassandra must have sufficient resources available to be able to support the required operational workload.

One important note about the hardware/instance requirements for Cassandra is that it was designed to run on commodity-level hardware. While some enterprise RDBMS suppliers recommend copious amounts of RAM and several dozen CPU cores on a proprietary chassis, Cassandra can run on much, much less. In fact, Cassandra can be made to run on small to mid-sized cloud instances, or even something as meager as a Raspberry Pi. However, as with most databases, Cassandra will obviously perform better with more resource.

The word instance was chosen here instead of hardware or machine. This is because...

Operating system optimizations

Apache Cassandra has a long-standing presence on Linux-based operating systems (OS), and will run just fine on many flavors of Linux (both RHEL and Debian-based), UNIX, and BSD. As of Apache Cassandra 2.2, Windows is now supported as a host operating system.

For production clusters, Apache Cassandra should be deployed on the most recent, Long Term Support (LTS) release of a Debian or RHEL based Linux.

Cassandra on Windows is still a very new development. If you want to ensure that your cluster is high-performing and problem-free, run it on Linux. The majority of the material in this book assumes that Cassandra is being deployed on Linux.

This book will describe installation variants for Ubuntu 16.04 LTS (Debian), and CentOS 7.4 (RHEL) Linux.

Disable...

Configuring the JVM

Apache Cassandra was written in Java, and therefore requires a JVM to run. Make sure to download a Java Runtime Environment (JRE) or Java Development Kit (JDK) to include as a part of your Cassandra installation. Unless otherwise noted, the latest patch of version 8 of the Oracle JDK or OpenJDK should be used.

At the time of writing, Apache Cassandra is not currently compatible with Java 9 or higher.

Adjustments to the JVM settings used by Cassandra can be made in the jvm.options configuration file. Many of those settings deal with the configuration and tuning of the garbage collector.

Garbage collection

Probably the most noticeable aspect of the JVM is how it manages garbage collection. There are two main...

Configuring Cassandra

Configuring a single node for Apache Cassandra is done in a few files located in the conf directory of the instance's Cassandra installation. Modification of many of these files can be optional for local instance, development deployments (the defaults should suffice). But for production deployments, most of these files should be adjusted.

cassandra.yaml

The cassandra.yaml file is the main configuration file for each node in a Cassandra cluster. Many of the behaviors of a node can be controlled or influenced from this file.

While the settings in cassandra.yaml are specific to the node on which the file resides, some settings do need to be the same throughout the cluster (these will be noted). Failure...

Managing a deployment pipeline

When you start working with large, production-level clusters, having a good orchestration tool can save you a lot of work. After all, building and configuring a three-node cluster is one thing, but building and maintaining a 300 node cluster requires a different approach.

This comes into play when cluster-wide changes must be applied, such as a new SSL certificate or an upgrade to a new patch level. Manual methods, which are fine for the three-node cluster, quickly become untenable at a large scale.

For some Cassandra teams, a collection of Python or shell scripts will suffice for running some of the repeatable parts of their deployment process. But as scale increases and configurations change, this approach relies on the team to adjust the scripts so that they continue to work with changing requirements.

As the problems of maintaining a distributed...

Summary

There certainly is a lot to consider when planning and building a new Apache Cassandra cluster, and this chapter has put forth a great deal of information. We have considered details regarding compute resources, networking, and sizing strategies. Linux operating system adjustments to help optimize Apache Cassandra have also been discussed.

As Cassandra runs on a JVM, we have analyzed approaches to sizing and configuring the Java heap. After that, we examined Apache Cassandra configuration files. Detailed explanations for various properties were put forth, as well as the benefits (and possible drawbacks) that each can provide. Finally, some brief recommendations on deployment strategy were put forward, comparing the difference between configuration management and orchestration.

One last piece of advice, is to test thoroughly. Start the cluster, examine its performance,...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Apache Cassandra 3.x - Third Edition
Published in: Oct 2018Publisher: PacktISBN-13: 9781789131499
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Aaron Ploetz

Aaron Ploetz is the NoSQL Engineering Lead for Target, where his DevOps team supports Cassandra, MongoDB, and Neo4j. He has been named a DataStax MVP for Apache Cassandra three times and has presented at multiple events, including the DataStax Summit and Data Day Texas. Aaron earned a BS in Management/Computer Systems from the University of Wisconsin-Whitewater, and an MS in Software Engineering from Regis University. He and his wife, Coriene, live with their three children in the Twin Cities area.
Read more about Aaron Ploetz

author image
Tejaswi Malepati

Tejaswi Malepati is the Cassandra Tech Lead for Target. He has been instrumental in designing and building custom Cassandra integrations, including a web-based SQL interface and data validation frameworks between Oracle and Cassandra. Tejaswi earned a master's degree in computer science from the University of New Mexico, and a bachelor's degree in electronics and communication from Jawaharlal Nehru Technological University in India. He is passionate about identifying and analyzing data patterns in datasets using R, Python, Spark, Cassandra, and MySQL.
Read more about Tejaswi Malepati

author image
Nishant Neeraj

Nishant Neeraj is an independent software developer with experience in developing and planning out architectures for massively scalable data storage and data processing systems. Over the years, he has helped to design and implement a wide variety of products and systems for companies, ranging from small start-ups to large multinational companies. Currently, he helps drive WealthEngine's core product to the next level by leveraging a variety of big data technologies.
Read more about Nishant Neeraj