You're reading from Mastering Apache Cassandra 3.x - Third Edition

Product typeBook

Published inOct 2018

Reading LevelIntermediate

PublisherPackt

ISBN-139781789131499

Edition3rd Edition

Languages

Java

Tools

Cassandra

Concepts

Database Programming

Authors (3):

Aaron Ploetz

Tejaswi Malepati

Nishant Neeraj

View More author details

Configuring a Cluster

This chapter will cover planning, configuring, and deploying an Apache Cassandra cluster. By the end of this chapter, you will understand the decisions behind provisioning resources, installing Cassandra, and getting nodes to behave properly while working together to serve large-scale datasets. When appropriate, considerations for cloud deployments will be interjected.

Specifically, this chapter will cover the following topics:

Sizing hardware and computer resources for Cassandra deployments
Operating system optimizations
Tips and suggestions on orchestration
Configuring the JVM
Configuring Cassandra

At the end of this chapter, you should be able to make good decisions about architecting an Apache Cassandra cluster. You should have an understanding of the target instances and providers to deploy on, and be able to articulate the pros/cons of deploying to...

Evaluating instance requirements

Knowing how to appropriately size hardware for a new Cassandra cluster is a vital step to helping your application team succeed. Instances running Apache Cassandra must have sufficient resources available to be able to support the required operational workload.

One important note about the hardware/instance requirements for Cassandra is that it was designed to run on commodity-level hardware. While some enterprise RDBMS suppliers recommend copious amounts of RAM and several dozen CPU cores on a proprietary chassis, Cassandra can run on much, much less. In fact, Cassandra can be made to run on small to mid-sized cloud instances, or even something as meager as a Raspberry Pi. However, as with most databases, Cassandra will obviously perform better with more resource.

The word instance was chosen here instead of hardware or machine. This is because...

Operating system optimizations

Apache Cassandra has a long-standing presence on Linux-based operating systems (OS), and will run just fine on many flavors of Linux (both RHEL and Debian-based), UNIX, and BSD. As of Apache Cassandra 2.2, Windows is now supported as a host operating system.

For production clusters, Apache Cassandra should be deployed on the most recent, Long Term Support (LTS) release of a Debian or RHEL based Linux.

Cassandra on Windows is still a very new development. If you want to ensure that your cluster is high-performing and problem-free, run it on Linux. The majority of the material in this book assumes that Cassandra is being deployed on Linux.

This book will describe installation variants for Ubuntu 16.04 LTS (Debian), and CentOS 7.4 (RHEL) Linux.

Disable...

Configuring the JVM

Apache Cassandra was written in Java, and therefore requires a JVM to run. Make sure to download a Java Runtime Environment (JRE) or Java Development Kit (JDK) to include as a part of your Cassandra installation. Unless otherwise noted, the latest patch of version 8 of the Oracle JDK or OpenJDK should be used.

At the time of writing, Apache Cassandra is not currently compatible with Java 9 or higher.

Adjustments to the JVM settings used by Cassandra can be made in the jvm.options configuration file. Many of those settings deal with the configuration and tuning of the garbage collector.

Garbage collection

Probably the most noticeable aspect of the JVM is how it manages garbage collection. There are two main...

Configuring Cassandra

Configuring a single node for Apache Cassandra is done in a few files located in the conf directory of the instance's Cassandra installation. Modification of many of these files can be optional for local instance, development deployments (the defaults should suffice). But for production deployments, most of these files should be adjusted.

cassandra.yaml

The cassandra.yaml file is the main configuration file for each node in a Cassandra cluster. Many of the behaviors of a node can be controlled or influenced from this file.

While the settings in cassandra.yaml are specific to the node on which the file resides, some settings do need to be the same throughout the cluster (these will be noted). Failure...

Managing a deployment pipeline

When you start working with large, production-level clusters, having a good orchestration tool can save you a lot of work. After all, building and configuring a three-node cluster is one thing, but building and maintaining a 300 node cluster requires a different approach.

This comes into play when cluster-wide changes must be applied, such as a new SSL certificate or an upgrade to a new patch level. Manual methods, which are fine for the three-node cluster, quickly become untenable at a large scale.

For some Cassandra teams, a collection of Python or shell scripts will suffice for running some of the repeatable parts of their deployment process. But as scale increases and configurations change, this approach relies on the team to adjust the scripts so that they continue to work with changing requirements.

As the problems of maintaining a distributed...

Summary

There certainly is a lot to consider when planning and building a new Apache Cassandra cluster, and this chapter has put forth a great deal of information. We have considered details regarding compute resources, networking, and sizing strategies. Linux operating system adjustments to help optimize Apache Cassandra have also been discussed.

As Cassandra runs on a JVM, we have analyzed approaches to sizing and configuring the Java heap. After that, we examined Apache Cassandra configuration files. Detailed explanations for various properties were put forth, as well as the benefits (and possible drawbacks) that each can provide. Finally, some brief recommendations on deployment strategy were put forward, comparing the difference between configuration management and orchestration.

One last piece of advice, is to test thoroughly. Start the cluster, examine its performance,...

The rest of the chapter is locked

You have been reading a chapter from

Mastering Apache Cassandra 3.x - Third Edition

Published in: Oct 2018Publisher: PacktISBN-13: 9781789131499

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Aaron Ploetz

Aaron Ploetz is the NoSQL Engineering Lead for Target, where his DevOps team supports Cassandra, MongoDB, and Neo4j. He has been named a DataStax MVP for Apache Cassandra three times and has presented at multiple events, including the DataStax Summit and Data Day Texas. Aaron earned a BS in Management/Computer Systems from the University of Wisconsin-Whitewater, and an MS in Software Engineering from Regis University. He and his wife, Coriene, live with their three children in the Twin Cities area.
Read more about Aaron Ploetz

Tejaswi Malepati

Tejaswi Malepati is the Cassandra Tech Lead for Target. He has been instrumental in designing and building custom Cassandra integrations, including a web-based SQL interface and data validation frameworks between Oracle and Cassandra. Tejaswi earned a master's degree in computer science from the University of New Mexico, and a bachelor's degree in electronics and communication from Jawaharlal Nehru Technological University in India. He is passionate about identifying and analyzing data patterns in datasets using R, Python, Spark, Cassandra, and MySQL.
Read more about Tejaswi Malepati

Nishant Neeraj

Nishant Neeraj is an independent software developer with experience in developing and planning out architectures for massively scalable data storage and data processing systems. Over the years, he has helped to design and implement a wide variety of products and systems for companies, ranging from small start-ups to large multinational companies. Currently, he helps drive WealthEngine's core product to the next level by leveraging a variety of big data technologies.
Read more about Nishant Neeraj

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages