An overview of architecture and modeling in Cassandra

Cassandra Design Patterns


January 2014

$10.00

This book is a fantastic guide to the ins and outs of the Cassandra database solution and how to apply the right design patterns in real-world situations. An essential tutorial for architects and developers.

(For more resources related to this topic, see here.)

Cassandra uses a peer-to-peer architecture, unlike a master-slave architecture, which is prone to single point of failure (SPOF) problems. Cassandra is deployed on multiple machines with each machine acting as a node in a cluster. Data is autosharded, that is, automatically distributed across nodes using key-based sharding, which means that the keys are used to distribute the data across the cluster. Each key-value data element in Cassandra is replicated across the cluster on other nodes (the default replication is 3) for high availability and fault tolerance. If a node goes down, the data can be served from another node having a copy of the original data.

Sharding is an old concept used for distributing data across different systems. Sharding can be horizontal or vertical. In horizontal sharding, in case of RDBMS, data is distributed on the basis of rows, with some rows residing on a single machine and the other rows residing on other machines. Vertical sharding is similar to columnar storage, where columns can be stored separately in different locations.

Hadoop Distributed File Systems (HDFS) use data-volumes-based sharding, where a single big file is sharded and distributed across multiple machines using the block size. So, as an example, if the block size is 64 MB, a 640 MB file will be split into 10 chunks and placed in multiple machines.

The same autosharding capability is used when new nodes are added to Cassandra, where the new node becomes responsible for a specific key range of data. The details of what node holds what key ranges is coordinated and shared across the cluster using the gossip protocol. So, whenever a client wants to access a specific key, each node locates the key and its associated data quickly within a few milliseconds. When the client writes data to the cluster, the data will be written to the nodes responsible for that key range. However, if the node responsible for that key range is down or not reachable, Cassandra uses a clever solution called Hinted Handoff that allows the data to be managed by another node in the cluster and to be written back on the responsible node once that node is back in the cluster.

The replication of data raises the concern of data inconsistency when the replicas might have different states for the same data. Cassandra uses mechanisms such as anti-entropy and read repair for solving this problem and synchronizing data across the replicas. Anti-entropy is used at the time of compaction, where compaction is a concept borrowed from Google BigTable. Compaction in Cassandra refers to the merging of SSTable and helps in optimizing data storage and increasing read performance by reducing the number of seeks across SSTables. Another problem that compaction solves is handling deletion in Cassandra. Unlike traditional RDBMS, all deletes in Cassandra are soft deletes, which means that the records still exist in the underlying data store but are marked with a special flag so that these deleted records do not appear in query results. The records marked as deleted records are called tombstone records. Major compactions handle these soft deletes or tombstones by removing them from the SSTable in the underlying file stores. Cassandra, like Dynamo, uses a Merkle tree data structure to represent the data state at a column family level in a node. This Merkle tree representation is used during major compactions to find the difference in the data states across nodes and reconciled.

The Merkle tree or Hash tree is a data structure in the form of a tree where every non-leaf node is labeled with the hash of children nodes, allowing the efficient and secure verification of the contents of the large data structure.

Cassandra, like Dynamo, falls under the AP part of the CAP theorem and offers a tunable consistency level. Cassandra provides multiple consistency levels, as illustrated in the following table:

Operation

ZERO

ANY

ONE

QUORUM

ALL

Read

Not supported

Not supported

Reads from one node

 

Read from a majority of nodes with replicas

Read from all the nodes with replicas

Write

Asynchronous write

Writes on one node including hints

Writes on one node with commit log and Memtable

Writes on a majority of nodes with replicas

Writes on all the nodes with replicas

A summary of the features in Cassandra

The following table summarizes the key features of Cassandra with respect to its origins in Google BigTable and Amazon Dynamo:

Feature

Cassandra implementation

Google BigTable

Amazon Dynamo

Architecture

Peer-to-peer architecture, ring-based deployment architecture

No

Yes

 

Data model

Multidimensional map

(row,column, timestamp) -> bytes

Yes

 

No

CAP theorem

AP with tunable consistency

No

Yes

 

Storage architecture

SSTable, Memtables

Yes

 

No

Storage layer

Local filesystem storage

No

No

Fast reads and efficient storage

Bloom filters, compactions

Yes

 

No

Programming language

Java

No

Yes

 

Client programming language

Multiple languages supported: Java, PHP, Python, REST, C++, .NET, and so on.

Not known

Not known

Scalability model

Horizontal scalability; multiple nodes deployment than a single machine deployment

Yes

 

Yes

 

Version conflicts

Timestamp field (not a vector clock as usually assumed)

No

No

Hard deletes/updates

Data is always appended using the timestamp field—deletes/updates are soft appends and are cleaned asynchronously as part of major compactions

Yes

 

No

Summary

Cassandra packs the best features of two technologies proven at scale—Google BigTable and Amazon Dynamo. However, today Cassandra has evolved beyond these origins with new unique and enterprise-ready features such as Cassandra Query Language (CQL), support for collection columns, lightweight transactions, and triggers.

Resources for Article:


Further resources on this subject:


Books to Consider

comments powered by Disqus