Apache Cassandra: Working in Multiple Datacenter Environments

Exclusive offer: get 50% off this eBook here
Cassandra High Performance Cookbook

Cassandra High Performance Cookbook — Save 50%

Over 150 recipes to design and optimize large scale Apache Cassandra deployments

$26.99    $13.50
by Edward Capriolo | July 2011 | Cookbooks Open Source

Apache Cassandra is a fault-tolerant, distributed data store which offers linear scalability allowing it to be a storage platform for large high volume websites.

The tunable consistency model of Cassandra extends beyond a single datacenter to complex multiple datacenter scenarios. This article by Edward Capriolo, author of Cassandra High Performance Cookbook, discusses the features inside Cassandra that are designed for this type of deployment.

 

Cassandra High Performance Cookbook

Cassandra High Performance Cookbook

Over 150 recipes to design and optimize large scale Apache Cassandra deployments

        Read more about this book      

(For more resources related to this subject, see here.)

Changing debugging to determine where read operations are being routed

Cassandra replicates data to multiple nodes; because of this, a read operation can be served by multiple nodes. If a read at QUORUM or higher is submitted, a Read Repair is executed, and the read operation will involve more than a single server. In a simple flat network which nodes have chosen for digest reads, are not of much consequence. However, in multiple datacenter or multiple switch environments, having a read cross a switch or a slower WAN link between datacenters can add milliseconds of latency. This recipe shows how to debug the read path to see if reads are being routed as expected.

How to do it...

  1. Edit <cassandra_home>/conf/log4j-server.properties and set the logger to debug, then restart the Cassandra process:

    log4j.rootLogger=DEBUG,stdout,R

  2. On one display, use the tail -f <cassandra_log_dir>/system.log to follow the Cassandra log:

    DEBUG 06:07:35,060 insert writing local
    RowMutation(keyspace='ks1', key='65', modifications=[cf1])
    DEBUG 06:07:35,062 applying mutation of row 65

  3. In another display, open an instance of the Cassandra CLI and use it to insert data. Remember, when using RandomPartitioner, try different keys until log events display on the node you are monitoring:

    [default@ks1] set cf1[‘e'][‘mycolumn']='value';
    Value inserted.

  4. Fetch the column using the CLI:

    [default@ks1] get cf1[‘e'][‘mycolumn'];

    Debugging messages should be displayed in the log.

    DEBUG 06:08:35,917 weakread reading SliceByNamesReadComman
    d(table='ks1', key=65, columnParent='QueryPath(columnFami
    lyName='cf1', superColumnName='null', columnName='null')',
    columns=[6d79636f6c756d6e,]) locally
    ...
    DEBUG 06:08:35,919 weakreadlocal reading SliceByNamesReadCo
    mmand(table='ks1', key=65, columnParent='QueryPath(columnFa
    milyName='cf1', superColumnName='null', columnName='null')',
    columns=[6d79636f6c756d6e,])

How it works...

Changing the logging property level to DEBUG causes Cassandra to print information as it is handling reads internally. This is helpful when troubleshooting a snitch or when using the consistency levels such as LOCAL_QUORUM or EACH_QUORUM, which route requests based on network topologies.

Using IPTables to simulate complex network scenarios in a local environment

While it is possible to simulate network failures by shutting down Cassandra instances, another failure you may wish to simulate is a failure that partitions your network. A failure in which multiple systems are UP but cannot communicate with each other is commonly referred to as a split brain scenario. This state could happen if the uplink between switches fails or the connectivity between two datacenters is lost.

Getting ready

When editing any firewall, it is important to have a backup copy. Testing on a remote machine is risky as an incorrect configuration could render your system unreachable.

How to do it...

  1. Review your iptables configuration found in /etc/sysconfig/iptables. Typically, an IPTables configuration accepts loopback traffic:

    :RH-Firewall-1-INPUT - [0:0]
    -A INPUT -j RH-Firewall-1-INPUT
    -A FORWARD -j RH-Firewall-1-INPUT
    -A RH-Firewall-1-INPUT -i lo -j ACCEPT

  2. Remove the highlighted rule and restart IPTables. This should prevent instances of Cassandra on your machine from communicating with each other:

    #/etc/init.d/iptables restart

  3. Add a rule to allow a Cassandra instance running on 10.0.1.1 communicate to 10.0.1.2:

    -A RH-Firewall-1-INPUT -m state --state NEW -s 10.0.1.1 -d
    10.0.1.2 -j ACCEPT

How it works...

IPTables is a complete firewall that is a standard part of current Linux kernel. It has extensible rules that can permit or deny traffic based on many attributes, including, but not limited to, source IP, destination IP, source port, and destination port. This recipe uses the traffic blocking features to simulate network failures, which can be used to test how Cassandra will operate with network failures.

Choosing IP addresses to work with RackInferringSnitch

A snitch is Cassandra's way of mapping a node to a physical location in the network. It helps determine the location of a node relative to another node in order to ensure efficient request routing. The RackInferringSnitch can only be used if your network IP allocation is divided along octets in your IP address.

Getting ready

The following network diagram demonstrates a network layout that would be ideal for RackInferringSnitch.

Apache Cassandra: Working in Multiple Datacenter Environments

How to do it...

  1. In the <cassandra_home>/conf/cassandra.yaml file:

    endpoint_snitch: org.apache.cassandra.locator.RackInferringSnitch

  2. Restart the Cassandra instance for this change to take effect.

How it works...

The RackInferringSnitch requires no extra configuration as long as your network adheres to a specific network subnetting scheme. In this scheme, the first octet, Y.X.X.X, is the private network number 10. The second octet, X.Y.X.X, represents the datacenter. The third octet, X.X.Y.X, represents the rack. The final octet represents the host, X.X.X.Y. Cassandra uses this information to determine which hosts are ‘closest'. It is assumed that ‘closer' nodes will have more bandwidth and less latency between them. Cassandra uses this information to send Digest Reads to the closest nodes and route requests efficiently.

There's more...

While it is ideal if the network conforms to what is required for RackInferringSnitch, it is not always practical or possible. It is also rigid in that if a single machine does not adhere to the convention, the snitch will fail to work properly.

Cassandra High Performance Cookbook Over 150 recipes to design and optimize large scale Apache Cassandra deployments
Published: July 2011
eBook Price: $26.99
Book Price: $44.99
See more
Select your format and quantity:
        Read more about this book      

(For more resources related to this subject, see here.)

Scripting a multiple datacenter installation

Testing out some multiple datacenter capabilities of Cassandra can sometimes require a large number of instances. This recipe installs and creates all the configuration files required to run a multiple datacenter simulation of Cassandra locally.

Getting ready

This recipe creates many instances of Cassandra, and each instance uses a minimum of 256 MB RAM. A Cassandra release in tar.gz format needs to be in the same directory as the script.

How to do it...

  1. Open <hpcbuild>scripts/multiple_instances_dc.sh and add this content:

    #!/bin/sh
    #wget http://www.bizdirusa.com/mirrors/apache//cassandra/0.7.0/
    apache-cassandra-0.7.0-beta1-bin.tar.gz
    HIGH_PERF_CAS=${HOME}/hpcas
    CASSANDRA_TAR=apache-cassandra-0.7.5-bin.tar.gz
    TAR_EXTRACTS_TO=apache-cassandra-0.7.5
    mkdir ${HIGH_PERF_CAS}
    mkdir ${HIGH_PERF_CAS}/commit/
    mkdir ${HIGH_PERF_CAS}/data/
    mkdir ${HIGH_PERF_CAS}/saved_caches/
    cp ${CASSANDRA_TAR} ${HIGH_PERF_CAS}
    pushd ${HIGH_PERF_CAS}

    This script will take a list of arguments such as ‘dc1-3 dc2-3'. The string before the first dash is the name of the datacenter, and the string after the dash is the number of instances in that datacenter.dcnum=0

    while [ $# -gt 0 ]; do
    arg=$1
    shift
    dcname=`echo $arg | cut -f1 -d ‘-'`
    nodecount=`echo $arg | cut -f2 -d ‘-'`
    #rf=`echo $arg | cut -f2 -d ‘-'`
    for (( i=1; i<=nodecount; i++ )) ; do
    tar -xf ${CASSANDRA_TAR}
    mv ${TAR_EXTRACTS_TO} ${TAR_EXTRACTS_TO}-${dcnum}-${i}
    sed -i ‘1 i MAX_HEAP_SIZE="256M"' ${TAR_EXTRACTS_TO}-${dcnum}-
    ${i}/conf/cassandra-env.sh

  2. Use the datacenter number as the value of the second octet, and the node number in the fourth octet:

    sed -i ‘1 i HEAP_NEWSIZE="100M"' ${TAR_EXTRACTS_TO}-${dcnum}-
    ${i}/conf/cassandra-env.sh
    sed -i "/listen_address\|rpc_address/s/localhost/127.${dcnum}.
    0.${i}/g" ${TAR_EXTRACTS_TO}-${dcnum}-${i}/conf/cassandra.yaml
    sed -i "s|/var/lib/cassandra/data|${HIGH_PERF_CAS}/
    data/${dcnum}-${i}|g" ${TAR_EXTRACTS_TO}-${dcnum}-${i}/conf/
    cassandra.yaml
    sed -i "s|/var/lib/cassandra/commitlog|${HIGH_PERF_CAS}/
    commit/${dcnum}-${i}|g" ${TAR_EXTRACTS_TO}-${dcnum}-${i}/conf/
    cassandra.yaml
    sed -i "s|/var/lib/cassandra/saved_caches|${HIGH_PERF_CAS}/
    saved_caches/${dcnum}-${i}|g" ${TAR_EXTRACTS_TO}-${dcnum}-${i}/
    conf/cassandra.yaml
    sed -i "s|8080|8${dcnum}0${i}|g" ${TAR_EXTRACTS_TO}-${dcnum}-
    ${i}/conf/cassandra-env.sh

  3. Change the snitch from SimpleSnitch to RackInferringSnitch. This will use the listen address of the Cassandra machine to locate it in the datacenter and rack:

    sed -i "s|org.apache.cassandra.locator.SimpleSnitch|org.
    apache.cassandra.locator.RackInferringSnitch|g" ${TAR_EXTRACTS_
    TO}-${dcnum}-${i}/conf/cassandra.yaml
    done
    dcnum=`expr $dcnum + 1`
    done
    popd

  4. Run this script passing arguments to create two datacenters with each having three nodes:

    $ sh scripts/multiple_instances_dc.sh dc1-3 dc2-3

  5. Start up each node in the cluster. Then, connect to a node with the cassandra-cli. Create a keyspace. Ensure the placement_strategy is the NetworkTopologyStrategy and supply it strategy_options to configure how many replicas to place in each datacenter:

    [default@unknown] create keyspace ks33 placement_strategy = 'org.
    apache.cassandra.locator.
    NetworkTopologyStrategy' and strategy_options=[{0:2,1:2}];s

How it works...

This script takes command-line arguments and uses those to set up multiple Cassandra instances using specific IP addresses. The IP addresses are chosen to work with the RackInferringSnitch. After starting the cluster, a keyspace using NetworkTopologyStrategy with a replication factor of 6 is created. The strategy_ options specify three replicas in datacenter 0 and three in datacenter 1.

Determining natural endpoints, datacenter, and rack for a given key

When troubleshooting, it is valuable to know which rack and datacenter your snitch believes a node belongs to. Also, knowing which machines would store a specific key is important when troubleshooting specific failures or determining how your strategy is spreading data across the cluster. This recipe shows how to use JConsole to find this information.

How to do it...

Inside JConsole, select the Mbeans tab, expand the org.apache.cassandra.db tree, expand the EndPointSnitch Mbean, then select Operations. In the right pane, find the button labeled getRack. Enter the IP address of a node to find rack information in the text box next to the button. Enter 127.0.1.0, then click on the getRack button.

Apache Cassandra: Working in Multiple Datacenter Environments

In the operations list, another method getDatacenter is defined. Supply the IP address of a node in the text box next to the button and then click on OK.

Apache Cassandra: Working in Multiple Datacenter Environments

How it works...

This operation is used internally to intelligently route requests. Calling this operation is a way to check that the PropertyFileSnitch or RackInferringSnitch is working correctly.

Manually specifying Rack and Datacenter configuration with a property file snitch

The job of the Snitch is to determine which nodes are in the same Rack or Datacenter. This information is vital for multiple datacenter Cassandra deployments. The property file snitch allows the administrator to define a property file that Cassandra uses to determine what datacenter and rack nodes are a part of. This recipe shows how to configure the property file snitch.

Getting ready

Review the diagram in the Choosing IP Addresses to work with RackInferringSnitch recipe. We will be using the same theoretical network for this recipe.

How to do it...

  1. Open <cassandra_home>conf/cassandra-topology.properties in your text editor. Create an entry for each host:

    10.1.2.5=ny:rack2
    10.1.2.6=ny:rack2
    10.1.3.7=ny:rack3
    10.2.5.9=tx:rack5
    10.2.3.4=tx:rack3
    10.2.3.9=tx:rack3

  2. Edit <cassandra_home>/conf/cassandra.yaml in your text editor:

    endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch

  3. Replicate this file to all hosts in your cluster and restart the Cassandra process.

How it works...

The cassandra-topology.properties file is a simple Java properties file. Each line is an entry in the form of <ip>=<data center>:<rack>. The property file snitch reads this information on startup and uses it to route requests. This optimization attempts to handle digest reads to a host on the same switch or datacenter.

There's more...

See the previous recipe, Determining natural endpoints, datacenter, and rack for a given key to see how to test if you have performed this setup correctly.

Troubleshooting dynamic snitch using JConsole

The Dynamic Snitch is a special snitch that wraps another snitch such as the PropertyFileSnitch. Internally, Cassandra measures latency of read traffic on each host and attempts to avoid sending requests to nodes that are performing slowly. This recipe shows how to use JConsole to find and display the scores that the snitch has recorded.

How to do it...

In the left pane, expand the view for org.apache.cassandra.db and expand the DynamicEndpointSnitch item below it. An Mbean with a randomly chosen number will be below, which you need to expand again. Click on the attributes and the Scores information will appear in the right panel.

Apache Cassandra: Working in Multiple Datacenter Environments

How it works...

When Cassandra nodes are under CPU or IO load due to a heavy number of requests, compaction, or an external factor such as a degraded RAID volume, that node should have a higher score. With dynamic snitch enabled nodes coordinating read operations will send fewer requests to slow servers. This should help balance requests across the server farm.

Quorum operations in multi-datacenter environments

Most applications use Cassandra for its capability to perform low-latency read and write operations. When a cluster is all located in a single physical location, the network latency is low and bandwidth does not (typically) have a cost. However, when a cluster is spread across different geographical locations, latency and bandwidth costs are factors that need to be considered. Cassandra offers two consistency levels: LOCAL_QUORUM and EACH_QUROUM. This recipe shows how to use these consistency levels.

Getting ready

The consistency levels LOCAL_QUORUM and EACH_QUORUM only work when using a datacenter-aware strategy such as the NetworkTopologyStrategy. See the recipe Scripting a multiple datacenter installation for information on setting up that environment.

READ.LOCAL_QUORUM returns the record with the most recent timestamp once a majority of replicas within the local datacenter have replied.
READ.EACH_QUORUM returns the record with the most recent timestamp once a majority of replicas within each datacenter have replied.
WRITE.LOCAL_QUORUM ensures that the write has been written to <ReplicationFactor> / 2 + 1 nodes within the local datacenter (requires network topology strategy).
WRITE.EACH_QUORUM ensures that the write has been written to <ReplicationFactor> / 2 + 1 nodes in each datacenter (requires network topology strategy).

How it works...

Each datacenter-aware level offers tradeoffs versus non-datacenter-aware levels. For example, reading at QUORUM in a multi-datacenter configuration would have to wait for a quorum of nodes across several datacenters to respond before returning a result to the client. Since requests across a WAN link could have high latency (40ms and higher), this might not be acceptable for an application that returns results to the clients quickly. Those clients can use LOCAL_QUORUM for a stronger read then ONE while not causing excess delay. The same can be said for write operations at LOCAL_QUORUM, although it is important to point out that writes are generally faster than reads.

It is also important to note how these modes react in the face of network failures. EACH_ QUORUM will only succeed if each datacenter is reachable and QUORUM can be established in each. LOCAL_QUORUM can continue serving requests even with the complete failure of a datacenter.

Cassandra High Performance Cookbook Over 150 recipes to design and optimize large scale Apache Cassandra deployments
Published: July 2011
eBook Price: $26.99
Book Price: $44.99
See more
Select your format and quantity:
        Read more about this book      

(For more resources related to this subject, see here.)

Using traceroute to troubleshoot latency between network devices

Internet communication over long distances has inherent latency due to the speed of light; however, some areas of the world have more robust network links and better peering. A common tool used to check for network latency is traceroute.

How to do it...

  1. Use traceroute to test the path to an Internet-based host.

    $ traceroute -n www.google.com

    traceroute to www.google.com (74.125.226.113), 30 hops max, 60
    byte packets

    1 192.168.1.1 2.473 ms 6.997 ms 7.603 ms
    2 10.240.181.29 14.890 ms 15.394 ms 15.590 ms
    3 67.83.224.34 15.970 ms 16.573 ms 16.947 ms
    4 67.83.224.9 22.606 ms 22.969 ms 23.324 ms
    5 64.15.8.1 23.841 ms 24.710 ms 24.283 ms
    6 64.15.0.41 30.699 ms 29.384 ms 24.861 ms
    7 * * *
    8 72.14.238.232 15.931 ms 16.388 ms 16.810 ms
    9 216.239.48.24 19.833 ms 16.504 ms 16.732 ms
    10 74.125.226.113 19.616 ms 19.158 ms 19.021 ms

How it works...

Traceroute tracks the route packets taken from an IP network on their way to a given host. It utilizes the IP protocol's time to live (TTL) field and attempts to elicit an ICMP TIME_EXCEEDED response from each gateway along the path to the host.

By analyzing the response time of each device, hops that are taking a long time can be identified as the source of problems. Depending on where the slowness occurs, you can take appropriate action. That may be contacting your network administrator or your ISP.

Ensuring bandwidth between switches in multiple rack environments

For clusters with a few nodes, it is generally advisable to place these nodes on a single switch for simplicity. Multiple rack deployments are suggested when the number of nodes is more than the ports on a typical switch or for redundancy. Cassandra does a high amount of intra-cluster communication. Thus, when nodes are divided across switches, ensure that the links between switches DO not become choke points. This recipe describes how to monitor network interface traffic.

How to do it...

Monitor the traffic on all interfaces, especially on the uplink interfaces between switches using a NMS such as mrtg or cacti. Know the maximum capacity of your network gear and plan for growth.

Apache Cassandra: Working in Multiple Datacenter Environments

There's more...

If the uplinks between switches are network contention points, there are several solutions. One option is to spread your nodes out over more switches. Another option is to upgrade the uplink between your switches, for example if your switches supports 10 GB uplinks, these would give more bandwidth than the standard 1 GB uplinks. Enterprise-level switches often support Link Aggregation Groups (LAG), which bundle multiple interfaces together in an active/active fashion to make a single logical interface that is as fast as the sum of the links aggregated.

Increasing rpc_timeout for dealing with latency across datacenters

Cassandra servers in a cluster have a maximum timeout that they will use when communicating with each other. This is different than the socket timeout used for Thrift clients talking to Cassandra. Operations will have more latency with nodes communicating across large distances. This recipe shows how to adjust this timeout.

How to do it...

  1. Open <cassandra_home>/conf/cassandra.yaml in a text editor. Increase the timeout value:

    rpc_timeout_in_ms: 20000

  2. Restart Cassandra for this change to take effect.

How it works...

When clusters are spread out over large geographical distances, intermittent outages have a greater chance of making requests exceed the timeout value. Remember, if a client is using a consistency level such as ONE, they may receive a result quickly, but the cluster may still be working to write that data to replicas in other datacenters. Raising the timeout gives the cluster more time to complete the process in the background.

Changing consistency level from the CLI to test various consistency levels with multiple datacenter deployments

By default, the command-line interface reads and writes data at consistency level ONE. When using Cassandra with multiple node and multiple datacenter environments, being able to execute operations at other consistency levels is essential to testing and troubleshooting problems. This recipe shows to change the consistency level while using the CLI.

Getting ready

This recipe assumes a multiple data setup such as the one created in the recipe Scripting a multiple datacenter installation.

How to do it...

  1. Use the consistencylevel statement to change the level the CLI will execute operations at:

    [default@unknown] connect 127.0.0.1/9160;
    Connected to: "Test Cluster" on 127.0.0.1/9160

    [default@unknown] create keyspace ks33 with placement_strategy =
    'org.apache.cassandra.locator.
    NetworkTopologyStrategy' and strategy_
    options=[{0:3,1:3,replication_factor=6}];
    [default@unknown] use ks33;
    Authenticated to keyspace: ks33
    [default@ks33] create column family cf33;

    Down all the nodes that are in one of the datacenters.

    $ <cassandra_home>/bin/nodetool -h 127.0.0.1 -p 9001 ring

    Address Status State Load Owns Token
    127.1.0.3 Down Normal 42.62 KB 42.18%
    127.0.0.1 Up Normal 42.62 KB 6.74%
    127.0.0.2 Up Normal 42.62 KB 11.78%
    127.0.0.3 Up Normal 42.62 KB 22.79%
    127.1.0.2 Down Normal 42.62 KB 4.62%
    127.1.0.1 Down Normal 42.62 KB 11.90%

  2. Insert a row at the default consistency level of ONE:

    [default@ks33] set cf33[‘a'][‘1']=1;
    Value inserted.

  3. Change the consistency level to EACH_QUORUM:

    [default@ks33] consistencylevel as EACH_QUORUM;
    Consistency level is set to ‘EACH_QUORUM'.
    [default@ks33] set cf33[‘a'][‘1']=1;
    null
    [default@ks33] get cf33[‘a'][‘1'];
    null

  4. Change the consistency level to LOCAL_QUORUM:

    [default@ks33] consistencylevel as LOCAL_QUORUM;
    Consistency level is set to ‘LOCAL_QUORUM'.
    [default@ks33] set cf33[‘a'][‘1']=1;
    Value inserted.
    [default@ks33] get cf33[‘a'][‘1'];
    => (column=31, value=31, timestamp=1304897159103000)

How it works...

The consistencylevel statement in the command-line interface changes the level operations run at. The default level of ONE will succeed as long as a single natural endpoint for the data acknowledges the operation. For LOCAL_QUORUM, a quorum of nodes in the local datacenters must acknowledge the operation for it to succeed. With EACH_QUROUM, a quorum of nodes in all datacenters much acknowledge the operation for it to succeed. If the CLI displays null after a set or get, the operation failed.

Using the consistency levels TWO and THREE

In multiple datacenter scenarios, replication factors higher than three are common. In some of these cases, users want durability of writing to multiple nodes, but do not want to use ONE or QUORUM. The Thrift code generation file, <cassandra_home>/interface/cassandra. thrift, has inline comments that describe the different consistency levels available.

* Write consistency levels make the following guarantees before
reporting success to the client:
...
* TWO Ensure that the write has been written to at least 2
nodes' commit log and memory table
* THREE Ensure that the write has been written to at least 3
nodes' commit log and memory table
...
* Read consistency levels make the following guarantees before
returning successful results to the client:
...
* TWO Returns the record with the most recent timestamp
once two replicas have replied.
* THREE Returns the record with the most recent timestamp
once three replicas have replied.

Getting ready

This recipe requires a multiple datacenter installation as described in the recipe Scripting a multiple datacenter installation.

How to do it...

  1. Create a two-datacenter cluster with two nodes in each datacenter:

    $ sh multiple_instances_dc.sh dc1-3 dc2-3

  2. Create a keyspace with a replication factor of 4 and two replicas in each datacenter:

    [default@unknown] create keyspace ks4 with placement_strategy =
    'org.apache.cassandra.locator.
    NetworkTopologyStrategy' and strategy_
    options=[{0:2,1:2,replication_factor=4}];

    [default@unknown] use ks4;
    [default@ks4] create column family cf4;

  3. Down multiple nodes in the cluster inside the same datacenter:

    $ <cassandra_home>/bin/nodetool -h 127.0.0.1 -p 9001 ring
    Address Status State Load Owns Token
    127.0.0.3 Down Normal 42.61 KB 39.37%
    127.1.0.1 Up Normal 42.61 KB 2.43%
    127.0.0.2 Down Normal 42.61 KB 10.85%
    127.1.0.2 Up Normal 42.61 KB 17.50%
    127.0.0.1 Up Normal 42.61 KB 5.40%
    127.1.0.3 Up Normal 42.61 KB 24.44%

  4. Set the consistency level to TWO and then insert and read a row:

    [default@ks4] connect 127.0.0.1/9160;
    [default@unknown] consistencylevel as two;
    Consistency level is set to ‘TWO'.

    [default@unknown] use ks4;
    [default@ks4] set cf4[‘a'][‘b']='1';
    Value inserted.
    [default@ks4] get cf4[‘a'];
    => (column=62, value=31, timestamp=1304911553817000)
    Returned 1 results.

How it works...

Replication factors such as TWO and THREE can be helpful. An example of this is a two-datacenter deployment with a replication factor of four. QUORUM would require three natural endpoints, and since each datacenter has two replicas, this would mean that the operation depends on the system in a remote datacenter. If the remote datacenter is down—EACH_QUORUM would fail, and if a local replica was down—LOCAL_QUORUM would fail. Consistency level TWO would allow the first two acknowledgments, local or remote, to successfully complete the operation.

Calculating Ideal Initial Tokens for use with Network Topology Strategy and Random Partitioner

NetworkTopologyStrategy works in conjunction with an EndpointSnitch to determine a relative proximity of each of your Cassandra nodes and distribute replica data in an explicit user specified manner. The replica insertion behavior of NetworkTopologyStrategy requires that the "standard" ring concept of even token distribution between all nodes that span multiple datacenters is not used, but instead create mirrored logical rings between datacenters.

Apache Cassandra: Working in Multiple Datacenter Environments

Getting Ready

For NetworkTopologyStrategy to work, you must have a correctly configured Endpoint Snitch. For absolute control, use PropertyFileSnitch to specify which Cassandra nodes are in which datacenter and rack.

How to do it...

For this example, assume two datacenters each with two nodes. Calculate the tokens for the nodes in a datacenter as if they were the entire ring.

  1. The formula to calculate the ideal Initial Tokens is:

    Initial_Token = Zero_Indexed_Node_Number * ((2^127) / Number_Of_
    Nodes)

  2. For the first node in the first datacenter (N0DC0):

    initial token = 0 * ((2^127) / 2)
    initial token = 0

  3. For the second node in the first datacenter (N1DC0):

    initial token = 1 * ((2^127) / 2)
    initial token = 85070591730234615865843651857942052864

    Now, for the second datacenter, do the exact same process, but no two nodes can have the same token, so offset the tokens by adding 1 to the token value.

  4. For the first node in the second datacenter (N0DC1):

    initial token = 1

  5. For the second node in the second datacenter (N1DC1):

    initial token = 85070591730234615865843651857942052865

How it works...

Continuing with our two-datacenter example, a replica for token 3 is set to go from DC0 to DC1. Cassandra determines which node will get the write in the remote datacenter the same way it would do for primary insertion. Cassandra will write to the node whose Initial Token is closest without being larger than the data's token. When using Network Topology strategy, Cassandra only has nodes in the remote datacenter to choose from when placing the replica, not the entire ring. Thus, the replica will write to DC1N0.

There's more...

NetworkToplogySnitch is versatile as it can be used with more than two datacenters and even when datacenters have differing numbers of nodes. However, it must be set up properly.

More than two datacenters

If there are more than two datacenters, follow the same steps but keep incrementing the offset so that no nodes have the same Initial Token. For example, add 2 in the third datacenter.

Datacenters with differing numbers of nodes

NetworkTopologyStrategy also works with multiple datacenters that each have different numbers of nodes. Follow the recipe of computing the tokens for that datacenter independently, and then check to make sure there are no token collisions on any other node in any datacenter. If the numbers collide, increment the token on one of those nodes.

Endpoint Snitch

Furthermore, using a different, or improperly configured Endpoint Snitch, will not guarantee you even replication.

Summary

The recipes in this article taught us how to configure and use features that control and optimize how Cassandra works in multiple datacenter environments.


Further resources on this subject:


About the Author :


Edward Capriolo

Edward Capriolo, who also authored the previous book, Cassandra High Performance Cookbook, is currently system administrator at Media6degrees where he helps design and maintain distributed data storage systems for the Internet advertising industry. Edward is a member of the Apache Software Foundation and a committer for the Hadoop-Hive project. He has experience as a developer as well as a Linux and network administrator and enjoys the rich world of open source software.

Books From Packt

Pentaho Data Integration 4 Cookbook
Pentaho Data Integration 4 Cookbook

Mastering phpMyAdmin 3.3.x for Effective MySQL Management
Mastering phpMyAdmin 3.3.x for Effective MySQL Management

NHibernate 3 Beginner's Guide
NHibernate 3 Beginner's Guide

Moodle as a Curriculum and Information Management System
Moodle as a Curriculum and Information Management System

PostgreSQL 9.0 High Performance
PostgreSQL 9.0 High Performance

CMS Made Simple Development Cookbook
CMS Made Simple Development Cookbook

JasperReports 3.6 Development Cookbook
JasperReports 3.6 Development Cookbook

MySQL Admin Cookbook
MySQL Admin Cookbook

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software