Cassandra uses gossip state information to locally determine if another node in the system is up or down. This failure detection information is used by Cassandra to avoid routing client requests to unreachable nodes whenever possible.
The gossip inter-node communication process tracks heartbeats from other nodes both directly (nodes gossiping directly to it) and indirectly (nodes heard about secondhand, thirdhand, and so on). Rather than have a fixed threshold for marking nodes without a heartbeat as down, Cassandra uses an accrual detection mechanism to calculate a per-node threshold that takes into account network conditions, workload, or other conditions that might affect the perceived heartbeat rate.
Node failures can result from various causes such as hardware failures, network outages, and so on. Node outages are often transient but can last for extended intervals. A node outage rarely signifies a permanent departure from the cluster, and therefore does not automatically result in permanent removal of the failed node from the cluster. Other nodes will still try to periodically initiate gossip contact with failed nodes to see if they are back up.
When a node comes back online after an outage, it may have missed writes for the replica data it maintains. Writes missed due to short, transient outages are saved for a period of time on other replicas and replayed on the failed host once it recovers using Cassandra’s built-in hinted handoff feature. If a node is down for an extended period, an administrator can run the nodetool repair utility after the node is back online to ‘catch it up’ with its corresponding replicas.
To permanently change a node’s membership in a cluster, administrators must explicitly remove a node from a Cassandra cluster using the nodetool management utility.
Cassandra can easily replicate data between different physical datacenters by creating a keyspace that uses the replication strategy currently termed NetworkTopologyStrategy. This strategy allows you to configure Cassandra to automatically replicate data to different data centers and even different racks within datacenters to protect against specific rack/physical hardware failures causing a cluster to go down. It can also replicate data between public Clouds and on-premises machines.
The main Cassandra configuration file is the cassandra.yaml file, which houses all the main options that control how Cassandra operates.
Cassandra’s architecture make it perfect for full Cloud deployments as well as hybrid implementations that store some data in the Cloud and other data on premises.
DataStax provides an Amazon AMI that allows you to quickly deploy a Cassandra cluster on EC2. See the online documentation for a step-by-step guide to installing a Cassandra cluster on Amazon.
Cassandra negates the need for extra software caching layers like memcached through its distributed architecture, fast write throughput capabilities, and internal memory caching structures.