In general, restoring a Cassandra node is done by first following these procedures:
Shut down the node that is to be restored
Clear the commit log by removing all the files in the commit log directory
e.g. rm /var/lib/cassandra/commitlog/*
Remove the database files for all keyspaces
e.g. rm /var/lib/cassandra/data/keyspace1/*.db
Take care so as not to remove the snapshot directory for the keyspace
Copy the latest snapshot directory contents for each keyspace to the keyspace’s data directory
e.g. cp -p /var/lib/cassandra/data/keyspace1/snapshots/56046198758643-snapshotkeyspace1/* /var/lib/cassandra/data/keyspace1
Copy any incremental backups taken for each keyspace into the keyspace’s data directory
Repeat steps 3-5 for each keyspace
Restart the node
Currently, no uninstaller exists for Cassandra. Therefore, removing Cassandra from a machine consists of the manual deletion of the Cassandra software, data, and log files.
Yes. First, data durability is fully supported in Cassandra so that any data written to a database cluster is first written to a commit log in the same fashion as nearly every popular RDBMS does.
Second, Cassandra offers tunable data consistency so that a developer or administrator can choose how strong they wish consistency across nodes to be. The strongest form of consistency is to mandate that any data modifications be made to all nodes, with any unsuccessful attempt on a node resulting in a failed data operation. Cassandra provides consistency in the CAP sense in that all readers will see the same values.
Other forms of tunable consistency involve having a quorum of nodes written to or just one node for the loosest form of consistency. Cassandra is very flexible and allows data consistency to be chosen on a per operation basis if needed so that very strong consistency can be used when desired, or very loose consistency can be utilized when the use case permits.
In Cassandra, consistency refers to how up-to-date and synchronized a row of data is on all of its replicas. Cassandra offers a number of built-in features to ensure data consistency:
Cassandra extends the concept of ‘eventual consistency’ by offering ‘tunable consistency’. For any given read or write operation, the client application decides how consistent the requested data should be.
Consistency levels in Cassandra can be set on any read or write query. This allows application developers to tune consistency on a per-query basis depending on their requirements for response time versus data accuracy. Cassandra offers a number of consistency levels for both reads and writes.
Choosing a consistency level for reads and writes involves determining your requirements for consistent results (always reading the most recently written data) versus read or write latency (the time it takes for the requested data to be returned or for the write to succeed).
If latency is a top priority, consider a consistency level of ONE (only one replica node must successfully respond to the read or write request). There is a higher probability of stale data being read with this consistency level (as the replicas contacted for reads may not always have the most recent write). For some applications, this may be an acceptable trade-off.
If consistency is top priority, you can ensure that a read will always reflect the most recent write by using the following formula:
(nodes_written + nodes_read) > replication_factor
For example, if your application is using the QUORUM consistency level for both write and read operations and you are using a replication factor of 3, then this ensures that 2 nodes are always written and 2 nodes are always read. The combination of nodes written and read (4) being greater than the replication factor (3) ensures strong read consistency.