Cassandra can be used in many different data management situations. Some of the most common use cases for Cassandra include:
Cassandra is typically not the choice for transactional data that needs per-transaction commit/rollback capabilities. Note that Cassandra does have atomic transactional abilities on a per row/insert basis (but with no rollback capabilities).
The primary difference between Cassandra and Hadoop is that Cassandra targets real-time/operational data, while Hadoop has been designed for batch-based analytic work.
There are many different technical differences between Cassandra and Hadoop, including Cassandra’s underlying data structure (based on Google’s Bigtable), its fault-tolerant, peer-to-peer architecture, multi-data center capabilities, tunable data consistency, all nodes being the same (no concept of a namenode, etc.) and much more.
HBase is an open-source, column-oriented data store modeled after Google Bigtable, and is designed to offer Bigtable-like capabilities on top of data stored in Hadoop. However, while HBase shared the Bigtable design with Cassandra, its foundational architecture is much different.
A Cassandra cluster is much easier to setup and configure than a comparable HBase cluster. HBase’s reliance on the Hadoop namenode equates to there being a single point of failure in HBase, whereas with Cassandra, because all nodes are the same, there is no such issue.
In internal performance tests conducted at DataStax (using the Yahoo Cloud Serving Benchmark – YCSB), Cassandra offered literally 5X better performance in writes and 4X better performance on reads than HBase.
MongoDB is a document-oriented database that is built upon a master-slave/sharding architecture. MongoDB is designed to store/manage collections of JSON-styled documents.
By contrast, Cassandra uses a peer-to-peer, write/read-anywhere styled architecture that is based on a combination of Google BigTable and Amazon Dynamo. This allows Cassandra to avoid the various complications and pitfalls of master/slave and sharding architectures. Moreover, Cassandra offers linear performance increases as new nodes are added to a cluster, scales to terabyte-petabyte data volumes, and has no single point of failure.