With respect to loading external data, Cassandra supplies a load utility called the sstableloader. The sstableloader is able to load flat files into Cassandra, however the files must first be converted into SSTable format.
Most RDBMS’s have an unload utility that allows data to be unloaded to flat files. Once in flat file format, the sstableloader utility can be used to load the data into Cassandra column families.
Some developers write programs to connect to both an RDBMS and Cassandra and move data in that way.
Read operations trigger consistency checks across all replicas for a requested row using a process called read repair. For reads, there are two types of read requests that a coordinator node can send to a replica; a direct read request and a background read repair request. The number of replicas contacted by a direct read request is determined by the read consistency level specified by the client. Background read repair requests are sent to any additional replicas that did not receive a direct request. Read repair requests ensure that the requested row is made consistent on all replicas. Read repair is an optional feature and can be configured per column family.
There are a number of CQL (Cassandra Query Language) drivers and native client libraries available for most all popular development languages (e.g. Java, Ruby, etc.) All drivers and client libraries can be downloaded from: http://www.datastax.com/download/clientdrivers.
The Cassandra data model is a schema-optional, column-oriented data model. This means that, unlike a relational database, you do not need to model all of the columns required by your application up front, as each row is not required to have the same set of columns. Columns and their metadata can be added by your application as they are needed without incurring downtime to your application.
Although it is natural to want to compare the Cassandra data model to a relational database, they are really quite different. In a relational database, data is stored in tables and the tables comprising an application are typically related to each other. Data is usually normalized to reduce redundant entries, and tables are joined on common keys to satisfy a given query.
In Cassandra, the keyspace is the container for your application data, similar to a database or schema in a relational database. Inside the keyspace are one or more column family objects, which are analogous to tables. Column families contain columns, and a set of related columns is identified by an application-supplied row key. Each row in a column family is not required to have the same set of columns.
Cassandra does not enforce relationships between column families the way that relational databases do between tables: there are no formal foreign keys in Cassandra, and joining column families at query time is not supported. Each column family has a self-contained set of columns that are intended to be accessed together to satisfy specific queries from your application.