Logo F2FInterview

Cassandra Interview Questions

Q   |   QA

If you are using replicated CF on the ring, only one seed in the ring doesn't mean single point of failure. The ring can operate or boot without the seed. However, it will need more time to spread status changes of node over the ring. It is recommended to have multiple seeds in production system. 

Some of JMX operations can't be called with jconsole because the buttons are inactive for them. Jconsole doesn't support array argument, so operations which need array as arugument can't be invoked on jconsole. You need to write a JMX client to call such operations or need array capable JMX monitoring tool. 

The key (and column names) must be under 64K bytes.

Routing is O(N) of the key size and querying and updating are O(N log N). In practice these factors are usually dwarfed by other overhead, but some users with very large "natural" keys use their hashes instead to cut down the size. 

We have come across several different, but similar, sets of symptoms that might match what you're seeing. They might all have the same root cause; it's not clear. One common piece is messages like this in dmesg:

INFO: task (some_taskname):(some_pid) blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

It does not seem that anyone has had the time to track this down to the real root cause, but it does seem that upgrading the linux-image package and rebooting your instances fixes it. There is likely some bug in several of the kernel builds distributed by Ubuntu which is fixed in later versions. Versions of linux-image-* which are known not to have this problem include:

  • linux-image-2.6.38-10-virtual (2.6.38-10.46) (Ubuntu 11.04/Natty Narwhal)
  • linux-image-2.6.35-24-virtual (2.6.35-24.42) (Ubuntu 10.10/Maverick Meerkat)

Uninstalling libjna-java or recompiling Cassandra with CLibrary.tryMlockall()'s mlockall() call commented out also make at least some sorts of this problem go away, but that's a lot less desirable of a fix.

If you have more information on the problem and better ways to avoid it, please do update this space.  

Cassandra schema updates assume that schema changes are done one-at-a-time. If you make multiple changes at the same time, you can cause some nodes to end up with a different schema, than others. (Before 0.7.6, this can also be caused by cluster system clocks being substantially out of sync with each other.)

To fix schema disagreements, you need to force the disagreeing nodes to rebuild their schema. Here's how:

Open the cassandra-cli and run: 'connect localhost/9160;', then 'describe cluster;'. You'll see something like this:

[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]

Note which schemas are in the minority and mark down those IPs -- in the above example, 192.168.1.27. Login to each of those machines and stop the Cassandra service/process by running 'sudo service cassandra stop' or 'kill <pid>'. Remove the schema* and migration* sstables inside of your system keyspace (/var/lib/cassandra/data/system, if you're using the defaults).

After starting Cassandra again, this node will notice the missing information and pull in the correct schema from one of the other nodes.

To confirm everything is on the same schema, verify that 'describe cluster;' only returns one schema version. 

In order to link this F2FInterview's page as Reference on your website or Blog, click on below text area and pres (CTRL-C) to copy the code in clipboard or right click then copy the following lines after that paste into your website or Blog.

Get Reference Link To This Page: (copy below code by (CTRL-C) and paste into your website or Blog)
HTML Rendering of above code: