At first I wanted to name this article “My personal list of grievances against Apache Cassandra”, but I decided to take a more positive approach.
Only Java developers need apply
Surely you can use Cassandra with non-Java clients but the reality is that administering and maintaining Cassandra itself requires very significant knowledge of Java and there is no way around it. The engineer must know Java garbage collection in great detail, and this requires significant experience. As for myself, with twenty+ years of experience with Java I still don’t know how to prevent long GC pauses in Cassandra completely, and in order for me to do that I have no choice but run Cassandra in a JVM profiler and familiarize myself with its source code.
Point is, Cassandra needs to be self-tuning and it should dynamically adjust itself to the workload. It should not require an above-average Java expertise to maintain and to tune. Ideally it shouldn’t require any Java expertise at all.
I don’t want to become an ops engineer
Cassandra is far from zero-maintenance. It requires a regular “repair” process to ensure consistency and that deleted items don’t come back. Compactions can temporarily double disk space utilization. If you have disk space issues and deleting data doesn’t mean you reclaim the space. Again, this is just like what I mentioned above, Cassandra needs to be more self-tuning, dynamically adjust itself to the workload and not require babysitting.
I want to see a proper cloud version of Cassandra
Today’s cloud environments like Google AppEngine, Heroku and even Amazon give you standard JDBC access to SQL database without requiring a developer to become an ops engineer or a DBA. To put it bluntly, I couldn’t care less about the nitty gritty details of cassandra.yaml . I want to use cassandra to build applications, not spend eternity tuning yaml files. I want Thrift and CQL access to a cloud Cassandra cluster maintained by someone else. That’s it.