Banking Technology is in Dire Need of Standartization and Openness

Old Bank Photo credit Toby Dickens
Old Bank
Photo credit Toby Dickens

A few weeks ago Investors Bank in New Jersey overhauled their systems. As a result Mint became incompatible with Investors and Investors customers could no longer view their account in Mint. There is anecdotal evidence1 that Mint uses the Yodlee platform2 for the integration. As it turns out, there is no standard mechanism by which external applications can work with banks. Yodlee’s own page states:

Through a proprietary system of direct data access and HTML parsing, Yodlee delivers financial data from more than 14,000 sources, and growing.

While the technology world is moving towards open APIs and standard authentication protocols3 the banking industry continues to rely on proprietary systems and HTML screen scraping. It seems that even using Yodlee platform it is not possible to integrate with banks in any standard way. Each time a bank updates their systems a team of engineers at Intuit must update integration scripts to ensure their customers can continue to use Mint with that bank4:

When a financial institution updates their system, our engineers have to rewrite the script on our end to match so that we can continue supporting them. Typically, they are notified when this is going to happen and can get it updated pretty quickly. However, please open a ticket by filling out our Contact Mint form to make sure this is on their radar and they can get the script updated as soon as possible.

The way Mint integrates with banks is by asking users to enter and store their bank credentials. Mint expects us to trust their security5. The technology industry, however, has long established a protocol by which an application (like Mint) needing access to an outside resource (a user’s bank account) does not need to capture user’s credentials. It is called OAuth6.

Had banks implemented OAuth, mint would use the protocol to obtain an authorization from the user to act upon the bank’s API on behalf of the user. In the event of a security breach at Mint it would be possible for the banks to invalidate all tokens — and disable all further access by Mint. Users would gain control over which applications they want to access their data and which they do not.

In 2015 there is no need for HTML screen scraping or proprietary technologies. Would Yodlee platform even be around if the banks used OAuth and standard API7 ? This is an industry that is in dire need of innovation. Banks need to learn how to recruit and retain top talent from the technology companies, not the other way around. They need to look beyond their traditional well accepted consulting vendors and service providers and think outside the box — especially considering the fact that the technology challenges they face have already been solved by others.

I Stand With Ahmed

This week a precocious 14-year old immigrant Ahmed Mohamed wanted to impress his teachers with a clock he made at home. He built it into one of those pencil boxes you buy at a craft store that look like a small brief case. The teachers and school officials thought it looked suspicious and called the police. The police proceeded to arrest him as a terrorism suspect1.

This is a technology blog and so I won’t get into the topics of politics, racism, and terrorism. Let’s even set aside the seemingly incompetent reaction of Irving, TX law enforcement who had not evacuated the school. Instead I am going to focus on the topic of STEM education in the United States.

My 8 year old daughter building an Arduino LCD circuit
My 8 year old daughter building an Arduino LCD circuit

It just so happened that a few days prior to this incident my 8 year old daughter asked if she can bring the Arduino LCD circuit I had built with her to school to show her friends and teachers. I was not even thinking that an elementary school teacher may think a circuit with batteries, wires and a display is a bomb and it may result in her arrest.

To tell the sorry state of American STEM education all one needs to do is take a tour of top engineering universities and visit science and engineering classrooms. A keen observer will find that the majority of students are immigrants. These students have multiple advantages over American students — they come from cultures that value knowledge and education, families that invest in their childrens future, and teachers who can a tell a bomb from a clock.

Of course, what starts in universities transfers to workplaces. A visit to any software company or even an IT department just about anywhere will reveal that the majority of developers are immigrants as well. They come from India, China, Ukraine, Belarus, Russia, and elsewhere in Asia and Europe.

Meanwhile, American politicians draw crowds of people at campaign rallies fanning the flames of fear over American jobs2. The reality, however, is that a much bigger threat to the future of American middle class jobs starts in schools. When teachers, school, and law enforcement officials can’t tell the difference between an explosive and a homemade clock — how can American kids look up to them ?

Setting Up Cross-Region Replication of AWS RDS for PostgreSQL

As of today AWS RDS for PostgreSQL1 does not offer cross-region replication. Short of switching to one of the RDS offerings that do support it, there is a few options to consider.

1. Custom Configured EC2 Instances with Master-Slave Replication

Master/slave light rail
Master/slave light rail

This setup sacrifices the benefits of AWS RDS service in exchange for greater control over replication settings. In this setup, one region hosts a master PostgreSQL host, and another region hosts a slave which can also act as a read-replica2.


Greater control over replication settings.


  • Give up all the advantages of running in AWS RDS environment.
  • Writes can only be performed in the master region.

2. Software-defined Two-phase Commit

Commitment Photo credit: Ed Schipul
Photo credit: Ed Schipul

In this setup there are two independent AWS RDS instances. The application, however, utilizes a two-phase commit protocol3 to guarantee that all writes make it into both databases in a transactional fashion.


  • Simple configuration
  • Does not sacrifice any of the AWS RDS advantages


  • Responsibility for ensuring that writes make it into all regions fall onto the application itself.
  • Increased application code complexity.
  • Write performance is sacrificed since all regional databases must participate synchronously.

3. Asynchronous Writers

Fanning Out Photo credit: Tim Haynes
Fanning Out
Photo credit: Tim Haynes

In this approach each region hosts an asynchronous writer that listens on an SQS queue4. All writes are published on the SNS topic that is configured with all regional writer queues as subscriptions5. When the application running in any of the regions wants to write into the database it publishes a message on this SNS topic which then fans it out to all of the regional SQS queues.


  • Simple configuration
  • Does not sacrifice any of the AWS RDS advantages
  • Does not sacrifice write performance


  • Subject to software bugs
  • Subject to SNS and SQS bugs and outages
  • No guarantee of consistency
  • Requires a mechanism for periodically reconciling differences between regions

Top Ten Differences Between ActiveMQ and Amazon SQS

Taxi queue at LaGuardia Photo credit: Scott Beale / Laughing Squid
Taxi queue at LaGuardia
Photo credit: Scott Beale / Laughing Squid

1. Persistence and Durability


Depending on the configuration ActiveMQ can maintain a message journal1. Each message is first written into a journal before being shipped to consumers. Ultimately, the number of messages that can be persisted is constrained by the available disk capacity.


Amazon SQS stores messages in a distributed storage across all availability zones in a given region2. Each message size can be up to 256KB and SQS can store an unlimited number of messages across unlimited number of queues3.

2. Redundancy


ActiveMQ offers a number of different configuration options for clustering4:

  • Broker Clusters and Networks of Brokers: this architecture is most appropriate for distributed networks of brokers. Producers on each broker can reach consumers across the entire cluster. This is most appropriate for a use case such as delivering market data to all consumers across the entire network (JMS topics). This is not exactly a redundant configuration – failure of a single broker results in message loss on that broker.
  • Master-Slave : In this configuration two or more ActiveMQ brokers use some sort of a shared5 storage for the journal. Prior to ActiveMQ 5.9 one had to relied either on a shared file system such as SAN or on an SQL database – which simply shifted the replication responsibility to a different technology. Starting with ActiveMQ 5.9 there is an option to use Replicated LevelDB with Zookeeper6.


SQS stores messages in redudant storage across all availability zones in a given region. To achieve high levels of redundancy and guarantee that no message is ever lost it relaxes some of the properties of a queueing system7. What that means is that on rare occasions messages may arrive out of order, and same message may be delivered more than once.

3. Graceful Failure


In a master-slave8 configuration all clients failover to the next available slave and continue processing messages. In any other configuration, all processing stops until the client is able to reconnect to its broker.

In the event of high memory, temp storage, or jounal space usage ActiveMQ can pause producers until the space frees up. This creates a potential for a deadlock situation where some consumers also act as publishers and become unable to publish or consume messages. There is a risk of the entire system locking up until space is freed up or configuration is changed.


When your application attempts to retrieve messages from a queue SQS picks a subset of all servers and returns messages from those servers. What that means is that if for some reason a server was unavailable a message may not get retrieved – but will on subsequent requests. This is mitigated to a certain extent by use of long polling9.

4. Message Order and Delivery Guarantee


Messages are delivered in the order they are sent10. When there are multiple consumers on the same queue some of the order may be lost – however, that is the case with any queue that has multiple consumers and it is exacerbated by clustering configurations.


In order to achieve high levels of scalability and redundancy SQS relaxes some of the guarantees of a traditional queuing system. On rare occasions messages may be delivered out of order and more than once, but they will get delivered and no message will be lost. Applications sensitive to duplicated or out-of-order processing need to implement logic to cover these scenarios11.

5. Monitoring and Utility API

This may seem off topic but I do find it necessary to mention. It is often useful, from application standpoint, to perform various utility functions against queues. An application may measure the rate of dequeuing, calculate number of pending messages, and self-optimize.

JMS does not offer API to retrieve this information. ActiveMQ does expose some of this via JMX, however12. Similarly, SQS offers metrics and utility API as part of the SDK.

6. Standards Compliance


ActiveMQ conforms to the JMS API specification in the Java universe and has drivers for other platforms and API specifications.


SQS uses HTTP REST protocol and a proprietary SDK. However, Amazon does offer a JMS implementation of the SQS SDK13.

7. Push Messages as They Become Available


The default ActiveMQ protocol is based on a socket connection that allows messages to get pushed to the consumer as soon as they are published. With JMS one can implement MessageListener14 interface and receive messages as they arrive.


SQS does not natively support push. One has to poll to retrieve messages. This is a minor inconvenience since Amazon provides both long polling and a JMS implementation. Various approaches exist to mimic the push behavior including one that I described in my post on “Guaranteeing Delivery of Messages with Amazon SQS.”15

8. Scalability and Performance


ActiveMQ can handle tens of thousands of messages per second on a single broker16. There is a great deal of tuning that affects ActiveMQ performance including the host computer capacity, network topology, etc. Scalability is achieved either vertically by upgrading broker hardware or horizontally by expanding the broker cluster.


SQS does not return from a SendMessage request unless the message has been successfully store and as a result it has a request-response latency of around 20ms. At first glance it may mean that it cannot handle more than a few hundred messages per second.

However, when dealing with a distributed queue like SQS one has to distinguish between latency and throughout17. SQS scales horizontally. By using multiple threads it is possible to increase message throughput almost indefinitely.

9. Setup, Operations and Support


ActiveMQ is just like any other software that one has to install, configure, monitor and maintain. Configuring and tuning ActiveMQ requires thorough understanding of hundreds of different settings18. ActiveMQ itself is written in Java so understanding of Java topics like memory management and garbage collection is helpful.


As long as you are operating in the AWS environment there is nothing to configure, install or maintain. SQS is a completely managed service.

10. Costs


ActiveMQ needs hosts to run on and storage it can use. Someone has to support and maintain it. The costs of ActiveMQ are a function of resources it needs to run and time it takes to tune, configure and maintain it. These costs are still present during periods of low utilization since it doesn’t scale automatically.


SQS is priced as a function of number of requests and data transfer. You are only charged for what you consume, so during periods of low utilization the costs are lower.


The discussion in this post boils down to the choice between a fully managed cloud service and an installable software product, just like DynamoDB vs Cassandra19. A managed service simplifies development and maintenance at the expense of standards compliance and customization options.

We Live in a Mobile Device Notification Hell

Notification Hell
Notification Hell

On a hot Sunday afternoon I found myself walking around Menlo Park Mall in central NJ with my wife and kids. My phone vibrated because someone’s automatic spambot just faved a dozen of my photos on Flickr1. Bitstrips app wanted to let me know that I have new Bitstrips waiting for me. 10App demanded my attention reminding me to make a YouTube video of what my kids did today.

As we walked past Verizon store, my phone got all excited telling me about all the things I can buy there. Apple Store wanted to remind me I have my order waiting for pick-up, even though I picked it up a week ago.

Flipboard decided to notify me that a barely dressed coffee aficionado interior decorator started following me. I have hundreds of messages unread in my personal email account and dozens of LinkedIn notifications of recruiters telling me about “Urgent Java openings” that have nothing to do with my career goals.

When I got back home my iPad’s screen was filled with the same exact notifications that my iPhone told me about, as if iPad is unaware it is owned by the same person and that I already acknowledged them. To make the matters worse my MacBook’s notification screen was repeating them as well.

We live in a notification hell world of smartphones, and every year it is getting worse. Our presumably smart devices are incapable of differentiating between what is important and what is not. The social sharing apps like Facebook, Twitter, and Instagram want our constant attention. Flickr is now a spam-bot haven — any time I post a picture, any hour of the day, it is immediately favorited by the same 3 people who have millions of favorites in their photostreams.

No wonder I have no desire to buy a smart-watch2 and I keep my iPhone permanently in a “Do Not Disturb” mode. Why would I want to add a yet another device that I have on me that will constantly demand my attention ?

Buried in their smartphones

While I miss the days of simple flip-phones, I can’t deny the convenience of smart mobile devices. They allow us to work where and when we want. They allow us to get the best price for products we shop for. Yet, I would love nothing more than to stop all the meaningless blinking, beeping and flashing.

We need intelligence built into mobile push notifications. While it is possible to selectively enable or disable notifications by the app, it is simply not enough. When I see a notification I want to swipe it and say “It’s not important” and have my device learn over time and stop alerting me of it3. This learning is then propagated to all of my devices.

Once I acknowledged a notification there is no need for my other devices to tell me about it again. There is nothing stopping my MacBook, iPhone and iPad from knowing that I already read my brother’s Facebook update. They can decrement the notification counters and remove that notification from their respective screens.

There used to be a joke in the software engineering circles that a software platform reaches the end of its natural lifecycle when it becomes capable of browsing the web. In 2015 it seems that any app loses its usefulness the moment it allows social sharing and public APIs. Once social sharing is enabled and public APIs are published the app becomes a medium for spam. Consider all the outfits that let you “buy” Twitter, Instagram, Flickr or Facebook followers.

Social Media
Social Media

I used to love Flipboard and used it daily to read the news. Then one day Flipboard allowed “likes” and “follows”. Within days I went from zero followers to a few dozen followers, all of which are skinny women calling themselves “internet mavens”, “social media aficionados” and “interior decorators.” Somehow they were all interested in Big Data, international politics, and stock market investments. I uninstalled Flipboard until I read somewhere that they started allowing private profiles that one has to opt-in.

It is not complicated for social media platforms to tell who is a bot and who is not. On Flickr, for example, an account with a million favorites but only a couple hundred photos that haven’t been updated in a couple of years is a spam bot4. These platforms can impose API limits — it is simply not humanly possible for someone to have a million favorite photos on Flickr, for instance.

Vast majority of us are not doctors5, military, police or firefighters — we have no real work emergencies. Most of us do not deal with life and death situations as part of our jobs. In software engineering what we typically call emergencies are self-inflicted manufactured crises. And yet, with proliferation of smart mobile devices we are expected to be constantly in contact with our work.

We need enterprise apps on our devices to know what’s important and to learn what is not. Enterprise apps should not be constantly notifying us of “work” we would rather not be doing on our spare time. Instead, they should be reminding us of our goals and helping us succeed.

  2. Why I am not getting an Apple Watch 
  3. I am thinking something along the lines of a Bayesian-network based spam filter. 
  4. It is disappointing to see some well-known photographers utilize the services of spammers. 
  5. Software Engineers Are Not Doctors 

What Every College Computer Science Freshman Should Know

The Clarkson School bridging year progrram class of 2015 graduates
In a few weeks new college freshmen will begin their classes. Some of them will choose to pursue a degree in Computer Science. Over the course of the four years in college they will be surrounded by like minded people who are at least as smart as they are and are just as interested in computers.

When they enter the job market they will compete for the same jobs with colleagues who do not have computer science degrees or any STEM degree at all. This group learned computer programming as a way to advance their careers. Anya Kamenetz of NPR writes1:

Virtually unknown just four years ago, today at least 50 of these programs have sprung up around the country and overseas. Collectively, the sector has taken in an estimated $73 million in tuition since 2011.

And the top programs say they are placing the vast majority of their graduates into jobs earning just under six figures in a rapidly expanding field — filling a need for practical, hands-on skills that traditional college programs, in many cases, don’t.

“The main portion [that attracted me] was the empowerment — being able to create something in terms of technology,” says Frausto, a slight man in a baseball cap with a mustache waxed straight out to sharp points. “That, and obtaining a trade.”

Coder boot camps are poised to get much, much bigger. This past summer, Kaplan, one of the largest education companies, acquired Dev Bootcamp, where Frausto is enrolled. These programs constitute nothing less than a new business model for for-profit vocational education. But their creators believe their greatest innovation may actually be in the realm of learning itself.

Software is so pervasive that every successful professional needs to know how to program. Whether they are an accountant using Excel, a statistician using Python, or a physicist using C, they are all writing software that solves some problem. The vast majority of software is not downloaded by consumers from an App Store, or bundled with a computer as part of the operating system. The vast majority of software solves some mind numbing business problem with no technical complexity that requires a computer science degree. In fact, the vast majority of such software is better off written by a business user who took a programming class. Patrick McKenzie writes2:

Most software is not sold in boxes, available on the Internet, or downloaded from the App Store. Most software is boring one-off applications in corporations, under-girding every imaginable facet of the global economy. It tracks expenses, it optimizes shipping costs, it assists the accounting department in preparing projections, it helps design new widgets, it prices insurance policies, it flags orders for manual review by the fraud department, etc etc. Software solves business problems. Software often solves business problems despite being soul-crushingly boring and of minimal technical complexity.

There is a mismatch3 between the expectations of the business world and those of computer science graduates. Many computer science graduates who expect to end up at Google, Facebook, or Amazon are bound to be disappointed – either because they won’t be able to, or because the actual jobs they are assigned to do will be as pragmatic as anywhere else4.

Ms. Kamenetz writes1 :

Patrick Sarnacke has hired many Dev Bootcamp and other “boot camp” graduates at ThoughtWorks. It’s a global software consultancy headquartered in Chicago, and Sarnacke is head of the associate consultant program.

“Just because someone has a four-year computer science degree doesn’t mean they’re going to be great coders in the business world,” he says. “A lot of traditional programs aren’t teaching the skills people need.”

If someone can become a marketable computer programmer in just a few weeks, why would they go through a four year computer science program that seems like is not preparing them for the real world ? What’s in it for them ? How do they differentiate themselves in the job market ?

The problem is in the definition of the term ‘coder’ and ‘programmer.’ In his visionary work Harry Braverman wrote back in the 1970s5:

A great deal of the work of programming was routine and could be delegated to cheaper employees. Thus the designation of “programmer” has by this time become somewhat ambiguous, and can be applied to expert program analysts who grasp the rationale of the systems they work on, as well as to program coders who take as their materials the pre-digested instructions for the system or subsystem and simply translate them mechanically into specialized terminology.

The training for this latter work occupies no more than a few months, and peak performance is realized within a one-to two-year period. In accordance with the logic of the capitalist division of labor, most programmers have been reduced to this level of work.5

What Braverman is warning us about is that a typical enterprise programmer or coder implements specific requirements and designs given to them. They may be constrained by the IT department in various ways with regards to which tools to use and which platforms to program for. They are likely to deal with an existing system that was built a decade or more ago, their job being to fix bugs and shoehorn new features. They must work within the constraints of strategic direction set by their management and they have little say in it. Their role in the company is under constant scrutiny and comparison with foreign outsourcing companies.

By the time today’s Computer Science freshmen enter the job market every person in the business world will be able to code a computer program. The programming languages and platforms are becoming the great equalizers of skills, allowing any business professional with a few weeks of training to configure their own enterprise apps, and write their own code in a language that makes the most sense to them6. In other words, the profession of a “programmer” as a standalone role will be gone. To become a “programmer” should not be a goal of anyone entering a Computer Science program.

The good news is that there are areas where a Computer Science background is the differentiating factor. By training, Computer Science graduates are better positioned for jobs where a STEM background is valuable3. Their ability to learn and adopt new technologies and cross-polinate ideas from others cannot be learned in a six week program.

At the top of the list of highly lucrative Computer Science opportunities is any sort of financial trading and analytics. Wall Street is continuously seeking ways to make more money, lose less, while doing it faster and more efficiently than their competitors. Wall Street quants can earn mid to upper six figure jobs and they typically come from STEM background. Many of the smaller Wall Street companies also have an atmosphere meant to attract top talent from the tech industry and many also offer services and software to outside customers. Wall Street jobs can be stressful – companies can rise and fall with technology7.

Software companies and startups are also a natural fit for a Computer Science graduate. With very few exceptions the products that they make are in some line of business. The challenge with making business software, however, is that rather than building internal enterprise apps they have to build software that can be used by many customers. That requires a scientific approach to software development, engineering discipline and abstract thinking. A six weeks course in JavaScript does not teach that.

Technology consultancies serve customers who are either not in a position to have their own technology team or who do not have the in-house expertise for a challenging project. In addition to STEM skills these jobs require ability to present ideas to decision makers. A Masters program graduate, for instance, with teaching and publishing experience, is well positioned for a succesful career at a technology consulting firm.

Finally, there has never been a better time to build a software product that a lot of customers will pay for. Whereas in the past to get an enterprise application into a company one had to get past CTOs, CIOs, CEOs, and IT directors, the cloud offers an opportunity to appeal directly to business users. A Salesforce user, for example, does not need permission from their IT department to purchase a 3rd party software that makes them more productive. Likewise, anyone using Google business apps or Microsoft Office365 can do the same. The barrier to entry has been lowered, and now anyone with an idea, a laptop, and a skillset can build a useful product that millions of people will use and pay for.

  1. Twelve Weeks to a Six Figure Job 
  2. Don’t Call Yourself a Programmer 
  3. Attracting STEM Graduates to Traditional Enterprise IT 
  4. I have a personal story of an internship at IBM back in the 1990s. I was really excited to work for IBM, but when I started my job I realized that what I was working on was a boring old internal financial data warehouse and reporting application. I was bored out of my mind for six month but I learned an important lesson: that internal financial application was as important to IBM as any of their customer facing products. 
  5. “Labor and Monopoly Capital: The Degradation of Work in the Twentieth Century” by Harry Braverman, p. 227 
  6. Look no further than SQL, various SAP products, and Salesforce 
  7. Thoughts on Wall St. Technology 

Ten Questions to Consider Before Choosing Cassandra


I spent several years working at a financial company in Jersey City, NJ Pavonia-Newport area. On my way to work, over lunch breaks, or otherwise any time I need to get some fresh air, I'd take my camera with me and take some pictures. These are the sights, the views, and people that I have seen and observed in that neighborhood. These are commuters, workers, residence.
Things to Consider.

1. Do you know what your queries will look like ?

In traditional SQL you design your data model to represent your business objects. Your queries can then evolve over time and can be ad-hoc. You can even create views, materialized or otherwise, to facilitate even more complex analytical queries.

Cassandra does not offer the flexibility of traditional SQL. While your data model can evolve over time and you are not tied to a hard schema, you have to design your data model around the queries you plan to run. The problem with that approach is that it is very rare for end users to say with certainty what they want. Over time their needs change and so do the queries they want to run. Changes to the storage model in Cassandra involve running massive data reloads.

2. What is your team’s skillset ?

Consider the human factor. Ability to get to the application’s data and build reports and run analytical queries is critical to developer and business user productivity. This is often overlooked but it can mean a difference between delivering features in days vs. weeks.

Not all developers are created equal. SQL is a widely accepted and simple query language that business users should be capable of learning and using. Yet, many have trouble with even the simplest SQL. Introducing a whole new mechanism for querying their data, even if it is as mockingly similar to SQL as CQL, could be a problem.

Traditional SQL databases have well established libraries and tool sets. While other platforms are supported, Java is the primary target platform for Cassandra. Cassandra itself is written in Java, and the vast majority of tools and client libraries for it are written in Java. Running and operating a Cassandra cluster requires understanding of JVM intracacies, such as garbage collection.

3. What is your anticipated amount of data ?

Consider whether the amount of data you expect to store justifies entirely new way of thinking about your data. Cassandra was conceived at the time when multi-core SSD-backed servers were expensive, and Amazon EC2 was in its infancy.

A single modern multi-core SSD-backed server running PostgreSQL or MySQL can offer a good balance of performance and query flexibility. AWS RDS is available up to 3 Terabytes. Both PostgreSQL and MySQL can easily handle tables hundreds of gigabytes in size.

4. What are your anticipated write performance requirements ?

In Cassandra, all writes are O(1) and do not require a read-before-write pattern. You write your data, it gets stored into commit log and control is returned back to your application. Eventually it is available for reads, usually very quickly.

In SQL, things are little more complex. If the tables are indexed all writes require a lookup in the index, which in most cases is a O(log(n)) operation (n is the number of unique values in the index). Over time this is going to slow down writes. The writes are equally performant as reads.

5. What is the type of data you plan on storing ?

Cassandra has some advantages over traditional SQL when it comes to storing certain types of data. Ordered sets and logs are one such case. Set-style data structures in SQL require a read-before-write (aka upsert). Logs to be analyzed at a later point using another set of tools can take advantage of Cassandra’s high write throughput.

One has to be cautious about frequently updated data in Cassandra. Compactions are un-predictable and repeatedly updating your data is going to introduce read performance penalties and increase disk storage costs.

6. What are your anticipated read performance requirements ?

Depending on the type of data you are storing, Cassandra may or may not hold advantages. Cassandra is eventually consistent, meaning that the data becomes available at some later point than when it was written. Typically it happens very quickly, but workloads where data is read almost immediately after it was written with expectation that it is exactly what was written are not appropriate for Cassandra. If that is your requirement, you need an ACID-capable database.

Any data that can be easily referred to by primary key at all times is a good fit for Cassandra. A traditional SQL database requires an index scan before it takes you to the right row. Cassandra primary key lookups are essentially O(1) operations and can work equally fast across extremely large data sets. On the other hand, secondary indices present a challenge.

7. Are you prepared for the operations costs ?

Since Cassandra is scaled by adding more nodes to the cluster operations can become quite expensive. Using an SQL database per-se doesn’t solve this problem, however. The question becomes managed cloud service vs. rolling your own cluster. On-premise there is no difference between a multi-node Cassandra cluster vs an RDBMS with multiple read replicas in terms of operations and administrative costs. On-premise vs cloud is a topic for another discussion.

Amazon RDS is a managed RDBMS service. Changing the type of server, number of cores, RAM, and storage, involves simply modifying it and letting it automatically get upgraded during the maintenance window. The backups and monitoring are automatic and it requires very little of human interaction other than using the database itself. You do not need a DBA to manage it.

There are some services out there that will manage a Cassandra cluster in the cloud for you. You can and should consider these vs. rolling your own cluster. You should also consider not using Cassandra at all and see if DynamoDB meets your needs.

8. What are your burst requirements ?

The one area where Cassandra is better than DynamoDB (or other similar services) is burst performance. Cassandra’s “0-60”, so to speak, is instantaneous. It can handle and sustain thousands of operations per second on relatively cheap hardware.

Compared to the scaling profile of DynamoDB in AWS, Cassandra has an upper hand. While DynamoDB can be autoscaled, the autoscaling action can take minutes or hours to complete. In the meantime, your users suffer. With DynamoDB you are stuck with either paying a premium for always-on capacity or you have to come up with clever ways to work around it.

A relational OLTP database may meet your requirements here, however. Even managed service like AWS RDS can handle short bursts of read/write IOPS that are over the provisioned limit and then they get gracefully throttled back.

9. Are you installing on-premise or in the cloud ?

Compared to cloud environments, your options on-premises are limited. You don’t have managed services like DynamoDB, RDS, or RedShift. Hardware and maintenance costs of a Cassandra cluster compared to a similarly sized RDBMS cluster are going to be about the same.

In the cloud environment, however, you are in a much better position to make the right decision. You have managed options like Google Big-Table, DynamoDB and RDS.

10. What are your disaster recovery requirements ?

Cassandra can play a crucial role in your design for failure because all data centers can be kept hot without the need for a master-slave failover. That type of a configuration is a lot more complex to achieve with an SQL database. SQL gives you consistency, while Cassandra gives you partition-tolerance.


Cassandra is constantly evolving, as are traditional databases and managed cloud services. What I see happening is convergence of functionality. There is a lot of cross pollination of ideas going on in the industry with NoSQL databases adopting some of the SQL functionality (think: Cassandra CQL and SQL) and SQL databases adopting some of the NoSQL functionality (think: PostgreSQL NoSQL features). It is important to keep a cool head and not jump on any new tech without understanding your use cases and skill sets.