Big Data is not all about Hadoop

Punchcard Photo credit Jan Andersen
Photo credit Jan Andersen

Big Data is not Hadoop, and Hadoop is not Big Data.

A lot of people are surprised that somehow Big Data adoption is growing while Hadoop is struggling. There is some speculation as to why and I have a much more pragmatic explanation: Hadoop is not SQL.

Not all developers are created equal. Not all developers can pick up new skills – and enjoy doing so. The vast majority of enterprise developers are business analysts who know how to configure business software like Salesforce or SAP. Many know SQL, also effectively a well established business language. Some may also know a programming language or two among the likes of Java, JavaScript, C# or even Python but that is not their primary job function or even interest. The mere concept of Map-Reduce might as well be a foreign language to this group of people.

Most IT departments don’t understand the implications of adopting distributed storage tools like Hadoop or Cassandra. Expansion and scalability happens by adding new nodes, thus increasing IT maintenance costs. The reality is that vast majority of businesses do not need Hadoop. Dramatic improvements in storage technology, especially SSDs, declining costs of multi-core servers, and seamless support for replicas offered by environments like AWS mean that traditional well established data processing and reporting systems (i.e. SQL) can actually be better at “Big Data” than Hadoop.

One thought on “Big Data is not all about Hadoop

Comments are closed.