Oleg’s Utilities : Exploring Node, Angular and Heroku

I needed a sandbox where I could experiment with Heroku, Node, JavaScript, and Angular and so I created a little suite of apps that I call “Oleg’s Utilities“. For now it is just a date calculator that my wife told me would be useful to her, but I’ll add more. A few observations:

  1. I am not an experienced web developer, but I went through Angular and Node tutorial and was able to get started with a simple app in about 20-30 mins.
  2. First time ever using Heroku, I was able to deploy a useful app in minutes.

It is amazing what one could do these days with the cloud services. As I pointed out before it truly is a Kodak Moment for IT departments and technology vendors. The fact that anybody can put together an app and deploy it to the cloud in minutes, and essentially zero cost to them is a truly disruptive revolutionary technology.

Your IT Department’s Kodak Moment

Your IT department’s Kodak moment is now, but it is not the kind of a moment where you get to take a cute picture and save it forever.

George Eastman founded Kodak in 1888. The company was dominant during most of 20th century in the market for photographic film. Even though they invented the first digital camera in 1975 they dismissed the idea of digital photography. As a dominant player in the industry they did not want to introduce anything that would threaten their near-monopoly on film products. While consumer electronics companies with no vested interest in film introduced amazing digital cameras, Kodak fell into a pattern of steep decline in the late 1990s and in 2007 had to file for bankruptcy.

Today’s enterprise IT market is monopolized by on-premise data centers. It is dominated by big vendors that have vested interested in maintaining the status quo. They would all love to tell you that they have some sort of a magic solution that brings the cloud to you. Complacent enterprise IT departments are more than willing to listen – after all, IT view themselves as gatekeepers to technology adoption in their companies.

The reality is that they will never keep up. The cloud brought the barriers to entry to near zero. While it used to be that it would take months or years and millions of dollars for a company to scale out their on-premise IT, now the same takes hours or days and zero upfront costs to scale out a data center. Companies that adopt cloud services will find themselves delivering applications, tools, and products to their customers much faster and at a lower cost. Companies that continue to look for excuses not to will find themselves outcompeted by peers that do not.

This is not limited to software technology companies, although they will feel the impact first. IT departments at companies to whom software is more of a tool than a product are at danger of rendering themselves obsolete by resisting cloud adoption. For a business unit to build and deploy an application IT is no longer required — all they need is a budget and an internet connection. IT departments, therefore, could make themselves more useful by facilitating API and data integration with cloud applications rather than standing in the way of progress.

The Perils of Division of Labor in Software Engineering

One of the key tenets of modern capitalism is division of labor. But is it a good thing for software development ?

Prior to the late 19th century a violin was produced from raw materials to completion by a single person, who himself may have been an expert violinist. He may have had members of his family work for him. Everyone involved in the process had deep connection to the craft and to the final product. Wikipedia article on Stradivarius states:

The name “Stradivarius” has become a superlative often associated with excellence; to be called “the Stradivari” of any field is to be deemed the finest there is. The fame of Stradivarius instruments is widespread, appearing in numerous works of fiction.

A modern violin on the other hand is assembled from many pieces, each one manufactured by someone else who may not even be musically inclined. At each point in the process, someone worked on this violin and contributed parts to it without knowing who the end user if the final product is going to be. Nobody in this chain of manufacturing can claim pride in the final product.

It also used to be that a stuffed bear is put together by a craftsman who would sew the toy together, stuff it with material, and decorate it on the outside. Today, a child can walk into a Build-a-Bear Workshop at their local mall, pick out the shell, and observe how a minimum wage worker sticks a tube into the shell and operates the machine that inflates the shell with stuffing.

A few years ago I needed my Swiss mechanical watch serviced. It used to be that a watchmaker had a shop downtown and serviced every single watch himself. To my horror, I realized that I had no choice but to drop my watch off at a mall kiosk, whose operator promptly put my precious posession in a UPS envelope and handed it off to a UPS truck driver.

Needless to say, when I got a call to come pick up my watch it turned out to be in a worse shape than it was when I dropped it off – in fact, the watch hands fell off the moment I turned the crown to adjust it. Eventually the watch repair company (Precision Time by the way) rectified the problem at their expense, but not without the hassle and stress for me.

In that entire process there was not a single person aside from me (the end user) who cared about the outcome. The kiosk owner had liability insurance, and so he couldn’t care less if my watch got damaged as he put it in the UPS envelope. UPS themselves don’t even know what’s in the package, and so they don’t care if it sits at the bottom of a pile in a dark wet corner of a truck. And finally, the person in a shop somewhere working on watches has to adhere to timelines and productivity goals that have nothing to do with the end result. There was not a single person in that entire chain who was able to make an informed decision about the entire process. It is no wonder that my watch came back damaged.

How is this relevant to software development, you might ask ? Well, in some companies the roles of business analyst, developer, tester, deployed and maintainer are distinct and separate. In the event of a production issue it takes orders of magnitude longer for developers to explain to maintainers what to do, than for them to do it themselves and fix the issue. All cognitive aspects of software engineering are split up and separated, keeping developers and maintainers as far removed from the knowledge as possible.

This hierarchical setup has historical roots. In 1974 Harry Braverman made the following observations in his Labor and Monopoly Capital: The Degradation of Work in the Twentieth Century (p. 227):

The upper level of the computer hierarchy is occupied by the systems analyst and the programmer. The systems analyst is the office equivalent of the industrial engineer, and it is his or her job to develop a comprehensive view of the processing of data in the office and to work out a machine system which will satisfy the processing requirements. The programmer converts this system into a set of instructions for the computer. In early computer installations, the programmer was generally a systems analyst as well, and combined the two functions of devising and writing the system. But with the encroachment of the division of labor, these functions were increasingly separated as it became clear that a great deal of the work of programming was routine and could be delegated to cheaper employees. Thus the designation of “programmer” has by this time become somewhat ambiguous, and can be applied to expert program analysts who grasp the rationale of the systems they work on, as well as to program coders who take as their materials the pre-digested instructions for the system or subsystem and simply translate them mechanically into specialized terminology. The training for this latter work occupies no more than a few months, and peak performance is realized within a one- to two-year period. In accordance with the logic of the capitalist division of labor, most programmers have been reduced to this level of work.

Below this level, computer work leaves the arena of specialized or technical skills and enters the realm of working-class occupations. The computer operator runs the computer in accordance with a set of rigid and specific instructions set down for each routine. The training and education required for this job may perhaps best be estimated from the pay scales, which in the case of a Class A operator are on about the level of the craftsman in the factory, and for Class C operators on about the level of the factory operative.

Of course, when Braverman published his book in the 1970s it was still very early in the history of computer industry. However, attempts to separate cognitive aspects of software craftsmanship from the work of a computer programmer did not stop there. The thinking was that by reducing the role of a programmer into a mere translator of very specific and very detailed requirements and design into machine language, and separating the programmers from any ability to make informed decisions, it was going to be possible to recruit a much cheaper workforce – or outsource it altogether.

Andrew Hunt and Thomas David warned us in Pragmatic Programmer: From Journeyman to Master (chapter 8 “Pragmatic Projects”):

Traditional team organization is based on the old-fashioned waterfall method of software construction. Individuals are assigned roles based on their job function. You’ll find business analysts, architects, designers, programmers, testers, documenters, and the like. There is an implicit hierarchy here—the closer to the user you’re allowed, the more senior you are.

Taking things to the extreme, some development cultures dictate strict divisions of responsibility; coders aren’t allowed to talk to testers, who in turn aren’t allowed to talk to the chief architect, and so on. Some organizations then compound the problem by having different subteams report through separate management chains.

It is a mistake to think that the activities of a project—analysis, design, coding, and testing—can happen in isolation. They can’t. These are different views of the same problem, and artificially separating them can cause a boatload of trouble. Programmers who are two or three levels removed from the actual users of their code are unlikely to be aware of the context in which their work is used. They will not be able to make informed decisions.

Just like a violin builder who can’t play music and doesn’t work with musicians (or is not a musician himself) can’t make a good violin, a programmer who doesn’t have a stake in the end product cannot build a good product. Environments and organizations with hierarchical bureaucracies discourage developers from proactively taking responsibility for the end product. The longer the chain of responsibility the less likely there is anyone in the hierarchy who can actually accept it.

Big Data is not all about Hadoop

Big Data is not Hadoop, and Hadoop is not Big Data.

A lot of people are surprised that somehow Big Data adoption is growing while Hadoop is struggling. There is some speculation as to why and I have a much more pragmatic explanation: Hadoop is not SQL.

Not all developers are created equal. Not all developers can pick up new skills – and enjoy doing so. The vast majority of enterprise developers are business analysts who know how to configure business software like Salesforce or SAP. Many know SQL, also effectively a well established business language. Some may also know a programming language or two among the likes of Java, JavaScript, C# or even Python but that is not their primary job function or even interest. The mere concept of Map-Reduce might as well be a foreign language to this group of people.

Most IT departments don’t understand the implications of adopting distributed storage tools like Hadoop or Cassandra. Expansion and scalability happens by adding new nodes, thus increasing IT maintenance costs. The reality is that vast majority of businesses do not need Hadoop. Dramatic improvements in storage technology, especially SSDs, declining costs of multi-core servers, and seamless support for replicas offered by environments like AWS mean that traditional well established data processing and reporting systems (i.e. SQL) can actually be better at “Big Data” than Hadoop.

Smart IT Departments Productize Business API and Take Ownership of Data Governance

Satya Nadella explained Microsoft’s “secret” weapon against AWS and Google:

To me what matters is having the right mix of SaaS value. I don’t think of my server business as somehow “old school” or “legacy.” I actually think of the server as the edge of my cloud.

We now have the ability to tie together the cloud and the server. That is a very unique capability that we have. So who am I competing with? Amazon has no capability to compete there. They don’t have a server. Nor does Google. Oracle doesn’t have the equivalent capability. So those are the places where we want to really excel.

The reality is that whether you have some sort of a server on-premises or not shouldn’t matter and that is why Google and Amazon are not concerned in the long term – while Microsoft will continue to be a follower in the cloud arena rather than a leader.

Let’s consider what public cloud like AWS has done for the software industry: it dropped the cost of entry for a startup to near zero. Whereas in the past a startup would need to get redundant enterprise-grade Internet connection and build out a server infrastructure, today all they need to do is going to their AWS dashboard and provision a server. The point is, Googles, Amazons of tomorrow are not built on-premises. Startups of today will be the dominant players of tomorrow and they are built in AWS and Google without a care in the world for on-premise IT.

Smart enterprise does not rely on any particular server. I’ve long been advising my employers and customers to not use Java EE servers like IIS, WebLogic or Glassfish, for example, and instead rely on lightweight platforms (Spring and Jetty for Java, Node.js, etc.) Smart enterprises build out enterprise API that make the location of their applications (on-premise or in the cloud) irrelevant to the business. Smart IT departments develop data governannce policies that improve insights while decentralizing data.

The cloud shifted the center of technology management and thought leadership away from enterprise IT departments and CTOs down to individual teams. A team armed with a budget no longer needs to go through red tape and beg their IT department to find a place where to run their applications in a scalable fashion. Just like BYOD disrupted enterprise mobility so does “Bring your own Salesforce”, “Bring your own AWS” and “Bring your own Heroku.” Enterprise IT can help make themseles relevant by not restricting where the applications are hosted and instead offering secure enterprise API accessible as described above, along with data governance and best practice procedures.

An enterprise that owns and productizes their business API and has a sound data governance is not beholden to any particular cloud vendor – not even Microsoft. Traditional vendors will continue to sell their hybrid on-premise/cloud products, but the reality is that they only kick the can down the road and further entrench their influence in your organization. Own your API and data governance and set your enterprise free from the shackles of enterprise IT vendors!

Guaranteeing Delivery of Messages with AWS SQS

I wanted to scratch an itch and get feedback from the open-source community. So, I put together a little Github project that I like to call SQS-RetryQueue.

Amazon SQS can be utilized to guarantee delivery and processing of messages. This project serves the following purposes:

  1. Demonstrate an example of using AWS SQS with Java to send and receive messages.
  2. Provide an algorithm for retrying failed deliveries.
  3. Provide an approach to keeping SQS costs to a minimum while maintaining real-time processing of messages.
  4. Seek feedback on the approach from the open-source community

AWS SQS is priced by request. One of the goals should be to minimize costs.

Processing of messages can happen on either the server that sent the message or any other server subscribing to this queue. The goal is to begin processing messages as soon as possible.

Each receiver thread acts as follows:

wait on the monitor object for up to visibilityTimeout
while there are messages on the queue:
    receive message
    try processing message
    if processing was successful, delete the message

Sending a message then involves the following:

send the message
notify all receivers waiting on the monitor object

I wanted to get this out of the way for some time. So, here. it is! Any feedback is greatly appreciated.

We Need a Cloud Version of Cassandra

Google recently launched Cloud BigTable – a cloud NoSQL service that is compatible with Apache HBase API. What this means is that an existing ecosystem of Hadoop applications is immediately binary-compatible with this new service and it doesn’t require any changes to the API.

Google is not only marketing this as an alternative to Hadoop, but it is also marketing it as an alternative to Cassandra.

Now, I’ve written in the past that what we need is a proper cloud version of Cassandra. Datastax is a rising star but the fact that none of the cloud providers have a managed, low cost, automatically scaled cloud offering that is compatible with Cassandra API is going to be their downfall.

What burnt me with Cassandra is the fact that as you add storage and capacity by adding more nodes you increase devops costs. What I would like to see from Datastax is not a partnership with HP whose goal ultimately is to sell more hardware and services to on-premise data centers. Instead, we need Datastax to form a strategic partnership with Amazon AWS and offer a zero-maintanenance, zero-initial investment, zero-devops, auto-scaled and fully managed binary API compatible Cassandra offering in AWS.