Infrastructure in the cloud vs on-premise


I made a comment on twitter saying that if you are still operating an on-premise data center in the second decade of the 21st century you are wasting a ton of money. I was talking specifically about AWS vs on-premise. I got some pushback on that assertion in the form of private messages. Here is the summary of the feedback I received:

  1. AWS only makes sense if you need to spin up hundreds of servers fast. Otherwise it is a costly low quality proposition.
  2. In AWS you have zero control of your infrastructure and therefore you have no control over the outcome of failures.
  3. On premise data centers are built to stay operational whereas on AWS you must build your infrastructure with the expectation of failure.

As an application developer my experience is quite the opposite. In the AWS environment I am able to provision resources as needed based on the requirements of my application. I cannot do so in an on-premise data center where any sort of an upgrade or installation can take weeks or months of red tape. At this point in my career, having seen what is possible in AWS and in the cloud in general I have zero interest in building anything out in an on-premise data center.

As for control of the infrastructure, where is the delineation ? At what point do you say with certainty that you have full control over your infrastructure ? Even if you have control over your LAN and other on-premise resources you still have to rely on your power company for electricity. Power companies solved the problem of offering energy as a utility, why not IT infrastructure companies offering their resources as a utility ?

The only way an on-premise datacenter can be better than anything AWS can offer is if you build the exact same infrastructure as they have, with the same resources and tools to help design for failure. Yes, that includes multiple data centers in geographically distinct regions (as in Virginia and California). Sure, there are flaws in AWS, and they do on occasion have outages. But so do on-premise data centers and in my experience with much greater frequency and with greater impact.

Consider the April 2011 EBS outage at Amazon in one of their availability zones:

What about Netflix, an AWS customer that kept on going because they had proper “design for failure”? Try doing that in your private IT infrastructure with the complete loss of a data center. What about another AWS/enStratus startup customer who did not design for failure, but took advantage of the cloud DR capabilities to rapidly move their systems to California? What startup would ever have been able to relocate their entire application across country within a few hours of the loss of their entire data center without already paying through the nose for it?

Sure, when you move to the cloud you give up control over your infrastructure, but the whole point of designing your applications for failure is to make your lack of control less relevant.