Our civilization has a single point of failure

2015-10-06 12.34.55-1

When I was in Las Vegas in October of 2015 for AWS re:Invent I took some time to go for a walk around the Venetian. I came upon Bauman Rare Books. In the store, I saw a copy of Saint Exupery’s “Le Petit Price.” It was a number 231 out of 260 copies signed by the author. It was selling for $23000.

When Amazon Kindle first came out in 2008 I was one of the early adopters. The device and the format proved perfect for me. The ability to impulse-purchase an electronic book anywhere I am and carry hundreds of books in one thin device allowed me to read dozens of books in the first year I got the Kindle. I can change the font size and read the books even when my eyes are tired. I don’t get motion sick reading Kindle on the bus, something that routinely happens to me with printed books.

None of those books are special, in any way. They all look the same on my Kindle, just a row in the catalog. There is nothing that makes them “limited edition.” None of them will ever be signed by an author. None of those electronic books will ever find themselves in a rare book store. The Kindle itself is unlikely to ever find itself on a collector’s shelf.

To make the matters more precarious, when you buy an ebook you don’t actually own it. In 2009, Amazon remotely wiped copies of Orwell’s “1984” from the Kindle devices. There are articles and guides out there on how to protect your ebooks from the same fate, such as “DRM be damned: How to protect your Amazon e-books from being deleted” on the.

On the opposite side of the debate, there are opinions justifying Amazon’s actions and their right to delete e-books from customer devices. When you pay for an ebook, you pay for a license to read it on your Kindle. This is analogous to how software is bought. When you pay for an app, you do not own it — you purchase a license to use it and it can be revoked at any time, for any reason, with little warning. The same can, did, and will happen with e-books and electronically purchased music.

Despite all that, I have long since stopped buying printed books. The convenience outweighs the perception of impermanence. If the book is valuable or special, then I buy both — the e-book for the convenience, and the printed version for the long term value. I can literally count with the fingers of one hand the number of printed versions of e-books that I purchased since getting a kindle.

If the power outages following the hurricane Sandy in the October of 2012 were a couple of days longer I would have had nothing to read. Imagine for a moment what would happen to the modern civilization in the event of a catastrophic solar flare that would at the very least suspend modern civilization, if not completely up-end it. If the power is out for months, and electronic device memories wiped clear, what will the humanity refer to for knowledge and guidance? What will happen to our family photographs, our music collections, and our Kindles?

People no longer collect music, they subscribe to it. We post thousands of photographs to Instagram and Flickr most of which get forgotten within hours from posting. We e-publish articles and blog posts, much like this one, that we know will be lost in the noise by tomorrow morning. We build apps that within weeks or days become outdated. There is hardly anything we put together today in the electronic form that is going to get discovered by our descendants a decade from now, never mind a century or a millennium.

The enterprises store years worth of data and process tens of thousands of transactions daily. For many industries, it is no longer possible to go back to using pencil, paper, and handwritten order forms. The financial industry is more reliant on electronics than ever. A major climate event or a World War affecting electronics is bound to disrupt the way we do business as we know it.

I do not know what the solution is. I do know that the humanity created a single point of failure for the entire civilization. Our long term strategical thinking has been reduced to near term instant gratification that will hardly last a generation.

Cloud Power: Operations costs are the Achilles’ heel of NoSQL

Check out the latest post at my Cloud Power blog at Computerworld:

Companies interested in adopting NoSQL should consider their options carefully. The vast majority of database use cases do not need massive horizontal scalability. Most applications could be better off with traditional SQL databases. In the cloud, there are NoSQL alternatives that cost less and are easier to maintain.

via Operations costs are the Achilles’ heel of NoSQL | Computerworld.

Cloud Power: IT departments must transform in the face of the cloud revolution

In case anyone’s been wondering why there hasn’t been an update to this blog in a couple of weeks — well, I am now part of the Computerworld contributor network with a blog called “Cloud Power“.

My first post is on the topic of Shadow IT  :

Cloud computing democratizes developer and end user productivity at the expense of transparency and IT control. Since developers and users are able to provision and utilize resources as needed, it is easy for costs, overall architecture and security to get out of control. Rather than getting in the way of productivity, however, the IT departments must evolve their role from that of the gatekeepers into that of enablers.

Enjoy the read and join the conversation!

Banking Technology is in Dire Need of Standartization and Openness

Old Bank Photo credit Toby Dickens
Old Bank
Photo credit Toby Dickens

A few weeks ago Investors Bank in New Jersey overhauled their systems. As a result Mint became incompatible with Investors and Investors customers could no longer view their account in Mint. There is anecdotal evidence1 that Mint uses the Yodlee platform2 for the integration. As it turns out, there is no standard mechanism by which external applications can work with banks. Yodlee’s own page states:

Through a proprietary system of direct data access and HTML parsing, Yodlee delivers financial data from more than 14,000 sources, and growing.

While the technology world is moving towards open APIs and standard authentication protocols3 the banking industry continues to rely on proprietary systems and HTML screen scraping. It seems that even using Yodlee platform it is not possible to integrate with banks in any standard way. Each time a bank updates their systems a team of engineers at Intuit must update integration scripts to ensure their customers can continue to use Mint with that bank4:

When a financial institution updates their system, our engineers have to rewrite the script on our end to match so that we can continue supporting them. Typically, they are notified when this is going to happen and can get it updated pretty quickly. However, please open a ticket by filling out our Contact Mint form to make sure this is on their radar and they can get the script updated as soon as possible.

The way Mint integrates with banks is by asking users to enter and store their bank credentials. Mint expects us to trust their security5. The technology industry, however, has long established a protocol by which an application (like Mint) needing access to an outside resource (a user’s bank account) does not need to capture user’s credentials. It is called OAuth6.

Had banks implemented OAuth, mint would use the protocol to obtain an authorization from the user to act upon the bank’s API on behalf of the user. In the event of a security breach at Mint it would be possible for the banks to invalidate all tokens — and disable all further access by Mint. Users would gain control over which applications they want to access their data and which they do not.

In 2015 there is no need for HTML screen scraping or proprietary technologies. Would Yodlee platform even be around if the banks used OAuth and standard API7 ? This is an industry that is in dire need of innovation. Banks need to learn how to recruit and retain top talent from the technology companies, not the other way around. They need to look beyond their traditional well accepted consulting vendors and service providers and think outside the box — especially considering the fact that the technology challenges they face have already been solved by others.

I Stand With Ahmed

This week a precocious 14-year old immigrant Ahmed Mohamed wanted to impress his teachers with a clock he made at home. He built it into one of those pencil boxes you buy at a craft store that look like a small brief case. The teachers and school officials thought it looked suspicious and called the police. The police proceeded to arrest him as a terrorism suspect1.

This is a technology blog and so I won’t get into the topics of politics, racism, and terrorism. Let’s even set aside the seemingly incompetent reaction of Irving, TX law enforcement who had not evacuated the school. Instead I am going to focus on the topic of STEM education in the United States.

My 8 year old daughter building an Arduino LCD circuit
My 8 year old daughter building an Arduino LCD circuit

It just so happened that a few days prior to this incident my 8 year old daughter asked if she can bring the Arduino LCD circuit I had built with her to school to show her friends and teachers. I was not even thinking that an elementary school teacher may think a circuit with batteries, wires and a display is a bomb and it may result in her arrest.

To tell the sorry state of American STEM education all one needs to do is take a tour of top engineering universities and visit science and engineering classrooms. A keen observer will find that the majority of students are immigrants. These students have multiple advantages over American students — they come from cultures that value knowledge and education, families that invest in their childrens future, and teachers who can a tell a bomb from a clock.

Of course, what starts in universities transfers to workplaces. A visit to any software company or even an IT department just about anywhere will reveal that the majority of developers are immigrants as well. They come from India, China, Ukraine, Belarus, Russia, and elsewhere in Asia and Europe.

Meanwhile, American politicians draw crowds of people at campaign rallies fanning the flames of fear over American jobs2. The reality, however, is that a much bigger threat to the future of American middle class jobs starts in schools. When teachers, school, and law enforcement officials can’t tell the difference between an explosive and a homemade clock — how can American kids look up to them ?

Setting Up Cross-Region Replication of AWS RDS for PostgreSQL

As of today AWS RDS for PostgreSQL1 does not offer cross-region replication. Short of switching to one of the RDS offerings that do support it, there is a few options to consider.

1. Custom Configured EC2 Instances with Master-Slave Replication

Master/slave light rail
Master/slave light rail

This setup sacrifices the benefits of AWS RDS service in exchange for greater control over replication settings. In this setup, one region hosts a master PostgreSQL host, and another region hosts a slave which can also act as a read-replica2.

Advantages

Greater control over replication settings.

Disadvantages

  • Give up all the advantages of running in AWS RDS environment.
  • Writes can only be performed in the master region.

2. Software-defined Two-phase Commit

Commitment Photo credit: Ed Schipul
Commitment
Photo credit: Ed Schipul

In this setup there are two independent AWS RDS instances. The application, however, utilizes a two-phase commit protocol3 to guarantee that all writes make it into both databases in a transactional fashion.

Advantages

  • Simple configuration
  • Does not sacrifice any of the AWS RDS advantages

Disadvantages

  • Responsibility for ensuring that writes make it into all regions fall onto the application itself.
  • Increased application code complexity.
  • Write performance is sacrificed since all regional databases must participate synchronously.

3. Asynchronous Writers

Fanning Out Photo credit: Tim Haynes
Fanning Out
Photo credit: Tim Haynes

In this approach each region hosts an asynchronous writer that listens on an SQS queue4. All writes are published on the SNS topic that is configured with all regional writer queues as subscriptions5. When the application running in any of the regions wants to write into the database it publishes a message on this SNS topic which then fans it out to all of the regional SQS queues.

Advantages

  • Simple configuration
  • Does not sacrifice any of the AWS RDS advantages
  • Does not sacrifice write performance

Disadvantages

  • Subject to software bugs
  • Subject to SNS and SQS bugs and outages
  • No guarantee of consistency
  • Requires a mechanism for periodically reconciling differences between regions

Top Ten Differences Between ActiveMQ and Amazon SQS

Taxi queue at LaGuardia Photo credit: Scott Beale / Laughing Squid
Taxi queue at LaGuardia
Photo credit: Scott Beale / Laughing Squid

1. Persistence and Durability

ActiveMQ

Depending on the configuration ActiveMQ can maintain a message journal1. Each message is first written into a journal before being shipped to consumers. Ultimately, the number of messages that can be persisted is constrained by the available disk capacity.

SQS

Amazon SQS stores messages in a distributed storage across all availability zones in a given region2. Each message size can be up to 256KB and SQS can store an unlimited number of messages across unlimited number of queues3.

2. Redundancy

ActiveMQ

ActiveMQ offers a number of different configuration options for clustering4:

  • Broker Clusters and Networks of Brokers: this architecture is most appropriate for distributed networks of brokers. Producers on each broker can reach consumers across the entire cluster. This is most appropriate for a use case such as delivering market data to all consumers across the entire network (JMS topics). This is not exactly a redundant configuration – failure of a single broker results in message loss on that broker.
  • Master-Slave : In this configuration two or more ActiveMQ brokers use some sort of a shared5 storage for the journal. Prior to ActiveMQ 5.9 one had to relied either on a shared file system such as SAN or on an SQL database – which simply shifted the replication responsibility to a different technology. Starting with ActiveMQ 5.9 there is an option to use Replicated LevelDB with Zookeeper6.

SQS

SQS stores messages in redudant storage across all availability zones in a given region. To achieve high levels of redundancy and guarantee that no message is ever lost it relaxes some of the properties of a queueing system7. What that means is that on rare occasions messages may arrive out of order, and same message may be delivered more than once.

3. Graceful Failure

ActiveMQ

In a master-slave8 configuration all clients failover to the next available slave and continue processing messages. In any other configuration, all processing stops until the client is able to reconnect to its broker.

In the event of high memory, temp storage, or jounal space usage ActiveMQ can pause producers until the space frees up. This creates a potential for a deadlock situation where some consumers also act as publishers and become unable to publish or consume messages. There is a risk of the entire system locking up until space is freed up or configuration is changed.

SQS

When your application attempts to retrieve messages from a queue SQS picks a subset of all servers and returns messages from those servers. What that means is that if for some reason a server was unavailable a message may not get retrieved – but will on subsequent requests. This is mitigated to a certain extent by use of long polling9.

4. Message Order and Delivery Guarantee

ActiveMQ

Messages are delivered in the order they are sent10. When there are multiple consumers on the same queue some of the order may be lost – however, that is the case with any queue that has multiple consumers and it is exacerbated by clustering configurations.

SQS

In order to achieve high levels of scalability and redundancy SQS relaxes some of the guarantees of a traditional queuing system. On rare occasions messages may be delivered out of order and more than once, but they will get delivered and no message will be lost. Applications sensitive to duplicated or out-of-order processing need to implement logic to cover these scenarios11.

5. Monitoring and Utility API

This may seem off topic but I do find it necessary to mention. It is often useful, from application standpoint, to perform various utility functions against queues. An application may measure the rate of dequeuing, calculate number of pending messages, and self-optimize.

JMS does not offer API to retrieve this information. ActiveMQ does expose some of this via JMX, however12. Similarly, SQS offers metrics and utility API as part of the SDK.

6. Standards Compliance

ActiveMQ

ActiveMQ conforms to the JMS API specification in the Java universe and has drivers for other platforms and API specifications.

SQS

SQS uses HTTP REST protocol and a proprietary SDK. However, Amazon does offer a JMS implementation of the SQS SDK13.

7. Push Messages as They Become Available

ActiveMQ

The default ActiveMQ protocol is based on a socket connection that allows messages to get pushed to the consumer as soon as they are published. With JMS one can implement MessageListener14 interface and receive messages as they arrive.

SQS

SQS does not natively support push. One has to poll to retrieve messages. This is a minor inconvenience since Amazon provides both long polling and a JMS implementation. Various approaches exist to mimic the push behavior including one that I described in my post on “Guaranteeing Delivery of Messages with Amazon SQS.”15

8. Scalability and Performance

ActiveMQ

ActiveMQ can handle tens of thousands of messages per second on a single broker16. There is a great deal of tuning that affects ActiveMQ performance including the host computer capacity, network topology, etc. Scalability is achieved either vertically by upgrading broker hardware or horizontally by expanding the broker cluster.

SQS

SQS does not return from a SendMessage request unless the message has been successfully store and as a result it has a request-response latency of around 20ms. At first glance it may mean that it cannot handle more than a few hundred messages per second.

However, when dealing with a distributed queue like SQS one has to distinguish between latency and throughout17. SQS scales horizontally. By using multiple threads it is possible to increase message throughput almost indefinitely.

9. Setup, Operations and Support

ActiveMQ

ActiveMQ is just like any other software that one has to install, configure, monitor and maintain. Configuring and tuning ActiveMQ requires thorough understanding of hundreds of different settings18. ActiveMQ itself is written in Java so understanding of Java topics like memory management and garbage collection is helpful.

SQS

As long as you are operating in the AWS environment there is nothing to configure, install or maintain. SQS is a completely managed service.

10. Costs

ActiveMQ

ActiveMQ needs hosts to run on and storage it can use. Someone has to support and maintain it. The costs of ActiveMQ are a function of resources it needs to run and time it takes to tune, configure and maintain it. These costs are still present during periods of low utilization since it doesn’t scale automatically.

SQS

SQS is priced as a function of number of requests and data transfer. You are only charged for what you consume, so during periods of low utilization the costs are lower.

Conclusion

The discussion in this post boils down to the choice between a fully managed cloud service and an installable software product, just like DynamoDB vs Cassandra19. A managed service simplifies development and maintenance at the expense of standards compliance and customization options.