Why I am not Getting an Apple Watch For Now: Or Ever

A friend of mine received his Apple Watch recently and I had a chance to play with it. In short, I am not swayed.
I am a fan of wrist watches. Personal timepieces have been marvels of engineering for centuries. I have an Oris Complication automatic, a couple of Citizen Eco-Drive wrist watches, and a $25 Casio sports watch. The one I wear 90% of the time is the Citizen Eco-Drive world traveler watch with atomic clock.

With the exception of the $25 Casio sports watch whose battery will die any day now, all of my watches can last a lifetime without looking outdated. The Oris requires regular maintenance due to its mechanical nature, but the Citizens don’t even require charging, ever. All of them are waterproof to 20 BAR.

While smartwatches are a huge step above the dorky Google Glass as far as wearable computing goes, they are far from convincing me to purchase any of them – Apple, Android or Pebble.

One issue is battery life. It means dragging extra cables and extra chargers with me when I travel. It means that on a 10 hour flight to Europe I have to worry about a yet another device dying on me. From what I gather, even the app to control a GoPro using the Apple Watch requires the phone to be paired with the GoPro first – so now I am draining batteries on three devices, instead of two.

Second issue is that I just don’t get the point of having what effectively is a smaller external screen for a device that is already in my pocket. To take advantage of any of the functionality the Watch needs to be in the vicinity of the smartphone. This is a make it or break it for me.

I would gladly give up my smartphone and exchange it for a smart watch with its own cell chip. I imagine a watch, just like Apple Watch, that has its own cellular and wifi chip that can work completely independently of the phone. I can use it as a hotspot for my laptop or my iPad. I can use it to make phone calls – using either speakerphone or a headset. In fact, the wrist band can have a speaker on one side and a microphone on the other so when you unclasp it from your wrist the watch works as a flip phone. When I get into my car, the watch pairs with my car’s bluetooth system and let’s me make phone calls. All of this without having a smartphone, or yet another device that requires cables and charging.

Third issue is that I happen to like the classic looks of the watches that I own already, and Apple Watch (and all other smartwatches for that matter) are quite ugly. Walk into any watch store and you will see a huge variety of shapes and sizes. To imagine a world where everyone wears an ugly smartwatch is to imagine an Orwellian or a Brave New World reality where everyone thinks and acts exactly the same.

Fourth issue revolves around notifications. Today, when someone messages me on Facebook my phone vibrates and shows notifications. My iPad, which is at home, also gets a notification. So does my computer. Even if I read them on my phone, the same exact notifications are waiting for me when I come home. To add a yet another device that will be beeping and blinking any time my mother clicks “Like” on Facebook, or any time my boss has a question would send me into therapy.

I am sorry, Apple and Google, but for now I am simply not interested. Make me a product whose battery lasts for at least a week, that can be used without also owning a smartphone, that won’t become outdated in a year, and I will take a look again.

My Brief Affair With Android

As a software engineer I like to experiment with different technologies and step outside of my comfort zone once in awhile. Having used iOS devices for a very long time, sometime last year I bought myself a used Samsung Galaxy Note 3. I can now confidently say that whoever thinks Android is better than iOS must also be the kind of person who thinks Windows 98 is better than Mac OS X.

Allow me to explain.

When my Samsung Galaxy Note 3 running Android 4.4 Kit Kat arrived the first that I was greeted with was a bunch of pre-installed apps I did not ask for and did not need, one of them unfortunately named “ISIS Wallet.” Verizon and Samsung preinstall a bunch of crappy apps that you cannot uninstall just like Acer and Dell used to (and still do) bundle unnecessary apps with their Windows laptops and desktops. Fine, we can move past that.

The device had a stylus. Woohoo! Exciting! I never used it. Ever.

The next thing I observed was absolutely horrendous notifications. If you lock the screen with a passcode you don’t see your notifications unless you unlock it, meaning you can’t at a glance tell what’s going on. App icons don’t show little bubbles like iOS does and across the top there is an incomprehensible bar of icons and indicators that gets filled up with meaningless nonsense. Oh, and the LED on the front of the device would light up in christmas light colors when there were pending notifications, but the colors mean nothing.

So that was Android 4.4 Kit Kat and I honestly thought that the device was so old it couldn’t support Android 5.0 Lollipop. I was fine with that. A couple of days ago, however, I got a notification saying something like “Samsung Has Prepared Android 5.0 Lollipop Update for Your Device. It has exciting new features.” So I said “Give it to me.”

After about 15-30 minutes of updating the device, I was greeted with an error message saying that my device is not compatible with Google Play Services. Furthermore, my Google Calendar stopped syncing.

Sorry, Google, I am not interested. I want my stuff to work. I have no desire in debugging what really should have been an automated process. If my device was not compatible with Google Play Services and installing Lollipop was going to break some things that make smartphones smart, it should not have installed Lollipop.

In any case, I have no desire to mess with the settings, to install and uninstall software, and so on. All I know is that my iOS devices always worked after updates. My contract with Verizon is up, I am going over there and will get a new iPhone this weekend.

Apple is (or was) the Biggest User of Apache Cassandra

One thing I did not realize about Cassandra is that Apple is (or was) one of the biggest Cassandra users out there:

Word in Goldmacher’s circles is that Apple will be “replacing” its huge Cassandra noSQL implementation with FoundationDB. Apple uses Cassandra for “iMessage, iTunes passwords, a bunch of stuff,” he says.

In fact, Apple is touted as having one of the largest production deployments of Cassandra of all, with over 75,000 nodes storing over 10 petabytes of data. Cassandra is a free and open source database with a commercial version offered by DataStax

The article further states that FoundationDB can operate on cheaper hardware, less nodes, and faster. It states that Apple could reduce their cluster size by 5-10%.

5-10% off of a cluster that size is not something to be sneezed at. We are talking upwards of 7500 servers and millions of dollars in savings in hardware and even more devops costs.

Since RAM is the new disk, disk is the new tape an in-memory data store backed by a disk is going to support reads that are orders of magnitude faster than a data store like Cassandra that uses disk as a primary storage mechanism. For example, Redis has a data model that is similar to Cassandra but it is entirely in-memory.

Of course, it all depends on your requirements. If your needs are to accumulate massive amounts of information that is queried infrequently or in off-peak batches, then Cassandra is very appropriate. But if you require consistent performance for both reads and writes you should look elsewhere.

Building a Supercomputer in AWS: Is it even worth it ?

The fact that Cray is still around is mind boggling. You’d think that commodity hardware and network technologies have long made supercomputing affordable for anyone interested. And yet, Cray Sells One of the World’s Fastest Systems:

“This, to IDC’s knowledge, is the largest supercomputer sold into the O&G sector and will be one of the biggest in any commercial market,” the report stated. “The system would have ranked in the top dozen on the November 2014 list of the world’s Top500 supercomputers.”

Building one of the dozen fastest supercomputers isn’t new for Cray – they’ve got three in the current top 12 now. But what is unique is that most of those 12 belong to government research labs or universities, not private companies. This may be starting to change, however. For example, IDC notes that overall supercomputing spending in the oil and gas sector alone is expected to reach $2 billion in the period from 2013-2018.

Supercomputers come with astronomical costs:

So, you’re in the market for a top-of-the-line supercomputer. Aside from the $6 to $7 million in annual energy costs, you can expect to pay anywhere from $100 million to $250 million for design and assembly, not to mention the maintenance costs

In the 1990s I was involved in a student project to build a Linux Beowulf cluster out of commodity components. It involved a half a dozen quad-core servers, with something like a gigabyte of RAM each. It cost a fortune, and it required us to obtain NSF funding for the project. I don’t recall the exact details.

I know, that a similarly configured modern cluster in AWS would cost a few hundred bucks a month if it was used continuously. But even the cluster we built at Clarkson was not used 24/7, and so if done right the same cluster would have cost a fraction of that in AWS.

Turns out I am not the only one who had an idea to build a Beowulf cluster in AWS:

After running through Amazon’s EC2 Getting Started Guide, and Peter’s posts I was up and running with a new beowulf cluster in well under an hour. I pushed up and distributed some tests and it seems to work. Now, it’s not fast compared to even a low-end contemporary HPC, but it is cheap and able to scale up to 20 nodes with only a few simple calls. That’s nothing to sneeze at and I don’t have to convince the wife or the office to allocate more space to house 20 nodes.

That last statement is important. Setting aside the costs, imagine the red tape required to put something like that together with the help of your on-premise IT department ?

At an AWS Summit a couple of years ago Bristol-Myers Squibb gave a talk on running drug trial simulations in AWS:

Bristol-Myers Squibb (BMS) is a global biopharmaceutical company committed to discovering, developing and delivering innovative medicines that help patients prevail over serious diseases. BMS used AWS to build a secure, self-provisioning portal for hosting research so scientists can run clinical trial simulations on-demand while BMS is able to establish rules that keep compute costs low. Compute-intensive clinical trial simulations that previously took 60 hours are finished in only 1.2 hours on the AWS Cloud. Running simulations 98% faster has led to more efficient and less costly clinical trials and better conditions for patients.

If I interpret that case study correctly BMS didn’t even bother with an on-premise supercomputer for this.

AWS of course is happy to oblige:

AWS allows you to increase the speed of research by running high performance computing in the cloud and to reduce costs by providing Cluster Compute or Cluster GPU servers on-demand without large capital investments. You have access to a full-bisection, high bandwidth network for tightly-coupled, IO-intensive workloads, which enables you to scale out across thousands of cores for throughput-oriented applications.

So, what would it cost to setup one of the worlds most powerful supercomputers in AWS and run it for one month ? I fully realize that this may not be a very accurate discussion, but let’s humor ourselves and try to imagine the biggest of the Top 500 Supercomputers built in AWS.

As of June 2013, the biggest super computer was at National University of Defense Technology in China, and it had 3,120,000 CPU cores. Let’s eyeball this in AWS using Amazon’s cost calculator. I put together a coupe of different HPC configurations.

Amazon’s g2.2xlarge instances have 8 cores and 15 gigabytes of RAM each. To get to the 3,120,000 cores one would need 390000 instances, which would cost $185,562,000.00 for a month, not including business support.

If you use No-Upfront Reserved for 1 year, the cost becomes $134,947,800.00 per month for a year. Three Year All-Upfront Reserved costs $2,889,900,000.00 up front and $100 a month.

Now, here is an important factor. On premises you have to build out the maximum capacity you will ever use, but in the cloud you can dynamically scale up and down as required by your workload. Whereas supercomputing was the domain of governments and wealthy corporations, it is now within reach of anyone building out in AWS.

Let’s try this with c4.8xlarge. On-demand this costs $119,901,600.00 a month. Three Year All-Upfront is $1,609,335,000.00 .

Of course, we don’t even know if such a thing is even possible on AWS — to quickly spin up a few hundred thousand servers. How long would that take ? This would probably require a conversation with AWS sales, and probably a volume discount. But either way, something tells me that for such large specialized computational workloads it would be naive to assume that building a supercomputer in the cloud would be cheaper.

This is why renting supercomputing time is still more efficient than both owning one or trying to spin one up in the cloud.

Week Of 4/6/2015 in Review

The Data Science Hype Cycle Must be at an All Time High

White House Appoints First Chief Data Scientist:

Patil, an alumni of the University of Maryland, has earlier served as VP of Product at RelateIQ and held positions at LinkedIn, Greylock Partners, Skype, PayPal, and eBay. At LinkedIn, the Wall Street Journal reports, he has been responsible for setting up Silicon Valley’s first data science team in the mid-2000s.

Can Cloud Outperform Raw Supercomputing Iron ?

The fact that Cray is still around is mind boggling. You’d think that commodity hardware and network technologies have long made supercomputing affordable for anyone interested. And yet, Cray Sells One of the World’s Fastest Systems. My question is this – what would be involved and how much would it cost to assemble a cluster in AWS that would perform comparably to a commercial raw-iron supercomputer ?

Reddit Eliminates Salary Negotiates for New Employees

Supposedly this is going to make the field more level for women wishing to work at Reddit. But once hired, can you negotiate for a raise ? If yes, then this whole idea is just a charade.

We have pay inequity across genders, across ages, and across many other aspects. GM recently hired a female CEO who is not only earning a fraction of her male predecessor, the male predecessor is still on the board earning the same salary he did before. All over the industry we have examples of younger employees earning less than older employees for the same job done, allegedly because of their lack of experience.

The only way to solve the problem of pay inequity is to make compensation numbers by role open and non-negotiable information. If each company published what they pay for any given role, and how much that compensation increases year over year, there would be no such thing as pay inequity.

Can you hear me now ?

I swear by all that is holy, each time I make a Skype or a Google Hangout call the first thing one of us says is “Can you hear me ?” If not, there is some fumbling with the settings, with the headphones, etc., and minutes later everything works.

Let me just say this: 1980s rotary phones worked better than any of this.

SCO Revives Their Lawsuit Against IBM

That patent troll SCO, which Microsoft is a major investor in, revived their Linux lawsuit against IBM. This is why I don’t trust any of Microsoft’s recent overtures in the open-source and Linux community.

Then again, VMWare got sued for failure to comply with GPL.

Microsoft Struggles to Maintain Relevance

Why does Microsoft offer Windows 10 for Raspberry Pi. Because they are struggling for relevance

Majority of “Big Data” Fits on Your Laptop

This poll shows that 85% of data scientists use Excel and other tools that run on their workstations rather than anything that even remotely requires cloud tools.

It’s like I always say, just because a data set has all of your data, it does not mean it is big.

Now Everyone Can be a Data Scientist

AWS launches Machine Learning Service. Now everyone can be a data scientist. I am not even being sarcastic – this means you no longer need to invest in home grown ML.

Amazon Wants to Replace On-Premise Data Centers

Enterprise IT departments will fight this tooth and nail, but it is inevitable that on-premise IT is an endangered species.

Ordered Sets and Logs in Cassandra vs SQL

I’ve written before that Cassandra’s achilles’ heel is devops:

Storage, redundancy and performance are expanded by adding more nodes. This can happen during normal business hours as long as consistency parameters are met. Same applies to node replacements.

As the number of servers grows be prepared to hire a devops army or look for a managed solution. Datastax offering helps but still not enough. Even in the cloud there is no good managed solution that we found. Cassandra.io requires you give up Thrift and CQL, and Instaclustr as of this moment does not use third generation SSD-backed instance types.

Since I have a total of about two dozen Cassandra nodes to upgrade and regression test the application I am not really looking forward to that exercise. So, one of the things I am working on is finding out whether we actually need Cassandra at all.

Having eliminated all relational-style queries (which are not a use case for Cassandra) by moving some of the data to PostgreSQL (which by the way, in some cases outperforms Cassandra by at least a factor of two on larger data sets1), I am now looking at some of the more obscure data structures that may be perfect for Cassandra but do not map directly onto SQL easily.

Frequently Updated Ordered Sets of Tuples

In Cassandra, there is a very convenient mechanism for maintaining an ordered set. You create a column family for a collection of ordered sets, with row key being a set identifier and column name being the name of an element in the set. Since columns are sorted and column names are unique within a row, you get an ordered set. In Cassandra a column family with ordered sets of tuples looks like this:

OrderedSetsCF   ->
    SetA    ->  (X=1),(Y=2),(Z=3)
    SetB    ->  (Y=10),(Z=4)

In SQL such a data structure would look like a table where set identifier and set element name are parts of the primary key, and value of an element is the third column. In this approach, the set identifier becomes denormalized and repeated for each set element. For large sets this can become a very big table, especially if you are using UUIDs for primary keys (which is a common technique with Cassandra):

OrderedSetsTable

SetID(PK)   ElementName(PK) ElementValue
----------------------------------------
SetA        X               1
SetA        Y               2
SetA        Z               3
SetB        Y               10
SetB        Z               4

Note that you can mitigate the problem of primary key denormalization in a situation where your keys are much bigger than is actually needed by the universe of your data (i.e. UUIDs). You can create a separate look up table where you store an integer code mapped onto your UUIDs and then use the integer code for your primary keys in the sets table.

Setting a value of a tuple in Cassandra is an O(1) operation due to the fact that Cassandra writes into a commitlog and a memtable and returns immediately. If you are writing into your set very rapidly, the values may not be readable until relatively much later especially if you have a high replication factor and you are not always hitting the same node for reads as the one you are using for writes. Eventually, over some period of time, your data becomes consistent across the cluster.

Since Cassandra is not ACID, the column does not need to be first inserted before you can do an update. Effectively, in Cassandra a create is the same as update. Not a single SQL database provides standard UPSERT operation and there is a lot of academic debate on why.

In SQL an insert into such a table is an O(log(n)) operation (roughly speaking, depending on the type of an index used) where nis a multiple of SetID and ElementName. As the number and size of such sets grows, and if the size of the keys grows, inserts become increasingly slower. Depending on the rate of your data ingestion this may or may not be much of an issue, considering that the upside of using SQL is that your data is immediately readable and you don’t have an eventual consistency problem2.

Since there is no UPSERT in SQL, and even if there was one it would require a read-before-write pattern, if you are to update a value you will incur a 2*O(log(n)) performance penalty — one time to look up the existing row, second time to update it, depending on how you do this and what your RDBMS query planner does. Again, this is the price you pay for ACID.

Updates and deletes complicate the situation for both Cassandra and an SQL database but differently. Cassandra uses append-onlySSTable files, meaning that all writes are O(1). However, in the background it must perform a compaction and until that happens you may use more disk space than you actually need and your reads become orders of magnitude slower because they have to skip over tombstones.

To illustrate the tombstone penalty, let’s pretend that value X in SetA above has been updated 3 times and value Y was deleted. Until compaction happens, what is actually stored on disk is this:

OrderedSetsCF   ->
    SetA    ->  (X=1),(Y=2),(Z=3)
            ->  (X=4) //(X=1) tombstone
            ->  (X=5) //(X=4) tombstone
            ->  (X=6) //(X=5) tombstone
            ->  {Y=2} //(Y=2) tombstone

Your set of 3 elements now takes up 4 extra units of storage until compaction happens. Furthermore, to get to the latest value of X and to tell that Y has been deleted Cassandra has to skip over the tombstones. So, what could have been an O(1)+O(log(n))operation (to first get to SetA using hash key and then to get to X using column index) now becomes an O(1)+O(log(n))+O(m) where n is the number of columns (elements in your set) and m is the number of times you updated X. If you just did a few hundred thousand updates to the same value over a relatively short period of time, you just created a serious bottleneck for your read operations.

A Cassandra row compaction itself requires additional storage and significant I/O capacity. Without getting into intricacies of the mechanism, consider the fact that in Cassandra SSTables are write-only. So, in order to compact the SetA row from the above example, Cassandra scans the row and skips over tombstones and writes it out into a new SSTable file. Once that file is written, the original fragmented SSTable is deleted. As a result, Cassandra may temporarily double the storage it requires. Furthermore, you have little control over when compaction actually happens.

Now, SQL databases are not immune to the problem of frequently updated data. For example, PostgreSQL VACCUM operation does exactly that — a very similar mechanism to what Cassandra does with compactions. There is really no escaping that problem. SQL databases like PostgreSQL may give you better control over storage reclamation than Cassandra, though, and because of the differences between VACUUM and compactions you are not incurring the tombstone penalty.

The reason why one would want to use SQL for this type of structure is simple: if you have a requirement to perform some real-time analytics over your data. It is as simple as that — are you accumulating rapidly changing data to process it later (in which case Cassandra or DynamoDB are very appropriate) or are you accumulating same data in order to gain meaningful insights out of it in real-time (in which case SQL is more appropriate) ?

Log-style Structures

Let’s suppose you want to store events from a multitude of remote sensors. This can be temperature sensors in different locations, market data by ticker symbol, or clicks in the apps across thousands or millions of users. Suppose you only want to retain this data for 2 days.

Each event is identified by its source, time (in milliseconds), event type, event id, and some event value.

In Cassandra column family one would store it like this. Source is your row key, time+eventType+eventId are your composite column name, and eventValue is your column value. Each column will have a ttl of 2 days (expressed in seconds). It would look like this:

EventsCF >
    SourceA -> (0:TypeA:1, X, ttl=172800), (1:TypeA:2,Y, ttl=172800), (1:TypeB:3,X, ttl=172800)
    SourceB -> (0:TypeB:1, X, ttl=172800), (1:TypeA:2,Y, ttl=172800)

and so on and so forth. In SQL, it would look like this:

EventsTable

source(PK)  time(PK)    eventType(PK)   eventId(PK) eventValue
---------------------------------------------------------------
SourceA     0           TypeA           1           X   
SourceA     1           TypeA           2           Y   
SourceA     1           TypeB           1           X       

and so on. You’d need a batch job to run regularly to delete rows older than 2 days.

In Cassandra, all writes are O(1). In SQL, all writes are O(log(n)) where n is the number of PK combinations. As the size of your log grows, so will the time it takes to insert a row. One could mitigate this in SQL by not using PK or indeces, but then querying this table will become nearly impossible.

Cassandra has a concept of a ttl on values, meaning that they logically disappear when ttl is up. However, it does not mean that the disk space is reclaimed. This too suffers from the compaction problem, and until compaction happens this data structure may consume an enormous amount of disk space. Suppose you accumulate 1 million log entries per day per source. Five days days later, unless compaction happened, you are actually storing 3 days more of data than you actually require.

Retrieval of this data out of Cassandra becomes a bit tricky. If you naively assume that by reading the entire row you are only reading the last two days worth of data you are wrong. Until compaction happens Cassandra will have to scan over tombstones – and in this example three days worth of them! Even if you optimize a bit and use a slice query starting at two days ago the best you will get out of Cassandra will be O(log(n)) where n is the total number of log entries you made in the last five days (until compaction happens).

The disk storage problem is further exacerbated here. Since the data with expired ttl won’t actually get deleted until compaction happens, and compaction itself may temporarily double the disk storage requirement you need to make sure you leave extra space on each node. Furthermore, this type of a structure in Cassandra may create an imbalance in the cluster if the amount of data varies a lot between sources.

Cassandra is a clear winner here from the performance perspective if the goal is to collect immense amount of data, especially if that data never expires. However, in the cloud environment like AWS I’d use Amazon’s facilities such as DynamoDB, EMR, or RedShift. Cassandra, as it grows, does become a devops nightmare. Over time you may end up with dozens, or hundreds of nodes if you never expire or delete data.

Conclusion

So what am I really getting at here ? Well, Cassandra really is a devops nightmare. I know I am going to stir some debate up on twitter with what I just said. I’d love nothing more than to stop using it. However, it continues to be a useful tool for some of the use cases I deal with, and for all its flaws I have not found a better option yet. As I keep saying, all I want is Cassandra that is a managed SaaS like DynamoDB where I don’t have to worry about devops.


  1. Yes I know I need to provide a benchmark. In this post I wanted to spark a conversation and then if I find the time I’ll post a benchmark. 
  2. This is negated by the use of read replicas that may experience a lag. 

Exploration of the Software Engineering as a Profession

In 1992 Ed Yourdon wrote Decline and Fall of the American Programmer followed by Rise and Resurrection of the American Programmer just four years later. The first book spelled doom and gloom for the American Programmers who were going to get replaced by cheaper counterparts in India, Russia, Philippines, etc. The second book revisited some of the predictions based on the changes that the software industry has undergone in the years between the books.

I have read both books as a freshman in college and both books were incredibly thought provoking. As a talented computer science student I did not feel seriously threatened by the predictions of the Decline and Fall, nor was I convinced by the conclusions from Rise and Resurrection. These books did spark controversy in the industry, but as all literature goes they were opinions rooted in facts of that time period. As I like to tell people who ask me questions any recommendation I make is based on facts known to me up to this moment and are not a guarantee of future results. Likewise, Decline and Fall and Rise and Resurrection had to be viewed in that prism.

Both books were based on popular management techniques of the time that emphasized separation of cognitive aspects of software development from programming. Indeed, popular software engineering project management techniques at the time were based on the experience from electrical and other engineering disciplines that put more weight on the design than on the implementation.

What I’d like to do is a modern exploration of the future of the software engineering in the United States as a craft and as a profession.

As it turned out, software engineering is not really an engineering discipline, and computer science is not really a science. In civil engineering, for example, a bridge that is safe and lasts for centuries takes months and years to design by highly qualified and well paid engineers and is then built to the specifications and design by individual craftsmen working in teams. A bridge is subject to forces beyond designers’ and engineers’ control. Once built, a bridge is extremely difficult to incrementally upgrade. That is obviously not the case with software.

Furthermore, unlike other engineering disciplines software has an incredible low cost of entry. While some engineering disciplines require years of education and apprenticeship, software engineering does not (but it could benefit from it). An architect would require a substantial capital investment to build a building. A software engineer, on the other hand, just needs food, a $1000 worth of equipment, and some spare time to build the next Twitter or Facebook.

Many of the predictions about outsourcing have not panned out either. Software engineers need to be domain area experts for example, something that is not easily accomplishable if you intend to have your software built by a generic pool of engineers overseas. Open-source is a great equalizer – whereas in the 1980s and 1990s one needed to hire an army of programmers to build boiler plate code, majority of the platform code is out there in the open today. Cloud platforms like AWS eliminate the need for an army of on-premise IT personnel – although they do create a temporary opening for outsourcing vendors to help customers migrate.

These are the topics that I’d like to explore over the next few months on this blog. Is there a future for software engineering as a profession in the United States ? What is the present state ? What are the forces at play ?