When Amazon Echo with Alexa service came out in November 2014 I was skeptical. A speaker with voice recognition seemed like an unneccessary oddity. When a friend of mine purchased one in 2015 I had a chance to play with it but was unimpressed still.
Alexa SDK has been open to third party developers for a year now. As a software engineer it is important for me to keep up with emerging technologies and learn about them. I purchased an Amazon Echo about a month ago and had an opportunity to interact with the technology and try out the SDK.
More useful than Siri
Comparing Alexa to Siri is like comparing apples to oranges. Yes, both are speech bots. That’s probably as much as they have in common.
The primary Alexa service revolves around information lookup, home automation, and shopping on Amazon. Users can enable “skills”, which are essentially speech-based apps, and expand Alexa’s functionality.
From the speech recognition standpoint, Alexa is definitely more responsive than Siri. This is a family-friendly product and as such it needs to handle different speech patterns – children, adults, and elderly. In my experiments, I found Alexa to be more accurate than both Siri and Google, but of course your mileage may vary.
Don’t expect it to pass a Turing test
In a Turing test a human operator uses a text-only terminal to interact with two test subjects separated from one another. The operator is aware that one subject is a machine and the other is a human, but they do not know which one. The machine subject is considered to have passed the test if the operator cannot tell which one is which.
Ask Alexa if she can pass a Turing test and she will answer: “I don’t need to pass that, I am not pretending to be human.” Expecting Alexa to pass this test is sure recipe for a disappointment. It is more advanced than interactive voice response systems and sure as hell more powerful than Siri, but it is not human.
The first analogy that occurred to me was that of Palm OS and Graffiti. Palm couldn’t pack the computing power needed to process handwriting while also keeping the cost of the device low. They instead asked the users to learn a dumb-down script-like mechanism to input data into the PDAs.
Likewise, Alexa’s users are expected to adapt a bit to Alexa’s capabilities. It doesn’t respond to an infinite variety of sentence structures, nor does it maintain a conversation like a human would. In short, it is a “chat bot.”
The good news is that Alexa is continuously improving. All the software needed to handle voice recognition and AI lives in the cloud. Amazon is continuously updating and improving the platform.
Amazon made it easy to contribute skills
The Alexa Skills Kit is well documented and easy to learn, especially if you use AWS Lambda. The developer needs to provide sample phrases, or utterances. The utterances get mapped onto intents and can have slots for custom words. Alexa’s machine learning backend does all of the analysis and by the time the code is reached everything is broken down into intents and slot values.
To get a sense of what’s involved in building speech bots I built a few simple skills and submitted them to Amazon for certification. Amazon provides a checklist to set expectations for developers. My experience working through the process is that it is very subjective – much like the experience of using Alexa itself.
Alexa Skills Kit is still in its early stages. I wish Amazon put a little more effort into making it work more smoothly with build tools, such as Jenkins. I would also like to see a monetization scheme similar to Amazon Underground.
Some final thoughts
Using Alexa for a few weeks I’ve become accutely aware of the contrast between dealing with a call center and dealing with AI. I must say, that dealing with AI is far more pleasant.
Shortly after getting Echo we needed to resolve an issue with our airline for an upcoming family trip. Unable to solve this problem using their website we had to call their customer service. As expected, I had to navigate the frustrating tree of menus. When I finally got to speak to someone they could barely speak English. They could only speak to a script and any diversion resulted in being transfered to someone in another department in what seemed like an endless vortex of incompetence.
Patrick Thibodeau has written a lot about outsourcing and flow of U.S. white collar jobs to low-cost countries. However, there is a bigger more secular change happening – and it will happen faster than anything we’ve experienced before. Any job that involves information lookup, scheduling, or following a script is bound to get replaced with an AI.
This story was originally published at my “Cloud Power” Blog at Computerworld on July 19th, 2016. Featured image credit Ken M Earney via Flickr