"I definitely saw some power in voice. It's a very powerful form of storytelling."
– Akilah Bolden-Monifa
For our human ancestors, as their brains evolved, so did their language, from signs and sounds to a more sophisticated form of oral speech, which made them capable of having complex conversations to form the social ties required for their survival. Unlike written communication, oral speech leaves no traces of its own, hence it was hard for historians to calculate an exact date for the origin of speech. However, using various methods, historians have speculated that speech was developed 300,000 years ago, symbols 30,000 and writing 7,000 years ago. Ever since then, humans have been putting speech and voice to various creative uses.
In this chapter, we shall explore one such use of our voice, the ability to command interactive voice-based personal assistants to perform specific tasks at will. Also before that, we will also understand what an intelligent voice-based personal assistant is, what needs it fulfills, and what voice-based personal assistants are available (including Alexa) in the current market by going through the following topics:
- The Need for Voice-Based Personal Assistants
- Applications of Voice-Based Personal Assistants
- A Comparison of Various Voice-Based Personal Assistants
So, let's move on to our first topic.
To understand the evolution of voice-based personal assistants, we will have to go back in time and see some of the important events that led to their advent. One of these many events was the evolution of computers. Although not directly related to the voice revolution, the evolution of computers played a key role in the evolution of voice-based personal assistants because it marked the invention of the internet, which is the backbone of most voice-based personal assistants. The computer revolution also introduced critical changes concerning hardware and integrated circuits, which we shall discuss next.
The computer revolution began in the 19th century when Charles Babbage invented the first analytical engine, which earned him the nickname the Father of Computers. The 1950s and 1960s were interesting times, which introduced some tremendous advances in the field of computer science with a groundbreaking invention, integrated circuits. Integrated circuits replaced diodes and vacuum tubes, which led to tremendous form factor changes in existing computers, in turn leading to smaller, more compact sizes. It was also the time when Gordon Moore introduced his famous observation that the number of transistors in an integrated circuit doubles every two years; roughly speaking, we would be able to pack more and more processing power into an integrated circuit while the size of the circuit would shrink every two years. Moore's observation already foresaw the future of our technology and hardware, and by following it we could have easily predicted at least one thing, that we would be seeing our computers getting smaller, a lot smaller, and voilà, today nearly everyone has a small computer in his/her hands, their smartphone.
The late '60s and early '70s also saw the advent of the Advanced Research Projects Agency Network (ARPANET), which eventually evolved to become the internet as we came to know it in the '80s. All this sounds trivial at first, before you realize that all these were the key factors that, had they not been invented, we would have never seen voice-based personal assistants in action.
Prior to voice-based personal assistants, the traditional way of sending commands to a computer system was either through the GUI using a mouse or through the terminal using a keyboard. As the form factor of traditional computing systems reduced, the input methods evolved too and initial handheld devices/mobile phones introduced a stylus in addition to the traditional keyboard to leverage the touchscreen capabilities of the device:
Figure 1.1: A smartphone with a stylus, captured in the year 2010
The evolution continued and the place of the stylus was taken by, as pointed out by Steve Jobs,"the best pointing device in the world," our fingers.
Steve Jobs introduced touch on the iPhone by using the term "best pointing device in the world" for a user's fingers in 2007 during the MacWorld Conference in San Francisco. The highlights of this conference are available on YouTube at https://www.youtube.com/watch?v=P-a_R6ewrmM.
As the interface between computers and humans grew thinner, it was only natural that voice was the next medium that could act as an input tool to computing devices, and hence there has been the advent of voice-based personal assistants.
The idea of having voice as an input medium for computing devices was not new; parallel to the computer revolution, there was also the voice revolution, many important discoveries of which are shown in the link: https://voicebot.ai/2017/07/14/timeline-voice-assistants-short-history-voice-revolution/
Of the many milestones of the voice revolution, almost every reader will be familiar with at least a few of the latest ones, namely Siri, Google Now, Cortana, and Amazon's Alexa. The most popular ones are Apple's Siri and Google's Google Now, which initially appeared integrated with iOS and Android mobile devices, respectively.
Apple's Siri initially appeared as an app on Apple's App Store, but was later acquired by Apple and became much more closely integrated with iOS devices. Siri uses a natural language interface to listen to commands from the user and perform the necessary actions. Also, with the coming of macOS Sierra, its capabilities were no longer limited to iOS devices:
Figure 1.3: The capabilities of Siri also extend to desktops in addition to iPhones
Google closely followed in the footsteps of Apple and, shortly after the introduction of Siri in 2011, introduced Google Now in 2012. Unlike Siri, Google Now was available natively for Android and also as a separate app for iOS devices. Google Now seamlessly integrated with other Android/Google features such as Gmail, Google Calendar, and the mighty Google Search itself:
Figure 1.4: Google Now is available on iOS as part of a native app (Google and the Google logo are registered trademarks of Google Inc., used with permission.)
Closely behind Google was Microsoft with its own intelligent voice-based assistant, Cortana, which it introduced in 2014 for desktop and mobile devices:
Figure 1.5: Microsoft's Cortana was initially introduced for Microsoft's mobile and desktop computing systems
As time passed, it became evident that voice-based personal assistants were here to stay and needed exclusive hardware and space of their own. This was something that Amazon took the lead on with the introduction of its brand Amazon Echo, which was a device family of smart speakers, specifically designed and developed by Amazon Inc. to enable its users touse the services of an interactive voice-based personal assistant called Alexa (hence the title of the chapter):
Figure 1.6: The Amazon Echo device family
Original flagship smart speaker.
Smaller and cheaper version of Echo without the amplified speaker, so the sound quality is also inferior to Echo.
Latest version of Echo with Zigbee integration.
Alexa-enabled device with a large touchscreen so that a user's interaction with Alexa is not just auditory but also visual.
Show+Dot=Spot. All the basic functionality of Show and Dot devices with the much lesser form factor.
Alexa-enabled Bluetooth speaker.
The Echo family marked Amazon's second foray into the hardware domain, the first being its introduction of the popular ebook reader, Kindle. Google also recognized the fact that interactive voice assistants can do much more by specifically leveraging the smart home concept and closely followed behind Amazon with its Google Home Smart Speaker, which contained Google Assistant as Alexa's counterpart:
Figure 1.7: Launch timeline for various voice-based personal assistants (source: www.citiusminds.com)
Please note that the preceding diagram does not include Google Now, which was introduced in 2012.
We have discussed the evolution of voice-based interactive personal assistants and how they developed from just another app on the user's smartphone to the user's smart home.
We discussed the evolution of voice-based personal assistants in the previous section. In this section, we shall extend that discussion to some of the popular uses of each of the interactive voice-based personal assistants, irrespective of whether the assistant in question is desktop, smartphone, or smart home-based. We shall begin with one of the earliest and most well-known ones, Apple's Siri.
As indicated earlier, Siri started as a separate smartphone app in 2011 for iOS, which was later on acquired by Apple. Initially, the capabilities of Siri were limited to smartphones and simple functions such as:
- Looking up contacts
- Messaging (SMS)
- Fetching weather updates on user demand, plus other simple queries as mentioned in the previous section
To know more about SiriKit, please visit https://developer.apple.com/sirikit/.
If the user has the following third-party apps installed, he/she can request a ride using Siri:
If the user has the following third-party apps installed, he/she can set those to send a message (and not just an SMS) using Siri:
Please note that the preceding lists are not exhaustive. However, third-party integrations were not the only thing on Apple's roadmap to extend the capabilities of Siri. The launch of macOS Sierra also brought the capabilities of Siri to the desktop. To know more about Siri's desktop capabilities, please visit https://support.apple.com/en-us/HT206993.
Siri can also help a user to:
- Search files on his/her Mac
- Notify the user about their storage space
- Send requests to
Contacts, and many others as shown here:
Figure 1.8: List of things Siri can help with (non-exhaustive) (source: www.osxdaily.com)
With a fair idea about Siri's desktop and smartphone capabilities, let's now move on to another popular voice assistant.
We are going to discuss the Android and Google Now next, which at the time of writing is the biggest player in the smartphone market and also the home of Google Now, the voice assistant introduced by Google for Android smartphones in 2012.
In early 2010, the smartphone market was dominated by many players. Over the years, this has filtered down and only two major players remain in the market as depicted as follows:
Figure 1.9: Smartphone market share distribution comparison between the years 2010 and 2016 (Data sourced from Gartner)
Google Now can do pretty much all that Siri can accomplish; however, it has better integration with the web and web-based queries, since the web is Google's main forte. Some of the things that a user can ask Google Now are:
Figure 1.10: Some of the things that Google Now can do (Data source: www.cnet.com)
Apart from Google Now, Google also has introduced Google Assistant, which is a more evolved version of Google Now, given the fact that the user can hold full-length conversations with Google Assistant, which is not possible with Google Now.
It is very likely that Google Now will be phased out and Google Assistant will take its place; however, Google Assistant is currently only available on Google Home, which is Google's smart home speaker; the Android Pixel 2 smartphone; and for Android Wear:
Figure 1.11: Devices on which Google Assistant is available (Google and the Google logo are registered trademarks of Google Inc., used with permission.)
Figure 1.12: Desktop market share as of January 2017 (Data source: www.windowscentral.com)
As shown in the preceding graph, as of January 2017, the desktop market had Windows, Linux, and Mac OS X as major players, with Microsoft being the dominant force, which brings us to our next personal assistant.
Figure 1.13: List of some things that Cortana can help with
Not just limited to Windows 10, Windows 10 Mobile, and Windows Phone 8.1, Cortana is also available for:
- iOS (as a separate app)
- Android (as a separate app)
- Xbox One
- Invoke smart Bluetooth speaker by Harman Kardon
- Web-based queries using Bing Search (for example, "Who is the President of the United States?")
- Launch apps and turn on/off Wi-Fi/Bluetooth
- Ask about weather
- Manage appointments, reminders, and events
With that, we come to discuss the Star of this book.
Alexa, the whole center point of this chapter and the book, is the interactive voice-based personal assistant by Amazon, originally introduced with its family of Echo devices. Alexa as an assistant is oriented towards a smart home concept, hence most of its use comes from Amazon Echo, a smart speaker designed and developed to be kept in the living room of the user's home so that the user can ask it day-to-day queries about weather, food recipes, and jokes, or play interactive trivia games, set alarms, shop for day-to-day items, and much more. The following diagram shows some of the things that a user can ask Alexa:
Figure 1.14: List of some things that Alexa can help with
The capabilities of Alexa can also be extended by installing third-party skills (similar to Google Home's third-party apps). Each third-party skill is meant to serve a specific purpose. For example, the Uber skill allows you to order a ride, the Domino's skill allows you to order a pizza—all from the comfort of your home and through the magic of your voice working together with Alexa.
As of the time of writing this, there are more than 15,000 skills available for Alexa with Uber and Lyft being the most used ones in the travel category, Pandora and Spotify for music streaming, and multiple other skills being utilized in home automation.
Due to our previous discussions, we already know that each market, whether it is desktop, smartphones, or smart homes, has a steady supply of interactive voice-based personal assistants. Almost every assistant can do whatever its counterparts can accomplish, but this leads to the question, where do the actual differences lie? Is there something that Alexa can do better than Google Assistant or vice versa?
This book is based on Alexa, which is a Smart-Home basedpersonal assistant, so in this section, we shall compare Alexa and Google Assistant to understand the finer differences between the two:
Uses the invocation phrase, "Alexa"
Uses the invocation phrase, "OK, Google"
Flagship hardware—Amazon Echo device family
Flagship hardware—Google Home, Pixel 2, Android Wear
Responds slightly better to e-commerce/shopping-related queries, since that is Amazon's main forte
Responds slightly better to web-based queries since Google's major forte is web searching
Slightly inferior contextual awareness
Better contextual awareness, hence conversations seem a little more natural
Capabilities of Alexa can be extended by installing third-party "skills"
Capabilities of Google Assistant can be extended by installing third-party apps; however, it has fewer apps currently available for it in the market than Alexa has skills
A wider range of integration with smart home devices such as smart lights, smart locks, smart switches, and smart thermostats
Slightly narrower range of integration with smart home devices
In a nutshell, both Google and Alexa are very skilled voice-based assistants and accomplish a lot for their users; however, since Google Assistant is fairly new to the market, its integration and compatibility with third-party apps and hardware is still evolving, albeit at a very rapid pace. However, even being the newer of the two, Google Home still fares better in terms of web integration and contextual awareness.
In this chapter, we covered the evolution of interactive voice-based personal assistants and the various factors involved in their move from a user's smartphone to their smart home. We also saw the various interactive voice-based personal assistants in the smartphone, desktop, and smart home markets, and the capabilities of each.
Our goal was to get the reader familiar with the history of interactive voice-based personal assistants so that over the course of the book, we can direct our focus onto Alexa, the interactive personal assistant bundled with Amazon Echo. The next chapter will enable the reader to understand the anatomy of an Alexa Skill and to hands-on program an Amazon Echo so that Alexa can learn to say one of the oldest phrases in computer programming, "Hello, World."