Alexa Skills Projects

3.5 (2 reviews total)
By Madhur Bhargava
    What do you get with a Packt Subscription?

  • Instant access to this title and 7,500+ eBooks & Videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
About this book

Amazon Echo is a smart speaker developed by Amazon, which connects to Amazon’s Alexa Voice Service and is entirely controlled by voice commands. Amazon Echo is currently being used for a variety of purposes such as home automation, asking generic queries, and even ordering a cab or pizza.

Alexa Skills Projects starts with a basic introduction to Amazon Alexa and Echo. You will then deep dive into Alexa Programming concepts such as Intents, Slots, Lambdas and maintaining your skill’s state using DynamoDB. You will get a clear understanding of how some of the most popular Alexa Skills work, and gain experience of working with real-world Amazon Echo applications. In the concluding chapters, you will explore the future of voice-enabled applications and their coverage with respect to the Internet of Things.

By the end of the book, you will have learned to design Alexa Skills for specific purposes and interact with Amazon Echo to execute these skills.

Publication date:
June 2018
Publisher
Packt
Pages
250
ISBN
9781788997256

 

Chapter 1. What is Alexa?

"I definitely saw some power in voice. It's a very powerful form of storytelling."

– Akilah Bolden-Monifa

For our human ancestors, as their brains evolved, so did their language, from signs and sounds to a more sophisticated form of oral speech, which made them capable of having complex conversations to form the social ties required for their survival. Unlike written communication, oral speech leaves no traces of its own, hence it was hard for historians to calculate an exact date for the origin of speech. However, using various methods, historians have speculated that speech was developed 300,000 years ago, symbols 30,000 and writing 7,000 years ago. Ever since then, humans have been putting speech and voice to various creative uses.

In this chapter, we shall explore one such use of our voice, the ability to command interactive voice-based personal assistants to perform specific tasks at will. Also before that, we will also understand what an intelligent voice-based personal assistant is, what needs it fulfills, and what voice-based personal assistants are available (including Alexa) in the current market by going through the following topics:

  • The Need for Voice-Based Personal Assistants
  • Applications of Voice-Based Personal Assistants
  • A Comparison of Various Voice-Based Personal Assistants

So, let's move on to our first topic.

 

The Need for Voice-Based Personal Assistants


To understand the evolution of voice-based personal assistants, we will have to go back in time and see some of the important events that led to their advent. One of these many events was the evolution of computers. Although not directly related to the voice revolution, the evolution of computers played a key role in the evolution of voice-based personal assistants because it marked the invention of the internet, which is the backbone of most voice-based personal assistants. The computer revolution also introduced critical changes concerning hardware and integrated circuits, which we shall discuss next.

The computer revolution began in the 19th century when Charles Babbage invented the first analytical engine, which earned him the nickname the Father of Computers. The 1950s and 1960s were interesting times, which introduced some tremendous advances in the field of computer science with a groundbreaking invention, integrated circuits. Integrated circuits replaced diodes and vacuum tubes, which led to tremendous form factor changes in existing computers, in turn leading to smaller, more compact sizes. It was also the time when Gordon Moore introduced his famous observation that the number of transistors in an integrated circuit doubles every two years; roughly speaking, we would be able to pack more and more processing power into an integrated circuit while the size of the circuit would shrink every two years. Moore's observation already foresaw the future of our technology and hardware, and by following it we could have easily predicted at least one thing, that we would be seeing our computers getting smaller, a lot smaller, and voilà, today nearly everyone has a small computer in his/her hands, their smartphone.

The late '60s and early '70s also saw the advent of the Advanced Research Projects Agency Network (ARPANET), which eventually evolved to become the internet as we came to know it in the '80s. All this sounds trivial at first, before you realize that all these were the key factors that, had they not been invented, we would have never seen voice-based personal assistants in action.

Prior to voice-based personal assistants, the traditional way of sending commands to a computer system was either through the GUI using a mouse or through the terminal using a keyboard. As the form factor of traditional computing systems reduced, the input methods evolved too and initial handheld devices/mobile phones introduced a stylus in addition to the traditional keyboard to leverage the touchscreen capabilities of the device:

Figure 1.1: A smartphone with a stylus, captured in the year 2010

The evolution continued and the place of the stylus was taken by, as pointed out by Steve Jobs,"the best pointing device in the world," our fingers.

Note

Steve Jobs introduced touch on the iPhone by using the term "best pointing device in the world" for a user's fingers in 2007 during the MacWorld Conference in San Francisco. The highlights of this conference are available on YouTube at https://www.youtube.com/watch?v=P-a_R6ewrmM.

As the interface between computers and humans grew thinner, it was only natural that voice was the next medium that could act as an input tool to computing devices, and hence there has been the advent of voice-based personal assistants.

Note

The idea of having voice as an input medium for computing devices was not new; parallel to the computer revolution, there was also the voice revolution, many important discoveries of which are shown in the link: https://voicebot.ai/2017/07/14/timeline-voice-assistants-short-history-voice-revolution/

Of the many milestones of the voice revolution, almost every reader will be familiar with at least a few of the latest ones, namely Siri, Google Now, Cortana, and Amazon's Alexa. The most popular ones are Apple's Siri and Google's Google Now, which initially appeared integrated with iOS and Android mobile devices, respectively.

Apple's Siri initially appeared as an app on Apple's App Store, but was later acquired by Apple and became much more closely integrated with iOS devices. Siri uses a natural language interface to listen to commands from the user and perform the necessary actions. Also, with the coming of macOS Sierra, its capabilities were no longer limited to iOS devices:

Figure 1.3: The capabilities of Siri also extend to desktops in addition to iPhones

Google closely followed in the footsteps of Apple and, shortly after the introduction of Siri in 2011, introduced Google Now in 2012. Unlike Siri, Google Now was available natively for Android and also as a separate app for iOS devices. Google Now seamlessly integrated with other Android/Google features such as Gmail, Google Calendar, and the mighty Google Search itself:

Figure 1.4: Google Now is available on iOS as part of a native app (Google and the Google logo are registered trademarks of Google Inc., used with permission.)

Closely behind Google was Microsoft with its own intelligent voice-based assistant, Cortana, which it introduced in 2014 for desktop and mobile devices:

Figure 1.5: Microsoft's Cortana was initially introduced for Microsoft's mobile and desktop computing systems 

As time passed, it became evident that voice-based personal assistants were here to stay and needed exclusive hardware and space of their own. This was something that Amazon took the lead on with the introduction of its brand Amazon Echo, which was a device family of smart speakers, specifically designed and developed by Amazon Inc. to enable its users touse the services of an interactive voice-based personal assistant called Alexa (hence the title of the chapter):

Figure 1.6: The Amazon Echo device family

The complete Echo family and their functionalities are described in the following table:

Device

Use

Amazon Echo

Original flagship smart speaker.

Echo Dot

Smaller and cheaper version of Echo without the amplified speaker, so the sound quality is also inferior to Echo.

Echo Plus

Latest version of Echo with Zigbee integration.

Echo Show

Alexa-enabled device with a large touchscreen so that a user's interaction with Alexa is not just auditory but also visual.

Echo Spot

Show+Dot=Spot. All the basic functionality of Show and Dot devices with the much lesser form factor.

Amazon Tap

Alexa-enabled Bluetooth speaker.

The Echo family marked Amazon's second foray into the hardware domain, the first being its introduction of the popular ebook reader, Kindle. Google also recognized the fact that interactive voice assistants can do much more by specifically leveraging the smart home concept and closely followed behind Amazon with its Google Home Smart Speaker, which contained Google Assistant as Alexa's counterpart:

Figure 1.7: Launch timeline for various voice-based personal assistants (source: www.citiusminds.com)

Please note that the preceding diagram does not include Google Now, which was introduced in 2012.

We have discussed the evolution of voice-based interactive personal assistants and how they developed from just another app on the user's smartphone to the user's smart home.

In the next section, we shall discuss some of the popular uses of voice-based interactive personal assistants.

 

Applications of Voice-Based Personal Assistants


We discussed the evolution of voice-based personal assistants in the previous section. In this section, we shall extend that discussion to some of the popular uses of each of the interactive voice-based personal assistants, irrespective of whether the assistant in question is desktop, smartphone, or smart home-based. We shall begin with one of the earliest and most well-known ones, Apple's Siri.

Siri

As indicated earlier, Siri started as a separate smartphone app in 2011 for iOS, which was later on acquired by Apple. Initially, the capabilities of Siri were limited to smartphones and simple functions such as:

  • Looking up contacts
  • Messaging (SMS)
  • Fetching weather updates on user demand, plus other simple queries as mentioned in the previous section

However, Apple's roadmap also extended the capabilities of Siri by closely integrating it with third-party apps and, true to their promise, with the coming of iOS 10, Apple also released SiriKit.

Note

To know more about SiriKit, please visit https://developer.apple.com/sirikit/.

If the user has the following third-party apps installed, he/she can request a ride using Siri:

  • Uber
  • Lyft

If the user has the following third-party apps installed, he/she can set those to send a message (and not just an SMS) using Siri:

  • WhatsApp
  • LinkedIn
  • WeChat
  • Slack

A user can also make VoIP calls using the following apps via Siri:

  • Skype
  • Viber

Note

Please note that the preceding lists are not exhaustive. However, third-party integrations were not the only thing on Apple's roadmap to extend the capabilities of Siri. The launch of macOS Sierra also brought the capabilities of Siri to the desktop. To know more about Siri's desktop capabilities, please visit https://support.apple.com/en-us/HT206993.

Siri can also help a user to:

  • Search files on his/her Mac
  • Notify the user about their storage space
  • Send requests to FaceTime with Contacts, and many others as shown here:

Figure 1.8: List of things Siri can help with (non-exhaustive) (source: www.osxdaily.com)

With a fair idea about Siri's desktop and smartphone capabilities, let's now move on to another popular voice assistant.

Google Now

We are going to discuss the Android and Google Now next, which at the time of writing is the biggest player in the smartphone market and also the home of Google Now, the voice assistant introduced by Google for Android smartphones in 2012.

In early 2010, the smartphone market was dominated by many players. Over the years, this has filtered down and only two major players remain in the market as depicted as follows:

Figure 1.9: Smartphone market share distribution comparison between the years 2010 and 2016 (Data sourced from Gartner)

Google Now can do pretty much all that Siri can accomplish; however, it has better integration with the web and web-based queries, since the web is Google's main forte. Some of the things that a user can ask Google Now are:

Figure 1.10: Some of the things that Google Now can do (Data source: www.cnet.com)

Apart from Google Now, Google also has introduced Google Assistant, which is a more evolved version of Google Now, given the fact that the user can hold full-length conversations with Google Assistant, which is not possible with Google Now.

It is very likely that Google Now will be phased out and Google Assistant will take its place; however, Google Assistant is currently only available on Google Home, which is Google's smart home speaker; the Android Pixel 2 smartphone; and for Android Wear:

Figure 1.11: Devices on which Google Assistant is available (Google and the Google logo are registered trademarks of Google Inc., used with permission.)

Now, moving on from the smartphone market to the desktop market:

Figure 1.12: Desktop market share as of January 2017 (Data source: www.windowscentral.com)

As shown in the preceding graph, as of January 2017, the desktop market had Windows, Linux, and Mac OS X as major players, with Microsoft being the dominant force, which brings us to our next personal assistant.

Cortana

With Microsoft's clear dominance of the desktop market, we cannot ignore Cortana, which is Microsoft's answer to Siri and Google Assistant, but focused on desktop and Windows Mobile:

Figure 1.13: List of some things that Cortana can help with

Not just limited to Windows 10, Windows 10 Mobile, and Windows Phone 8.1, Cortana is also available for:

  • iOS (as a separate app)
  • Android (as a separate app)
  • Xbox One
  • Invoke smart Bluetooth speaker by Harman Kardon

Some of the many things that Cortana can accomplish are:

  • Web-based queries using Bing Search (for example, "Who is the President of the United States?")
  • Launch apps and turn on/off Wi-Fi/Bluetooth
  • Ask about weather
  • Manage appointments, reminders, and events

With that, we come to discuss the Star of this book.

Alexa

Alexa, the whole center point of this chapter and the book, is the interactive voice-based personal assistant by Amazon, originally introduced with its family of Echo devices. Alexa as an assistant is oriented towards a smart home concept, hence most of its use comes from Amazon Echo, a smart speaker designed and developed to be kept in the living room of the user's home so that the user can ask it day-to-day queries about weather, food recipes, and jokes, or play interactive trivia games, set alarms, shop for day-to-day items, and much more. The following diagram shows some of the things that a user can ask Alexa:

Figure 1.14: List of some things that Alexa can help with 

The capabilities of Alexa can also be extended by installing third-party skills (similar to Google Home's third-party apps). Each third-party skill is meant to serve a specific purpose. For example, the Uber skill allows you to order a ride, the Domino's skill allows you to order a pizza—all from the comfort of your home and through the magic of your voice working together with Alexa.

As of the time of writing this, there are more than 15,000 skills available for Alexa with Uber and Lyft being the most used ones in the travel category, Pandora and Spotify for music streaming, and multiple other skills being utilized in home automation.

 

A Comparison of Various Voice-Based Personal Assistants


Due to our previous discussions, we already know that each market, whether it is desktop, smartphones, or smart homes, has a steady supply of interactive voice-based personal assistants. Almost every assistant can do whatever its counterparts can accomplish, but this leads to the question, where do the actual differences lie? Is there something that Alexa can do better than Google Assistant or vice versa?

This book is based on Alexa, which is a Smart-Home basedpersonal assistant, so in this section, we shall compare Alexa and Google Assistant to understand the finer differences between the two:

Alexa

Google Assistant

Uses the invocation phrase, "Alexa"

Uses the invocation phrase, "OK, Google"

Flagship hardware—Amazon Echo device family

   Flagship hardware—Google Home, Pixel 2, Android Wear

Responds slightly better to e-commerce/shopping-related queries, since that is Amazon's main forte

Responds slightly better to web-based queries since Google's major forte is web searching

Slightly inferior contextual awareness

Better contextual awareness, hence conversations seem a little more natural

Capabilities of Alexa can be extended by installing third-party "skills"

Capabilities of Google Assistant can be extended by installing third-party apps; however, it has fewer apps currently available for it in the market than Alexa has skills

A wider range of integration with smart home devices such as smart lights, smart locks, smart switches, and smart thermostats

Slightly narrower range of integration with smart home devices

In a nutshell, both Google and Alexa are very skilled voice-based assistants and accomplish a lot for their users; however, since Google Assistant is fairly new to the market, its integration and compatibility with third-party apps and hardware is still evolving, albeit at a very rapid pace. However, even being the newer of the two, Google Home still fares better in terms of web integration and contextual awareness.

It would be really interesting to see what the evolution of AI and Machine Learning brings to the table in the coming era and how these assistants are able to leverage that.

 

Summary


In this chapter, we covered the evolution of interactive voice-based personal assistants and the various factors involved in their move from a user's smartphone to their smart home. We also saw the various interactive voice-based personal assistants in the smartphone, desktop, and smart home markets, and the capabilities of each.

Our goal was to get the reader familiar with the history of interactive voice-based personal assistants so that over the course of the book, we can direct our focus onto Alexa, the interactive personal assistant bundled with Amazon Echo. The next chapter will enable the reader to understand the anatomy of an Alexa Skill and to hands-on program an Amazon Echo so that Alexa can learn to say one of the oldest phrases in computer programming, "Hello, World."

About the Author
  • Madhur Bhargava

    Madhur Bhargava is specialized in Wireless and Mobile Computing from CDAC ACTS Pune, India. He started his career at Electronic Arts as a software engineer working on mobile games. He later addressed problems in personalized healthcare, leveraging the power of mobile and voice computing. He is proficient in various mobile/embedded technologies and strives to be a software generalist. He believes that good software is a result of talented individuals working together as a communicative team in an Agile manner. He likes to spend time with his family, read, and watch movies.

    Browse publications by this author
Latest Reviews (2 reviews total)
Awesome book! Clear and one of a kind.
Have experts review materials. The book I got I’d supposed to be to learn to develop skills for Alexa, but the code included doesn’t resemble at all what Amazon uses and recommends for building skills (ASK SDK v2). Very unhappy with my purchase
Alexa Skills Projects
Unlock this book and the full library FREE for 7 days
Start now