The age of intelligent machines has arrived, and conversational interfaces are leading the charge. Over the past couple of years, we have been swarmed by a number of new kinds of machines and software collectively known as bots. Bots are automated hardware or software machines that are powered by the advances in Artificial Intelligence (AI) technologies. Recent developments in machine learning algorithms, such as deep learning and deep reinforcement learning, have improved the performance of AI tasks such as Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), Text to Speech Synthesis (TTS), and Image Recognition. This has accelerated humankind's journey toward the technological singularity, the point in time when AI surpasses natural human intelligence by leaps and bounds.
One of the long-term goals in the field of AI is to build computer systems that can have human-like conversations with users. With recent advances in AI technologies, we are now one step closer to achieving this goal. Now, it is no longer fictional that we are able to interact with devices and gadgets in our homes and offices using nothing but voice. We still have a long way to go toward creating standards and building digital beings that are capable of seamless natural language conversation. However, a recent surge in interests and massive investments in pursuing these ideas suggest that we are on track toward evolving such a global standard. If you are excited about the recent developments in AI and automation technologies, this book is for you. We will embark on a journey toward a point in time in the future that the design guru Mark Curtis calls conversational singularity, when conversational devices disappear and conversation between man and machine is seamless and natural.
This is a book for programmers beginning to build conversational interfaces. Today, basic button-based chatbots can be built without even having to write a single line of code. In this book, that is where we will start. We will gradually move toward more complex and flexible architectures, and we will explore channels to use, such as Facebook Messenger, SMS, and Twitter. We will also be exploring tools for understanding natural language and conversation management as we proceed. Finally, we will end our journey by building voice-enabled bots on platforms such as Amazon Alexa and Google Assistant.
Conversational user interfaces are as old as modern computers themselves. ENIAC, the first programmable general-purpose computer, was built in the year 1946. In 1950, Alan Turing, a British computer scientist, proposed to measure the level of intelligence in machines using a conversational test called the Turing test. The test involved having the machine compete with a human as a dialogue partner to a set of human judges (yet another human). The judges would interact with each of the two participants (the human and the machine) using a text type interface that is not unlike most of the modern messaging chat applications. Over chat, the judges were supposed to identify which of the two participants was the machine. If at least 30% of the judges couldn't differentiate between the two participants, the machine was considered to have passed the test. This was one of the earliest human thoughts on conversational interfaces and their bearing on the intelligence levels of machines that have such capabilities. However, attempts to build such interfaces have not been very successful for several following decades.
For about 35 years, since the 1980s, Graphical User Interfaces (GUI) have been dominating the way in which we have been interacting with machines. With recent developments in AI and growing constraints such as the shrinking size of gadgets (from laptops to mobile phones), reducing on-screen real estates (smart watches), and the need for interfaces to become invisible (smart home and robots), conversational user interfaces are once again becoming a reality. For instance, the best way to interact with mobile robots that are distributed gadgets in smart homes would be using voice. The system should, therefore, be able to understand the users' requests and responses in natural human language. Such capabilities of systems can reduce human effort in learning and understanding current complex interfaces.
Conversational user interfaces have been known under several names: natural language interfaces, spoken dialogue systems, chatbots, intelligent virtual agents, virtual assistants, and so on. The actual difference between these systems is in terms of the backend integrations (for example, databases, and task/control modules), modalities (for example, text, voice, and visual avatars), and channels they get deployed on. However, one of the common themes among these systems is their ability to interact with users in a conversational manner using natural language.
The origins of modern chatbots can be traced back to 1964 when Joseph Weizenbaum at Massachusetts Institute of Technology (MIT) developed a chatbot called Eliza. It used simple rules of conversation and rephrased most of what the users said to simulate a Rogerian therapist. While it showed that naive users may be fooled into thinking that they are talking to an actual therapist, the system itself did not understand the user's problem. Following this, in 1991, the Loebner prize was instituted to encourage AI researchers to build chatbots that can beat the Turing test and advance the state of AI. Although no chatbots beat the test until 2014, many notable chatbots won prizes for winning other constrained challenges. These include ALICE, JabberWacky, Rose, and Mitsuku. However, in 2014, in a Turing test competition to mark the 60th anniversary of Alan Turing's death, a chatbot called Eugene Goostman, portraying a 13 year old kid, managed to fool 33% of the judges—thereby beating the test. Artificial Intelligence Markup Language (AIML) and ChatScript were developed as a way to script the knowledge and conversational content for most of these chatbots. Scripts developed using these scripting languages can then be fed into interpreters to create conversational behavior. Chatbots developed to beat the Turing test were largely chatty with just one objective—to beat the Turing test. This was not considered by many as advancement in AI or toward building useful conversational assistants.
On the other hand, research in artificial intelligence, specifically in machine learning and natural language processing, gave rise to various conversational interfaces such as question answering systems, natural language interfaces to databases, and spoken dialogue systems. Unlike chatbots built to beat the Turing test, these systems had very clear objectives. Question answering systems processed natural language questions and found answers in unstructured text datasets. Natural Language Interfaces to Database Systems (NLIDBS) were interfaces to large SQL databases that interpreted database queries posed in a natural language such as English, converted them into SQL, and returned the hits as response. Spoken Dialogue Systems (SDS) were systems that could maintain contextful conversations with users to handle conversational tasks such as booking tickets, controlling other systems, and tutoring learners. These were the precursors of modern chatbots and conversational interfaces.
In 2011, Apple released an intelligent assistant called Siri as part of their iPhones. Siri was modeled to be the user's personal assistant, doing tasks such as making calls, reading messages, and setting alarms and reminders. This is one of the most significant events in the recent past that rebooted the story of conversational interfaces. During the initial days of Siri, users used it only a few times a month to perform tasks such as searching the internet, sending SMS, and making phone calls. Although novel, Siri was treated as a work in progress with a lot more features to be added in the following years. In the early days, Siri had many clones and competition on Android and other smartphone platforms. Most of these were modeled as assistants and were available as mobile apps.
In the same year (2011), IBM introduced Watson, a question answering system that participated in a game show called Jeopardy and won it against previous human winners, Brad Rutter and Ken Jennings. This marked a milestone in the history of AI as Watson was able to process open domain natural language questions and answer them in real time. Since then, Watson has been refashioned into a toolkit with an array of cognitive service tools for natural language understanding, sentiment analysis, dialogue management, and so on.
Following Siri and Watson, the next major announcement came from Microsoft in 2013, when they introduced Cortana as a standard feature on Windows phones and later in 2015 on Windows 10 OS. Like Siri, Cortana was a personal assistant that managed tasks such as setting reminders, answering questions, and so on.
In November 2014, Amazon invited its Prime members to try out its very own personal assistant called Alexa. Alexa was made available on Amazon's own product called Echo. Echo was a first-of-its-kind smart speaker that housed within it an assistant like a "ghost" in the machine. Although called a speaker, it was actually a tiny computer with the voice as its only interface, unlike smartphones, tablets, and personal computers. Users can speak to Alexa using voice, ask her to do tasks such as setting reminders, playing music, and so on.
Recently, in April 2016, Facebook announced that it is opening up its popular Messenger platform for chatbots. This was a radically different approach to conversational interfaces compared to Siri, Alexa, and Cortana. Unlike these personal assistants, Facebook's announcement led to the creation of custom built and branded chatbots. These bots are very much like Siri, Cortana, and Alexa, but can be custom tuned to the requirements of the business building them. Chatbots are now poised to disrupt several markets, including customer service, sales, marketing, technical support, and so on. Many messaging platforms, such as Skype, Telegram, and others, also opened up to chatbots around the same time.
In May 2016, Google announced Assistant, its version of a personal chatbot that was accessible on multiple platforms such as Allo app and Google Home (a smart speaker like Echo). All assistants like Siri, Cortana, Alexa, and Google Assistant have also opened up as channels for third-party conversational capabilities. So, it is now possible to make your Alexa and Google Assistant personalized by adding conversational capabilities (called skills or actions) from a library of third-party solutions. Just as brands can develop their own chatbots for various messaging services (for example, Skype and Facebook Messenger), they can also develop skills for Alexa or actions for Google Assistant. Apple's very own smart speaker, Homepod, powered by Siri, is slated to be released in 2018.
Parallel to these developments, there has also been major growth in terms of tools that are available to build and host chatbots. Over the last two years, there has been an exponential growth of tools to design, mock, build, deploy, manage, and monetize chatbots. This has resulted in the creation of an ecosystem that designs and builds custom conversational interfaces for businesses, charities, governmental, and other organizations across the globe.
In this section, let's take a look at the basic architecture of a conversational interface:
The core module of a conversational interface is the conversation manager. This module controls the flow of the conversation. It takes the semantic representation of what the user says as input, and decides what the response of the system should be. It will maintain a representation of the conversational context in some form, say a set of key value pairs, in order to meaningfully carry out the conversation over several turns between the user and the system.
The semantic representation of the user input can be directly fed from button pushes. In systems that can understand language, user utterances will be translated into semantic representation, consisting of user intents and parameters (slots and entities), by a natural language understanding module. This module may need to be previously trained to understand a set of user intents identified by the developer pertaining to the conversational tasks at hand.
Voice-enabled interfaces that accept user's speech inputs also need a speech recognition module that can transcribe speech into text before feeding it into the natural language understanding module. Symmetrically, on the other side, there is a need for a speech synthesizer (or text-to-speech engine) module that converts the system's text response into speech.
The conversational manager will interact with backend modules. It can be a database or an online data source that gets queried in order to answer a user's question (for example, TV schedule) or an online service to carry out a user's instruction (for example, booking a ticket).
The channel is where the chatbot actually meets the user. Depending on the channel, there may be one or more modules that make up this layer. For instance, if the chatbot is on Facebook Messenger, this layer consists of a Facebook Page and a Facebook App that connects to the rest of the chatbot modules wrapped as a web app.
Conversational user interfaces have found themselves applied in various scenarios. Their applications can be classified broadly into two categories: enterprise assistants and personal assistants.
Enterprise Assistants are chatbots and other conversational user interfaces that are modeled after customer service representatives and store assistants. Just like human customer service representatives, the bots engage customers in conversation carrying out marketing, sales, and support tasks. Most chatbots deployed on channels such as Facebook Messenger, Skype, Slack, and many more are enterprise assistants. They are designed and built to do tasks that store assistants and customer service representatives would do. Enterprise assistants are being developed in many business sectors, automating a variety of conversational tasks.
On the other hand, personal assistants are bots like Alexa, Siri, and Cortana, which act as a user's personal assistant, doing tasks such as managing a calendar, sending texts, taking calls, and playing music. These personal assistants can be extended in terms of their capabilities. For instance, Alexa allows for such augmentation by letting developers build skills that users can choose to add to their own Alexa. Brands can, therefore, develop skills for Alexa or actions for Google Assistant that will enable Alexa and Assistant to interact with the brand's IT services and perform tasks such as placing orders, checking delivery status, and many more. For instance, popular brands like PizzaHut, Starbucks, and Domino's have developed skills that can be enabled on Alexa.
Although chatbots have been under development for at least a few decades, they did not become mainstream channels for customer engagement until recently. Over the past two years, due to serious efforts by industry giants like Apple, Google, Microsoft, Facebook, IBM, and Amazon, and their subsequent investments in developing toolkits, chatbots and conversational interfaces have become a serious contender to other customer contact channels. In this time, chatbots have been applied in various sectors and various conversational scenarios within these sectors: retail, banking and finance, governmental, health, legal and third sector, and many more.
In retail, chatbots have been applied for product marketing, brand engagement, product assistance, sales, and support conversations. Brand-engagement chatbots offer tips and advice to loyal customers of a brand related to the use of products sold by the brand. For instance, Sephora chatbot advises users on how to select their ideal lipstick. Similarly, TK-Maxx chatbot assisted users in choosing gifts for their friends and family during Christmas 2016. One of the first retailers to explore chatbots for sales was H&M. The H&M chatbot helped users browse through the product catalogue and add products to their shopping carts. Car manufacturers like Tesla, Kia, and Mercerdes have developed chatbots that can help car users with information regarding their cars.
Chatbots have been very successful in the banking and finance industry. Banking was one of the first sectors that experimented with conversational interfaces. Banking chatbots can answer generic questions about financial products, secure banking, and so on, along with providing specific and personalized information about user's accounts. Many global banks and financial service providers including Bank of America, ICICI bank, HSBC, Royal Bank of Scotland, Capital One, Mastercard, and so on have deployed chatbots to assist their customers. Many fintech companies are building chatbots that can act as financial assistant to users. Ernest.ai and Cleo are chatbots that can link to your bank accounts and talk to you about your spending, balances, and also provide tips to save money. Chatbots are also being widely deployed in the insurance sector, where they act as assistants that can get you tailored quotes (for example, SPIXII).
Chatbots are also being used in legal, health, governmental, and third sectors. A chatbot called DoNotPay has assisted people to challenge parking tickets in London and New York in over 160,000 cases. Following this, more chatbots have been developed to help people access justice and legal services: assessment of crime (LawBot), business incorporation (LawDroid), help tenants (RentersUnion), help with legal questions and documentation (Lisa, LegaliBot, Lexi, DocuBot), and find lawyers (BillyBot).
In the third sector, chatbots have been used to spread awareness of issues that charities care about. Stoptober is a Facebook chatbot that was developed by the National Health Services (NHS) in the UK to help smokers quit. Another chatbot, Yeshi, was developed to draw awareness to Ethiopia's water crisis. Chatbots are beginning to make their entry into healthcare as well. Chatbots like Your.MD and HealthTap were designed to diagnose health issues based on symptoms. Emily is a chatbot designed by LifeFolder to help make the end of life decisions (for example, legal documentation, life support, organ donation, and many more).
Chatbots are not only being used to be customer facing but also internally, to face employees. Chatbots, in a sense, are becoming coworkers by helping fellow employees with tasks that are repetitive, mundane, and boring. Messaging services such as Slack and Microsoft Teams have been encouraging chatbots on their platforms to automate office communication. These bots aim to engage coworkers in chat on fun and essential tasks. For instance, there are bots to help coworkers share knowledge (Obie.ai), access other services such as GDrive (WorkBot), set up meetings (Meekan), discuss lunch (LunchTrain), and even help with decision making (ConcludeBot, SimplePoll).
If you are interested in finding out more use cases, I would recommend you to take a look at some of the bot directory services like botlist.co and www.chatbots.org, where you can find more information and inspiration.
Over the last few years, an ecosystem of tools and services has grown around the idea of conversational interfaces. There are a number of tools that we can plug and play to design, develop, and manage chatbots.
Mockups can be used to show clients as to how a chatbot would look and behave. These are tools that you may want to consider using during conversation design, after coming up with sample conversations between the user and the bot on the back of a napkin. Mockup tools allow you to visualize the conversation between the user and the bot and showcase the dynamics of conversational turn-taking. BotSociety.io (https://botsociety.io/) and BotMock.com (https://botmock.com/) are some of the popular mockup tools. Some of these tools allow you to export the mockup design and make videos.
Channels refer to places where users can interact with the chatbot. There are several deployment channels over which your bots can be exposed to users. These include messaging services such as Facebook Messenger, Skype, Kik, Telegram, WeChat, and Line; office and team chat services such as Slack, Microsoft Teams, and many more; traditional channels such as the web chat, SMS, and voice calls; and smart speakers such as Amazon Echo and Google Home. Choose the channel based on your users and the requirements of the project. For instance, if you are building a chatbot targeting consumers, Facebook Messenger can be the best channel because of the growing number of users who use the service already to keep in touch with friends and family. To add your chatbot to their contact list may be easier then getting them to download your app. If the user needs to interact with the bot using voice in a home or office environment, smart speaker channels can be an ideal choice. And finally, there are tools that can connect chatbots to many channels simultaneously (for example, Dialogflow integration, MS Bot Service, and Smooch.io, and so on).
There are many tools that you can use to build chatbots without having to code even a single line: Chatfuel, ManyChat, Dialogflow, and so on. Chatfuel allows designers to create the conversational flow using visual elements. With ManyChat, you can build the flow using a visual map called the FlowBuilder. Conversational elements such as bot utterances and user response buttons can be configured using drag and drop UI elements. Dialogflow can be used to build chatbots that require advanced natural language understanding to interact with users.
On the other hand, there are scripting languages such as Artificial Intelligence Markup Language (AIML), ChatScript, and RiveScript that can used to build chatbots. These scripts will contain the conversational content and flow that then needs to be fed into an interpreter program or a rules engine to bring the chatbot to life. The interpreter decides how to progress the conversation by matching user utterances to templates in the scripts. While it is straightforward to build conversational chatbots using this approach, it becomes difficult to build transactional chatbots without generating explicit semantic representations of user utterances. PandoraBots is a popular web-based platform for building AIML chatbots.
Alternatively, there are SDK libraries that one can use to build chatbots: MS Bot Builder, BotKit, BotFuel, and so on provide SDKs in one or more programming languages to assist developers in building the core conversational management module. The ability to code the conversational manager gives developers the flexibility to mould the conversation and integrate the bot to backend tasks better than no-code and scripting platforms. Once built, the conversation manager can then be plugged into other services such as natural language understanding to understand user utterances.
Like other digital solutions, chatbots can benefit from collecting and analyzing their usage statistics. While you can build a bespoke analytics platform from scratch, you can also use off-the-shelf toolkits that are widely available now. Many off-the-shelf analytics toolkits are available that can be plugged into a chatbot, using which incoming and outgoing messages can be logged and examined. These tools tell chatbot builders and managers the kind of conversations that actually transpire between users and the chatbot. The data will give useful information such as the conversational tasks that are popular, places where conversational experience breaks down, utterances that the bot did not understand, and the requests which the chatbots still need to scale up to. Dashbot.io, BotAnalytics, and Google's Chatbase are a few analytic toolkits that you can use to analyze your chatbot's performance.
Chatbots can be built without having to understand utterances from the user. However, adding the natural language understanding capability is not very difficult. It is one of the hallmark features that sets chatbots apart from their digital counterparts such as websites and apps with visual elements. There are many natural language understanding modules that are available as cloud services. Major IT players like Google, Microsoft, Facebook, and IBM have created tools that you can plug into your chatbot. Google's Dialogflow, Microsoft LUIS, IBM Watson, SoundHound, and Facebook's Wit.ai are some of the NLU tools that you can try. We will explore Dialogflow (previously called Api.Ai) in some of the chapters.
One of the challenges of building the bot is to get users to discover and use it. Chatbots are not as popular as websites and mobile apps, so a potential user may not know where to look to find the bot. Once your chatbot is deployed, you need to help users find it. There are directories that list bots in various categories. Chatbots.org is one of the oldest directory services that has been listing chatbots and virtual assistants since 2008. Other popular ones are Botlist.co, BotPages, BotFinder, and ChatBottle. These directories categorize bots in terms of purpose, sector, languages supported, countries, and so on. In addition to these, channels such as Facebook and Telegram have their own directories for the bots hosted on their channel. In the case of Facebook, you can help users find your Messenger bot using their Discover service.
Chatbots are built for many purposes: to create awareness, to support customers after sales, to provide paid services, and many more. In addition to all these, chatbots with interesting content can engage users for a long time and can be used to make some money through targeted personalized advertising. Services such as CashBot.ai and AddyBot.com can integrate with your chatbot to send targeted advertisements and recommendations to users, and when users engage, your chatbot makes money.
The aforementioned is not an exhaustive list of tools and nor are the services listed under each type. These tools are evolving over time as chatbots are finding their niche in the market. This list is to give you an idea of how multidimensional the ecosystem is and help you explore the space and feed your creative mind.
Conversational user interfaces bring in the best of both worlds: human-like natural interaction combined with the benefits of digital technology.
- Availability: Like any other automated digital technologies, conversational interfaces are low-cost and are available 24/7. It is like having someone man the web chat desk all the time so that customers always have someone to get answers from.
- Personalized experience: Unlike websites and smartphone apps, chatbots can provide a very personalized experience owing to the conversational nature of interaction. One-to-one conversation settings provide ample opportunity to understand and adapt to a user's goals, preferences, and constraints.
- Low cost: Chatbots are digital solutions and therefore provide customer support services at least ten times cheaper than humans doing the very same tasks.
- Consistency: Chatbots can be consistent in services, which may be hard to achieve with human operators and may be very important in certain sectors.
- Quick response times: Unlike human-based systems, the response time for chatbots is much quicker. Users no longer have to wait for their call to be picked up and during a conversation, the chatbot responses will be quicker than human responses, especially when human operators are tasked with more than one simultaneous chat (sometimes up to five). The ability of chatbots to handle simultaneous conversations also removes the bottleneck of limited customer support bandwidth and therefore helps businesses scale up.
- Scale up: Chatbots can easily scale up to handle increasing and seasonal traffic, which is not easy to do when using a battery of live advisors. Holiday season may particularly drive up demand for customer support. At such times, chatbots can be used to handle low priority and easy tasks, thereby reducing the load on live advisors—and human assistance can be used judiciously to handle high-value conversations.
The conversational user interface technologies are currently one of the top trending topics in the technology business. Most big brands have started formulating their chatbot strategy within their larger AI and automation strategy. Innovations such as chatbots, smart speakers, and self-driving cars are driving such major policy decisions. The world is gearing up to bear the onslaught of automation technologies that are poised to replace humans in repetitive and structured tasks.
The recent rise of chatbots has been fueled by many factors:
- Milliennials have been steadily moving toward chat as their preferred channel to interact with brands. Customer contact surveys show that people want to use web chat channels when available, compared to other traditional channels, such as email and phone, to contact businesses.
- The growth of chat messaging apps on smartphones and other devices has surpassed the usage of social media apps such as Facebook and Twitter. Now people spend more time on messaging apps, chatting with friends, family, colleagues, and even businesses.
- Rising customer demand on chat is putting tremendous pressure on brands. The lack of skilled human resources to handle growing chat traffic is also an important contributor of the rise of chatbots.
- Availability of cognitive service tools for natural language understanding, speech recognition, speech synthesis, conversation management, analytics, and so on has made the design and development of chatbots easier than it was a few years ago.
- Opening up of messaging channels and innovative new avenues, such as smart speakers, has made delivering services over chatbots a reality. The fact that there is a growing interest in messaging apps and devices such as smart speakers presents an attractive opportunity for brands to build chatbots to take advantage of the users who are already available on these channels.
There are several surveys and statistics that show that conversational interfaces are here to stay. Through the following list, we offer some of the most compelling survey findings and predictions that show that chatbots are here for the long run:
- Gartner (https://www.gartner.com/smarterwithgartner/gartner-top-strategic-predictions-for-2018-and-beyond/) predicts that by 2021, brands that design their websites to include voice and visual search will increase their revenue by 30% and that more that 50% of businesses will spend more on chatbots than traditional mobile apps.
- In an Oracle survey, 80% of respondents (C-level executives) said that they are planning to introduce chatbot services by 2020, if not already [OR].
- Juniper research predicts that use of chatbots will produce annual cost savings of USD 8 billion by 2022, up from USD 20 million in 2017 [JR].
- A Hubspot survey found that about 47% of consumers are open to buying items through a chatbot and around 40% don't care whether they talk to a chatbot or a human as long as they get help easily and quickly [HB].
- Finally, according to a recent Grand View Research report, the global chatbot market is poised to reach a staggering USD 1.25 billion by 2025, growing at a CAGR of 24.3% from USD 190 million in 2016 [GVR].
So are you ready to get started and build some chatbots yet? I hope I have given you a good introduction to the world of chatbots in this chapter. We covered historical and recent developments, classification of chatbots, their application in various sectors, their benefits, their future, and their basic architecture. Over the course of the next eight chapters, I will introduce you to several tools, techniques, and concepts that will enable you to build amazing conversational interfaces. Let the journey begin!
- [Stoptober] https://www.marketingweek.com/2016/09/20/stoptober-uses-facebook-messenger-bot-to-help-people-quit-smoking/
- [Yeshi] https://www.akqa.com/work/lokai/walk-with-yeshi/
- [Emily] https://medium.com/life-folder/introducing-emily-the-chatbot-that-talks-about-death-97b390119cce
- [SPIXII] https://www.insly.com/en/blog/chatbot-is-the-future-of-automated-insurance/
- [OR] http://uk.businessinsider.com/80-of-businesses-want-chatbots-by-2020-2016-12
- [JR] https://www.juniperresearch.com/researchstore/innovation-disruption/chatbots/retail-ecommerce-banking-healthcare
- [HB] https://research.hubspot.com/reports/artificial-intelligence-is-here
- [GVR] http://www.grandviewresearch.com/industry-analysis/chatbot-market