Packt+ | Advance your knowledge in tech

You're reading from Building Slack Bots

Product type Book

Published in Jun 2016

Publisher Packt

ISBN-13 9781786460806

Pages 182 pages

Edition 1st Edition

Languages

JavaScript

Concepts

Chatbots

Table of Contents (14) Chapters

Building Slack Bots

Credits

About the Author

About the Reviewer

www.PacktPub.com

Preface

1. Getting Started with Slack

2. Your First Bot

3. Adding Complexity

4. Using Data

5. Understanding and Responding to Natural Language

6. Webhooks and Slash Commands

7. Publishing Your App

Index

Chapter 5. Understanding and Responding to Natural Language

We've built bots that can play games, store data, and provide useful information. The next step isn't information gathering, it's processing. This chapter will introduce natural language processing (NLP) and show how we can use it to enhance our bots even further.

In this chapter, we will cover:

A brief introduction to natural language
A Node implementation
Natural language processing
Natural language generation
Displaying data in a natural way

A brief introduction to natural language

You should always strive to make your bot as helpful as possible. In all the bots we've made so far, we've awaited clear instructions via a key word from the user and then followed said instructions as far as the bot is capable. What if we could infer instructions from users without them actually providing a key word? Enter natural language processing (NLP).

NLP can be described as a field of computer science that strives to understand communication and interactions between computers and human (natural) languages.

In layman's terms, NLP is the process of a computer interpreting conversational language and responding by executing a command or replying to the user in an equally conversational tone.

Examples of NLP projects are digital assistants such as the iPhone's Siri. Users can ask questions or give commands and receive answers or confirmation in natural language, seemingly from a human.

One of the more famous projects using NLP is IBM's Watson system...

Fundamentals of NLP

NLP, at its core, works by splitting a chunk of text (also referred to as a corpus) into individual segments or tokens and then analyzing them. These tokens might simply be individual words but might also be word contractions. Let's look at how a computer might interpret the phrase: I have watered the plants.

If we were to split this corpus into tokens, it would probably look something like this:

['I', 'have', 'watered', 'the', 'plants']

The word the in our corpus is unnecessary as it does not help to understand the phrase's intent— the same for the word have. We should therefore remove the surplus words:

['I', 'watered', 'plants']

Already, this is starting to look more usable. We have a personal pronoun in the form of an actor (I), an action or verb (watered), and a recipient or noun (plants). From this, we can deduce exactly which action is enacted to what and by whom. Furthermore, by conjugating the verb watered, we can establish that this action occurred in the past. Consider...

Tokenizers

Start by creating a new project with npm init. Name your bot "weatherbot" (or something similar), and install the Slack and Natural APIs with the following command:

npm install @slack/client natural –save

Copy our Bot class from the previous chapters and enter the following in index.js:

'use strict';

// import the natural library
const natural = require('natural');

const Bot = require('./Bot');

// initalize the tokenizer
const tokenizer = new natural.WordTokenizer();

const bot = new Bot({
  token: process.env.SLACK_TOKEN,
  autoReconnect: true,
  autoMark: true
});

// respond to any message that comes through
bot.respondTo('', (message, channel, user) => {

  let tokenizedMessage = tokenizer.tokenize(message.text);

  bot.send(`Tokenized message: ${JSON.stringify(tokenizedMessage)}`, channel);
});

Start up your Node process and type a test phrase into Slack:

The returned tokenized message

Through the use of tokenization, the bot has split the given phrase into short fragments...

Stemmers

Sometimes, it is useful to find the root or stem of a word. In the English language, irregular verb conjugations are not uncommon. By deducing the root of a verb, we can dramatically decrease the amount of calculations needed to find the action of the phrase. Take the verb searching for example; for the purpose of bots, it would be much easier to process the verb in its root form search. Here, a stemmer can help us determine said root. Replace the contents of index.js with the following to demonstrate stemmers:

'use strict';

// import the natural library
const natural = require('natural');

const Bot = require('./Bot');

// initialize the stemmer
const stemmer = natural.PorterStemmer;

// attach the stemmer to the prototype of String, enabling
// us to use it as a native String function
stemmer.attach();

const bot = new Bot({
  token: process.env.SLACK_TOKEN,
  autoReconnect: true,
  autoMark: true
});

// respond to any message that comes through
bot.respondTo('', (message, channel...

String distance

A string distance measuring algorithm is a calculation of how similar two strings are to one another. The strings smell and bell can be defined as similar, as they share three characters. The strings bell and fell are even closer, as they share three characters and are only one character apart from one another. When calculating string distance, the string fell will receive a higher ranking than smell when the distance is measured between them and bell.

The NPM package natural provides three different algorithms for string distance calculation: Jaro-Winkler, the Dice coefficient, and the Levenshtein distance. Their main differences can be described as follows:

Dice coefficient: This calculates the difference between strings and represents the difference as a value between zero and one. Zero being completely different and one meaning identical.
Jaro-Winkler: This is similar to the Dice Coefficient, but gives greater weighting to similarities at the beginning of the string.
Levenshtein...

Inflection

An inflector can be used to convert a noun back and forth from its singular and plural forms. This is useful when generating natural language, as the plural versions of nouns might not be obvious:

let inflector = new natural.NounInflector();

console.log(inflector.pluralize('virus'));
console.log(inflector.singularize('octopi'));

The preceding code will output viri and octopus, respectively.

Inflectors may also be used to transform numbers into their ordinal forms; for example, 1 becomes 1st, 2 becomes 2nd, and so on:

let inflector = natural.CountInflector;

console.log(inflector.nth(25));
console.log(inflector.nth(42));
console.log(inflector.nth(111));

This outputs 25th, 42nd, and 111th, respectively.

Here's an example of the inflector used in a simple bot command:

let inflector = natural.CountInflector;

bot.respondTo('what day is it', (message, channel) => {
  let date = new Date();

  // use the ECMAScript Internationalization API to convert 
  // month numbers into names
  let...

Displaying data in a natural way

Let's build our bot's weather functionality. To do this, we will be using a third-party API called Open Weather Map. The API is free to use for up to 60 calls per minute, with further pricing options available. To obtain the API key, you will need to sign up here: https://home.openweathermap.org/users/sign_up.

Note

Remember that you can pass variables such as API keys into Node from the command line. To run the weather bot, you could use the following command:

SLACK_TOKEN=[YOUR_SLACK_TOKEN] WEATHER_API_KEY=[YOUR_WEATHER_KEY] nodemon index.js

Once you signed up and obtained your API key, copy and paste the following code into index.js, replacing process.env.WEATHER_API_KEY with your newly acquired Open Weather Map key:

'use strict';

// import the natural library
const natural = require('natural');

const request = require('superagent');

const Bot = require('./Bot');

const weatherURL = `http://api.openweathermap.org/data/2.5/weather?&units=metric&appid...

When to use NLP?

It might be tempting to have weatherbot listen to and process all messages sent in the channel. This immediately poses some problems:

How do we know if the message sent is a query on the weather or is completely unrelated?
Which geographic location is the query about?
Is the message a question or a statement? For example, the difference between Is it cold in Amsterdam and It is cold in Amsterdam.

Although an NLP-powered solution to the preceding questions could probably be found, we have to face facts: it's likely that our bot will get at least one of the above points wrong when listening to generic messages. This will lead the bot to either provide bad information or provide unwanted information, thus becoming annoying. If there's one thing we need to avoid at all costs, it's a bot that sends too many wrong messages too often.

Here's an example of a bot using NLP and completely missing the point of the message sent:

A clearly misunderstood message

If a bot were to often mistake...

Mentions

To implement the second point, we need to revisit our Bot class and add mention functionality. In the Bot class' constructor, replace the RTM_CONNECTION_OPENED event listener block with the following:

this.slack.on(CLIENT_EVENTS.RTM.RTM_CONNECTION_OPENED, () => {
  let user = this.slack.dataStore.getUserById(this.slack.activeUserId)
  let team = this.slack.dataStore.getTeamById(this.slack.activeTeamId);

  this.name = user.name;
  this.id = user.id;

  console.log(`Connected to ${team.name} as ${user.name}`);
});

The only change here is the addition of the bot's id to the this object. This will help us later. Now, replace the respondTo function with this:

respondTo(opts, callback, start) {
  if (!this.id) {
    // if this.id doesn't exist, wait for slack to connect
    // before continuing
    this.slack.on(CLIENT_EVENTS.RTM.RTM_CONNECTION_OPENED, () => {
      createRegex(this.id, this.keywords);
    });  
  } else {
    createRegex(this.id, this.keywords);
  }
      
  function...

Classifiers

Classification is the process of training your bot to recognize a phrase or pattern of words and to associate them with an identifier. To do this, we use a classification system built into natural. Let's start with a small example:

const classifier = new natural.BayesClassifier();

classifier.addDocument('is it hot', ['temperature', 'question','hot']);
classifier.addDocument('is it cold', ['temperature', 'question' 'cold']);
classifier.addDocument('will it rain today', ['conditions', 'question', 'rain']);
classifier.addDocument('is it drizzling', ['conditions', 'question', 'rain']);

classifier.train();


console.log(classifier.classify('will it drizzle today'));
console.log(classifier.classify('will it be cold out'));

The first log prints:

conditions,question,rain

The second log prints:

temperature,question,cold

The classifier stems the string to be classified first, and then calculates which of the trained phrases it is the most similar to by assigning a weighting to each possibility...

Using trained classifiers

An example classifier.json file that contains training data for weather is included with this book. For the rest of this chapter, we will assume that the file is present and that we are loading it in via the preceding method.

Replace your respondTo method call with the following snippet:

let settings = {};

bot.respondTo({ mention: true }, (message, channel, user) => {
  let args = getArgs(message.text);

  if (args[0] === 'set') {
    let place = args.slice(1).join(' ');
    settings[user.name] = place
    
    bot.send(`Okay ${user.name}, I've set ${place} as your default location`, channel);
    return;
  }

  if (args.indexOf('in') < 0 && !settings[user.name]) {
    bot.send(`Looks like you didn\'t specify a place name, you can set a city by sending \`@weatherbot set [city name]\` or by sending \`@weatherbot ${args.join(' ')} in [city name]\``, channel);
    return;
  }

  // The city is usually preceded by the word 'in'  
  let city = args.indexOf...

Natural language generation

Natural language can be defined as a conversational tone in a bot's response. The purpose here is not to hide the fact that the bot is not human, but to make the information easier to digest.

The flavorText variable from the previous snippet is an attempt to make the bot's responses sound more natural; in addition, it is a useful technique to cheat our way out of performing more complex processing to reach a conversational tone in our response.

Take the following example:

Weatherbot's politician-like response

Notice how the first weather query is asking whether it's cold or not. Weatherbot gets around giving a yes or no answer by making a generic statement on the temperature to every question.

This might seem like a cheat, but it is important to remember a very important aspect of NLP. The more complex the generated language, the more likely it is to go wrong. Generic answers are better than outright wrong answers.

This particular problem could be solved by adding more...

When should we use natural language generation?

Sparingly, is the answer. Consider Slackbot, Slack's own in-house bot used for setting up new users, amongst other things. Here's the first thing Slackbot says to a new user:

The humble bot

Immediately, the bot's restrictions are outlined and no attempts to hide the fact that it is not human are made. Natural language generation is at its best when used to transform data-intensive constructs such as JSON objects into easy to comprehend phrases.

The Turing Test is a famous test developed in 1950 by Alan Turing to assess a machine's ability to make itself indistinguishable from a human in a text-only sense. Like Slackbot, you should not strive to make your bot Turing Test complete. Instead, focus on how your bot can be the most useful and use natural language generation to make your bot as easy to use as possible.

The uncanny valley

The uncanny valley is a term used to describe systems that act and sound like humans, but are somehow slightly off. This slight discrepancy actually leads to the bot feeling a lot more unnatural, and this is the exact opposite of what we are trying to accomplish with natural language generation. Instead, we should avoid trying to make the bot perfect in its natural language responses; the chances of finding ourselves in the uncanny valley get higher the more human-like we try to make a bot sound.

Instead, we should focus on making our bots useful and easy to use, over making its responses natural. A good principle to follow is to build your bot to be as smart as a puppy, a concept championed by Matt Jones (http://berglondon.com/blog/2010/09/04/b-a-s-a-a-p/):

"Making smart things that don't try to be too smart and fail, and indeed, by design, make endearing failures in their attempts to learn and improve. Like puppies."

Let's expand our weatherbot to make the generated response...

Summary

In this chapter, we discussed what NLP is and how it can be leveraged to make a bot seem far more complex than it really is. By using these techniques, natural language can be read, processed, and responded to in equally natural tones. We also covered the limitations of NLP and understood how to differentiate between good and bad uses of NLP.

In the next chapter, we will explore the creation of web-based bots, which can interact with Slack using webhooks and slash commands.